> but that's a matter of convention & taste, with no bearing on the correctness ...

tagrun · on May 4, 2022

Unlike the example in the link you give, η isn't a generic random name like a,x that can mean anything. If you ever read a paper on stochastic gradient optimization, you'd know that η means learning rate in the context.

It is bikeshedding because it is analogous to insisting that using "angle" instead of "θ", or "radius" instead of "r" in a 2D geometry library is superior and takes your code from being a lackluster to something that shines (in the words of the original author), while not having anything useful to say anything about the mathematical/technical aspects of the code itself.

Here is the definition of bikeshedding:

> The term was coined as a metaphor to illuminate Parkinson’s Law of Triviality. Parkinson observed that a committee whose job is to approve plans for a nuclear power plant may spend the majority of its time on relatively unimportant but easy-to-grasp issues, such as what materials to use for the staff bikeshed, while neglecting the design of the power plant itself, which is far more important but also far more difficult to criticize constructively. It was popularized in the Berkeley Software Distribution community by Poul-Henning Kamp[1] and has spread from there to the software industry at large.

from https://en.wiktionary.org/wiki/bikeshedding

00ajcr · on May 4, 2022

My interpretation of the point in the blog post was that explicitly spelling out variable names makes APIs and the underlying code much more accessible to a wider audience.

Sure, there'll be a subset of users of these libraries that have read ML/textbooks and are familiar with what η means in this context.

Today, many (most?) users of ML libraries will probably not know what η means without looking it up. Adhering to mathematical notation puts up an unnecessary barrier to using the API/code and ultimately limits wider engagement/collaboration.

To attract a bigger slice of the ML community, choosing names that the ML hobbyyist can read, understand and use without pause is the better path forward.

tagrun · on May 4, 2022

You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work?

FYI, the documentation of the function https://fluxml.ai/Flux.jl/stable/training/optimisers/ explicitly says it is learning rate:

> Learning rate (η): Amount by which gradients are discounted before updating the weights.

so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

jstx1 · on May 4, 2022

> How does that work?

You can look up "learning rate" much easier than to look up "what is this Greek letter on my screen" followed by "what is the use of this Greek letter in my context" and only then followed by searching for "learning rate"

More importantly, it's possible to know what a learning rate is without knowing what Greek letter it's commonly denoted as. Especially since mathematical notation is so inconsistent across authors. I want less ambiguity in code, not more. Explicit is better than implicit.

Mathematical notation is notorious for being an absolute mess of inconsistencies. Who in their right mind looked at it and went "yep, I want more of this in my source code".

mattkrause · on May 5, 2022

This depends a lot on the target audience for your code.

For research-focused code, it is likely that whatever you're implementing was initially described in terms of mathematical notation (e.g., in a paper or book). It can be helpful to have variables that unambiguously match that canonical source. In fact, a lot of my Julia code has docstrings containing references/links to the original paper and a comment noting that it uses the notation therein.

This sidesteps the problem where textual descriptions like `learning_rate` can sometimes be ambiguous: is it the original learning rate, or perhaps the current rate after applying some sort of schedule or decay? I think the Flux documentation is pretty close to ideal, in that it's got a symbol you can match against equations (though no reference to them) as well as text that you can search to learn more.

tagrun · on May 4, 2022

You are not answering the (rhetorical) question that you quoted, and the answer to your response is already in the paragraph that followed it:

As I said, the necessary keywords for Googling it, along with a brief description is already present in the documentation.

The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

> Mathematical notation is notorious for being an absolute mess of inconsistencies.

According to whom? What exactly is inconsistent?

nullstyle · on May 5, 2022

> The quibble here is about the necessity of reproducing the all the necessary keywords for an accurate Google search during every single function call.

No, that is not the quibble. My quibble is with choosing identifiers that make code less legible when taken by itself. The best code teaches future readers about how it works.

mcabbott · on May 5, 2022

The struct's field name is `eta`, but this is an internal detail. Its constructor takes a positional-only argument, no public name.

The greek letter is used in the documentation. And the reason is that every optimiser's documentation links to the original paper, and tries to follow that. If the Adam paper calls the two decay rates β1, β2, then staying close to that seems the least confusing option.

nullstyle · on May 5, 2022

Perhaps I'm missing your point, but I think you're focussing too much on the specific case that someone who isn't me came up with.

My most general point is that the identifiers we use in our code are almost never just convention or taste when we are sharing that code with anyone else (and for most, "anyone else" includes our future selves). Getting a little more specific, I'm specifically interested in Julia and look forward to working in it further, but I've personally felt pain around scientific/mathematical notation when trying to understand code I've found on github. tagrun dismissing my pain as nonsense and the people who argue for my ilk as perpetrators of bikesheddding is dipshitted. Yeah, I'm probably the asshole for being a college dropout trying to leverage modern scientific computing for my own ends (snoogins), but I'm also willing to bet tagrun is probably the member of a team that talks down to junior members and complains they haven't read enough papers or the right papers to see the magnificience of their code ;).

It's fine to write code that demands a domain expert to understand, but don't pretend like its good across all dimensions. There are tradeoffs involved.

Personally I find the preponderance of scientific/mathematical notation (whatever you want to call it) in Julia to be cute; It certainly does bind the code to linked papers in a pretty cool way when it all fits together properly. That said, its a pain in the ass when it doesn't fit together properly and I've personally had a journey into Julia spoiled due to frequency at which I had to figure out how to notate something or what word to use when regarding some squiggle I haven't encountered before. I look forward to having a better intuition for the greek alphabet but until then Julia will often be harder to read, let alone understand when compared to ruby or javascript or go or C# or any other of the roughly dozen programming languages I've worked with and feel comfortable translating between.

mcabbott · on May 5, 2022

> > Learning rate (η): Amount by which gradients are discounted before updating the weights.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

As far as I can tell it's a documentation complaint. He has to remember "η" from the line with the signature, past the line "Gradient descent optimizer with learning rate η ...", and a heading "Parameters" until the line quoted which explains this in full.

He says this is the API, but that's inaccurate. The API being explained is that the first positional argument is the learning rate. It's not a keyword argument, so you cannot supply it by name. What variable names are used in the code is private, and in fact the struct's field name is `eta` so that you can access it without typing greek.

If this makes the top 10 list (even the top 10 list of documentation complaints) then Flux is doing OK. Especially the top 10 list of a guy with a PhD in a mathematical field. (From the sort of university which used to require students to know latin & greek, too.)

nullstyle · on May 5, 2022

> Unlike the example in the link you give, η isn't a generic random name like a,x that can mean anything. If you ever read a paper on stochastic gradient optimization, you'd know that η means learning rate in the context.

Why should reading a paper on stochastic gradient optimization be a prerequisite to understanding your idiosyncratic choices for identifiers? The fact of the matter is that I can understand code much better than acedemic prose. I'll learn the code and then supplement with the paper as needed. By using idiosyncratic identifiers you're gating off your code from people who haven't jumped through the same specific hoops you have and have the same mental muscles you have developed.

> It is bikeshedding because it is analogous to insisting that using "angle" instead of "θ", or "radius" instead of "r" in a 2D geometry library is superior and takes your code from being a lackluster to something that shines (in the words of the original author), while not having anything useful to say anything about the mathematical/technical aspects of the code itself.

No. To someone who doesn't have an established mental muscle for mathematical notation it is analgous to using thai script to write for an audience that primarily reads english: I can still use google translate, but the cognitive load is much higher to members of the audience who are native thai. That isn't bikeshedding, that's caring about understandability.

> You are saying most people don't know what η in that context means (=people who likely haven't read a book or a paper on stochastic gradient, and don't know how it actually works), but they would somehow magically figure out what it actually does if we call it "learning_rate" in ASCII letters. How does that work?

I learn from code a whole lot faster than I do from acedemic prose. I usually start with code then read the whitepapers the code refers to as I go. I learn slower from code that uses symbology I'm not familiar with. In the context of learning a new code base, unfamiliar symbols are bad in several ways.

> so this is already explicit to anyone who reads the documentation. The quibble in the post is about the named parameter.

My quibble is with the assumptions I think you make about what comprises good code quality while at the same time having suffered through code from people who share your attitudes. Naming matters to me in more ways than you're apparently versed with. The article I linked is just one small discussion on naming but not comprehensive by any means. I was linking it more in hopes that you would do further thinking of your own about a pretty wide subject. Just my opinion.

Furthermore, the page on FluxML demonstrates the problem I'm referring to. Just down from where you linked you'll find an entry like `RMSProp(η = 0.001, ρ = 0.9, ϵ = 1.0e-8)` in which `ϵ` is described nowhere in the entry. It's a random symbol that I understand to be used usually in set membership notation, but in this context (the context of some random link some random person posted in the interwebs) I have no clue what it means and thus it is a barrier to my understanding.

ChrisRackauckas · on May 5, 2022

Unicode should not be in public APIs. This is a standard around Julia. Flux is breaking the standard. Yes, it's not a good thing.

mcabbott · on May 5, 2022

The unicode epsilon isn't in the public API, it's describing the 3rd positional argument.

This was added recently, and for some reason the PR (1840) didn't fix the docs, which is bad. The Optimisers.jl version has an explanation: https://fluxml.ai/Optimisers.jl/dev/api/#Optimisers.RMSProp

nullstyle · on May 5, 2022

Interesting! do you have any links talking about that standard? I'm super interested in Julia and this seems like a good opportunity to learn something I've been missing so far.

ChrisRackauckas · on May 5, 2022

I'm not sure if/where it's formalized, but it's just generally something that's been enforced throughout Julia's Base, along with many of the package organizations (like SciML among others). It's something that would be mentioned at code review time by most contributors. It's why you don't see unicode keyword arguments. There's a lot of reasons. I think the best one is that you want the API to be compatible with old terminals you tend to get on HPCs which do not tend to support unicode. We should probably make it a part of the standard formatter rules or something at this point.

nullstyle · on May 5, 2022

Word. Thank you for the response.