Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Matlab vs. Julia vs. Python (tobydriscoll.net)
151 points by mbauman on July 3, 2019 | hide | past | favorite | 146 comments


Personally, matlab drives me absolutely up the wall when it comes to ANYTHING other that flipping big matricies around. As a domain-specific tool for linear algebra, I certainly prefer it over R, but as a general purpose tool it makes me want to pull my own teeth out.

It's just not designed to make good pipeline tools that are maintainable, easily tested, and easily refactored, and never was. Its ability to handle things like "easy and sensible string and path manipulations" are ... rudimentary, at best, and a weird pastiche of C, fortran, java, and whatever else language was faddish when that feature was added.

I have more or less one real annoyance with the technical content of matlab. (One-indexing i can live with):

  size([1])
  ans = [1 1]
which is flatly wrong, and numpy gets it right:

  In[0] np.array([1]).shape
  Out[0] (1,)
Python is actually a general purpose language which has a mature scientific stack, and i feel more secure in my numerical computations there because i can have a robust test suite and command line entry points into my code that increase my confidence that my code's doing what i think it should be, and makes it easy to use.

Packaging is more or less a coin-flip. Python packaging is a giant faff; matlab packaging is nonexistent (you have to vendor every dependency yourself) and expensive (your users have to shell out for the toolboxes you use, and/or have the MCR installed and can't edit your code).

I'll maintain matlab when i have to, but i don't enjoy it very much.


One behavior I like about Matlab is that functions are functions. Arguments are passed by value and there's no way a function can modify its arguments (well, except perhaps objects which are less common).

Matlab will try to optimize and avoid a copy if the function does not modify the argument. Matlab is also (sometimes) smart enough so that A=f(A) will modify A in place instead of making a copy.

This is what I expect from a math oriented language. Maintain the illusion of referential transparency but optimize under the hood if possible.

Also Matlab has a reasonable JIT compiler. And a good debugger.

I no longer use Matlab but it is a very productive environment for scientific computing (simulations, exploration).


Many numpy functions have an `inplace` (or maybe there’s an underscore there?) argument that tells whether to perform the operation in place or return a new array.


It's completely arbitrarily whether a numpy function or method modifies the array or not. No way to tell except documentation or trying it out.


Yeah, I was more than a little surprised to see his perspective - I thought for sure he was going to rave about Julia, pick on Python a little, and skewer Matlab. My graduate thesis advisor forced me to use Matlab to do all of my master’s thesis work, and I found it to be about the most frustrating environment possible, at least for any actual programming. I’m a little shocked to read that anybody uses Matlab outside of a pure academic environment… I guess I just don’t work on sophisticated enough projects?


It's extremely popular in control engineering domains.

You would find it an impossible task to track down an automobile manufacturer or a platform level supplier which doesn't leverage Matlab & Simulink all over the place.


The author of this post is in academia, so Matlab outside of academia may be as rare as you think.


I know of a few major trading firms that use MATLAB for a lot of their analysis. In many companies that hire engineers they also often use MATLAB as it's sufficient for what they need those engineers to do (and programmers at those companies use different languages), and many engineers come out of their degree feeling most comfortable with MATLAB for scientific computation.


MATLAB is common at NASA, although Python is starting to be used a lot, too. Julia is really interesting... I'd love for it to become a standard.


I think it is actually prevalent in a lot of industry. They don't make a lot of money off academia with those ~$30 student licenses.


Historically, the free or cheap student licenses were believed to pay them back when those students moved into industry and were inclined to install what they were familiar with. There's still a lot of MATLAB in college teaching curricula.

Part of when I help on-board newly graduated colleagues (typically engineers and scientists, not hired as programmers) is to reassure them that they can get a MATLAB license if will help them get productive quickly. This has happened once or twice. Many of them never bother, as they get busy enough with CAD and basic design work, that they don't really find a use for scientific computation. But to an increasing extent, they're willing to make the hop to Python, possibly just because it's a popular buzzword, but in any event, they are able to get themselves up to speed pretty quickly.


Oh I have no doubt that the cheap license is to get folks hooked, but as I said, it is not where they make money.

A single user license and a few toolboxes brings in more cash than all the student licenses my University probably used that year.


it is widely used in the rf signal processing community. communications, radar, that sort of thing


That's where I used it, although it was a while ago now. Processing results from EMI/EMC testing.


People in my place of work love it for DSP algorithms.


> (...) ANYTHING other that flipping big matricies around.

but... what else is there, in life? Flipping big matrices around is nearly everything I do, and the python stuff seems too cumbersome for me to bother.


What kind of flipping are you talking about? I can transpose a matrix in numpy with X.T. What is cumbersome about this?


In python, I hate hate hate having to do

    import numpy
    import scipy.sparse
    import scipy.sparse.linalg
just to begin writing something.

You cannot create a literal array without calling a function. You cannot concatenate arrays without calling a function, and moreover this function has a different name depending on whether your arrays are full or sparse. The @ notation for matrix products is horrendous (and .dot is even worse). Arrays are not first-class objects of the language, and you have to use an external library. This is more infuriating by the fact that other, more complex data structures like dictionaries or strings are natively supported, even if they are mostly useless for numerical computation.

Compare the clean matrix flipping in octave

      z = [ kron(speye(rows(y)), x) ; kron(y, speye(cols(x))) ]
to the python monstrosity

      z = scipy.sparse.vstack([
              scipy.sparse.kron(scipy.sparse.eye(y.shape[0]), x),
              scipy.sparse.kron(y, scipy.sparse.eye(x.shape[1]))
          ])


lol don't blame the tools for your ignorance of them

    from scipy.sparse import eye, kron
    from scipy.spare import vstack as vs

    z = vs([
            kron(eye(y.shape[0]), x),
            kron(y, eye(x.shape[1]))
          ])


It's exactly the same thing that I wrote, isn't it?

The thing that kills my soul is the need for the "vs" function. Why aren't the symbols [] not enough ?


Because python is a different language from Matlab and python lists are lists and note 1d arrays?


>which is flatly wrong

Why is it wrong?

Edit: typo


In and of itself, it's not wrong. Where it goes sideways is how Matlab chooses to allow matrix operations against these 1x1 matrices that should otherwise fail, how Matlab chooses to iterate (over columns, unless it's a "row-vector"), and how Matlab chooses to define length (the max dimension).

Amusingly this also leads to things like

    x = 1
    x(end+1) = 2
    x == [1 2]


I don't see how any of those behaviors are wrong. It's just a standard that MATLAB elects to follow.


a 1-tensor is not a 2-tensor, and scalars have no area.

matlab:

  size(1)
  ans = [1,1]
python:

  In[0]: np.array(1).shape
  Out[0]: ()
0-tensors are not 2-tensors, either. You can read matlab's result as "scalars are matricies that have one row and one column", and that's nonsense -- matlab effectively lacks a scalar numeric type; under the covers, everything is a matrix.


Arrays are not tensors (except when they are, of course). If you want to talk about (n-dimensional) arrays of numbers why wouldn’t the word “array” be enough?


in the nomenclature i’m used to, scalars, vectors, and matrices are all special cases of tensors. ( they have rank 0,1, and 2 respectively).

what numpy calls it is more or less moot.


It's not about what nomenclature numpy uses, it's about the word tensor being used by some people to refer to arrays for no discernible reason at all (except the coolness factor, maybe?).


I guess i don't see your point. I'm a biomedical researcher, we do lots of numerical analysis in my part of the world, these are the words we use to unambiguously refer to different entities and concepts of linear algebra.

This is not a CS data structures 101 thing, this is a "there are proper names for mathematical entities" thing. I think you're confusing representation and existence, maybe.


> "there are proper names for mathematical entities"

That's my point. Have you ever used a proper tensor? I mean, the mathematical entity: https://en.wikipedia.org/wiki/Tensor


yes, often. This comes up all the time in tissue mechanics, because continuum mechanics turns out to be quite useful for studying the mechanical properties of soft tissue.


In that case I think you can easily see my point. Numerical computing has done pretty well since the fifties using just "multidimensional arrays".

[Edit: You may care about vector algebra but 99.999% of the users of numerical arrays do not. I just think that the proper term in a general discussion about data structures in numerical computing like the one here is “array”.]

[Edit2: in case you’re really not familiar with the abuse of the term, people do now often use the word tensor to talk about multidimensional arrays that have nothing to do with the mathematical entity called tensor. And I don’t think either matlab or python pretend that they have tensors.]


Now _scalars_ not being distinct from 1x1 matrices I do agree is highly problematic. Not having a real 1-tensor, though, works surprisingly well.


Isn't it just a design choice? It's called "MAT"lab after all.


Matlab has its quirks, but I've never come across a better IDE for debugging 'scientific' code/scripts. Seeing current values by hovering over variables, the ability to pause and execute some 'testing' code, or overwrite things and carry on, being able to easily edit arrays/matrices in an excel-like table, having matrix arithmetic that doesn't look like shit when written as code... these are things that I find very useful. I use it in my field (structural engineering) for those reason alone, even if it might be slower, or more difficult to accomplish certain tasks.


I've used quite a bit of R and Python and I've never touched Matlab. Similarly to your comment - Python has nothing that comes even close to RStudio for working with data. Jupyter, Spyder, PyCharm, VSCode/Atom with data science extensions - none of them are as good.


Agreed. R might not have the breadth of Python, and it's less conventional as a language (procedural, vector based).

It has some key strengths though:

1. RStudio IDE as you note. It's a really great, focused IDE for doing most of the things people do with R.

2. Shiny. Such a well conceived and constructed toolkit for building interactive apps

3. The package ecosystem: lots of really good quality, high performance packages

4. RStudio the company, who contribute a lot to the community - both open source (RStudio IDE, Shiny, tidyverse, ...) and commercial (RSConnect, Package Manager).

From a language design pov I like Julia over Python over R. But for number-heavy computing I prefer the R ecosystem overall.


One of the packages created by RStudio is reticulate, which allows integration with python. They have also recently added some support for python to the IDE:

https://rstudio.github.io/reticulate/articles/rstudio_ide.ht...


On the other hand, base R has to maintain (some degree of?) compatibility with S. Which means that all the strange design choices and weird behaviour in base R have little hope of ever changing. No number of additional packages can fix this.


There is one nice thing about R core's focus on maintaining backwards compatibility, however: code from a decade ago (more often than not) will run without a hiccup on current versions of R.

Related tweets:

https://twitter.com/hrbrmstr/status/1124016682413039616

https://twitter.com/hrbrmstr/status/1122186751987073025


What about Rodeo? From the looks of it, it's getting close. And honestly, at this point I continue to use R because of RStudio and the tidyverse, otherwise I would have ditched the language a long time ago.


> I continue to use R because of RStudio and the tidyverse

Tidyverse is massively overrated if you ask me. The good parts of it (dplyr and ggplot) are nice for interactive work. And that's about it - if you're deploying the code in an application, you're best off sticking to base R as much as possible.


Disregard my previous comment. I looked into it and apparently Rodeo's company was acquired and the IDE became unsupported and is now a dumpster fire.

Also, it used electron.

So long live RStudio, I guess.


You should check out Spyder, included with the Anaconda Python distribution. It includes most of what makes MATLAB a productive IDE. In particular, you can inspect various types of variables (including Pandas tables in a nice grid view) and watch how the contents changes as the code executes.


Spyder is good for inspecting, coming close to Matlab (for the rather simple stuff I use it for) but it's a lacking on the code editing/browsing/plugin front. VSCode on the other hand is pretty good for the latter, is still lacking what I look for in Spyder and Matlab (i.e. if I'm debugging something, I just want a REPL in which I can type with full syntax completion, enter multiline text etc). As a result I mostly use both at the same time.. Though I wouldn't be surprised if VSCode is going to get better at one point, on github there are already multiple issues dedicated to this with at least some activity of the team.


I definitely understand this point. Another alternative is Python (with numpy, scipy and matplotlib) using the Pycharm IDE. It has many of the features you're talking about.


pycharm now has exactly the layout of matlab

https://i.imgur.com/epByE8g.png


My (math graduate school) perspective:

MATLAB is adored in academia for a number of reasons. It is easy to make readable small scripts for in-class examples. The debugging feature/IDE is easy to navigate. The school pays for the licenses; there is no overhead work to compile or download packages (unless you want to do something 'fancy').

I took a ML course that was taught in Python. All my Numerical Analysis and Modeling courses relied on MATLAB for examples and homework. I (as a programmer outside of just the math world) picked Julia for research. Now I do much more theoretical research, as I did not enjoy mixing coding and mathematics.

A fellow student, developing PDE solvers in FORTRAN was told by a mentor to get it to work in MATLAB first and then move on to faster languages.

Happy to answer any questions :)


The biggest gripe I have with matlab is that it teaches absolutely horrendous programming habits. Most of the graduate students that only used/learned matlab in their studies (I'm in an engineering field) program everything into one big script which they copy around and change a couple of parameters. Now obviously you can do things properly in matlab, it's just matlabs structure encourages you not to (who thought that one function per file and no namespaces are a good idea?!) Students who used Python on the other hand typically have much better programming habits (not necessarily good), I think because of it's roots as a programming language first, you get much more exposed to programming paradigms when you learn it.

The other thing I found weird was the complaint about matrixes objects being deprecated. After the addition of the '@' operator it is really the same as matlab, except their default is a matrix, while for numpy it's an array. As a side note the author complains about the '@', what about matlabs stupid decision to use the most easily overlooked ascii character for distinguishing between matrix and element wise operations. In my experience, almost all the time a matlab calculation returns weird or garbage results, the bug search is a "find the missing '.'


This [Julia] is the first language I’ve used that goes beyond ASCII.

Python 3 has supported unicode variable names for 12 years. Not all of unicode is permitted, but all the useful bits are.


Subscripts and superscripts are probably some of my most-used unicode identifiers. Supported in Julia but not Python 3.

    In [1]: α₁ = 1
      File "<ipython-input-1-3c2973844bb9>", line 1
        α₁ = 1
         ^
    SyntaxError: invalid character in identifier


Out of curiosity, how do you physically enter these subscripts? It seems a standard keyboard would be somewhat limiting.


It's just a simple tab completion in the REPL and usual editors, see: https://news.ycombinator.com/item?id=20346931


Why do you prefer using subscripts rather than simple ASCII numerals in variable names?


It's not a big thing, and obviously it's a matter of taste, but I find it improves readability slightly. It's most notable if I'm doing work based on literature with standard notations (e.g., α₀ instead of alpha_0), but it can also be really nice to help keep track of different states of the same value, e.g., loc₀ vs. locᵢ and locₖ inside loop(s) or with staggered indexing computations.


Inline? I dunno, shouldn't the thinking be separated from the showing? IOW, of course python can display sub/superscript, but using it in the code doesn't feel necessary.


Yes, absolutely "inline" as identifiers in your code. So many algorithms use subscript or superscript notations, and it's so nice to be able to use names like vₓ or H₀ or χ² or Aᵀ directly.


Nice as long as you are only reading a paper printout of the code and never have to write or edit it yourself, since your keyboard is never going to be set up for typing someone else’s choice of weird symbols.


That's also not true. Julia itself defines the tab completions for all these characters — and they'll be very familiar to you if you've ever used latex. They're just a tab away in every editor worth its salt.

    α₁  \alpha<TAB>\_1<TAB>
    vₓ  v\_x<TAB>     
    H₀  H\_0<TAB>
    χ²  \chi<TAB>\^2<TAB>
    Aᵀ  A\^T<TAB>
When I went to do the python demo above, IPython also tab-completed the `\alpha` above... but to get the ₁ the quickest and easiest way for me to get it was in my Julia editor.


So now anyone who might want to work with your program has to be trained in LaTeX and the Greek alphabet, and has to use a specialized IDE/editor.

Fair enough in some contexts, but I don’t find that to be the nicest general-audience feature. YMMV.


Folks who want to work with my program are going to know Julia. These tab completions aren't limited to editors/IDEs, they also work at the REPL. Just copy-paste the character you didn't know into Julia's help prompt:

    help?> α
    "α" can be typed by \alpha<tab>
Code is for reading much more than writing — especially for a new person. I maintain that matching the canonical form of the algorithm I'm implementing will help them gain their feet faster.

Good example:

https://github.com/JuliaStats/Distances.jl/blob/c21aab0fae30...

vs.

https://www.npmjs.com/package/haversine-geolocation#introduc... (note the mathematical formula on that page and the JS code compared to Julia's)


The use of λ₁ and φ₂ instead of l1 and a2 or x1 and y2 or longitude1 and latitude2 is cute, but really not that big a practical improvement.

Sure someone can pattern-match the code to the already-derived line in a reference book somewhere, but that doesn’t help at all with reasoning about the geometrical relationships or developing new algorithms, following control flow, building abstractions, improving precision, handling edge cases, ...

It’s mostly helpful when you want to treat your formulas as an externally defined black box.

* * *

Aside: The problem in this particular example is that spherical coordinates and spherical trigonometry are just not a very good formalism for calculating anything, in either theory or practice. Unfortunately cartography, geodesy, etc. are bound by tradition and there hasn’t been much effort to switch them to better tools.

I’ve been trying to read a spherical trigonometry textbook (Todhunter, 1878) the last few days and following along is a huge pain.

Much better is to switch to cartesian coordinates or stereographically projected coordinates, and then use vector methods (and skip writing explicit coordinates in your code to the extent possible). All of the proofs and derivations get nicer, with geometrically meaningful steps and conclusions. Now your points can just be called p and q or a and b (or if you have a lot of them and aren’t pressed for space in each expression, points[i] and points[i+1]), and the coordinates stay internal.

In addition to using clearer code, the calculations will also be faster, more precise, with better numerical stability, take up less memory/bandwidth, ...

Here’s some general math for an arbitrary-dimensional sphere; for just the 2-sphere everything is simpler. http://geocalc.clas.asu.edu/pdf/CompGeom-ch3.pdf


Good grief man. I literally just grepped through my installed packages for '₁', looked for a relatively interesting line with lots of unicode, saw "haversine" and realized it was a simple formula others would understand and that I would be able to find an image of online, and then hit google images looking for it.

The point isn't the algorithm itself. The point is just how using unicode allows you to match the style of an arbitrary algorithm out of a textbook.


Copying random formulas out of a textbook is only a tiny part of writing programs, at least for me. Making this have slightly less friction just isn’t that big a practical improvement for me. YMMV.

In general I see code or explanations relying on Greek letters as no better than ones with English words for names.


From hours and hours of studying math, those symbols are imbued with implicit meaning - in a way, they are simultaneously a data type, a name, and often a purpose or context (e.g. delta-x).

That's a lot of information for such a compact representation. And that's mostly unconscious, which is great. I really found the Julia example above to be far, far easier to understand than the linked JS (though I think that JS was not particularly great).

To me, it seems like it'd be nice-to-have, but alone, I don't think it would be enough to make me switch languages.


Isn't the conversation about using Unicode variables as opposed to ASCII variables and how they are useful when implementing equations from papers and textbooks? How is spherical coordinates not being as useful as cartesian coordinates relevant at all (not that I agree with this opinion)? There are tons and tons of other equations where Unicode letters are useful and the specifics of spherical vs cartesian is not at all relevant to the usefulness of Unicode letters.


> So now anyone who might want to work with your program has to be trained in LaTeX

I mean everyone who uses my python code is going to have to be trained in pip, virtualenv, mypy, sphinx, and git. Anyone editing code written by and for mathematicians or scientists is going to know LaTeX, it's like the html of academia.


How does "write \alpha to get the α symbol" remotely compares to being trained in Latex?

And as pointed out, you don't need any IDEs, this is even supported in the REPL. And using a specialized editor for a language/environment is hardly unusual (not that it's needed).


In that case the "general audience" is mathematically literate people who all know the greek alphabet already. They already use greek leters elsewhere, and they will be happy to find they can still do that from within julia.


Mathematica allows non-ASCII variable names. Odd the author didn't mention that, since Mathematica is pretty common in academia.

I learned Mathematica and MATLAB in my math and science courses (Physics) and was going to learn Java before I dropped Computer Science. Interesting I could probably replace all those with Julia now.


Indeed, and for a little longer than 12 years!

Flipping through a few of his notebooks, I guess he's from the part of academia that was core Matlab-land, engineering-style number-crunching. Mathematica is mostly people doing symbolic things, and people doing lots of stats are yet another story.


also wondered about that (still remember a friend of mine, after he started using python3 "I can use greek letters for variables now"). seems like python needs more experience than porting a numerics textbook...

Besides that: how do you actually input a significant number of greek letter w/o consulting a layout-picture or wikipedia?


If you're on Linux, you can set up your compose key map to do the job. here's an example: https://gist.github.com/carlobaldassi/8951743


Matlab sells its onerously expensive licenses by marketing itself as having unbeatable numerics performance. This is mostly a farce.

The vast majority of Matlab's vaunted numerics performance comes from using MKL instead of OpenBLAS. However Intel has made MKL free software. Meaning that you can easily build NumPY on top of it. Numpy+MKL will compile down to virtually identical assembly as Matlab.

There's very little reason in this day and age to pay Mathworks such an insane licensing fee.


> Matlab sells its onerously expensive licenses by marketing itself as having unbeatable numerics performance.

Not at all. At Lockheed (probably one of Mathwork's biggest customers) the big use case is the toolboxes. Scientific/engineering packages like signal processing, radar, phased array, embedded/VHDL are 2nd to none and are used daily. Some sites also used a lot more of the Simulink side for modeling & simulation.

[0] https://www.mathworks.com/help/signal/ref/signalanalyzer-app...

[1] https://www.mathworks.com/help/comm/examples/rf-satellite-li...

[2] https://www.mathworks.com/help/fusion/examples/multi-patform...

[3] https://www.mathworks.com/products/hdl-coder.html


Only some of the packages are 2nd to none, but as you note some of them like Simulink have large and dedicated userbases. I've heard very mixed results from the VHDL stuff, but some people love it.


Matlab's hook is the domain specific toolkits which don't have polished open source equivalents.


Which ones? They seem pretty overpriced for what you get. IMO Mathematica is a better bet in most cases if you don't want Python or Julia.


A lot of them are very domain specific, like controls engineering and stuff like that.

If I recall, a big chunk of the code that controls Toyota engines is generated from a massive Simulink model.


Simulink is probably the biggest.


Ah. That I would agree with. The rest of the packages don't impress me as much.


For what it's worth, OpenBLAS is not significantly slower than MKL, at least for level 3 linear algebra on x86 on AVX2 downwards. I don't know whether it supports SKX well now, but otherwise BLIS does. (Also OpenBLAS and BLIS are infinitely faster than on POWER and ARM.) I don't understand "compiling down" -- the linear algebra performance is determined by the library, which is, or could be, the same in each case.

[I agree about the marketing aspects and the huge waste of university resources buying into that rather than funding development of free software.]


The choice of language usually comes down to the packages. In any of the three aforementioned languages one can easily and quickly manipulate matrices short of an unwillingness to learn. Julia is nice because it's fast with native code. Python is nice because of Scipy. Matlab is nice because it decides how to spend your money without cause.

I'm an AI researcher / practitioner. For me code accompanying papers is very useful and usually this code is in Python. Occasionally it's Matlab but let's be honest, who cares about those papers :). I'd love to use Julia but the package support just isn't there. Ironically people like me are supposed to be writing this code but with a demanding job and a family it's not likely I will be improving their DataFrame effort anytime soon.

Anyway the MAIN reason I use open source software is because if it isn't working correctly I simply fix the code myself. This isn't possible in the proprietary world. Why would you trust your research or production work with code you can't see and edit?

There's been a lot of talk about documentation. Docs are secondary sources, like WIRED, read the code if you're serious about being correct. Even (especially) hired hands make mistakes and fail to write good tests.

This article reminded me of the fictional Simpson's news article "Old Man Yells at Cloud". It's funny, and he may have a point, but it has no relevance.


Matlab is a calculator. It is a really nice calculator for some things, but its definitely a calculator with programming language features bolted on. I strongly believe that Matlab is unsuitable for writing most software. It is nonetheless extremely popular in some engineering fields for write-only scripts.

I'll never forget when I took a controls class and we were given an option to use Python on our own or matlab with guidance and support from the professor and TAs. I chose Python since my background was slightly different that most of the students, but most everyone chose matlab. It was highly amusing to watch the whole lab suffer for days because no one understood matlab's semantics. (The base framework had been written by some expert who retired and made extensive use of both handle and value classes. Problem was, no one still involved with the clas knew what the difference was.

Meanwhile, I spent about 6 seconds longer writing out np.dot a few times.

Matlab is good for math. Most software (even math heavy stuff, EM simulations, etc) has little math (in terms of source, no execution time).


"MATLAB is the BMW sedan of the scientific computing world. It’s expensive, and that’s before you start talking about accessories (toolboxes). You’re paying for a rock-solid, smooth performance and service..."

mmmmhmmmm...


I do scientific programming in both Python and Matlab. The two things that to me are major benefits of Matlab are the ease of setting up your installation, and the documentation. The Matlab documentation is amazing when compared to the numpy/scipy documentation, and is almost reason alone for a beginner to use Matlab. FWIW, the Mathematica documentation is also fantastic.


Where is the discussion of R? You talk of scientific computation and don't speak of R? That's an oversight, given the majority of the scientists I know have used R. There's also STATA, which economists love, which can do somethings in my workflow much quicker than R. There is also a huge contingent of analysts that uses SAS, especially in healthcare and finance.

The car analogies in the blogpost are not particularly useful... Why do people feel the need to dumb down a topic with off-the-wall analogies? Talking about Julia like it's Tesla is laughable. Tesla is a huge innovator, Julia is another tool that does similar things to the other tools. The apt analogy for Julia would be a new ICE company, not a new EV company.


"Julia is a huge innovation; a Tesla is another car that does similar things to the other cars."


Yes, but the majority of computational scientists do not use statistics - something that may be hard to believe.

As an example, most physics programs don't even have an introductory statistics class in their curriculum.

Some engineering disciplines use it more, and it is often relied a lot in industry. But most engineering research makes little to no use of it.


"If your experiment needs statistics, you ought to have done a better experiment." Lord Rutherford (maybe)


Except there's a whole field of statistics called Design of Experiments, which describes the mathematical basis for how to create a good experiment :-)


In physics you often have the luxury, not always available in other disciplines, of having a good theory :-)


Oh, I know (I have a PhD in physics)--but I'd say the main reason statistics isn't used in most fields of physics isn't because it wouldn't help, but rather because most physicists don't know much about statistics.

And publication-wise, it's a bit of a chicken and the egg problem: since most reviewers are also not well-versed in statistics, they are less likely to accept a paper that uses statistical methods to confirm or reject a hypothesis. Thus, there's no institutional pressure for a physicist to learn statistics.

(And a graduate course in statistical mechanics does not count, saying that as someone who used to use that excuse)


I agree with you but IMO, this could be a good thing sometimes. Avoiding statistics to some extent requires more deterministic experiments, where statistics are limited to quantum uncertainty or gaussian distribution. Otherwise there is probably some unknown mechanism to be revealed, and that’s exactly what the community expects physicists to do.


There's no such thing as a deterministic experiment. There are so many sources of uncertainty in all fields of experimental physics, and statistics can help quantify and even design better experiments. It's just a shame statistics has never become a part of the standard physicist's education.


I agree that statistics can help a lot in experiments and should be a part of standard education. My point here is, when physicists encounter uncertainty, they are encouraged more by the current culture to design a better experiment with deterministic theory to reduce the error, compared to analysing the results statistically. It would be better to have both, but it's not as bad as you might think for focusing more on the former one, at least for some fields I'm familiar with.


>And publication-wise, it's a bit of a chicken and the egg problem: since most reviewers are also not well-versed in statistics, they are less likely to accept a paper that uses statistical methods to confirm or reject a hypothesis. Thus, there's no institutional pressure for a physicist to learn statistics.

Not to mention that if experimentalists actually used statistics, they would have to describe their experiment in more detail when publishing for their statistical analysis to make sense. And in my experience, they really prefer to list as little about their experimental setup as they can get away with.

>And a graduate course in statistical mechanics does not count, saying that as someone who used to use that excuse

Heh heh. Most physicists I know would invoke "But we use probability in quantum mechanics all the time!"


> Most physicists I know would invoke "But we use probability in quantum mechanics all the time!"

I have absolutely heard that one before :-)


I think a variant from A Random Walk in Science was "The experiment failed, so we had to use statistics", which I appreciate up to a point. However, I'd hope that physicists calculate error estimates in some sense, and goodness of fit. You hear the statistical significance of results from CERN, for instance.


There is a lot more to scientific computation than statistics, and the only thing R is good at is statistics.


It's quite popular in bioinformatics as well: https://www.bioconductor.org/


Unfortunately the quality of the Bioconductor packages varies wildly, from pretty good to downright abysmal.

Some can't even work: take a look at this one[1], which is part of the current release of BioC: it will never work because the hardcoded host there is no longer functioning and the author wrote that for their master's thesis and have since moved on.

In addition, there's no centralized bug reporting, and some packages use GH issues while for others you may need to email the author and hope for the best (sometimes works, sometimes doesn't).

I use BioC regularly every day because I have no other alternatives, but its main advantage is the sheer numbers of packages available for almost every bioinformatic niche, rather than the quality of the packages themselves.

[1] https://github.com/AllenTiTaiWang/anamiR/blob/7a7a133c553f8c...


That's not a true statement. R has packages for a wide variety of fields--statistics is just the most well developed.

And an important aspect of scientific computation is data visualization, an area in which R is decidedly more advanced than other languages.


One thing I can think of is that R is not well suited for computationally expensive work --- by "computationally expensive" I mean something that takes months to run on a cluster. Not only is R 100--1000x slower than Fortran, but it also does not have strong support for parallelism or GPU computing. I have yet to see a computational fluid dynamicist using R. Though Python is also not much better in that aspect, it can be used as a glue language to interact with native C/C++/Fortran code under the hood.


R is actually the basis of one of the CORAL2 benchmark sets, and there are published scaling results to 5000 cores as far as I remember. I've not done it, but presumably you can substitute its BLAS with a GPU version, the same as with an OpenMP-parallelized one to get performance for anything linear algebra-ish.

[HPC clusters on which you'd run CFD typically don't let you run for more than days, let alone months. There's probably no reason you couldn't use R the same way as Python for such things.]


R has interfaces for Fortran, C, and C++ as well. Quite good ones, in fact.


You are soo wrong. R for earth scientists is a god send thing. Not even Python has as many packages/libraries for earth science as R


I've a lot of experience with both MATLAB and Python. I find Python to be the better designed language, with fewer rough edges. But MATLAB still promotes and encourages the most productive approach to programming, one that NumPy just falls short of.


For exploratory work MATLAB and the MATLAB IDE are pretty good and shorter scripts are fine. However once you start trying to write actual programs and pass 1-2 kLOC or so, MATLAB just gets more and more painful.


And "MATLAB packages" are especially painful. The fact that you "install packages" by adding folders to the path is somewhat insane, especially since that is somewhat of a silent global change. I think the hardest thing to do with MATLAB is get someone else's scripts running, because it's always like "oh I forgot to give you my special plot function which I keep in this folder, but requires other things, so just add this whole thing to your global state, but it's incompatible with XXX since I use a function named A". Remembering those incidents makes me shed a tear... while Python and Julia allow you to import without sending every internal name and you're done. And in Python and Julia, people add continuous integration tests to packages.


I like to say that Matlab was designed by applied mathematicians who don't want to care about software engineering, and python+numpy+scipy was designed by software engineers who don't want to care about applied mathematics :P

I don't know that that's strictly true but it gives you a general idea for why things are the way they are.


Somehow it's nice to have a global namespace so I dont have to remember which libs have "plot()" or "sind()".

Nothing beats Matlab/Gnu Octave for me when I just want to make some quick calculations.


Julia "default" behavior (the 'using' keyword) is also importing everything (that is exported on the selected module) to the global namespace, using multiple dispatch to safely assign the optimal implementation of each function for each argument set without clashing. It's not unusual in Julia to have one function with more than a hundred implementations, and the user doesn't need to care which one will be chosen.

Though you can also use 'import' to keep the namespaces separate.


It's not really a default though. You have to always choose to do `using` instead of `import`. A lot of people choose `using`, because it's generally more safe than something like MATLAB or Python because of multiple dispatch, but there's nothing in the language that actually defaults to `using` other than the convention of some people.


When you say numpy falls short, do you also include pandas?


Try using R's data.table and then use pandas. Pandas is just awful to use in comparison.


I'm of the same mind, and have been surprised for a long time that this isn't commonly said. dplyr/tidyverse also suits my thinking process around working with data much better than pandas, which is by far the worst of the lot.

h2o.ai has been working on data.table for python (https://github.com/h2oai/datatable). Hope it matures quickly


The author does not mention modern Fortran, which does have array operations, like Matlab, Python/Numpy, and Julia. I wonder if its lacking a REPL is the main reason why.


I suspect doing all this in the context of a numerical analysis textbook has contributed to the authors prespective. One nice thing about Python is that you can build and prototype applications and services around it. This is important as many applications need numerical analysis. Could you imagine writing your entire application in matlab?


I find it quite ridiculous that a discussion about the matlab language in the context of free software projects does not even mention the excellent interpreter Octave. I say "ridiculous" to be generous; in reality it seems mostly bad faith.


I noticed that, too. I don't understand why Octave is this "dark horse" when it comes to these kinds of things...


Toby Driscoll is the author of a great Matlab library for computing conformal maps, the Schwarz-Christoffel Toolbox

http://www.math.udel.edu/~driscoll/SC/ or on github https://github.com/tobydriscoll/sc-toolbox

But I can’t say I agree with much of this.

> One function per disk file in a flat namespace was refreshingly simple for a small project, but a headache for a large one.

This is an absolutely horrible bonkers limitation for a “small project”. It’s “refreshing” like someone constantly dumping buckets of ice water over your head. Being able to define functions in the repl, export multiple functions from a single file, etc. are things I pretty much can’t live without in an interactive programming language. Needing to make every tiny utility function into its own file adds SO MUCH FRICTION to basic prototyping workflows.

The result is that when working with Matlab I try to make as many functions as possible into lambdas, e.g. square = @(x) x∗x. But these are very limited in practice, and the workarounds to make a function work as a lambda often compromise readability, performance, correctness, and functionality.

+ + +

Is an occasional V.conj().T @ D∗∗3 @ V really that much worse than V' ∗ D^3 ∗ V?

I find that in even small programs, syntax complexity and clarity is dominated by logic flow rather than matrix multiplication. My Python programs end up dramatically easier to read than my Matlab programs, including the almost-pure-numerics parts of them.

If concise built-in syntax for numerical operations were our highest ideal, we’d all be using APL.

> exists a matrix class, and yet its use is discouraged and will be deprecated

I think Driscoll doesn’t understand what’s going on here. There is plenty of discussion for someone who searches about the problems with the matrix class. The matrix class dated from a time when there was no @ operator, so ∗ was used for matrix multiplication instead of elementwise multiplication. This made it inconsistent with everything else and artificially limiting in obnoxious ways. Now that Python has an @ operator it is no longer useful or necessary. This has nothing to do with matrices being important or not.

> Matplotlib package is an amazing piece of work, and for a while it looked better than MATLAB, but I find it quite lacking in 3D still

YMMV, and maybe I’m spoiled by D3, Vega, Altair, ggplot2, etc., but I really don’t like Matlab’s plotting tools or Matplotlib. They are inflexible and full of arcane details, and produce mediocre output. We should aspire to better plotting than those in all of our environments.

+ + +

> The big feature of multiple dispatch makes some things a lot easier and clearer than object orientation does.

This is partly because Matlab’s version of “object orientation” is a horrendously broken pile of trash.

> Partly that’s my relative inexperience and the kinds of tasks I do, but it’s also partly because MathWorks has done an incredible job automatically optimizing code.

I’ve poked around in several large Matlab projects, and there are huge performance problems everywhere. In particular any project using Matlab’s version of object orientation ends up incurring huge amounts of overhead.

+ + +

Overall my impression is that Julia (and Matlab’s) language choices are driven by people who want to directly type their math paper into a program with as little thought and as few changes as possible.

For folks with decades of experience reading and writing math papers, this is fair enough I guess.

For many people from a software background it seems like a poor choice of priorities.

Programs written by researchers are often unintelligible without the accompanying paper (and sometimes with, depending on the paper). Full of 1-letter variable names defined off-screen somewhere with no comment explaining what it stands for, weird API inconsistencies, lack of structure, hacks that worked on one set of inputs for a demo but don’t handle edge cases, ....


My position is that despite all of Julia's problems (many of the worst being internal) I think it's the best choice of the three. And at least as far as the move from MatLab to Julia that's just a win for all of science.

> This is partly because Matlab’s version of “object orientation” is a horrendously broken pile of trash.

Sure, but also multi-methods are objectively better than single dispatch object systems. And per Graham's web of power this is easily demonstrated by the fact that what a multimethod can do in one line, a single dispatch system requires n^m (n being objects in the hierarchy, m being the number of objects in the call) lines of pattern code (sans any meta programming features of course) to match in expressiveness.

The fact that Julia can use multi-methods with type inference to improve the compilation of typeless methods to the level of native compiled program speeds is just gravy.

> I’ve poked around in several large Matlab projects, and there are huge performance problems everywhere.

A problem python can share, with the worst offenses requiring rewriting the given chunk of code in a different language and then writing bindings for it, and managing the compilation of it.

And that Julia works very hard to avoid, through it's gradual type system (often allowing a programmer to add a type to a single line of code and have the entire project get 10x performance gains) and multimethods (allowing programmers to optimize specific edge cases without intruding on the formula code).

> Overall my impression is that Julia (and Matlab’s) language choices are driven by people who want to directly type their math paper into a program with as little thought and as few changes as possible.

Yes, but at least Julia does so in a way that allows for professional programmers to easily work with and maintain it. Between multi-methods, optional typing, choose-your-own-starting-array-index, a programmer can modify an existing Julia code base without the language itself being the problem.

Julia could certainly do better here - they are often hostile to attempts to improve the software engineering story around Julia - they at least have a path forward for a programmer attempting to maintain or optimize their scientific computing code in a sane way. Something that neither python nor Matlab can really claim.


What do you see as Julia's most egregious problems, internal or otherwise? and internal in particular?


I think the most egregious problem from a holistic perspective is their lack of prioritization for software engineering. Things like packing large amounts of specialized mathematics constructs (and whole libraries like git and markdown) into the main distribution (causing memory size and startup time problems), lack of interfaces or similar abstraction tools (which are strictly speaking not necessary from a functionality perspective, but are useful as organizational and code safety tools), and an inability to tune their garbage collector. Their debugger was also pretty shoddy when I last used it (e.g. it wasn't a priority for many years), but I'm pretty sure they have fixed it significantly by now. And hey they have macros, which just invokes the problems with lisps, but at least you can do whatever you want programming construct wise if you really want to.

"Internally" they have some poor interactions with external developers (meaning people not from MIT). [deleted a rant] In the end I suspect Linus is probably worse to work with, it probably comes with the territory, so it's not like it's a deal breaker.

All of this is to say I agree with this article: http://worrydream.com/ClimateChange/ one of the best things a programmer can do is contribute to tools, specifically including Julia, which advance our ability to understand climate change. It really is in my opinion the best programming language to further scientific advancement. I just wish I had the capability to work with them.


> And there’s zero-indexing (as opposed to indexes that start at 1)

What, and that's exactly the part I dislike about both Matlab and Julia! The amount of "+ 1" and "- 1"s in matlab indices you need when subdividing matrices into multiple equal sized parts is horrible.

Weird, I thought mathematicians would know better what makes sense and what not. Starting with an offset of 1 is clearly what does not make sense, you start at distance 0 from your starting point.


In mathematics, a matrix doesn't have an 'offset' or a 'starting point.' I think that perceiving matrices in those terms is an artifact of thinking of matrices as sitting in an address space, where the elements of your matrix are part of a larger span that can contain other data.

A matrix is a collection of elements and nothing more. Indexing starts at 1 because that's the first element in your matrix, and numbering the first element 1 makes sense. Numbering from 0 makes sense when it represents an offset into something, but a matrix isn't that. 0-based indexing just always feels to me like letting the implementation details leak out. (I don't feel this way when an array actually represents a chunk of memory, rather than a math object.)

The proliferation of +1 and -1 depends on the application. Some work better with 1-indexed, some work better with 0-indexed. Personally I get annoyed at having to use `len-1` too often when working with 0-indexed arrays. This is why some languages like Julia (and FORTRAN apparently?) let you choose your index-base, which... has trade-offs.


> and numbering the first element 1 makes sense

Why?


Ever heard someone talk about their zeroth kid, without making a programming joke? English counts objects one two three. I've never heard of a natural language doing otherwise, we would surely translate what they say when pointing at two apples as "two".

Mathematicians of course label objects however suits the problem at hand. It's extremely common to write say c_0 + c_1 x + c_2 x^2 + ... for a taylor series starting with a constant, or start at -1 or -n if that's tidier.


No, but mathematics is not spoken language... the origin of the number line happens to be at 0, not at 1. If you start indexing at 1, you're missing a piece. That may not be noticeable if all you do is index a fixed size thing with a manually chosen index, but as soon as you need to compute with the indices themselves, you notice it, and it's not pretty.

I'm not going to say "zeroeth" in spoken language, but for any serious computation, you're going to be indexing variable sized things and need to compute the indices themselves. And I don't want to make code or formulas less readable due to shifts by 1.


Like I said, what's most readable is problem-dependent. Sometimes it's about modular arithmetic, but sometimes it's about not having to say Γ(n+1) too often.

The number line concerns real numbers, not elements.


This discussion provided me some food for thought. You (and the Matlab, and many other like-minded individuals) have chosen is to work with natural numbers¹ as indexes as these provide the most value for the amount of hassle. Having a 0 based index model is not really a choice the moment where one gets forced to work with the indexes themselves. That is especially when there are negative indexes to be considered, like we happen to encounter in the real world. What is the chance to go on past the integers, having a proportion or some irrational number² like π as index?

¹ https://en.m.wikipedia.org/wiki/Natural_number

² https://en.m.wikipedia.org/wiki/Irrational_number


It's how people count things. If you want to know how many sheep or apples are in front of you, you start counting at one, and the number you get up to is the number that there are. If there are i elements of a set, the first element is the 1st element and the last element is the ith element. Doing it otherwise risks fencepost errors.

(There are perfectly reasonable arguments why computer languages might be zero-indexed, whereby objects start at the zeroth offset. But counting from 1 upwards is definitely more natural for counting, it's disingenuous to pretend otherwise).



> thought mathematicians would know better what makes sense and what not

I'm no mathematician, but when I studied linear algebra, I remember index starts from 1 in matrix. I think MATLAB just follows that convention, it's "Matrix lab" after all.


Basically if you do any kind of modulo arithmetic on an index you need to shift by 1 in a 1-based indexing system, while 0-based just works.


Which is why Julia offers the `fld1`, `mod1` and `fldmod1` functions:

https://docs.julialang.org/en/v1/base/math/#Base.fld1


What kills me about MATLAB is that it's CLI still doesn't have any terminal shortcuts beyond Ctrl+C. No Ctrl+D, no Ctrl+W, no Ctrl+left arrow or Ctrl+right arrow.

It's also a gigantic memory hog and all the plots are copy-on-write, and the memory usually isn't deallocated afterwards.


Specifically the author's mention of extended character support in Julia for math symbols, as well as the emphasis on matrix support, makes me wonder why APL didn't maintain popularity among the academic crowd.


Well...the reason is APL can't do scientific computing well.

It doesn't by default have scientific libraries and builtins for solving equations. It has a fast interpreter, but scientific computing often needs much faster.

It was also late getting off the mainframe.

I really like APL, but it doesn't have the horsepower for my needs.


There is also a completely different but complementary set of tools that handle formal calculous such as Maple and its libre alternative Maxima that are worth consideration.


Mandatory matlab in coursework is usually harmful and should be avoided if possible.

Julia and python are full-featured languages that aren't marketing for mathworks.


I just don't think this guy tried at all to account for his biases, and that sort of makes the entire article hard to believe beyond his MATLAB experiences.


I would dismiss a lot of his comments on Python because they were so biased by his preference of a MATLAB-like syntax.


I think he tried, but didn't succeed. Still, it's interesting to see how things look to a very experienced Matlab user.


Run this python jewel:

exec(''.join(chr(int(''.join(str(ord(i)-8203)for i in c),2))for c in ' '.split(' ')))




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: