> Cooperative multitasking came out slower than preemptive in the nineties
This wasn't really the reason for the shift away from cooperative multitasking, it was really because cooperative multitasking isn't as robust or well behaved unless you have a lot of control over what tasks you have trying to run together.
In theory cooperative multitasking should have better throughput (latency is another story) because each task can yield at a point where its state is much simpler to snapshot rather than having to do things like record exact register values and handle various situations.
... I never meant to imply that performance was the reason for the switch.
We've had a track record of technologies which:
1) Automated things (reliving programmers from thinking about stuff)
2) Were expected to make stuff slower
3) In reality, sped stuff up, at least in the typical case, once algorithms got smart
That's true for interpreted/dynamic languages, automated memory management/garbage collection, managed runtimes of different sorts, high-level descriptive languages like SQL, etc.
Sometimes, it took a lot of time to figure out how to do this. Interpreters started out an order-of-magnitude or more slower than compilers. It took until we had bytecode+JIT that performance roughly lined up. Then, we started doing profiling / optimization based on data about what the program was actually doing, and potentially aligning compilation to the individual users' hardware, things suddenly got a smidgeon faster than static compilers.
There is something really odd to me about the whole async thing with Python. Writing async code in Python is super-manual, and I'm constantly making decisions which ought to be abstracted away for me, and where changing the decisions later is super-expensive. I'd like to write.
> In reality, sped stuff up ... That's true for interpreted/dynamic languages, automated memory management/garbage collection, managed runtimes of different sorts, high-level descriptive languages like SQL, etc.
None of that is true.
Even SQL modeling declarative work in the form of queries requires significant tuning all the time.
The rest of the list is egregious.
> things suddenly got a smidgeon faster than static compilers.
> It took until we had bytecode+JIT that performance roughly lined up.
It really didn't. Yes, in highly specialized benchmark situations, JITs sometimes manage to outperform AOT compilers, but not in the general case, where they usually lag significantly. I wrote a somewhat lengthy piece about this, Jitterdämmerung:
Well, if you wanna go that route, in the general case, code will be structured differently. On one side, you have duck typing, closures, automated memory management, and the ability to dynamically modify code.
On the other side, you don't.
That linguistic flexibility often leads to big-O level improvements in performance which aren't well-captured in microscopic benchmarks.
If the question is whether GC will beat malloc/free when translating C code into a JIT language, then yes, it will. If the question is whether malloc/free will beat code written assuming memory will get garbage collect, it becomes more complex.
Objective-C has duck typing (if you want), closures, automated memory management and the ability to dynamically modify code.
And is AOT compiled.
GC can only "beat" malloc/free if it has several times the memory available, and usually also only if the malloc/free code is hopelessly naive.
And you've got the micro-benchmark / real-world thing backward: it is JITs that sometimes do really well on microbenchmarks but invariably perform markedly worse in the real world. I talk about this at length in my article (see above).
> That's true for interpreted/dynamic languages, automated memory management/garbage collection, managed runtimes of different sorts, high-level descriptive languages like SQL, etc.
Of the things you mention, I agree on SQL, and "managed runtimes" is generic enough that I cannot really judge.
I'm thoughroghly unconvinced about the rest being faster than the alternatives (and that's why you don't see many SQL servers written in interpreted languages with garbage collection).
Well, I think you missed part of what I said: "at least in the typical case" (which is fair -- it was a bit hidden in there)
There's a big difference between normal code and hand-tweaked optimized code. SQL servers are extremely tuned, performant code. Short of hand-written assembly tuned to the metal, little beats hand-optimized C.
I was talking about normal apps. If I'm writing a generic database-backed web app, a machine learning system, or a video game. Most of those, when written in C, are finished once they work, or at the very most have some very basic, minimal profiling / optimization.
For most code:
1) Expressing that in a high-level system will typically give better performance than if I write it in a low-level system for V0, the stage I first get to working code (before I've profiled or optimized much). At this stage, the automated systems do better than most programmers do, at least without incredible time investments.
2) I'll be able to do algorithmic optimizations much more quickly in a high-level programming language than in C. With a reasonably time-bounded investment in time, my high-level code tends to be faster than my low-level code -- I'll have the big-O level optimizations finished in a fraction of the time, so I can do more of them.
3) My low-level code gets to be faster once I get into a very high level of hand-optimization and analysis.
Or in other words, I can design memory management better than the automated stuff, but my get-the-stuff-working level of memory management is no longer better than the automated stuff. I can design data structures and algorithms better than PostgreSQL specific to my use case, but those won't be the first ones I write (and in most cases, they'll be good enough, so I won't bother improving them). Etc.
I am sorry to be blunt, but that sounds like a PR statement filled with nonsense.
> If I'm writing a generic database-backed web app
If you are writing a system where performance does not matter, then performance does not matter.
> a machine learning system or a video game. Most of those, when written in C, are finished once they work, or at the very most have some very basic, minimal profiling / optimization.
Wait, what? ML engine backends and high-level descriptions, and video games are some of the most heavily tuned and optimized systems in existence.
> At this stage, the automated systems do better than most programmers do, at least without incredible time investments.
General-purpose JIT languages are so far from being an actual high-level declarative model of computation that the JIT compiler cannot perform any kind of magic of the kind you are describing.
Even actual declarative, optimizable models such as SQL or Prolog require careful thinking and tuning all the time to make the optimizer do what you want.
> 2) I'll be able to do algorithmic optimizations much more quickly in a high-level programming language than in C.
C is not the only low-level AOT language. C is intentionally a tiny language with a tiny standard library.
Take a look at C++, D, Rust, Zig and others. In those, changing a data structure or algorithm is as easy as in your usual JIT one like C#, Java, Python, etc.
> 3) My low-level code gets to be faster once I get into a very high level of hand-optimization and analysis.
You seem to be implying that a low-level language disallows you from properly designing your application. Nonsense.
> I can design memory management better than the automated stuff, but my get-the-stuff-working level of memory management is no longer better than the automated stuff
You seem to believe low-level programming looks like C kernel code of the kind of a college assignment.
> If you are writing a system where performance does not matter, then performance does not matter.
It's not binary. Performance always matters, but there are different levels of value to that performance. Writing hand-tweaked assembly code is rarely a good point on the ROI curve.
> Wait, what? ML engine backends and high-level descriptions, and video games are some of the most heavily tuned and optimized systems in existence.
Indeed they are. And the major language most machine learning researchers use is Python. There is highly-optimized vector code behind the scenes, which is then orchestrated and controlled by tool chains like PyTorch and Python.
> Take a look at C++, D, Rust, Zig and others. In those, changing a data structure or algorithm is as easy as in your usual JIT one like C#, Java, Python, etc.
I used to think that too before I spent years doing functional programming. I was an incredible C++ hacker, and really prided myself on being able to implement things like highly-optimized numerical code with templates. I understood every nook and cranny of the massive language. It actually took a few years before my code in Lisp, Scheme, JavaScript, and Python stopped being structured like C++.
You putting "Python" and "Java" in the same sentence shows this isn't a process you've gone through yet. Java has roughly the same limitations as C and C++. Python and JavaScript, in contrast, can be used as a Lisp.
I'd recommend working through SICP.
> You seem to be implying that a low-level language disallows you from properly designing your application. Nonsense.
Okay: Here's a challenge for you. In Scheme, I can write a program where I:
1) Write the Lagrangian, as a normal Scheme function. (one line of code)
2) Take a derivative of that, symbolically. (it passes in symbols like 'x and 'y for the parameters). I get back a Scheme function. If I pretty-print that function, I get an equation render in LaTeX
3) Compile the resulting function into optimized native code
4) Run it through an optimized numeric integrator.
This is all around 40 lines of code in MIT-Scheme. Oh, and on step 1, I can reuse functions you wrote in Scheme, without you being aware they would ever be symbolically manipulated or compiled.
If you'd like to see how this works in Scheme, you can look here:
That requires being able to duck type, introspect code, have closures, GC, and all sorts of other things which are simply not reasonably expressible in C++ (at least without first building a Lisp in C++, and having everything written in that DSL).
The MIT-Scheme compiler isn't as efficient as a good C++ compiler, so you lose maybe 10-30% performance there. And all you get back is a couple of orders of magnitude for (1) being able to symbolically convert a high-level expression of a dynamic system to the equations of motion suitable for numerical integration (2) compile that into native code.
(and yes, I understand C++11 kinda-added closures)
> And the major language most machine learning researchers use is Python.
Read again what I wrote. Even the model itself is optimized. The fact that it is written in Python or in any DSL is irrelevant.
> I used to think that too before I spent years doing functional programming.
I have done functional programming in many languages, ranging from lambda calculus itself to OCaml to Haskell, including inside and outside academia. It does not change anything I have said.
Perhaps you spent way too many years in high-level languages that you have started believing magical properties about their compilers.
> prided myself on being able to implement things like highly-optimized numerical code with templates.
Optimizing numerical code has little to do with code monomorphization.
It does sound like you were abusing C++ thinking you were "optimizing" code without actually having a clue.
Like in the previous point, it seemed you attributed magical properties to C++ compilers back then, and now you do the same with high-level ones.
> It actually took a few years before my code in Lisp, Scheme, JavaScript, and Python stopped being structured like C++.
How do you even manage write code in Lisp etc. "like C++"? What does that even mean?
> You putting "Python" and "Java" in the same sentence shows this isn't a process you've gone through yet. Java has roughly the same limitations as C and C++.
Pure nonsense. Java is nowhere close to C or C++.
> Here's a challenge for you.
I would use Mathematica or Julia for that. Not Scheme, not C++. Particularly since you already declared the last 30% of performance is irrelevant.
You are again mixing up domins. You are picking a high-level domain and then complaining a low-level tool does not fit nicely. That has nothing to do with the discussion and we could apply that flawed logic to back any statement we want.
> Perhaps you spent way too many years in high-level languages that you have started believing magical properties about their compilers.
> It does sound like you were abusing C++ thinking you were "optimizing" code without actually having a clue.
> Like in the previous point, it seemed you attributed magical properties to C++ compilers back then, and now you do the same with high-level ones.
I think at this point, I'm checking out. You're making a lot of statements and assumptions about who I am, what my background is, what I know, and so on. I neither have the time nor the inclination to debunk them. You don't know me.
When you make it personal and start insulting people, that's a good sign you've lost the technical argument. Technical errors in your posts highlight that too.
If you do want to have a little bit of fun, though, you should look up the template-based linear algebra libraries of the late nineties and early 00's. They were pretty clever, and for a while, were leading in the benchmarks. They would generate code, at compile time, optimized to the size of your vectors and matrixes, unroll loops, and similar. They seem pretty well-aligned to your background. I think you'll appreciate them.
Yes, the whole hoopla about async and particularly async/await has been a bit puzzling, to say the least.
Except for a few very special cases, it is perfectly fine to block on I/O. Operating systems have been heavily optimized to make synchronous I/O fast, and can also spare the threads to do this.
Certainly in client applications, where the amount of separate I/O that can be usefully accomplished is limited, far below any limits imposed by kernel threads.
Where it might make sense is servers with an insane number of connections, each with fairly low load, i.e. mostly idle, and even in server tasks quality of implementation appears to far outweigh whether the server is synchronous or asynchronous (see attempts to build web servers with Apple's GCD).
For lots of connections actually under load, you are going to run out of actual CPU and I/O capacity to serve those threads long before you run out of threads.
Which leaves the case of JavaScript being single threaded, which admittedly is a large special case, but no reason for other systems that are not so constrained to follow suit.
I think my question is whether async Python is slower in the case it was designed for -- many, long-running open sockets.
Async was traditionally used server-side for things like chat servers, where I might have millions of sockets simultaneously open.