Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Guide to Java Virtual Threads (rockthejvm.com)
135 points by saikatsg on March 15, 2023 | hide | past | favorite | 110 comments


> The problem with platform threads is that they are expensive from a lot of points of view. First, they are costly to create. Whenever a platform thread is made, the OS must allocate a large amount of memory (megabytes) in the stack to store the thread context, native, and Java call stacks. This is due to the not resizable nature of the stack. Moreover, whenever the scheduler preempts a thread from execution, this enormous amount of memory must be moved around.

Scheduler pre-emption does not cause stack memory to be copied. Perhaps they're thinking of registers.

(As an side, suspension and resumption of Java virtual threads does result in its stack being copied--saved and restored--as this was deemed less costly[1] than growable stacks, which is how Go works.)

> As we can imagine, this is a costly operation, in space and time. In fact, the massive size of the stack frame limits the number of threads that can be created. We can reach an OutOfMemoryError quite easily in Java, continually instantiating new platform threads till the OS runs out of memory:

Stack frame != stack.

The author seems confused about some concepts. I didn't read beyond this so don't know whether that confusion effected any of their conclusions or advice.

[1] EDIT: Whether less costly in terms of performance or development effort I'm not sure. A major reason JavaScript and many other languages don't implement stackfull coroutines is that the virtual machines--interpreters, JITs, etc--are written in a way that in-language function calls and recursion directly or indirectly rely on the underlying native "C" stack. This correspondence is not something you can typically remedy without completely rewriting the implementation from scratch. Language implementations like Go and Lua and were written from the beginning to avoid this correspondence. To accomplish stackfull coroutines languages like Java and, IIUC, OCaml really had no choice but to rely on some other tricks, though I think OCaml permitted some tricks not available to Java, because OCaml could do some transforms which Java couldn't given the nature of the JVM.


I think JavaScript doesn't want this because, semantically, the whole language is designed around the idea of a single thread of execution, that can't be suspended except explicitly with an await statement. So if you call a function, you know that it can't suspend and let some other thread take control and make arbitrary changes out from under you before control returns to you. Breaking this assumption would probably break too much existing code.


What do stackfull coroutines have to do with cooperative vs non-cooperative concurrency? They’re entirely orthogonal problems.

JavaScript absolutely does want stackfull coroutines (even if they’re not called coroutines, but just async stacks). That why chrome has so much magic inside it to reassemble async stack traces for exceptions. But it has to do that via all manner of complex bookkeeping and jiggery-pokery, which is often broken by libraries doing clever things. Having async functionality built on top of cooperative coroutines, all sharing a single system thread, would make async stacktraces trivial to produce accurately, and make it substantially easier to debug highly interwoven async code.


The connection is that you can inefficiently simulate stackful coroutines with non-cooperative concurrency. Just give each coroutine its own OS thread, and block whenever the coroutine yields. It's slow and not a good way to build real systems, but the fact that it's possible at all means that libraries have to be designed to be able to handle it (e.g., that if they call into user code, it might yield the thread) and so not assume for correctness that their code will keep running in the order they control without interruption.

JavaScript wasn't designed to support non-cooperative concurrency, and so a lot of existing code does assume that it's safe to, e.g., mutate global state in a way that other code doesn't expect or can't handle, run some operation, and then put the global state back how it was. Chrome's async stack traces don't undermine this because they are only for debugging; they can't in any way affect control flow. Stackful coroutines would break this assumption, the same way preemptive multithreading would. You can argue that the language should have included these features from the beginning, but it didn't, and changing it now would mess up the ecosystem.


What on earth are you talking about. Why would anyone want to inefficiency simulate stackfull co-routines using system threads? And what does underlying engine design have to do with the JavaScript spec? Additionally why do you think anyone is advocating for the use of a single system thread per co-routine in a discussion about Java virtual threads, a technology does the literal opposite?

There’s nothing preventing anyone from creating a JavaScript engine that simulates async JavaScript using stackful cooperative co-routines, all operating in a single shared system thread. Perfectly upholding the existing JS thread model, but substantially reducing the bookkeeping needed to produce useful stack traces.


Also the space for the native thread stack ("C stack") is just allocated virtual memory (at least on *ix), not physical memory. When the program starts to touch stack pages, on first touch the user code will trap to the OS where the vm system will transparently fill in the needed physical pages as the usage grows.

Virtual threads don't seem like a worthwhile complexity tradeoff unless you're trying to run lots of threads in 32-bit address space. I wonder if this got started in that era and just took time to mature.


> unless you're trying to run lots of threads

Well this is exactly the design point. The authors want to promote a coding style where spawning a new thread is ultra-cheap, possibly at a ratio of very few IO calls per virtual thread.

The ideal application would be WhatsApp's use of Erlang [1][2]: 2.8M active connections per server (in 2012! 100GB RAM servers), each of them mostly idle with 200k msgs/sec.

All of this while keeping the threading model, keeping your stacks intact for debugging, and possibly a hierarchy of threads where you can kill a whole branch and hot-reload it with new code. (which is a thing that's not easy to do with reactive programming / async await)

[1] http://highscalability.com/blog/2014/2/26/the-whatsapp-archi...

[2] https://web.archive.org/web/20221220020352/http://www.erlang...


To expand the quote:

>> Virtual threads don't seem like a worthwhile complexity tradeoff unless you're trying to run lots of threads in 32-bit address space

Bolting M:N threads onto JVM runtime in the hope it will enable people to program the JVM like Erlang (without the rest of the Erlang feature set that actually make it good) still seems to me a clear cut case of "not a worthwhile complexity tradeoff". The added JVM implementation complexity and maintainance burden would seem to be pretty heavy for something like this.


> 2.8M active connections per server

I think this is a use-case best left for nginx and other proxies written i C.

Java should be easy to use. Leave weird specialcase programming to special tools. Don't try to make java solve all problems.

Java nio became a monster of complexity which made implementing an http server much more complex that it should be.


With modern hardware 2.8M doesn't seem a particularly large number. Modern Java VMs are pretty efficient (more so after project Loom), so I don't see why Java wouldn't be appropriate for this use case. In fact efficient virtual threads would allow avoiding nio and provide a fast and sane blocking interface.

Note: I say this as a C++ programmer.


Having that many connections requires tons of buffering and effectively manual scheduling (or clients can experience latency issues). At the very least non-blocking io allows to tune the scheduling depending on the clients behavior and importance. Personal experience: tons of java nio and lock-free structures.


Rightly or wrongly, that's exactly what Java is designed for. Write once, run anywhere. It's not just for HTTP servers and web services.


Virtual threads usually don't suffer the overhead of asking the OS to spawn a thread. If your workload consists of many small concurrent tasks, spawning threads can easily become costlier than processing the workload itself.

These "let's make our own threads" implementations all seem to stem from "I want threads, but I don't want to wait for the kernel to do its thing". This approach has some downsides for implementations (there's a reason the kernel takes a moment to spawn a thread and now you have to deal with the implications) but staying in userland also has some advantages in terms of pure performance.

Such tasks could of course be done faster using thread pools and a manual division of the workload (or adding locking to a dynamic work queue, etc.) but the threading model can be easier to visualise and reason about. It sits somewhere in the middle between the performance of a custom threading solution and the ease of use of single threaded code.

I imagine things like web servers, dealing with tons of different connections, will be able to use this mechanism quite effectively. If you're just batching through a dataset, I don't think you'll have much of an advantage using this model.


>If your workload consists of many small concurrent tasks, spawning threads can easily become costlier than processing the workload itself.

Executors have been in java since 1.5, and even before people used queues and thread pools. Starting threads on demand has never been an accepted way to write code aside, perhaps, some blog posts.


This sort of lightweight thread is at the core of things like Erlang or Go. You can just spin up processes/goroutines by the thousands without impacting performance too much. It just completely changes the way you write concurrency code.


>concurrency code.

Likely parallel but not concurrency, i.e. there are very few (if any) contention points.


People are worried about GPT4 producing nonsense huh... articles like this prove humans still have that market on lock.


> The problem with platform threads is that they are expensive from a lot of points of view. First, they are costly to create. Whenever a platform thread is made, the OS must allocate a large amount of memory (megabytes) in the stack to store the thread context, native, and Java call stacks. This is due to the not resizable nature of the stack. Moreover, whenever the scheduler preempts a thread from execution, this enormous amount of memory must be moved around.

There are a few issues and inaccuracies in this statement.

While it is true that platform threads can be expensive in terms of resources, the claim that the OS must allocate "megabytes" of memory for the stack is an exaggeration. The actual size of the stack depends on the operating system and the specific implementation, but typical default values range from a few dozen kilobytes to a few hundred kilobytes, not megabytes.

The statement implies that the entire stack is moved around when the scheduler preempts a thread from execution. This is not accurate. When a thread is preempted, the operating system saves the context of the thread, which is a relatively small amount of data, including the values of the CPU registers and the program counter. The stack itself is not moved around during this process.

It is not correct to say that the stack is "not resizable." While the default stack size is set by the operating system, many programming languages and operating systems allow you to specify the stack size when creating a new thread. However, it is true that once a thread has been created, its stack size typically cannot be changed.

Sent from OpenAI.


> It is not correct to say that the stack is "not resizable." While the default stack size is set by the operating system, many programming languages and operating systems allow you to specify the stack size when creating a new thread. However, it is true that once a thread has been created, its stack size typically cannot be changed.

Seems like a typical confused/pendantic Stack Overflow answer. I would say that stack size is "configurable", but not "resizable", because you indeed can't change the stack size (down, at least) of an already-created OS thread.


You can create pthreads with manually allocated stack segments and resize them later.

https://www.ibm.com/docs/en/zos/2.2.0?topic=functions-pthrea...


Does that actually resize the stack in a way that the C runtime will recognize (e.g., add a guard page, making the stack size available to future calls of `pthread_getattr_get_np`)? I guess you'd only resize the stack yourself if your code was planning on somehow keeping track of the stack size, so I suppose it's not too big of a concern, but still.


The only way to resize the segment is mapping in pages at the end of the original stack, and that only works if you were careful and arranged for virtual space to be reserved and available at the end. And the only robust way to do that is to map something in. At that point might as well assign the space to the stack from the beginning. To


> The actual size of the stack depends on the operating system and the specific implementation, but typical default values range from a few dozen kilobytes to a few hundred kilobytes, not megabytes.

also lazily allocated in many implemenations

address space gets used, not memory


I would say this article is a far cry from "nonsense" - it's quite informative, even if there are a few small inaccuracies or naming issues.


There quite a few inaccuracies. The fact they are ok w/ using a naked sleep and w/o checking the interrupt flag is a flag on its own. (if you want sleep w/o exceptions use LockSupport.parkNanos not lombok)

The more glaring one is the fact that stack memory uses virtual memory space and it's not 'allocate' until used. On 64bit OS it means it's quite cheap. To me that's an unmissable, if one has spent any time doing anything with io or threads in any capacity.

There is a distinct different between parallel tasks and concurrent ones and the article seems to be confused about that part as well. (There is a reason the term is 'embarrassingly parallel' instead of 'embarrassingly concurrent')


C# .NET had async and await in 2012, for comparison. I've always loved Java but Microsoft deserves immense credit for raising the bar, and so quickly, too.


Java Virtual Threads seems to be a ("better" in their mind) alternative than async/await, so not sure Microsoft should be credited for Java Virtual Threads?

> Also, the async/await approach, such as Kotlin coroutines, has its own problems. Even though it aims to model the one task per thread approach, it can’t rely on any native JVM construct. For example, Kotlin coroutines based the whole story on suspending functions, i.e., functions that can suspend a coroutine. However, the suspension is wholly based upon non-blocking IO, which we can achieve using libraries based on Netty, but not every task can be expressed in terms of non-blocking IO. Ultimately, we must divide our program into two parts: one based on non-blocking IO (suspending functions) and one that does not. This is a challenging task; it takes work to do it correctly. Moreover, we lose again the simplicity we want in our programs.

> The above are reasons why the JVM community is looking for a better way to write concurrent programs. Project Loom is one of the attempts to solve the problem. So, let’s introduce the first brick of the project: virtual threads.


It's not really an alternative to async/await. Implicit vs cooperative multithreading have pros and cons for each.


Authors of Loom seems to disagree with you.

> An alternative solution to that of fibers to concurrency's simplicity vs. performance issue is known as async/await, and has been adopted by C# and Node.js, and will likely be adopted by standard JavaScript. Continuations and fibers dominate async/await in the sense that async/await is easily implemented with continuations (in fact, it can be implemented with a weak form of delimited continuations known as stackless continuations, that don't capture an entire call-stack but only the local context of a single subroutine), but not vice-versa.

> While implementing async/await is easier than full-blown continuations and fibers, that solution falls far too short of addressing the problem. While async/await makes code simpler and gives it the appearance of normal, sequential code, like asynchronous code it still requires significant changes to existing code, explicit support in libraries, and does not interoperate well with synchronous code. In other words, it does not solve what's known as the "colored function" problem.

This is under the section called "Other Approaches" in https://cr.openjdk.org/~rpressler/loom/Loom-Proposal.html

That seems to indicate to me that async/await would be an alternative to virtual threads.


There are differences between implicit/pre-emptive and co-operative multitasking though and there is probably tradeoffs for each. For Java however, much of the existing API exposes Threads, and locks. How do you introduce better scale of existing threading code without too much porting/changes to existing codebases wihtout significant rewrites? The "we are where we are" problem exists a lot more in Java than say .NET/Node/Go/etc which IMO tips the scales more to this kind of approach.

I think that both approaches solve a lot of the same problems, but there is some problems that fit one paradigm better than the other and I don't think they are entirely mutually exclusive either. I've heard it mentioned with Loom it solves the async "colour" probglem however I think people "overblow" the color problem of the Async API too much. If most things are async it isn't really that painful. I actually like when something is marked "async" and I have to be careful how to use it. How is it async? Can the code run in parallel and join later? Is there race condition potential? Deadlock? How is it actually async, what's it sync vs async behaviour? How do I propogate the need to halt the async operation across multiple layers? Who owns the whole workflow, can I make sure the async context is propagated to that layer so they can cancel it for example?

Probably an unpopular opinion but knowing a method has async behavior, at the type level, could be a feature not a bug. It forces you to handle it at the sync/async junction and think about these things.


> I think people "overblow" the color problem of the Async API too much. If most things are async it isn't really that painful.

There are two problems with async/await's cooperative multitasking:

1. The first is only relevant to languages that also have threads: it splits the APIs into two worlds with very similar semantics but disjoint syntactic "universes." You always need a separate API for either world and need to do everything twice.

2. Even when that is the only paradigm, it is less composable. When scheduling points are explicit, adding a scheduling point inside a subroutine requires changing all of its callers, transitively. In contrast, with non-cooperative multitasking, any subroutine in the hierarchy can individually choose to exclude interleaving (and in a finer-grained way) for atomic operations with various constructs (the simplest being locks). Since most operations need not be atomic and are independent, this is not only more composable and evolvable, but also a more reasonable default.


Async annotations in the vast majority of languages (possibly outside of JS and maybe rust) don't protect any useful invariant. You can still block in async code by calling any already existing blocking function not marked async, and async code can run in parallel when you have multiple underlying executors and shared memory.

If you design a language form scratch, async could be useful, but even then there would no need to annotate functions as they could be inferred. Generally effect systems seem a superior solution anyway.


That statement also contains factual errors. async/await came from standard JS (as part of the harmony track of work) not from node.


Yes, but I would hold that the impetus for pushing it through was from what people in the Node community was doing.

Once the early code Node became actually used the callback pyramids of doom started gaining notoriety, there was even a small period before async/await where people abused generators achieve a similar style of code(The name of the Koa framework was a play on generator co-routines) ,luckily everyone migrated over afterwards.


There's some overlap for sure, but if you're trying to juggle tasks on a UI thread with fine grain control, Loom code is going to look like await without the sugar.

Hopefully I'm wrong and there's a new golden age of GUI programming in Java but I'm currently skeptical.


async/await vs virtual threads is about stackfull vs stackless. Coperative vs preemptive multitasking is completely orthogonal.


Can you explain how it's orthogonal when async/await is cooperative and virtual threads are pre-emptive?


Async is cooperative only if your executor is single threaded. If you have multiple events loop running in parallel (for example multiple c++ coroutines running in an ASIO io_service pool), it is indistinguishable from a preemptive model.

Of course some specific implementations can be single threaded only (js) or use shared nothing threads (rust), but that orthogonal with the model


Well I'm specifically talking about the single threaded executor case. (A very common case in UI programming).

If you call a plain function from an async function, you know it will not run on a different thread no matter the executor. In this Loom model, you don't know that.

You control the thread until you yield execution no matter the executor. Explicit async methods also means you get explicit sync methods. In that sense, you will not be pre-empted when code changes under you. You can write async tasks that you know will run atomically without locks because you never yielded execution.

In this implicit model, I don't believe that's the case. Because yield points can be added at any time and thus you can be pre-empted at any time, you need to use locks for atomicity (or give up and revert to using a normal Thread). I don't think its possible to interweave this implicit yielding with atomic execution in a way that's as simple as calling a sync function. Is there some other way?


Well we are discussing models in general.

If you do not want preemption disable it. For example in posix threads use sched_fifo and pin to one core.

Loom might provide similar features if needed.


At least from the Java language advocates' perspective, async/await is a worse solution to the problem of async than the structured concurrency approach that virtual threads will enable.


async/wait is using tasks which are the equivalent of virtual threads.

What I do like to see in C# is something akin to goroutines from Go or actors in Elixir.


This response confuses me since goroutines and Erlang processes are really very similar to project Loom's virtual threads.

The mistake is that async/await is not equivalent to virtual threads: the former can do only a subset of what the latter can.


Virtual actors for C# are offered by solutions like MS Orleans as I understand it.


> What I do like to see in C# is something akin to goroutines from Go or actors in Elixir.

A task is equivalent to a goroutine.



thats interesting, any good examples?


Tons of writing and debate on this subject but the canonical starting point is "What Color Is Your Function?" https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...


Async/await are also much slower than properly implemented (see: Golang) fibers.

Basically, with async/await you need to allocate on the heap frames for each level of the async/await call stack. You basically get a linked list of stack frames.

This is essentially isomorphic to segmented stacks that Go ditched in favor of resizable stacks, because segmented stacks kill performance if a tight loop just happens to cross a segment boundary.


>Async/await are also much slower than properly implemented (see: Golang) fibers.

Are there any benchmarks to back that affirmation?


Hm. I know this from my personal work, replacing Go code with an async/await code.

There's a bunch of benchmarks that compare C# async/await and Goroutines (e.g. https://alexyakunin.medium.com/go-vs-c-part-1-goroutines-vs-... ). Go still wins there, but they all miss the very important case of nested async/awaits.

These microbenchmarks also focus on looking at Go channels versus other synchronization methods, which in my experience is not that relevant (Go channels just suck).

I guess I'll need to write a blog post.


Those benchmarks are really old, things can change quite fast when it comes to programming language implementations.


I wrote a quick-and-dirty benchmark: https://blog.alex.net/benchmarking-go-and-c-async-await

Results: Go - 32ms, C# - 494ms. The difference also becomes bigger as you add more recursion levels.


Async/await proliferates. If you call await, your function must be async too.

Green threads are decoupled from synchronization.


You can use tasks without async/await.


Not only async, but C# had LINQ, lambdas, records, pattern matching, pointers, hardware instructions via intrinsics, stackalloc etc. before similar constructs came into Java, if they ever did.

Probably there are examples where Java introduced something first, but I don't know because I'm not so well versed in Java.

While similar and inspired by Java, I do prefer C# because is less verbose, requires less boiler plate, it generally has only one proper way to do things and is kind of jack of all trades, in the sense you can tackle any area of programming besides very low level systems programming - and It quite can reach that point, too if there will be a way to disable GC and allow manual memory management.

Web backend - check, web frontend - check, mobile apps - check, desktop - check, multi platform - check, embedded - check, games - check, VM - check, native AOT - check.

It also looks great in benchmarks: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

I am biased but since I recognize it, maybe you shouldn't downvote me just for that. :)

And what's even better than C# is F#, but that's too bad nobody likes functional programming or hire programmers to use functional languages.


There's plenty of stuff Java did first (or only) -- an optimising JIT, two new generations of GC (G1, ZGC--a low latency GC with <1ms max pause time), low-overhead deep profiling (Java Flight Recorder) -- but they're all in the runtime. Java's strategy since the beginning has always been to innovate on the platform and be a last-mover on the language, keeping it conservative. .NET seems to follow an opposite strategy.

That's how we've been able to avoid properties and just have algebraic data types, avoid async/await and do virtual threads, avoid string interpolation and just have safe string templates. This also allows us to keep the number of features in the language relatively low -- as we, and most of our users, like it.


I'm sure Java dis some things first, as I said. It's great that you mention some.

>That's how we've been able to avoid properties and just have algebraic data types

By algebraic data types you mean product types or sum types?


Both. Java has records for products, sealed hierarchies for sums, and offers pattern matching over them. See https://www.infoq.com/articles/data-oriented-programming-jav...


>sealed hierarchies for sums

I wouldn't call that a good example of sum types. You can use sealed classes or other mechanism in C# to emulate sum types. I was thinking about something built in the language.

Example:

type IntOrBool = | I of int | B of bool


It's built into the language and the VM (the VM enforces the "sealedness", and pattern matching uses it for exhaustiveness checks). It's just that all types in Java are nominal.


Both of those (sum and products) are ADTs.


> Not only async, but C# had LINQ, lambdas, records, pattern matching, pointers, hardware instructions via intrinsics, stackalloc etc. before similar constructs came into Java, if they ever did.

Just a clarification regarding one statement which I know for sure is not true: Java team started working on records before C# started similar work.

I don't know about the rest.


And pattern matching was started in java first (goes hand-in-hand with records to some extent).


Yes! It was distrust towards Microsoft that kept .NET from growing to be a universal language and kept Java in the game, to be honest. The language / framework, per se, is to be celebrated as a great leap forward borne out of the good kind of competition.


Have you looked at the source of some of the C# benchmarks? I don’t believe they’re representative of how one would actually write C#. They’re all extremely hand tuned using raw pointers and unsafe blocks. The regex benchmark actually just delegates to a C library over FFI.


> And what's even better than C# is F#, but that's too bad nobody likes functional programming or hire programmers to use functional languages.

In JVM land there is Scala and people do hire for it, more so than any other typed FP language, AFAIK.


That head start has had a huge impact on the ecosystem too. Random libraries (ex: Google.Cloud.PubSub.V1) have first class, mature support for async and streams. Compare that to Python (and I'd have to assume Python is much more popular on Google Cloud) which only recently got async support and it's still kludgy. This really applies across the board for anything web related.

JavaScript/TypeScript is probably the only ecosystem with comparable async support, unless you count golang which achieves a similar result with different ergonomics. I'm still on the fence on which I personally prefer.. I can see the appeal of not having the async logic pollute the callstack, but at the same time the magic[1] way golang handles i/o seems antithetical to its philosophy of being simple and explicit (for example, with respect to error handling).

[1]: https://www.reddit.com/r/golang/comments/xiu4zg/comment/ip77...


And they are experimenting with green threads now. Would be... hilarious? If it landed in .NET before Java.

https://github.com/dotnet/runtimelab/tree/feature/green-thre...


In Java it is landed already as preview feature last September and will be final feature this September.


This is way better than async/await. This is Go/Erlang levels of green thread ease of use.


I guess everyone is entitled to their opinion.


Java had green threads in v1.0.


My understanding is that green threads are different on the implementation side as well. Basically they are cheaper at the system level? The API do not even change much last time I checked. It’s all happening in how the JVM span threads


> .NET had async and await in 2012, for comparison.

java had ExecutorService and futures support for a while, it is just another synthetic sugar around the same solution, although Java's approach is much more powerful and flexible.


java green threads strike back


That would be the best name by far!


They were called green as well as "M:N" threads: https://en.wikipedia.org/wiki/Green_thread#Green_threads_in_...


https://a.co/d/2hw5NSv

Solaris Internals

has the greatest explanation of this that I knew, including what you said.


Green used to mean non-preemptive; which is not the case here, or for Golang or Erlang (afaik). M:N is all about resource usage, the semantics are mostly the same as OS threads.



Sorry to ask a somewhat unrelated question: How did the author create those flow charts? That style is Just. So. Cool.


The style, and especially the splotchy background coloring, made Paper[1] (By WeTransfer it seems? When it was released they were called fiftythree.) my first guess.

[1]: https://apps.apple.com/us/app/paper-by-wetransfer/id50600381...


Looks like they used excalidraw.

It’s a OSS and a PWA. A lot of people seem to use it because it lets you draw without worrying about details.


It could also be an Onyx or ReMarkable tablet as well.


These are great, but Loom discussions almost always mix details with issues of the API.

It's hard to fathom what that means for end developers.

Will anything at all change but performance?

What are the practical implications beyond that?

We just don't bother with Executors and live happily?


The way i think about the situation is:

1. Platform threads + blocking APIs = comfortable to use, does not scale to many connections

2. Platform threads + non-blocking APIs = painful to use, scales to many connections

3. Virtual threads + blocking APIs = comfortable to use, scales to many connections

So, at the moment, if you want to write software which scales to many connections (which not everybody needs to do, but some do), you have to suffer for it. With virtual threads, it will be a lot more pleasant.

As a concrete example, with non-blocking architecture, there is no way to have an InputStream or OutputStream which streams an arbitrarily large amount of data over the network without buffering it all. Because the contracts of those classes say that they block when there is no data or space available! If you look at the APIs of web servers based on non-blocking APIs (eg [1], [2]), they want you to read or write the whole body in one go, or maybe in chunks, which will be pushed to you if you're reading. You can only build a stream on top of that by buffering everything, or using another thread, which destroys the advantage of non-blocking architecture. There is loads of useful I/O machinery that is built on top of streams, like JSON parsers and generators, so your choice is either not using that, or accepting buffering or extra threads.

With virtual threads, you just use blocking I/O, servers and clients can expose streams for request and response bodies, and it's trivial to use any I/O machinery you like. It's also far easier to write your own.

[1] https://undertow.io/javadoc/1.3.x/io/undertow/io/Receiver.ht...

[2] https://undertow.io/javadoc/1.3.x/io/undertow/io/Sender.html


In these discussions it's good to quantify what you mean by "many connections".

Eg web is full of people posting Java OutOfMemory stack traces when they haven't increased the OS resource limits from the default and are imagining that the limit is 10k threads instead of 1M threads on their hw, falsely concluding that Java uses uses a lot of physical memory per thread stack.


So why can't I just use an OS regular thread to use a non-blocking API?

What is the advantage of a virtual thread in that case? You are saying that it's painful? How exactly? If you use an Executor that's setup with a proper Thread Pool it's mostly painless?

I mean, virtural threads are nicer but I see that as almost a syntax issue. A bit cleaner.

I guess I'm asking what you mean by 'painful to use?'.


I literally include a concrete example in the comment you are replying to.


>2. Platform threads + non-blocking APIs = painful to use, scales to many connections

Wow, I have written a substantial amount of non-blocking IO since around java 1.4.2 (when it actually became stable). It was not much harder to use at all (compared to io streams). The issues w/ buffering/scalability and internal scheduling will be exactly the same with green threads.


Me too, and that's not correct.

Firstly, the idea that NIO is "not much harder to use at all (compared to io streams)" is wild on its own. People built Netty (and XNIO) precisely because of how awkward NIO is to use.

But secondly, there are common, useful patterns that are trivial with streams and extremely awkward with asynchronous I/O. For example, here's some code which queries a database and streams the results to the client as JSON:

    OutputStream responseBody; // assume you get this from somewhere
    try (JsonGenerator json = Json.createGenerator(responseBody)) {
        json.writeStartArray();
        try (Connection connection = openDatabaseConnection()) {
            ResultSet results = connection.prepareStatement("select first_name, last_name from users").executeQuery();
            while (results.next()) {
                json.writeStartObject();
                json.write("first_name", results.getString("first_name"));
                json.write("last_name", results.getString("last_name"));
                json.writeEnd();
            }
        }
        json.writeEnd();
    }
It does not load all the data into memory, it does not buffer all the JSON in memory, it automatically handles backpressure (if the socket blocks on a write, it will pause reading results from the database), it releases its resources cleanly if an exception occurs, and it's fourteen lines of very simple code.

You cannot do anything like this with an asynchronous API.


>It does not load all the data into memory, it does not buffer all the JSON in memory, it automatically handles backpressure (if the socket blocks on a write, it will pause reading results from the database),

This particular feels a lot better with streams as jdbc is blocking by nature and it has no other options.

However, the simple code has its own issues. It closes the output stream on finishing the read, unless the output is really large it wastes quite a lot (the tcp/tlc handshakes, authentication/authorization/etc.). If the output is large enough (implied by not loading the json, and not having where clause in the sql) it exposes the database server to all the denial of service when the clients don't read fast enough, i.e. the database has to support as many connections as the frontend java.


Additionally, Loom is about structured concurrency, built on top of virtual threads.

Two different projects, albeit that one is making use of the output of the other.


Weird for the article to compare OS threads to Loom threads, and skip over the last ten years of Futures.


But Futures still tie up a real OS thread. You can chain them together which alleviates some additional thread cost but it's not M:N scheduling, you're just being more clever with your OS threads. Virtual Threads separate those concepts.


> but it's not M:N scheduling

Why do you think it is not M:N scheduling? You have M tasks and N OS threads in the pool?


There is an example in TFA which shows how if you have a normal thread pool and two tasks with one sleeping indefinitely and only a single thread to execute, the the second thread never executes. Then the example switches to virtual threads and both can execute (after also adding a sleep). This shows how you can have more tasks than threads and everything still can get a chance to run. That's not true with Futures prior to this. M:N means you can run more tasks in parallel than you have threads, just not with a speedup or more parallelism. So only virtual threads are M:N.


I think what you are describing is API for manual scheduling: you call sleep() to give up control to scheduler.

With tasks/futures you supposed to submit another task to pool which will wake up after certain time.

Both are "green threads" with manual scheduler (you need to make explicit call to give up control to scheduler) but different API.


Thread.sleep on a platform thread takes that thread out of action until the sleep ends. If an executor with 10 threads got 10 tasks that all called Thread.sleep(1000000) then it can run no more tasks until one of the threads wakes up.

When a virtual thread sleeps or blocks on IO it is unhooked from the underlying platform thread so another virtual thread can run. You can have an almost unlimited number of virtual threads multiplexed over a small number of carrier (platform) threads. Hence M:N


In your view advancement is that in "green threads" they overloaded Thread.sleep(..), so it doesn't call real Thread.sleep() but doing something like Futures + ExecutionService underneath instead.

Java already had tons of non-blocking io/http and many other frameworks without "green threads" but using futures and executionservice. Green threads look like syntactic sugar.


I think Thread.yield() would do that?


Thread.yield() is very weird, and something I'd not advise using aside spin lock with backoff. It's OS dependent, and if may just bounce to the OS and back, continuing the current execution.


That's what should have been in the article.


but that's what the topic of the article in its title is.

I say that as someone who once had a Solaris Internals book (and worked at Sun in that area), and it would be pretty great for that book's explanation to be online for Java people who didn't use original green threads to see.


I learned a new word from this article, "Omissis", which means "omissions and reactions". Equivalent in this context to "...snip..." or "some code omitted".


Can I forget about reactive programming now? Its been fashionable for the last 5+ years, I hate it. Kinda cool to play around with but just seems to double the complexity on everything.


Great, so Java has virtual threads 13 years after C# introduced tasks. Why can't Java introduce useful concepts faster?


The java equivalent of task is probably Future or CompletableFuture, which have been in java since 2012 / 2014. In terms of running them, Executors have been there since 2014.

(Obviously this would have been 1:1 with OS Threads)


Java had green threads in Java 1.1 (February 1997), C# 1.0 was released in January 2002...




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: