Ask HN: Dynamic memory/CPU provisioning for VMs?

dilyevsky · on July 18, 2020

GCP e2 instances (that are like 30% cheaper) are closest match to what you are asking. These VMs run on overcommitted capacity and migrated to a different physical host seamlessly when the resources are reclaimed

Edit: e2 not n2 - https://www.google.com/amp/s/cloudblog.withgoogle.com/produc...

wmf · on July 18, 2020

For CPU you have hotplug and quota scheduling; for memory you have hotplug and ballooning.

But when you say "the price would vary according to how much RAM or CPU resources you use" you get into the real complexity: resource sharing. If your VM temporarily gives up some RAM, can another VM use that RAM? This is very hard to do, because the provider doesn't know when/if you'll want that RAM back. They don't want a physical server to get into a situation where the RAM demand is higher than the installed RAM because there is no good solution to that scenario. If you're running hundreds of "micro" VMs/containers on one server you can rely on statistical multiplexing and luck, but it doesn't really work for large workloads.

A provider called NearlyFreeSpeech has been charging based on "the use of one gigabyte of RAM for one minute, or the equivalent amount of CPU power" since even before EC2 existed AFAIK, but I suspect this complexity is more scary than attractive for most people. https://www.nearlyfreespeech.net/services/hosting

soamv · on July 18, 2020

Turns out to be a somewhat annoying problem at the VM level. Not impossible but complex enough that maybe higher-level solutions like functions are better.

Consider memory usage -- but operating systems (and some applications) are designed to grab all the memory they can, and use it for caching etc. So it's hard for the VM host to known when it can grab memory and stop billing the user for it.

But there is this idea called memory ballooning -- you have a little process running on the VM guest OS that grabs lots of memory, but is actually in cahoots with the host, and just tells the host -- "hey I got all this memory, you can take it back and use it somewhere else".

Okay, so doesn't ballooning solve the problem? There are a few problems with it -- you can't balloon when you need the memory, because it's not fast enough. So you have to balloon pro-actively. And you don't know how much to balloon, so you have to guess: do it wrong and the guest OS will start swapping, or it might activate its OOM killer and start killing processes.

So making memory usage follow the application is kinda sorta possible but comes with hairy problems. What about CPU? CPUs usage already follows the application so you could just measure and bill accordingly -- except that nothing is gained if memory doesn't also follow usage.

All in all it's way simpler to get towards this goal with clearly-defined higher level services like Lambda.

k__ · on July 18, 2020

And even with Lambda you have to decide before. At least powertuning helps here a bit.

Mandatum · on July 19, 2020

And keep in mind the premium price that follows cloud Functions as a Service. Unless you're running a very intermittent batch job with strict control and security policies, most traditional organisations really shouldn't need to build stuff out on this type of architecture model. If you ops and dev team is less than 5 people, maybe consider this to save on ops costs.. But I think it's fair to say for some orgs, this premium just isn't worth it. And they don't know it until they've been billed millions of dollars over years and finally a CFO/CTO is running cost-cutting programs.

k__ · on July 19, 2020

The idea is that the Ops for a coparable architecture without FaaS is more expensive than just running FaaS and not having to pay or to talk to as many people.

p17 · on July 20, 2020

I'm interested in CPU core ballooning. I am willing to manually configure my code to ask to increase CPU cores.

purpleidea · on July 18, 2020

https://github.com/purpleidea/mgmt/ can dynamically add and remove vcpus to a running vm. Each change has sub-second precision, and a second https://github.com/purpleidea/mgmt/ running in the vm can detect this and charge workloads accordingly if so desired.

There are videos of it happening, but no blog post yet.

phamilton · on July 18, 2020

t3 instances on AWS have burst capacity charges if you choose unlimited. It's $0.05 per vCPU-hour, and is only charged if you exceed the accumulated burst capacity.

So running a t3 would allow you to pay for a baseline and then only pay for the CPU you end up needing beyond that baseline.

trebligdivad · on July 18, 2020

Removing RAM from VMs turns out to be actually quite tricky - hot-unplug very rarely works because the OS tends to have allocated stuff all over so you don't have a nice large DIMM like quantity to unplug. Ballooning kind of works but it's more advisory, there's nothing to stop a guest gobbling that ram up again (and it has other issues). David Hildenbrand's Virtio-mem might help solve this; see: https://www.youtube.com/watch?v=H65FDUDPu9s

oneplane · on July 18, 2020

Not for VMs but definitely for containers and functions (like AWS Lambda). You can configure them with soft and hard limits but also invocation counts and runtime.

Doing the same in a VM might be possible (the technologies exist) but it's often the task or workload that needs to be modified to support it and when you are already doing that a step to containers, functions or horizontal scaling is just as easy (or hard). Horizontal scaling based on load is pretty common (classic ASGs but also overcapacity cost bidding based scaling).

rbanffy · on July 18, 2020

It would be possible. Paying for CPU and memory was the business model of hundreds of data processing companies that ran mainframe batch jobs and time-sharing services on partitioned machines. Mainframe companies billed users by machine usage (with the machine on-site). This is obviously doable.

It raises some isolation concerns, however. To make this work, the host needs to know how much memory is allocated to a given tenant, and that's difficult without having access to the OS running inside the machine, which is easy with containers, but not so much with VMs. The tenant can turn off virtual CPU cores in the VM and the hypervisor can pick that signal up (at least in Linux) but I'm not sure it's possible right now to do the same to virtual memory modules. I'd love if AWS allowed me to do that because the boot process of some of my workloads is more CPU-bound than the rest of the machine's lifetime and having a bunch of extra cores at boot would do wonders. If, under memory pressure, I could "plug in" more virtual memory modules, it'd also be quite nice.

There could be a market opportunity, but whoever does it would need to beat the current incumbents in price or this would not fly. Also there would be some difficulty scaling up - all these extra CPUs and memory would need to come from somewhere and would necessarily fail (or need to trigger a live VM migration, which, IIRC, nobody does in cloud environments right now) if the host box's resources are fully allocated.

toast0 · on July 18, 2020

With appropriate cooperation between host and guest, you could use memory balooning [1] for this. Basically the guest is told the system memory is huge, but allocates most of it to a baloon driver that just gives it back to the host. You could signal the host to give it back to the guest when needed / paid for.

Complications: a lot of oses and some programs will tune themselves on startup to the size of the ram, if the baloon is large, you would have a lot of ram used in bigger kernel structures that might be never used. There's probably a practical limit on how much your memory could grow. Provisioning would be complex, as you mentioned.

Google cloud is documented to do live migration away from hosts that need maintenance, so it seems possible to at least potentially move guests if they want more ram and can't get it on the current system. You could also make expandable guests colocate with preemptable guests, and preempt guests when expansion was needed. Presumably, you could also make a tier with premptable ram, but that seems pretty specialized, I'm not sure if you would get enough users to meet provisioning needs.

Some physical systems have hotplug ram, but it's not very common. The expense doesn't seem worth the gains in most cases.

[1] https://en.m.wikipedia.org/wiki/Memory_ballooning

rhn_mk1 · on July 18, 2020

Given that hotplug exists, couldn't it be used to communicate the amount of available RAM to the guest instead of ballooning? That would leave the OS responsible for swapping hungry programs out and resizing kernel structures.

I guess the missing piece is that it doesn't leave a good way for the guest to request more RAM (maxing out the balloon?).

sscarduzio · on July 18, 2020

what if the guest's way to ask more ram is a super simple API call coming from a simple autoscaling system based on basic CPU/RAM usage metrics?

jacques_chester · on July 18, 2020

I expect the mainframe model will re-emerge at some point. Paying for consumption instead of capacity is just too attractive.

One thing that would make this easier would be a shift towards systems like LegoOS[0]. Instead of a host-centric design, you directly attach hardware components to a "fast enough" network, each with a little bit of local controlling logic. So there are no hosts per se, just a pool of CPUs, a pool of RAM, a pool of disks etc.

It wouldn't be suitable for workloads where locality dominates flexibility. But in the long run, it could be very useful for scale economies. Especially since different workloads have different patterns of demand for CPU, RAM and disk.

[0] https://www.usenix.org/system/files/osdi18-shan.pdf

blibble · on July 18, 2020

for memory the tech already exists: https://en.wikipedia.org/wiki/Memory_ballooning

you could bill based on how much has/hasn't been ballooned out

CPU: AWS already have burstable instance types, which bill you more for usage if exceed your quota: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstabl...

rbanffy · on July 19, 2020

Burstable cores works as long as you are sharing the core with other processes. The moment you have the whole core for you you either migrate the process to a faster core (imagine a Xeon Lakefield with 16 beefy cores and 128 Atom-like ones), possibly on a different host, or add more cores to the VM you have.

sebazzz · on July 18, 2020

I have a related question: how can you know that a VM is being limited in terms of CPU power?

As an experiment I deployed several apps to Azure S1 tier and they appear to be good bit slower than running on a VM on our Dell R740 on-premises server.

WrtCdEvrydy · on July 18, 2020

Be aware, most cloud companies consider a 'core' to be a hyperthread.

When ARM released on AWS, it was a double punch because it has no hyperthreading on the chips so you get a full core at a cheaper price.

lykr0n · on July 18, 2020

The cloud is always going to be slower the physical hardware of similar specs. This is fact. You don't use the cloud for performance.

You can't unless the provider is exposing those kind of metrics to the system. Some systems, such as VMware can expose CPU allocation time, CPU Wait, and other metrics that show much time you're waiting for resources

Azure S1s are shared CPU. My guess is that Azure is running 4 VMs per core or something, so you can get more then 25% of CPU if there isn't any contention, but you might only be 25% of that core

PaywallBuster · on July 18, 2020

Possible in vmware to add additional CPUs https://blogs.vmware.com/performance/2019/12/cpu-hot-add-per...

However we kind of moved on to disposable servers which can be scaled on demand. In which case you just add additional servers or adjust server type depending on requirements. Same with containers.

The need to add RAM or CPU to a running instance was from the time you used to have a singe long living instance serving an application.

pstrateman · on July 18, 2020

Yeah of course you can.

Simply allocate each vm one core for each hyperthread and record the CPU time used in total.

Nobody actually does this because it makes billing complicated. Both practically and for sales.

My guess is this would result in overall less revenue as well, AWS is for sure making lots of money selling the same cpu time to a dozen people not actually using it.

wmf · on July 18, 2020

AWS is for sure making lots of money selling the same cpu time to a dozen people not actually using it.

They "for sure" aren't; their prices are so high precisely because the resources are guaranteed (in most cases).

jrockway · on July 18, 2020

I agree with that. Amazon pretends to sell you "burstable" instances, but they know that most people are going to be bursting at the same time a day because most applications want low latency and host them close to their users, applications that are worth paying for are used by businesses, and business users use the app between 9-5 on weekdays. As a result, you don't get much bursting and the instances aren't much cheaper than non-burstable instances.

Good resource utilization will only happen when you have non-interactive tasks that can be scheduled to smooth the demand. When I worked at Google, this is something we had; plenty of people were willing to run their mapreduces overnight, and take advantage of the purchased interactive capacity that other services needed during the day. In the real world, I've never had anything like that. Users use our website during the day. Employees use the internal apps during the day. Developers send their code off to CI during the day. That means from 9-5 there isn't enough capacity (builds have to wait) and from 5-9 the computers are just sitting there bored. I don't see a way around it, even at cloud provider scale, just because so many users are like me. I might be willing to send CI builds off to some part of the world where it's nighttime -- if you'll give me the CPUs for free. Nobody appears to have written software to do this (maybe the big CI providers do this in the background; but I haven't noticed a lot of latency while ssh-ing to CircleCI jobs for example), so we sit here using computers very inefficiently.

Ultimately, computers are cheap enough that it doesn't matter. I have a 10 core workstation that sit idle most of the time. Having a fast build when I need one is nice; and the cost to have those cores sitting idle is minimal. I would be happy spending even more money on my workstation for that reason. I imagine people treat their servers the same way, and aren't tempted to tweak things because their workloads are interactive and autoscaling introduces latency.

pstrateman · on July 18, 2020

The performance of EC2 instances varies significantly between instances.

If the resources were actually guaranteed that simply wouldn't happen.

Noisey neighbors wouldn't be a phrase people even knew if they actually guaranteed resources.

the8472 · on July 18, 2020

As far as memory goes your OS would normally gobble up as much ram as it can get for caching. You would have to go out of your way to make it memory-frugal so it could yield ram back to the hypervisor, and that could impact performance.

It's easier with CPUs where you can just yield back if there's no work.

spullara · on July 18, 2020

AWS’s Aurora Serverless is charged for in this manner albeit with a minimum level.

hacker_newz · on July 18, 2020

Your title implies dynamic hardware provisioning while the post is about pricing for use? Which is it?

inetknght · on July 18, 2020

I understand it to be about both. VMs can have CPUs and RAM added or removed while running as long as the kernel supports it. Linux supports it. I'm not aware of a service that sells this feature though

gautamkmr89 · on July 18, 2020

Use AWS Fargate or other lambda or other serverless technology.