Blurred rounded rectangles

gfxgirl · on April 26, 2020

> elaborate 3d scenes built up out mostly out of distance field primitives, a stunning demonstration of the power and flexibility of the technique.

Also a demonstration of how slow that technique is. I can run stunning games with entire cities of buildings and people and cars and mountains in the distance and trees and grass and clouds all running at 60fps or faster. Or I can run some SDF that runs at 0.2 to 3 fps on the same machine.

Don't get me wrong, I'm blown away by those shaders but they aren't remotely performant.

This particular technique might be okay but you'd still arguably be better running it on 4 quads that make a frame. There's no reason you want to be computing pixels in the middle of the frame where there is no shadow.

raphlinus · on April 26, 2020

It depends on how you use it. The Shadertoy examples demonstrate that it's possible to do a whole scene as one shader, but I agree the performance is not great for complex scenes.

This one is intended to be used as part of a 2D renderer. It would be easy to write this up as a fragment shader (and the parameter calculation can be done in either the vertex shader on the CPU). I believe, without having tested it yet, that it would be lightning fast, especially because the heart of the erf approximation can be done with inversesqrt.

I'm also working on a compute-centric renderer, and for that I expect it to be even faster. There, I'll break the scene into tiles, and for each tile there can be an analysis of what's inside. So tiles on the interior of a large blurred rounded rectangle can be solid colors, the edges can just compute a 1d function, etc, and only the corners with the full version.

Keep in mind, what's done in many, if not most, rendering pipelines is to render the rounded rect into a buffer, run a Gaussian blur shader over that buffer into another one (often two or more stages for the separable version of the kernel or something like a dual Kawase blur). Even the simple fragment shader version should massively outperform that.

kroltan · on April 26, 2020

> Also a demonstration of how slow that technique is.

Or how fast! SDFs can compute approximations of volumetric effects that on a regular raytracing engine would take a few seconds to render.

Additionally, games have been using (baked) 2D SDFs [1] for ages to render world-space (and recently even plain screen-space) text, it's plausible to use the same technique to generate other kinds of shapes.

It can be very useful for UI elements since you can have just a few source assets, and with some shader parameters you get fully animatable effects such as drop shadows, glow, and even normals.

[1]: https://steamcdn-a.akamaihd.net/apps/valve/2007/SIGGRAPH2007...

BubRoss · on April 26, 2020

You are conflating multiple things here. Marching through volumes for volumetric effects are not going to be made faster by SDFs generally because you need to march through the videos from an eye ray and march back to the light sources while doing it.

2D signed distance fields are also different, since as textures, each shader fragment is still looking at the same pixels it would have seen before and is just able to do a little extra work with the values it finds to create sharp text.

dahart · on April 26, 2020

Lots of amazing SDF examples run at 60Hz, are you thinking of some specific examples that run at 3fps and below?

It’s not really fair to compare SDF with rasterizing using a game engine that has bounding volumes. They are different things.

Some of the terrain & grass SDF examples run faster than any raster engine can ever do it.

> you’d still arguably be better running it on 4 quads

Nothing about the article precludes doing that, right? The technique would work without modification if you attach it to some quads and leave out the middle. Probably even better yet, just exclude the middle from any calculations in the shader and use 1 quad...

Lorin · on April 27, 2020

The folks at CIG (makers of the upcoming Star Citizen game) are hard at work optimizing SDF for shaders.

grenoire · on April 26, 2020

Fun fact: iOS app icons are not rounded rectangles (rectangles with 90 degree arc corners) but squircles, which are roughly superellipses with n=5. The linked Wikipedia articles also remark this.

busymom0 · on April 26, 2020

Same thing with the rounded corner of the iPhone X series. One time I was trying to make one of he UIView match the corner radius of the screen and it wasn’t matching until I learnt that the rounded corners of the screen/device is a 38.5 squircle (not sure if that’s the real term). Basically instead of having a sharp curve, it starts much earlier.

foxes · on April 26, 2020

If you compute the curvature it starts flat, does a few bump and then flattens again (this curvature is not a very smooth function however). An n-ellipse is not quite the right curve. Apple used a bunch of Bezier curves. I think their design could be made a little more aesthetic if you used some smoothed piecewise bump (like a Gaussian nearly). You can numerically solve for the curve given the curvature and then approximate it.

saagarjha · on April 27, 2020

That’s a slightly different curve, and I believe the corner radius for that is defined to be 39.

busymom0 · on April 28, 2020

This is where I originally learnt about it. Seems like it's called a continuous corner, or “squircle”:

https://kylebashour.com/posts/finding-the-real-iphone-x-corn...

saagarjha · on April 28, 2020

The corner radius is 39; you can find it by peeking inside iOS. CALayer has "continuous corners", which are very similar to but just slightly different than the app icon shape, which is a 16-part Bézier curve generated inside of the MobileIcons framework. This curve is as far as I can tell identical to the one that UIBezierPath.init(roundedRect:cornerRadius:) will give you when the corner radius is 22.5% of the side length. (Note that the app icon on the home screen, being of a constant size, is actually generated via an image mask rather than dynamic clipping.)

jacobolus · on April 26, 2020

Raph also linked to https://www.figma.com/blog/desperately-seeking-squircles/

ape4 · on April 26, 2020

https://en.wikipedia.org/wiki/Squircle Please don't vote this up ;)

DominikD · on April 26, 2020

Fun fact: physical controls on the second generation Zune (released 2008) were squircles.

dirtydroog · on April 26, 2020

It reminds me of the techniques used in railways to smooth the transition from straights to bends.

https://en.m.wikipedia.org/wiki/Track_transition_curve

raphlinus · on April 26, 2020

Not a coincidence. I talk about that a bit in my thesis on curves: https://levien.com/phd/phd.html

saagarjha · on April 27, 2020

Actually, iOS app icons are not superellipses but a 16-part Bézier curve.

pixelpoet · on April 26, 2020

See also the great article from 2001 by Michael Herf (these days best known for f.lux): http://stereopsis.com/shadowrect/

lainga · on April 26, 2020

Why again are explicit `min` and `max` faster? Is that GLSL specific, and (unlike say C++, where std::min and std::max are just `if (__a < __b) return __a; return __b;`) the compiler won't be able to turn a one-line conditional into an ARB MIN or MAX instruction?

Jasper_ · on April 27, 2020

The classic explanation is "divergence", and it goes something like this: On GPUs, branching is tricky because the same code is evaluating many pixels at the same time. If half of those pixels go one way, and half of those pixels go another way, the GPU has to run both pieces of code, with half the results "masked out" [1]. This is why branchless code tends to be more idiomatic in shading languages.

You might ask how max() and min() are implemented with a branchless model. Sometimes the GPU has a native instruction for it, and a "sufficiently stupid compiler" might not be able to recognize the branch and turn it into the corresponding max/min.

The modern reality is that most all GPUs all have conditional move instructions which allow them to do some amount of branchless conditional across vector lanes like "x >= 1.0 ? x : 0.0;" without incurring the penalty of true flow control.

However, some are still uncomfortable with trusting the compiler to recognize and support this, especially on mobile chipsets with poor quality compilers. Others still just prefer the coding style of the idiomatic branchless expressions, since it's what they're used to.

[1] Footnote: On super old GPUs, like those in the Direct3D 8 era, flow control was emulated completely through branchless systems. The native machine ISA was something like a series r=lerp(A,B,C)+D instructions, and flow control amounted to clever abuse of this paradigm -- lerping to 0.0 or 1.0 can get you a form of conditional move.

raphlinus · on April 26, 2020

There are two considerations here. One is whether you get the optimal assembly for min and max operations. Using the GLSL intrinsic is probably the best way to get confidence in that, but it's also likely that compilers are smart enough to figure it out when given other input. If it did compile to an actual branch, it would be much slower. Also note on AVX there are VMINPS and VMAXPS instructions.

The other consideration is the style of writing the code. If you were writing for a sequential processor with fast branching, it would be very tempting to write "if in the corner, compute this. If exterior on the edge, compute that. etc." This might save quite a bit in the number of "actual work" operations, but is much more likely to compile into branches, which on a GPU (or SIMD or, likely, any modern CPU) the will cost more than the work saved.

So it's very idiomatic when writing shader code to use min and max to combine a bunch of cases into a unified code path that can be executed as a straight line.

anonymoushn · on April 26, 2020

It probably compiles to the same code as the equivalent if statements. Source: this person who certainly knows better than I do https://twitter.com/bgolus/status/1235254923819802626

benkoller · on April 26, 2020

It really gives me great pleasure to find gems like this article on HN. I can't see a future in which I'd otherwise gained the insights I've now gained through reading (your?) piece about the complexity of blurring complex shapes. Thanks for that.

dahart · on April 26, 2020

Nice article! I have a couple of questions.

> [reciprocal square root] it is particularly well supported in SIMD and GPU and is generally about the same speed as simple division.

Curious, Raph - why is the erf using f64? Reciprocal square root is well supported for single precision, but not double precision. And the spline fit constants in there are single precision anyway. I’m guessing it’d be a lot faster with no harmful effects as f32. (Seems to work fine on ShaderToy BTW).

Also curious if erf() might be overkill? Did you compare to using a smoothstep()? What are the quality indicators you’re looking for? It seems like I get very close to the same results as your erf approximation if I use smoothstep(0., blurwidth, sqrt(d)) where d is the SDF distance to the box. (With the added benefit that I automatically have a strict bound on the blur.)

raphlinus · on April 26, 2020

Sure, you'd want to do this with 32 bit floats in production, the f64 was really for prototyping.

I didn't compare smoothstep. It's worth doing an analysis of the tradeoff between performance and quality. In any case, I think in practice this erf approximation will be plenty fast, and probably a bit better quality, especially in the tail region.