> Since go doesn’t inline assembly i decided to include the search loop into the assembly itself and to return the index of the non matching character. Sadly i didn’t get it to work fully because i’m missing tools to actually debug the assembly itself.
Um... "gdb <my_binary>", "break <my_function>", "run", just like any other tool.
Then "disassemble" to see the instructions around the PC, "stepi" to execute one instruction. "info register" to get current state (or use registers in an expression as e.g. "print $rax" or "print $r11").
And that's it. Debuggers are actually natively tools operating at the level of machine instructions. The language syntax support is a level on top.
Assembly is one of those things that is feared by so many engineers, but is actually incredibly accessible. Whenever I find an excuse to drop down to assembly my colleagues have looked at me like I'm a wizard with 3 heads.
The tooling to debug, dump or inspect, or to link or otherwise glue bits of assembly together has existed for decades. It's all right there!
I agree it's unnecessarily feared. At its core, it lacks a lot of complexity.
But actually writing asm is an experience straight from the 80s. Compared to modern languages, the tooling is simply antiquated. I noticed this once I started writing larger sections of asm at a time. It's "fine" in the same sense that writing makefiles is "fine" - it becomes a PITA after a certain size.
I think there's lots to improve on the current state of asm, but it's just too niche. The only power users I know are hobbyists working on operating systems and games for retro hardware.
Also add cryptographical people to that list. A friend of mine told me that they create 'constant' time algorithms.
These have to written in assembly otherwise the compiler will try to speed up the algorithm. But then you can theoretically measure the execution time of encrypting and reverse engineer the possible keys.
It's one of these areas LLMs really help to get started. Not much invention needed, but quite involved setup if you don't know what you're precisely looking for. Having them spit out boilerplate and needed invocations lowers the barrier substantially.
Assuming you have a C compiler, you should use C to get basic boilerplate and use asm or volatile asm to drop down to assembly level.
This is true for GPUs, CPUs like x86 and ARM, and more. Only on the smallest of embedded systems like AVR with 4kB RAM have I ever found dropping to raw assembly to be useful.
Bonus points: input/output/clobbers works well with the GCC and CLang optimizers. Though it's not intuitive how compilers can still move your assembly around, it's surprisingly full featured for how to get info into and out of registers.
There are rare times where the C boilerplate is counterproductive. I guess OS level setup is another spot where raw non-C assembly is useful.
Intrinsics, when available, are a convenient tool. But even then, I think assembly these days has very limited applications and most programmers don't need to know any assembly to be effective.
Preemptive disclaimer: all my jobs were the exception to the rule and I sadly had to work with various assembly languages on a daily basis. Still not a fan.
Things like the condition flags, or GPU execution flags, or other 'implicit' flags are when assembly is needed.
Otherwise, intrinsics are sufficient. And with wavefront voting / ballot instructions (and intrinsics) being so readily available on modern GPUs, there's very little reason to go to those assembly level / machine code flags or registers.
Similarly, back when ccmov instructions weren't reliably output by compilers, you'd probably want raw assembly. But today I think it's safe to rely upon the optimizers.
You should be able to read assembly, not least because understanding what's generated by the compiler or jit is pretty important to performance optimization.
Reading is indeed the main thing i would like to learn, For my master thesis i had dome some GPU programming (8 years ago) and then it was super useful to read the assembly to reduce the amount of steps and to understand the 'execution model'.
So this allows you to make sure your 'optimization' actually optimized anything.
Aren't there many other steps with better bang for your buck to be done before that? Libraries, better algorithms and data structures, parallelism, etc.
I suspect it depends, when you would write a yaml/json parser you can only change the algorithm up to a point.
After that you will have to start doing some bit fiddling and then being able to see the assembly can be really valuable.
How many programmers write a YAML/JSON parser vs. use an existing library?
How many of the ones who write their own parser would benefit from using a library more than from reading assembly?
If your answer is that: "well, the ones writing the library benefit from learning assembly"... Think about what percentage of programmers they represent. Not to mention that source-level profiling will still give them better bang for their buck.
As somebody who has read a ton of assembly in their career because those marginal gains mattered: 99% of programmers are better off investing their time elsewhere.
Yes i agree with that one most people don't need, they should first use a profiler. Then they can easily improve the performance by 10x.
For example I optimized a calculation with python dataframes by monkeypatching the 'unique' method so it would skip the sort, since my data was already sorted. This gained me a 5% performance improvement. (there where a few other tricks which reduced the calculation time from 3h to 20m making iterating faster)
So i guess the assembly part is just a personal interest and it is only useful for the most inner loop of a program which you can't avoid.
It seems that in general using SIMDs/intrinsic is already in the very advanced playbook of a developer. Just like reflection, classpath scanning etc, GPU acceleration.
Ideally the standard library should provide the fastest JSON/YAML/CSV parser so no other attempts are made to improve on the standard.
I suspect your argument could even be used if you need performance it might be easier to just switch languages. Somebody was super exiting to me that he used a javascript lib which generated optimized code for SQL queries deserialization at runtime. I bluntly said well shouldn't you just use another language to avoid this complexity.
Curious question, Why did you read assembly often in your career?
Yes, I understand. But the only way that gives you good bang for your buck is if you have already exhausted a number of other areas earlier. I.e. it is marginal gains.
How many programmers out there would be better served by spending time learning more about those other areas before they even start thinking about whether the compiler is generating slightly suboptimal assembly?
I also like "disp /3i $pc", which tells gdb to disassemble the next three instructions whenever it stops (including after stepi). Though I'm sure you can configure it to give you more sophisticated disassembly display and context, especially with the tui interface, this is usually "good enough" for me when I run into something I want to step through at the assembly level in the middle of debugging a program that's primarily C.
Um... "gdb <my_binary>", "break <my_function>", "run", just like any other tool.
Then "disassemble" to see the instructions around the PC, "stepi" to execute one instruction. "info register" to get current state (or use registers in an expression as e.g. "print $rax" or "print $r11").
And that's it. Debuggers are actually natively tools operating at the level of machine instructions. The language syntax support is a level on top.