Location: France
Remote: Yes
Willing to relocate: No
Technologies: Rust, C/C++, distributed systems, high perf computing
Résumé/CV: Started as compilers guy (embedded C, Verilog-A), continued in embedded dev in distributed systems, then spent the last decade as Rust dev. cf https://www.linkedin.com/in/xavierbestel https://github.com/bestouff
Email: xav@bes.tel
> Successive governments have succeeded in making Britain less attractive to immigrants. Unfortunately, they may also have made the immigrants who arrive less attractive to Britons. As the number of migrant workers has collapsed, asylum-seekers have become a bigger part of the mix
Wow, if only there were a way for a sovereign nation to control the composition of its immigrants... Crazy how this is framed as a factor completely out of their control, rather than the deliberate policy choice that it is.
The problem of UB is not really that it may crash in some architecture. The real problem is that the compiler expects UB code to NOT happen, so if you write UB code anyway the compiler (and especially the optimizer) is allowed to translate that to anything that's convenient for its happy path. And sometimes that "anything" can be really unexpected (like removing big chunks of code).
One example along this path as an example is that every function must either terminate or have a side effect. I don't think one has bitten me yet but I could completely see how you accidentally write some kind of infinite loop or recursion and the function gets deleted. Also, bonus points for tail recursion so this bug might only show up with a higher optimization level if during debug nothing hit the infinite loop.
Infinite loop without side effects == program stuck and not responding on user input and not outputting anything. That's not something a useful program will ever want to do.
Yes, the C++ committee has been making some stupid decisions lately. This is not the only one.
Low level platform-specific code that needs to hot spin until an interrupt happens can use assembly for that part which it will need to do for the interrupt handler anyway.
The problem is when you accidentally write an infinite loop. In a different language, you run the code, see that it gets stuck and fix it. In C, the compiler may delete the function, making it hard to realize what is happening.
I'll help. A call to `stpcpy` that ignores the return value can be swapped with a call to the (more likely to be optimized) `strcpy`. Since that's infinite recursion, and there is no forward progress, it's undefined behavior and anything goes.
This isn't just theory, it actually broke things in practice for me.
C does allow unconditional infinite loops (e.g. "while (1) { }" isn't UB) but still is UB if the controlling expression isn't constant (e.g. "while (two < 10) { }" is UB if two is a variable less than 10)
Yes, that is a problem, but this is also the most useful feature and reason for UB. People that suggest to just define it or make it unspecified, miss, that the compiler being able to remove whole parts of a program is the point. When I write code, that is UB for certain inputs, it is because I do not intend the program to have any behaviour for these inputs. I do want the compiler to optimize those away or do anything that effects from the behaviour of the other defined cases. It is deeply satisfying to add some conditions triggering log strings and see that they do not occur in the binary, because they can be only reached via UB.
The point in the article that 'It's not about optimisations' really got my attention. I've previously done some work where we wrote an analysis pass under the assumption that it executed last in the transformation pipeline and this was needed for correctness. The assumption was that since no further optimisations happened it was safe. Now I'm not so sure...
Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).
Another commenter suggested using LLMs, but I disagree. Having clangd emit warning squiggles for unchecked operations (like signed addition) would be a good start.
> Removing code paths that the programmer has explicitly laid out in the source code should be made a hard compile error unless the operation has been tagged with an attribute (anyone who wants to add the unsafe keyword to C? ).
Dead code elimination is essential for performance, especially when using templates (this is basically what enables the fabled "zero cost abstraction" because complex template code may generate a lot of 'inactive' code which needs to be removed by the optimizer).
The actual issue is that the compiler is free to eliminate code paths after UB, but that's also not trivial to fix (and some optimizations are actually enabled by manually injecting UB (like `__builtin_unreachable()` which can make a measurable difference in the right places).
Dead code elimination is run multiple times, including after other optimizations. So code that is not initially dead may become dead after propagating other information. Converting dead code into an error condition would make most generic code that is specialized for a particular context illegal.
This is trickier than it initially seems. Using preprocessor directives to include or exclude swaths of code is a very common thing, and implementing a compiler error as you described would break the building of countless C codebases.
enum op_t{ add, mul };
int exec(op_t op, int a, int b) {
if(op == add) { return a+b; }
if(op == mul) { return a\*b; }
}
c = exec(add, a,b);
Should be the compiler be prevented from inlining exec and constant-propagating op and removing the mul branch? What about if a and b are constants and the addition itself is optimized away?
reply