> C doesn't try to save you from making mistakes. It has very few opinions about your code and happily assumes that you know exactly what you're doing. Freedom with responsibility.
I love C because it doesn't make my life very inconvenient to protect me from stubbing my toe in it. I hate C when I stub my toe in it.
I understand where this is coming from, but I think this is less true than it used to be, and (for that reason) it often devolves into arguments about whether the C standard is the actual source of truth for what you're "really" allowed to do in C. For example, the standard says I must never:
- cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)
- allow a signed integer to overflow
- pass a NULL pointer to memcpy, even if the length is zero
- read an unitialized object, even if I "don't care" what value I get
- read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that
All of these are ways that (modern, standard) C doesn't really "do what the programmer said". A lot of big real-world projects build with flags like -fno-strict-aliasing, so that they can get away with doing these things even though the standard says they shouldn't. But then, are they really writing C or "C with custom extensions"? When we compare C to other languages, whose extensions are we talking about?
Having spent many, many years paid to write C, and with no wish to write any more now than I learned Rust, I would suggest a rewording:
"C assumes you know what you're doing, which is only a problem because you don't know what you're doing."
Periodically, especially in r/cpp I run into people who are apparently faultless and so don't make the mistakes that make these languages dangerous, weirdly none of these people seem to have written any software I can inspect to see for myself what that looks like, and furthermore the universe I live in doesn't seem to have any of the resulting software. I choose to interpret this mystery as: People are idiots and liars, but of course there could be other interpretations.
> Periodically, especially in r/cpp I run into people who are apparently faultless and so don't make the mistakes that make these languages dangerous, weirdly none of these people seem to have written any software I can inspect to see for myself what that looks like, and furthermore the universe I live in doesn't seem to have any of the resulting software.
So basically Jeff Sutherland ever since he started talking about AI. "My AI agents have formed a Scrum team that's 30 times faster than any human developer!" Great, Jeff. Working in which company's production codebase?
To be sure, my Rust has bugs in it, but none of them come close to the spooky nonsense that could happen in my C and yet the performance is excellent. Probably more than once a day Rust's compiler rejects code that an analogous C compiler would wave through - and maybe it'd survive testing too, at least for a while.
C doesn't make anything inconvenient, that's its major appeal. Some things are convenient by design, yes, but it's not trying to prevent you from doing anything. That's a feature.
It doesn't allow me to write my own memory allocator, it forces me to.
This line of argumentation reminds me of this:
Advertise and promote a shortcoming or a fault as a virtue.
For example, ultra-cheap single-use film cameras are advertised as "No Focusing Required." The truth is, no focusing is possible, because those cameras have cheap plastic fixed-focus lenses that won't move and can't be focused. What is a serious shortcoming for a camera — the inability to properly focus on the subject — is sold as a convenience: "You don't have to bother with focusing."
#define hc_task_yield(task)
do {
task->state = __LINE__;
return;
case __LINE__:;
} while (0)
That's just diabolical. I would not have thought to write "case __LINE__". In the case of a macro, using __LINE__ twice expands to the same value where the macro is used, even if the macro has newlines. It makes sense, but TIL.
I've written C on-and-off for over 30 years (just various throw-away prototypes and OS/app interaction microbenchmarks) and it took a while + a web search to get it. Diabolical indeed. Edit: And makes sense in hindsight.
As someone who has a file with similar hacks, I will say this: I am not a C++ fan, but if you find yourself writing C code where you simulate methods via structs with function pointers often, just use C++ as a basic "C with classes" at that point. You want methods anyway, you have to go through a pointer dereference to call the function, it's just not worth the code weirdness. If you have the grit to use structs with function pointers everywhere, you have the grit to stick to the simpler subset of C++.
I'm torn. The step from C to any c++ is big. Now if you want anybody to be able to use your code they need to be using c++ or you have to provide a C api anyway. On the other hand, manually implementing vtables is annoying. Ive been sticking to pure C and haven't been bothered enough to go back to any c++ yet (about 6 months on my current project). I mostly only miss templated containers so far.
> The reason I believe C is and always will be important is that it stands in a class of its own as a mostly portable assembler language, offering similar levels of freedom.
When your computer is a PDP-11, otherwise it is a high level systems language like any other.
Less controversially, when you write C, you write for a virtual machine described by the C spec, not your actual hardware.
Your C optimizer is emulating that VM when performing symbolic execution, and the compiler backend is cross-compiling from it. It's an abstract hardware that doesn't have signed overflow, has a hidden extra bit for every byte of memory that says whether it's initialized or not, etc.
Assembly-level languages let you write your own calling conventions, arrange the stack how you want, and don't make padding bytes in structs cursed.
These are all such nonsensical misinterpretations of what people mean when they say C is "low level". You absolutely don't write C for the C abstract machine, because the C spec says nothing about performance, whereas performance is one of the primary reasons people write C.
The existence of undefined behaviour isn't proof that there is a C "virtual machine" that code is being run on. Undefined behaviour is a relaxation of requirements on the compiler. The C abstract machine doesn't not have signed overflow, rather it allows the compiler to do what it likes when signed overflow is encountered. This is originally a concession to portability, since the common saying is not that C is close to assembly, but rather that it is "portable" assembler. It is kept around because it benefits performance, which is again one of the primary reasons people write C.
C performance exists thanks to UB, and the value optimising compilers extract out of it, during the 8 and 16 bit home computers days any average Assembly developer could write better code than C compiler were able to spit out.
I'm not trying to prove a novel concept, just explain how the C spec thinks about C:
> The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.
This belief that C targets the hardware directly makes C devs frustrated that UB seems like an intentional trap added by compilers that refuse to "just" do what the target CPU does.
The reality is that front-end/back-end split in compilers gave us the machine from the C spec as its own optimization target with its own semantics.
Before C got formalised in this form, it wasn't very portable beyond PDP. C was too opinionated and bloated for 8-bit computers. It wouldn't assume 8-bit bytes (because PDP-11 didn't have them), but it did assume linear memory (even though most 16-bit CPUs didn't have it). All those "checking wetness of water... wet" checks in ./configure used to have a purpose!
Originally C didn't count as an assembly any more than asm.js does today. C was too abstract to let programmers choose addressing modes and use flags back when these mattered (e.g. you could mark a variable as `register`, but not specifically as an A register on 68K). C was too high level for tricks like self-modifying code (pretty standard practice where performance mattered until I-cache and OoO killed it).
C is now a portable assembly more because CPUs that didn't fit C's model have died out (VLIW) or remained non-standard specialized targets (SIMT).
> Less controversially, when you write C, you write for a virtual machine described by the C spec, not your actual hardware.
Isn't this true for most higher level languages as well? C++ for instance builds on top of C and many languages call into and out of C based libraries. Go might be slightly different as it is interacting with slightly less C code (especially if you avoid CGO).
That's a curious remark, although I guess it doesn't look high level from the eyes of someone looking at programming languages today.
C has always been classed as a high level language since its inception. That term's meaning has shifted though. When C was created, it wasn't assembly (middle) or directly writing CPU op codes in binary/hex (low level).
> Describing C as "high-level" seems like deliberate abuse of the term
Honestly it doesn't really matter. High level and low level are relative to each-other (and machine language), and nothing changes based on what label you use.
While C was adapted to the PDP-11, this was adding byte-level memory access. Otherwise I do no think there is anything in C specific to the PDP-11, or what would this be?
What makes C low-level is that it can work directly with the representation of objects in memory. This has nothing to do with CPU features, but with direct interoperability with other components of a system. And this is what C can do better than any other language: solve problems by being a part of a more complex system.
The post-increment and post-decrement operators mapped directly onto PDP-11 CPU addressing modes.
The integral promotion rules come directly from the PDP-11 CPU instruction set.
If I recall correctly so does the float->double promotions.
CPUs started adapting to C semantics around the mid-80's. CPU designers would profile C generated code and change to be able to more efficiently run it.
Thanks. I guess the integral promotion is related to byte-addressing. If you have bytes but can not directly do arithmetic on them, promoting them to word size seems natural.
Can you elaborate? C constructs generally map to one or a few assembly instructions at most. You can easily look at C and predict the generated assembly. This is in contrast to other compiled languages, like Go, that inject instructions for garbage collection and other runtime features.
Yeah, people keep repeating that like a broken record lately, it smells like Rust to me.
No one is claiming it was built for today's processors, just that it puts less obstacles between you and the hardware than almost any other language. Assembler and Forth being the two I'm familiar with.
If a language is unpopular, people won't want to work for you and you'll run into poor support. Rewriting a library may take months of dev time, whereas C has an infinite number of libraries to work with and examples to look at.
Being old doesn't mean anyone knows the language. I mean if the language predates C significantly and nobody uses is then there's probably a really good for it. The goalposts aren't moving they're just missing the shot
C++ for one - it has atomics with well defined memory barriers, and guarentees for what happens around them.
The real answer is obviously Assembly - pick a random instruction from any random modern CPU and I'd wager there's a 95% chance it's something you can't express in C at all. If the goal is to model hardware (it's not), it's doing a terrible job.
C lacks sympathy with nearly all additions to hardware capabilities since the late 80s. And it's only with the addition of atomics that it earns the qualification of "nearly". The only thing that makes it appear as lower level than other languages is the lack of high-level abstraction capabilities, not any special affinity for the hardware.
For one, would expect that a low level language wouldn't be so completely worthless at bit twiddling. Another thing, if C is so low level, why can't I define a new calling convention optimized for my use case? Why doesn't C have a rich library for working with SIMD types that has been ubiquitous in processors for 25 years?
It puts less obstacles in the way of dealing with hardware than almost any other language for sure.
What's standardized was never as important in C land, at least traditionally, which I guess partly explains why it's trailing so far behind. But the stability of the language is also one of its features.
simd doesnt make much sense as a standard feature/library for a general purpose language.
If you're doing simd its because you're doing something particular for a particular machine and you want to leverage platform specific instructions, so thats why intrinsics (or hell, even externally linked blobs written in asm) is the way to go and C supports that just fine.
But sure, if all youre doing is dot products I guess you can write a standard function that will work on most simd platforms, but who cares, use a linalg library instead.
Like, say I have a data structure that is four bits wide (consisting of a couple of flags or something) and I want to make an array of them and access them randomly. What help do I get from C to do this? C says "fuck you".
Pick an appropriate base type (uintN_t) for a bitset, make an array of those (K * N/4) and write a couple inline functions or macros to set and clear those bits.
> Using a stricter language helps with reducing some classes of bugs, at the cost of reduced flexibility in expressing a solution and increased effort creating the software.
First of all, those languages do not "help" "reducing" some classes of bugs. They often entirely remove them.
Then, even assuming that any safe language with unsafe regions (Rust, C#, etc) would not give you comparable flexibility at a fraction of the risk... if your flexible, effortless solution contains entire classes of bugs, then there is no point in comparing "effort". You should at least take into account the effort in providing a software with a high confidence that those bugs are not there.
No amount of chest-thumping about how good of a programmer you are and telling everyone else to, "get good," has had any effect on the rate of CVE's cause by memory safety bugs that are trivial to introduce in a C program.
There are good reasons to use C. It's best to approach it with a clear mind and a practical understanding of its limitations. Be prepared to mitigate those short comings. It's no small task!
I am not sure the number of CVEs measures anything meaningful. The price for zero-days for important targets goes into the millions.
While I am sure there can not be enough security, I am not at all sure the extreme focus on memory safety is worth it, and I am also not sure the added complexity of Rust is really worth it. I would prefer to simplify the stack and make C safer.
If that's your preference you're going about it all wrong. Rust's safety is about culture and you're looking at the technology, it's not that Rust doesn't have technology but the technology isn't where you start.
This was the only notable failing of Sean's (abandoned) "Safe C++" - it delivers all the technology a safe C++ culture would have needed, but there is no safe C++ culture so it was waved away as unimportant.
The guy whose mine loses fifty miners in a roof collapse doesn't need better mining technology, inadequate technology isn't why those miners died, culture is. His mine didn't have safety culture, probably because he didn't give shit about safety, and his workers either shared this dismissal or had no choice in the matter.
Also "extreme focus" is a misinterpretation. It's not an extreme focus, it's just mandatory, it's like if you said humans have an "extreme focus" on breathing air, they really don't - they barely even think about breathing air - it was just mandatory so if you don't do it then I guess that stands out.
Let's turn it around: Do you think the mining guy that does not care about safety will start caring about a safety culture because there is a new safety tool? And if it is mandated by government, will it be implemented in a meaningful way, or just on paper?
It will certainly be implemented in a meaningful way, if the consequences for the mining guy are hard enough that there won't be a second failure done by the same person.
Hence why I am so into cybersecurity laws, and if this is the only way to make C and C++ communities embrace a safety culture, instead of downplaying it as straitjacket programming like in the C vs Pascal/Modula-2 Usenet discussion days, then so be it.
So there's a funny thing about mouthing the words, the way the human mind works the easiest way to explain to ourselves why we're mouthing the words is that we agree with them. And so in that sense what seems like a useless paper exercise can be effective.
Also, relevantly here, nobody actually wants these terrible bugs. This is not A or B, Red or Blue, this is very much Cake or Death and like, there just aren't any people queueing up for Death, there are people who don't particularly want Cake but that's not the same thing at all.
At some point, in order to make C safer, you're going to have to introduce some way of writing a more formal specification of the stack, heap and the lifetime of references into the language.
Maybe that could be through a type system. Maybe that could be through a more capable run-time system. We've tried these avenues through other languages, through experimental compilers, etc.
Without introducing anything new to the language we have a plethora of tools at our disposal:
- Coq + Iris, or some other proof automation framework with separation logic.
- TLA+, Alloy, or some form of model checking where proofs are too burdensome/unnecessary
- AFL, Valgrind and other testing and static analysis tools
- Compcert: formally verified compilers
- MISRA and other coding guidelines
... and all of this to be used in tandem in order to really say that for the parts specified and tested, we're confident there are no use-after-free memory errors or leaks. That is a lot of effort in order to make that statement. The vast, vast majority of software out there won't even use most of these tools. Most software developers argue that they'll never use formal methods in industry because it's just too hard. Maybe they'll use Valgrind if you're lucky.
Or -- you could add something to the language in order to prevent at least some of the errors by definition.
I'm not a big Rust user. Maybe it's not great and is too difficult to use, I don't know. And I do like C. I just think people need to be aware that writing safe C is really expensive and time consuming, difficult and nothing is guaranteed. It might be worth the effort to learn Rust or use another language and at least get some guarantees; it's probably not as hard as writing safe C.
(Maybe not as safe as using Rust + formal methods, but at least you'll be forced to think about your specification up front before your code goes into production... and where you do have unsafe code, hopefully it will be small and not too hard to verify for correctness)
I suppose the better response is that it removes those classes of bugs where they are absolutely unnecessary. Tricky code will always be tricky, but in the straightforward 80% (or more) of your code such bugs can be completely eliminated.
It's unfortunate that C has so many truly unnecessary bugs which are only caused by stupid overly "clever" exploitation of undefined behaviour by compilers.
The ones GP is referring to all go away when you use -O0. They're completely artificially constructed by compiler writers language-lawyering the language. They were unforeseeable to the people who actually wrote the language, who expected interpretations like "dereferencing null crashes the program" or "dereferencing null accesses the interrupt vector table" and absolutely were not expecting "dereferencing null deletes the previous three lines of code"
> Predictable response: "But they can only occur in unsafe regions which you can grep for" and my response to that: "so?"
The situation is both worse than this and better than this. Consider the .set_len() method on Rust's Vec. It's unsafe, because you could just .set_len(1_000_000) and then the Vec would happily let you try to read the nonexistent millionth element and segfault. However, if you could edit the standard library sources, you could add this new method to Vec without touching any unsafe code:
This is exactly the same as the real set_len, except it's a "fn" instead of an "unsafe fn". Now the Vec API is totally broken, and safe callers can corrupt memory. Also critically, we didn't write any unsafe code in "set_len_totally_safe_i_promise". The key detail is that this new method has access to the private self.len field of Vec that unsafe blocks in the same module rely on.
In other words, grepping for all the unsafe blocks isn't sufficient for saying that a program is UB-free. You also have to make sure that none of the safe code ever violates an invariant that the unsafe blocks rely on. Read the comments, think really hard, etc.
So...what's the point of all this? The point is that it lets us define a notion of "soundness", such that if I only write safe code, and I only use libraries that are "sound", we can guarantee that my program is UB-free. In other words, any UB in my program would necessarily point to a bug in one of my dependencies, in the stdlib, or in the compiler. (Or you know, in the hardware, or in mathematics itself.) In other other words, instead of auditing my entire gigantic (safe) program for UB, we can reduce the problem to auditing my dependencies for soundness. Crucially, this decouples the difficulty of the problem from the size of my program. This wouldn't be very interesting if "safe code" was some impoverished subset, like "unsigned integer arithmetic only". But in fact safe code can use pointers, tagged unions, pointers into tagged unions, heap allocation/freeing, and multithreading. Lots of large, complicated, useful, real-world programs are written in 100% safe code. Here the version of this story with all the caveats and footnotes: https://jacko.io/safety_and_soundness.html
No matter whether you are using C for "freedom" or "flexibility" of "power", 95% of the time you only need that in a very small portion of your codebase. You almost definitely do _not_ need any of that in, say, the logic to parse CLI arguments or config files, which however is a prime example of a place where vulnerabilities are known to happen.
Which is in the past I would reach out to something like Perl on its heyday, given its coverage of UNIX API as part of the standard library, for anything manipulating CLI tools or config files.
Nowadays pick your scripting language, and if C is really needed, cleanly placing it in a loadable module with all security invariants into that scripting, or managed language, instead of 100% pure C source.
Usually they can also happen outside, if you did something wrong in the unsafe region.
edit: I'm sorry that my captain obvious moment is turning out to be some truth bomb for some. Please keep downvoting as a way to regain your inner peace.
Some points about the introduction, but otherwise this seems like an interesting collection of (slightly deranged?) patterns in C.
> The truth is that any reasonably complicated software system created by humans will have bugs, regardless of what technology was used to create it.
"Drivers wearing seatbelts still die in car accidents and in some cases seatbelts prevent drivers from getting out of the wreckage so we're better off without them." This is cope.
> Using a stricter language helps with reducing some classes of bugs, at the cost of reduced flexibility in expressing a solution and increased effort creating the software.
...and a much smaller effort debugging the software. A logic error is much easier to reason about than memory corruption or race condition on shared memory. The time you spend designing your system and handling the errors upfront pays dividends later when you get the inevitable errors.
I'm not saying that all software should be rewritten in memory-safe languages, but I'd rather those who choose to use the only language where this kind of errors regularly happens be honest about it.
However there is no official roadmap regarding C23 support, and now with the whole safety discussion going on and Secure Future Initiative, probably will never happen.
Additionally clang is a blessed compiler at Microsoft, it is included on Visual Studio, so whatever MSVC doesn't support can be done in clang as alternative.
They have added one feature (typeof) from C23, so maybe they will add the rest when they release C++26. Or maybe they won't. Microsoft is an expert in inflicting the cruelty of providing just enough hope.
C++26? There are having issues with delivering C++23, since the whole change in security focus with Rust, Go, C#, Java first, C and C++ for existing codebases, and most likely one of the reasons Herb Sutter is no longer at Microsoft.
Oh wow, I don't write C++, so I didn't know how bad the situation was. My recollection that MSVC always implemented C++ standards posthaste is clearly outdated.
Yup, we are never getting C23. Good thing C11 is decent enough, I guess.
I think he's referring to C specifically, not C++. It's true that modern versions of MSVC are compliant (and they're also typically faster at implementing features than gcc and clang), but for the longest time there were subtle differences in their C library. To this day I don't think they support VLAs, which are technically standard C (At least until recently, I'm not sure about the latest versions, hopefully someone more knowledgeable can say more).
For C (not C++), MSVC got C17 in 2020, apart from VLAs - which are never planned. No real roadmap for if/when it will get C23 - which is not just fully implemented in GCC, but the default used standard.
For those also wondering like myself, this refers to hacker in the whiny "security hacking is only called cracking Reeeee" manner, so this is just aimed at programmers and not security professionals.
> C doesn't try to save you from making mistakes. It has very few opinions about your code and happily assumes that you know exactly what you're doing. Freedom with responsibility.
I love C because it doesn't make my life very inconvenient to protect me from stubbing my toe in it. I hate C when I stub my toe in it.
> It has very few opinions about your code
I understand where this is coming from, but I think this is less true than it used to be, and (for that reason) it often devolves into arguments about whether the C standard is the actual source of truth for what you're "really" allowed to do in C. For example, the standard says I must never:
- cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)
- allow a signed integer to overflow
- pass a NULL pointer to memcpy, even if the length is zero
- read an unitialized object, even if I "don't care" what value I get
- read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that
All of these are ways that (modern, standard) C doesn't really "do what the programmer said". A lot of big real-world projects build with flags like -fno-strict-aliasing, so that they can get away with doing these things even though the standard says they shouldn't. But then, are they really writing C or "C with custom extensions"? When we compare C to other languages, whose extensions are we talking about?
I've heard it put another way that I enjoyed: "C assumes you know what you're doing, which is only a problem if you don't know what you're doing."
Having spent many, many years paid to write C, and with no wish to write any more now than I learned Rust, I would suggest a rewording:
"C assumes you know what you're doing, which is only a problem because you don't know what you're doing."
Periodically, especially in r/cpp I run into people who are apparently faultless and so don't make the mistakes that make these languages dangerous, weirdly none of these people seem to have written any software I can inspect to see for myself what that looks like, and furthermore the universe I live in doesn't seem to have any of the resulting software. I choose to interpret this mystery as: People are idiots and liars, but of course there could be other interpretations.
> Periodically, especially in r/cpp I run into people who are apparently faultless and so don't make the mistakes that make these languages dangerous, weirdly none of these people seem to have written any software I can inspect to see for myself what that looks like, and furthermore the universe I live in doesn't seem to have any of the resulting software.
So basically Jeff Sutherland ever since he started talking about AI. "My AI agents have formed a Scrum team that's 30 times faster than any human developer!" Great, Jeff. Working in which company's production codebase?
Yeah, well, as stated: software written by humans will have bugs.
The real danger with Rust is the cult like delusion that's not the case for them.
To be sure, my Rust has bugs in it, but none of them come close to the spooky nonsense that could happen in my C and yet the performance is excellent. Probably more than once a day Rust's compiler rejects code that an analogous C compiler would wave through - and maybe it'd survive testing too, at least for a while.
No, it just makes it inconvenient to try to protect yourself from stubbing your toe in it.
C doesn't make anything inconvenient, that's its major appeal. Some things are convenient by design, yes, but it's not trying to prevent you from doing anything. That's a feature.
> C doesn't make anything inconvenient
Other than writing memory safe code, as history has shown.
Difficult, not inconvenient.
Because it allows things that are difficult, like writing your own memory allocators.
If you don't like working at that difficulty level, then C programming isn't for you. And that's fine.
It doesn't allow me to write my own memory allocator, it forces me to.
This line of argumentation reminds me of this:
Advertise and promote a shortcoming or a fault as a virtue.
For example, ultra-cheap single-use film cameras are advertised as "No Focusing Required." The truth is, no focusing is possible, because those cameras have cheap plastic fixed-focus lenses that won't move and can't be focused. What is a serious shortcoming for a camera — the inability to properly focus on the subject — is sold as a convenience: "You don't have to bother with focusing."
https://orangepapers.eth.limo/orange-propaganda.html#make_vi...
Oh, very much likewise, but there's always two sides to a coin.
Usually stubbing your toe does not take your whole leg.
Minor correction, macros CANT have newlines, you need to splice them during preprocessing using \ followed by a new line, the actual code has these:
from https://github.com/codr7/hacktical-c/blob/main/macro/macro.h
#define hc_align(base, size) ({ \ __auto_type _base = base; \ __auto_type _size = hc_min((size), _Alignof(max_align_t)); \ (_base) + _size - ((ptrdiff_t)(_base)) % _size; \ }) \
After preprocessing it is a single line.
We might get multi-line macros in C2y standard: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3524.txt
Credit to Simon Tatham
https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
I knew the name sounded familiar:
Simon Tatham's Portable Puzzle Collection https://www.chiark.greenend.org.uk/~sgtatham/puzzles/
With GNU extensions, you can make a simpler coroutine macro without switch/case abuse:
I've written C on-and-off for over 30 years (just various throw-away prototypes and OS/app interaction microbenchmarks) and it took a while + a web search to get it. Diabolical indeed. Edit: And makes sense in hindsight.
As someone who has a file with similar hacks, I will say this: I am not a C++ fan, but if you find yourself writing C code where you simulate methods via structs with function pointers often, just use C++ as a basic "C with classes" at that point. You want methods anyway, you have to go through a pointer dereference to call the function, it's just not worth the code weirdness. If you have the grit to use structs with function pointers everywhere, you have the grit to stick to the simpler subset of C++.
I'm torn. The step from C to any c++ is big. Now if you want anybody to be able to use your code they need to be using c++ or you have to provide a C api anyway. On the other hand, manually implementing vtables is annoying. Ive been sticking to pure C and haven't been bothered enough to go back to any c++ yet (about 6 months on my current project). I mostly only miss templated containers so far.
Why? I do not find the syntactic sugar C++ adds very helpful and it misses other C features.
Nope, not from my experience.
Because in C++ the features are just there right around the corner, they will seep into the code base.
And I don't want even classes, there's too much junk in there that I don't need.
> The reason I believe C is and always will be important is that it stands in a class of its own as a mostly portable assembler language, offering similar levels of freedom.
When your computer is a PDP-11, otherwise it is a high level systems language like any other.
Less controversially, when you write C, you write for a virtual machine described by the C spec, not your actual hardware.
Your C optimizer is emulating that VM when performing symbolic execution, and the compiler backend is cross-compiling from it. It's an abstract hardware that doesn't have signed overflow, has a hidden extra bit for every byte of memory that says whether it's initialized or not, etc.
Assembly-level languages let you write your own calling conventions, arrange the stack how you want, and don't make padding bytes in structs cursed.
These are all such nonsensical misinterpretations of what people mean when they say C is "low level". You absolutely don't write C for the C abstract machine, because the C spec says nothing about performance, whereas performance is one of the primary reasons people write C.
The existence of undefined behaviour isn't proof that there is a C "virtual machine" that code is being run on. Undefined behaviour is a relaxation of requirements on the compiler. The C abstract machine doesn't not have signed overflow, rather it allows the compiler to do what it likes when signed overflow is encountered. This is originally a concession to portability, since the common saying is not that C is close to assembly, but rather that it is "portable" assembler. It is kept around because it benefits performance, which is again one of the primary reasons people write C.
C performance exists thanks to UB, and the value optimising compilers extract out of it, during the 8 and 16 bit home computers days any average Assembly developer could write better code than C compiler were able to spit out.
And also because it doesn't get in your way of doing exactly what you want to do.
If that was true then the optimizers wouldn't need to exist in the first place.
Compared to the alternatives.
It gets very frustrating to communicate at this level.
The alternatives outside Bell Labs were just as capable.
I don't think compilers allowing trash through is a good thing.
I'm not trying to prove a novel concept, just explain how the C spec thinks about C:
> The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.
This belief that C targets the hardware directly makes C devs frustrated that UB seems like an intentional trap added by compilers that refuse to "just" do what the target CPU does.
The reality is that front-end/back-end split in compilers gave us the machine from the C spec as its own optimization target with its own semantics.
Before C got formalised in this form, it wasn't very portable beyond PDP. C was too opinionated and bloated for 8-bit computers. It wouldn't assume 8-bit bytes (because PDP-11 didn't have them), but it did assume linear memory (even though most 16-bit CPUs didn't have it). All those "checking wetness of water... wet" checks in ./configure used to have a purpose!
Originally C didn't count as an assembly any more than asm.js does today. C was too abstract to let programmers choose addressing modes and use flags back when these mattered (e.g. you could mark a variable as `register`, but not specifically as an A register on 68K). C was too high level for tricks like self-modifying code (pretty standard practice where performance mattered until I-cache and OoO killed it).
C is now a portable assembly more because CPUs that didn't fit C's model have died out (VLIW) or remained non-standard specialized targets (SIMT).
> Less controversially, when you write C, you write for a virtual machine described by the C spec, not your actual hardware.
Isn't this true for most higher level languages as well? C++ for instance builds on top of C and many languages call into and out of C based libraries. Go might be slightly different as it is interacting with slightly less C code (especially if you avoid CGO).
> When your computer is a PDP-11, otherwise it is a high level systems language like any other.
Describing C as "high-level" seems like deliberate abuse of the term. The virtual machine abstraction doesn't imply any benefits to the developer.
That's a curious remark, although I guess it doesn't look high level from the eyes of someone looking at programming languages today.
C has always been classed as a high level language since its inception. That term's meaning has shifted though. When C was created, it wasn't assembly (middle) or directly writing CPU op codes in binary/hex (low level).
Neither does pretending C is a macro Assembler.
> Describing C as "high-level" seems like deliberate abuse of the term
Honestly it doesn't really matter. High level and low level are relative to each-other (and machine language), and nothing changes based on what label you use.
Best thing to do is shrug and say "ok".
While C was adapted to the PDP-11, this was adding byte-level memory access. Otherwise I do no think there is anything in C specific to the PDP-11, or what would this be?
What makes C low-level is that it can work directly with the representation of objects in memory. This has nothing to do with CPU features, but with direct interoperability with other components of a system. And this is what C can do better than any other language: solve problems by being a part of a more complex system.
The post-increment and post-decrement operators mapped directly onto PDP-11 CPU addressing modes.
The integral promotion rules come directly from the PDP-11 CPU instruction set.
If I recall correctly so does the float->double promotions.
CPUs started adapting to C semantics around the mid-80's. CPU designers would profile C generated code and change to be able to more efficiently run it.
Thanks. I guess the integral promotion is related to byte-addressing. If you have bytes but can not directly do arithmetic on them, promoting them to word size seems natural.
Can you elaborate? C constructs generally map to one or a few assembly instructions at most. You can easily look at C and predict the generated assembly. This is in contrast to other compiled languages, like Go, that inject instructions for garbage collection and other runtime features.
See my list of languages on a sibling thread, same applies to those, nothing special about C there.
Yeah, people keep repeating that like a broken record lately, it smells like Rust to me.
No one is claiming it was built for today's processors, just that it puts less obstacles between you and the hardware than almost any other language. Assembler and Forth being the two I'm familiar with.
Because people keep repeating the urban myth of portable assembler and being the very first systems programming language.
One of the very first systems programming languages was JOVIAL, from 1958. C's inventors were still finalising their studies.
Which other popular language more accurately represents a random access machine of fixed word length?
I don't know, Ada, Modula-2, Object Pascal, PL/I, NEWP, PL.8, D, Zig, Mesa, ATS,....
But then again, you booby trapped the question with popular language.
If a language is unpopular, people won't want to work for you and you'll run into poor support. Rewriting a library may take months of dev time, whereas C has an infinite number of libraries to work with and examples to look at.
Moving goalposts regarding systems programming languages features, some on the group predate C by a decade.
Being old doesn't mean anyone knows the language. I mean if the language predates C significantly and nobody uses is then there's probably a really good for it. The goalposts aren't moving they're just missing the shot
Popularity isn't a measure of quality. Never has been and certainly not in the case of programming languages.
There is unpopular - and then there is can I get a working toolchain for modern OS that’s not emulated.
Still not a measure of quality.
Are we having a discussion about the greatest language of all time? What’s your context here.
Many of those languages do not have pointers - which are fundamental to how modern instruction sets work.
Yes they do, point an example from that group, and I will gladly prove you wrong.
Well sounds like you are confident and we are going to get into a semantic argument about what qualifies as a pointer.
So which of these languages do you think is a better representation of hardware and not a PDP-11?
Better representation of the hardware?
None of them, you use Assembly if you want the better representation of hardware.
Yes, I am quite confident, because I have been dispelling the C myth of the true and only systems programming language since the 1990's.
So then your comment about C being an outdated PDP-11 must be equally true of other languages. So it says nothing.
None, but that'a not what computers are. C assumes that in a few places, e.g. variadic functions, and those are the worst parts of the language.
> but that'a not what computers are
Which language more accurately represents hardware then?
C++ for one - it has atomics with well defined memory barriers, and guarentees for what happens around them.
The real answer is obviously Assembly - pick a random instruction from any random modern CPU and I'd wager there's a 95% chance it's something you can't express in C at all. If the goal is to model hardware (it's not), it's doing a terrible job.
C has the same atomics and concurrency model as C++.
C++ better represents the machine?
Assembly language from the hardware vendor.
isn't it translated to microcode before being executed?
Depends on the hardware design.
[flagged]
bro just quoted a chatbot
C lacks sympathy with nearly all additions to hardware capabilities since the late 80s. And it's only with the addition of atomics that it earns the qualification of "nearly". The only thing that makes it appear as lower level than other languages is the lack of high-level abstraction capabilities, not any special affinity for the hardware.
For one, would expect that a low level language wouldn't be so completely worthless at bit twiddling. Another thing, if C is so low level, why can't I define a new calling convention optimized for my use case? Why doesn't C have a rich library for working with SIMD types that has been ubiquitous in processors for 25 years?
It puts less obstacles in the way of dealing with hardware than almost any other language for sure.
What's standardized was never as important in C land, at least traditionally, which I guess partly explains why it's trailing so far behind. But the stability of the language is also one of its features.
It also has pointers which are absent from most languages but essential to instruction sets.
Lots of languages since the 1950's have pointers.
simd doesnt make much sense as a standard feature/library for a general purpose language. If you're doing simd its because you're doing something particular for a particular machine and you want to leverage platform specific instructions, so thats why intrinsics (or hell, even externally linked blobs written in asm) is the way to go and C supports that just fine.
But sure, if all youre doing is dot products I guess you can write a standard function that will work on most simd platforms, but who cares, use a linalg library instead.
Like, say I have a data structure that is four bits wide (consisting of a couple of flags or something) and I want to make an array of them and access them randomly. What help do I get from C to do this? C says "fuck you".
Pick an appropriate base type (uintN_t) for a bitset, make an array of those (K * N/4) and write a couple inline functions or macros to set and clear those bits.
Only if you don't know C.
Otherwise is says, do whatever you feel like.
Honest q: after skimming through the book it's unclear how it's targeted towards hackers (c.f. academics)?
Defined as practical, curious problem solvers, I'm aware the word has other interpretations.
Any quick way to make a PDF out of this?
You'll need pandoc and xelatex
> Using a stricter language helps with reducing some classes of bugs, at the cost of reduced flexibility in expressing a solution and increased effort creating the software.
First of all, those languages do not "help" "reducing" some classes of bugs. They often entirely remove them.
Then, even assuming that any safe language with unsafe regions (Rust, C#, etc) would not give you comparable flexibility at a fraction of the risk... if your flexible, effortless solution contains entire classes of bugs, then there is no point in comparing "effort". You should at least take into account the effort in providing a software with a high confidence that those bugs are not there.
No amount of chest-thumping about how good of a programmer you are and telling everyone else to, "get good," has had any effect on the rate of CVE's cause by memory safety bugs that are trivial to introduce in a C program.
There are good reasons to use C. It's best to approach it with a clear mind and a practical understanding of its limitations. Be prepared to mitigate those short comings. It's no small task!
I am not sure the number of CVEs measures anything meaningful. The price for zero-days for important targets goes into the millions.
While I am sure there can not be enough security, I am not at all sure the extreme focus on memory safety is worth it, and I am also not sure the added complexity of Rust is really worth it. I would prefer to simplify the stack and make C safer.
If that's your preference you're going about it all wrong. Rust's safety is about culture and you're looking at the technology, it's not that Rust doesn't have technology but the technology isn't where you start.
This was the only notable failing of Sean's (abandoned) "Safe C++" - it delivers all the technology a safe C++ culture would have needed, but there is no safe C++ culture so it was waved away as unimportant.
The guy whose mine loses fifty miners in a roof collapse doesn't need better mining technology, inadequate technology isn't why those miners died, culture is. His mine didn't have safety culture, probably because he didn't give shit about safety, and his workers either shared this dismissal or had no choice in the matter.
Also "extreme focus" is a misinterpretation. It's not an extreme focus, it's just mandatory, it's like if you said humans have an "extreme focus" on breathing air, they really don't - they barely even think about breathing air - it was just mandatory so if you don't do it then I guess that stands out.
Let's turn it around: Do you think the mining guy that does not care about safety will start caring about a safety culture because there is a new safety tool? And if it is mandated by government, will it be implemented in a meaningful way, or just on paper?
It will certainly be implemented in a meaningful way, if the consequences for the mining guy are hard enough that there won't be a second failure done by the same person.
Hence why I am so into cybersecurity laws, and if this is the only way to make C and C++ communities embrace a safety culture, instead of downplaying it as straitjacket programming like in the C vs Pascal/Modula-2 Usenet discussion days, then so be it.
So there's a funny thing about mouthing the words, the way the human mind works the easiest way to explain to ourselves why we're mouthing the words is that we agree with them. And so in that sense what seems like a useless paper exercise can be effective.
Also, relevantly here, nobody actually wants these terrible bugs. This is not A or B, Red or Blue, this is very much Cake or Death and like, there just aren't any people queueing up for Death, there are people who don't particularly want Cake but that's not the same thing at all.
At some point, in order to make C safer, you're going to have to introduce some way of writing a more formal specification of the stack, heap and the lifetime of references into the language.
Maybe that could be through a type system. Maybe that could be through a more capable run-time system. We've tried these avenues through other languages, through experimental compilers, etc.
Without introducing anything new to the language we have a plethora of tools at our disposal:
- Coq + Iris, or some other proof automation framework with separation logic.
- TLA+, Alloy, or some form of model checking where proofs are too burdensome/unnecessary
- AFL, Valgrind and other testing and static analysis tools
- Compcert: formally verified compilers
- MISRA and other coding guidelines
... and all of this to be used in tandem in order to really say that for the parts specified and tested, we're confident there are no use-after-free memory errors or leaks. That is a lot of effort in order to make that statement. The vast, vast majority of software out there won't even use most of these tools. Most software developers argue that they'll never use formal methods in industry because it's just too hard. Maybe they'll use Valgrind if you're lucky.
Or -- you could add something to the language in order to prevent at least some of the errors by definition.
I'm not a big Rust user. Maybe it's not great and is too difficult to use, I don't know. And I do like C. I just think people need to be aware that writing safe C is really expensive and time consuming, difficult and nothing is guaranteed. It might be worth the effort to learn Rust or use another language and at least get some guarantees; it's probably not as hard as writing safe C.
(Maybe not as safe as using Rust + formal methods, but at least you'll be forced to think about your specification up front before your code goes into production... and where you do have unsafe code, hopefully it will be small and not too hard to verify for correctness)
Update: fixed markup
The problem is not tools don't exist, lint was created in 1979 at Bell Labs after all.
It is the lack of culture to use them unless there is a goverment mandate to impose them, like in high critical computing.
I agree.
Definitely, but the idea is that its unique feature set is worth it.
Yeah, there are still good reasons to use it.
So use Rust, fine by me.
I might too some day, who knows.
If the language has unsafe regions, it doesn't entirely remove classes of bugs, since they can still occur in unsafe regions.
(Predictable response: "But they can only occur in unsafe regions which you can grep for" and my response to that: "so?")
I suppose the better response is that it removes those classes of bugs where they are absolutely unnecessary. Tricky code will always be tricky, but in the straightforward 80% (or more) of your code such bugs can be completely eliminated.
It's unfortunate that C has so many truly unnecessary bugs which are only caused by stupid overly "clever" exploitation of undefined behaviour by compilers.
Unfortunate, yes.
But what bugs? Suboptimal choices maybe; but any backwards compatible, popular language is going to have its share of those.
The ones GP is referring to all go away when you use -O0. They're completely artificially constructed by compiler writers language-lawyering the language. They were unforeseeable to the people who actually wrote the language, who expected interpretations like "dereferencing null crashes the program" or "dereferencing null accesses the interrupt vector table" and absolutely were not expecting "dereferencing null deletes the previous three lines of code"
Which I would definitely recommend as a strong default.
> Predictable response: "But they can only occur in unsafe regions which you can grep for" and my response to that: "so?"
The situation is both worse than this and better than this. Consider the .set_len() method on Rust's Vec. It's unsafe, because you could just .set_len(1_000_000) and then the Vec would happily let you try to read the nonexistent millionth element and segfault. However, if you could edit the standard library sources, you could add this new method to Vec without touching any unsafe code:
This is exactly the same as the real set_len, except it's a "fn" instead of an "unsafe fn". Now the Vec API is totally broken, and safe callers can corrupt memory. Also critically, we didn't write any unsafe code in "set_len_totally_safe_i_promise". The key detail is that this new method has access to the private self.len field of Vec that unsafe blocks in the same module rely on.In other words, grepping for all the unsafe blocks isn't sufficient for saying that a program is UB-free. You also have to make sure that none of the safe code ever violates an invariant that the unsafe blocks rely on. Read the comments, think really hard, etc.
So...what's the point of all this? The point is that it lets us define a notion of "soundness", such that if I only write safe code, and I only use libraries that are "sound", we can guarantee that my program is UB-free. In other words, any UB in my program would necessarily point to a bug in one of my dependencies, in the stdlib, or in the compiler. (Or you know, in the hardware, or in mathematics itself.) In other other words, instead of auditing my entire gigantic (safe) program for UB, we can reduce the problem to auditing my dependencies for soundness. Crucially, this decouples the difficulty of the problem from the size of my program. This wouldn't be very interesting if "safe code" was some impoverished subset, like "unsigned integer arithmetic only". But in fact safe code can use pointers, tagged unions, pointers into tagged unions, heap allocation/freeing, and multithreading. Lots of large, complicated, useful, real-world programs are written in 100% safe code. Here the version of this story with all the caveats and footnotes: https://jacko.io/safety_and_soundness.html
You still need to audit the safe part for other bugs...
But yes, this is nice and we should (and probably will) have a safe mode in C too.
No matter whether you are using C for "freedom" or "flexibility" of "power", 95% of the time you only need that in a very small portion of your codebase. You almost definitely do _not_ need any of that in, say, the logic to parse CLI arguments or config files, which however is a prime example of a place where vulnerabilities are known to happen.
Which is in the past I would reach out to something like Perl on its heyday, given its coverage of UNIX API as part of the standard library, for anything manipulating CLI tools or config files.
Nowadays pick your scripting language, and if C is really needed, cleanly placing it in a loadable module with all security invariants into that scripting, or managed language, instead of 100% pure C source.
My solution since early 2000's.
Agreed, there's a lot to win from gluing C to a more protected language, I'm a fan of embedding a scripting language.
Usually they can also happen outside, if you did something wrong in the unsafe region.
edit: I'm sorry that my captain obvious moment is turning out to be some truth bomb for some. Please keep downvoting as a way to regain your inner peace.
> if you did something wrong in the unsafe region.
*you or anyone else in your chain of dependencies that use unsafe
Some points about the introduction, but otherwise this seems like an interesting collection of (slightly deranged?) patterns in C.
> The truth is that any reasonably complicated software system created by humans will have bugs, regardless of what technology was used to create it.
"Drivers wearing seatbelts still die in car accidents and in some cases seatbelts prevent drivers from getting out of the wreckage so we're better off without them." This is cope.
> Using a stricter language helps with reducing some classes of bugs, at the cost of reduced flexibility in expressing a solution and increased effort creating the software.
...and a much smaller effort debugging the software. A logic error is much easier to reason about than memory corruption or race condition on shared memory. The time you spend designing your system and handling the errors upfront pays dividends later when you get the inevitable errors.
I'm not saying that all software should be rewritten in memory-safe languages, but I'd rather those who choose to use the only language where this kind of errors regularly happens be honest about it.
Debugging from specific classes of bugs, yes.
I'm not trying to hide anything, just help shift the balance back to common sense.
> Microsoft has unfortunately chosen to neglect C for a long time, its compilers dragging far behind the rest of the pack.
Is this still true? MSVC is pretty good at compiling C++ nowadays
They are talking about C not C++, for Microsoft C was done, it was about time to move into C++.
This was the official position in 2012,
https://herbsutter.com/2012/05/03/reader-qa-what-about-vc-an...
However after the Microsoft reboot with Satya, there was a change of heart regarding C, back in 2020, with C11 and C17 being supported,
https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-...
And 2022
https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual...
However there is no official roadmap regarding C23 support, and now with the whole safety discussion going on and Secure Future Initiative, probably will never happen.
Additionally clang is a blessed compiler at Microsoft, it is included on Visual Studio, so whatever MSVC doesn't support can be done in clang as alternative.
They have added one feature (typeof) from C23, so maybe they will add the rest when they release C++26. Or maybe they won't. Microsoft is an expert in inflicting the cruelty of providing just enough hope.
C++26? There are having issues with delivering C++23, since the whole change in security focus with Rust, Go, C#, Java first, C and C++ for existing codebases, and most likely one of the reasons Herb Sutter is no longer at Microsoft.
https://developercommunity.visualstudio.com/t/Implement-C23-...
https://developercommunity.visualstudio.com/t/Implement-C26-...
Security changes,
https://azure.microsoft.com/en-us/blog/microsoft-azure-secur...
https://blogs.windows.com/windowsexperience/2024/11/19/windo...
Oh wow, I don't write C++, so I didn't know how bad the situation was. My recollection that MSVC always implemented C++ standards posthaste is clearly outdated.
Yup, we are never getting C23. Good thing C11 is decent enough, I guess.
Microsoft took 30 years to implement a C89 compatible preprocessor: https://docs.microsoft.com/en-us/cpp/preprocessor/preprocess...
I think he's referring to C specifically, not C++. It's true that modern versions of MSVC are compliant (and they're also typically faster at implementing features than gcc and clang), but for the longest time there were subtle differences in their C library. To this day I don't think they support VLAs, which are technically standard C (At least until recently, I'm not sure about the latest versions, hopefully someone more knowledgeable can say more).
I see. I kind of assumed improving the C++ compiler required improving the C parts as well.
VLA situation seems complex: https://stackoverflow.com/questions/55696680/in-which-versio...
Compare performance, features or anything of Clang and MSVC and you'll see the differences.
For C (not C++), MSVC got C17 in 2020, apart from VLAs - which are never planned. No real roadmap for if/when it will get C23 - which is not just fully implemented in GCC, but the default used standard.
MSVC always focused on C++, and C was treated as an afterthought.
For those also wondering like myself, this refers to hacker in the whiny "security hacking is only called cracking Reeeee" manner, so this is just aimed at programmers and not security professionals.
It's the original meaning of the word. This complaint is kind of ironic considering the site you're posting on :)