> Modern clang and gcc won't compile the LLVM used back then (C++ has changed too much)
Is this due to changing default values for the standard used, and would be "fixed" by adding "std=xxx" to the CXXFLAGS?
I've successfully built ~2011 era LLVM with no issues with the compiler itself (after that option change) using gcc last year - there were a couple of bugs in the llvm code though that I had to workaround (mainly relying on transitive includes from the standard library, or incorrect LLVM code that is detected by the newer compilers)
One of the big pain points I have with c++ is the dogmatic support of "old" code, I'd argue to the current version's detriment. But because of that I've never had an issue with code version backwards compatibility.
LegionMammal978 4 hours ago [-]
Even -fpermissive is no longer sufficient for some of the things that appear in the old LLVM codebase. It's mostly related to syntax issues that older compilers accepted even though the standard never permitted them.
o11c 4 hours ago [-]
Well, one thing I've noticed about LLVM is that it blatantly and intentionally relies on UB. The particular example I encountered probably isn't what causes the version breakage, but it's certainly a bad indicator.
That said, failures in building old software are very often due to one of:
* transitive headers (as you mentioned)
* typedef changes (`siginfo_t` vs `struct siginfo` comes to mind)
* macros with bad names (I was involved in the zlib `ON` drama)
* changes in library arrangement (the ncurses/tinfo split comes to mind, libcurl3/4 conditional ABI change, abuse of `dlopen`)
Most of these are one-line fixes if you're willing to patch the old code, which significantly increases the range of versions supported and thus reduces the number of artifacts you need to build for bootstrapping all the way to a modern version.
ummonk 8 minutes ago [-]
Rather ironic it relies on UB given the extent to which Clang + LLVM insists on interpreting UB in the most creative way possible to optimize code…
jasonthorsness 6 hours ago [-]
The difficulty in reproducing builds and steps even from a time as recent as 2011 is somewhat disturbing; will technology stabilize or is this going to get even worse? At what point do we end up with something in-use that we can’t make anymore?
jcranmer 4 hours ago [-]
I'd imagine that it's going to end up both getting somewhat better and somewhat worse.
2011 is around the time that programmers start taking undefined behavior seriously as an actual bug in their code and not in the compiler, especially as we start to see the birth of tools to better diagnose undefined behavior issues the compilers didn't (yet) take advantage of. There's also a set of major, language-breaking changes to the C and C++ standards that took effect around the time (e.g., C99 introduced inline with different semantics from gcc's extension, which broke a lot of software until gcc finally switched the default from C89 to C11 around 2014). And newer language versions tend to make obsolete hacky workarounds that end up being more brittle because they're taking advantage of unintentional complexity (e.g., constexpr-if removes the need for a decent chunk of template metaprogramming that relied on SFINAE, a concept which is difficult to explain even to knowledgeable C++ programmers). So in general, newer code is likelier to be substantially more compatible with future compilers and future language changes.
But on the other hand, we've also seen a greater tend towards libraries with less-well-defined and less stable APIs, which means future software is probably going to have a rougher time with getting all the libraries to play nice with each other if you're trying to work with old versions. Even worse, modern software tends to be a lot more aggressive about dropping compatibility with obsolete systems. Things like (as mentioned in the blog post) accessing the modern web with decade-old software is going to be incredibly difficult, for example.
lmm 1 hours ago [-]
The telephone network was famously thought to be impossible to bootstrap even 50 years ago. We won't ever be able to "black start" our computers unless someone cares enough to put money and effort into it. (Also all technological civilisation is somewhat self-dependent e.g. do you think it would be possible to make microprocessors without running computers?). Possibly reproducible build efforts and things like Guix will make it happen.
Sharlin 5 hours ago [-]
Enter Vinge's programmer-archaeologists!
bee_rider 5 hours ago [-]
I think we must have some software in use for which the compiler or the source code just isn’t around anymore. It probably isn’t a massive problem. There’s just a slow trickle of tech we can’t economically reproduce, but we replace it with better stuff. Or, if it was really crucial, it would become worth paying for, right?
Complete speculation: They might not have had it in the first place or might not have had legal license to modify it themselves. The About Box shown in the article implies Microsoft just licensed MathType from Design Sciences, Inc. DSI got acquired by WIRIS just a few months before that in 2017 which may also have had something to do with it: https://en.wikipedia.org/wiki/MathType
skissane 4 hours ago [-]
I think with advances in AI-assisted decompilation, we may soon end up in the situation where given a binary you can produce realistic-looking source (sane variable and function names, comments even) which compiles to the same binary, even though non-identical to the original source code
bee_rider 3 hours ago [-]
Could be, although I don’t think that’ll give them any more HDL to train on (unless they also get access to a whole lot of high end microscopes!)
LegionMammal978 4 hours ago [-]
I've done this project myself, based on Ubuntu 20.04 and a whole lot of patchsets [0]. I got up to the 2014-01-20 snapshot before running into weird LLVM stack issues that I couldn't figure out how to resolve. One big annoyance is that the snapshot file refers to some commit hashes that do not appear to point to any surviving public repo, so it takes a fair bit of effort to reconstruct which commits must have been included in the missing commits.
Why do I have to use a VPN and pick a US server to access this article?
neilv 4 hours ago [-]
Is there, or could there be, a simple implementation of a compiler for the latest full Rust language (in C, Python, Scheme/Racket, or anything except Rust) that is greatly simplified because, although it accepts the latest full Rust language as input, it assumes the input is correct?
Could this simple non-checking Rust implementation transliterate the real Rust compiler's code, to unchecked C, that is good enough for that minimal-steps, sustainable bootstrapping?
This simple non-checking compiler only has to be able to compile one program, and only under controlled conditions, possibly only on hardware with a ton of memory.
No it can't. Not for RISC-V/musl, so I'm sure that must be true for other platforms too.
JoshTriplett 1 hours ago [-]
Once you've compiled it for one platform, you've re-bootstrapped it, at which point you can use the real compiler to cross-compile for another platform.
yjftsjthsd-h 3 hours ago [-]
So.... It can, just not for a particular target platform? Or am I missing your point?
neilv 4 hours ago [-]
`mrustc` might be exactly what I wanted. Thank you.
4 hours ago [-]
charcircuit 4 hours ago [-]
Rust can selfbootstrap by compiling the rust code for the compiler.
15155 4 hours ago [-]
I imagine you just need to update CA certs and the known_hosts file to get GitHub communication working again.
oasisbob 4 hours ago [-]
A few more hurdles might involve expectations of SHA-1 cert signing, and TLS1.0 deprecation
fcoury 6 hours ago [-]
Not sure why, but I am getting 403 Forbidden, so if you are getting the same here's an archive.is link https://archive.is/UH5fg
superkuh 6 hours ago [-]
You're not the only one getting blocked. I emailed dreamwidth about this in the past and they say it's something their upstream network host does and they cannot even fix it if their site users wanted to fix it. They're a somewhat limited and broken host partially repackaging some other company's services.
>Dreamwidth Studios Support: I'm sorry about the frustrations you're having. The "semi-randomly selected to solve a CAPTCHA" interstitial with a visual CAPTCHA is coming from our hosting provider, not from us: ... and we don't have any control over whether or not someone from a particular network is shown a CAPTCHA or not because we aren't the ones who control the restriction.
This also applies to the 403's.
neilv 4 hours ago [-]
This needs to be a catchy name, but I don't have a good one. CloudFlaritis? CloudFlareup? (CloudFlareDown?)
Regardless of whether Cloudflare is the particular infra company, the company who uses them responds to blocked people: "We don't know why some users can't access our Web site, and we don't even know the percentage of users who get blocked, but we're just cargo-culting our jobs here, so sux2bu."
The outsourced infra company's response is: "We're running a business here, and our current solution works well enough for that purpose, so sux2bu."
o11c 4 hours ago [-]
Hmm, "cloudfail" is already in use, and "cloudfuckyou" while descriptive is profane enough that it will cause unnecessary friction with certain people, and "clownflare" is too vague/silly (and is less applicable to other service providers).
So I propose "cloudfart" - just rude enough it can't be casually dismissed, but still tolerable in polite company. "I can't access your website (through the cloudfart |, it's just cloudfarting at me)."
Other names (not all applicable for this exact use): cloudfable, cloudunfair, cloudfalse, cloudfarce, cloudfault, cloudfear, cloudfeeble, cloudfeudalism, cloudflake, cloudfluke, cloudfreeze, cloudfuneral.
neilv 4 hours ago [-]
Would be nice if the name punished a perpetrator's brand.
Not just sound like we're taking in stride an unavoidable fact of nature.
Want people to stop saying "ClouldFlareup" (like a social disease)? Stop causing it.
eptcyka 4 hours ago [-]
Can’t say I’m a fan of Nix evangelists pointing their finger at any problem and yelling how it would be solved better by using Nix, but in this case, one could pin a nixpkgs version and all the sources for llvm, gcc and ocaml, and thus have a reproducible bootstrap. Ultimately, it wouldn’t do anything different to what was done manually here, but pinning commits will save the archaelogical burden for the next bootstrapper.
chubot 4 hours ago [-]
Does re bootstrapping Rust like this actually work? How much work is it?
LegionMammal978 2 hours ago [-]
Lots of work, you need hundreds of steps across the snapshots, and patches for each one to get them to work. (E.g., the makefile had hardcoded -Werror for ages.) Not to mention that if you want to make it portable, you must always start with the i686 version and cross-compile from there. (Preferably leaving x86 as late as possible: the old LLVM versions are full of architecture-specific quirks.)
neilv 4 hours ago [-]
> Debian has maintained both EOL'ed docker images and still-functioning fetchable package archives at the same URLs as 14 years ago.
Is this due to changing default values for the standard used, and would be "fixed" by adding "std=xxx" to the CXXFLAGS?
I've successfully built ~2011 era LLVM with no issues with the compiler itself (after that option change) using gcc last year - there were a couple of bugs in the llvm code though that I had to workaround (mainly relying on transitive includes from the standard library, or incorrect LLVM code that is detected by the newer compilers)
One of the big pain points I have with c++ is the dogmatic support of "old" code, I'd argue to the current version's detriment. But because of that I've never had an issue with code version backwards compatibility.
That said, failures in building old software are very often due to one of:
* transitive headers (as you mentioned)
* typedef changes (`siginfo_t` vs `struct siginfo` comes to mind)
* macros with bad names (I was involved in the zlib `ON` drama)
* changes in library arrangement (the ncurses/tinfo split comes to mind, libcurl3/4 conditional ABI change, abuse of `dlopen`)
Most of these are one-line fixes if you're willing to patch the old code, which significantly increases the range of versions supported and thus reduces the number of artifacts you need to build for bootstrapping all the way to a modern version.
2011 is around the time that programmers start taking undefined behavior seriously as an actual bug in their code and not in the compiler, especially as we start to see the birth of tools to better diagnose undefined behavior issues the compilers didn't (yet) take advantage of. There's also a set of major, language-breaking changes to the C and C++ standards that took effect around the time (e.g., C99 introduced inline with different semantics from gcc's extension, which broke a lot of software until gcc finally switched the default from C89 to C11 around 2014). And newer language versions tend to make obsolete hacky workarounds that end up being more brittle because they're taking advantage of unintentional complexity (e.g., constexpr-if removes the need for a decent chunk of template metaprogramming that relied on SFINAE, a concept which is difficult to explain even to knowledgeable C++ programmers). So in general, newer code is likelier to be substantially more compatible with future compilers and future language changes.
But on the other hand, we've also seen a greater tend towards libraries with less-well-defined and less stable APIs, which means future software is probably going to have a rougher time with getting all the libraries to play nice with each other if you're trying to work with old versions. Even worse, modern software tends to be a lot more aggressive about dropping compatibility with obsolete systems. Things like (as mentioned in the blog post) accessing the modern web with decade-old software is going to be incredibly difficult, for example.
[0] https://github.com/LegionMammal978/rust-from-ocaml
Could this simple non-checking Rust implementation transliterate the real Rust compiler's code, to unchecked C, that is good enough for that minimal-steps, sustainable bootstrapping?
This simple non-checking compiler only has to be able to compile one program, and only under controlled conditions, possibly only on hardware with a ton of memory.
Regardless of whether Cloudflare is the particular infra company, the company who uses them responds to blocked people: "We don't know why some users can't access our Web site, and we don't even know the percentage of users who get blocked, but we're just cargo-culting our jobs here, so sux2bu."
The outsourced infra company's response is: "We're running a business here, and our current solution works well enough for that purpose, so sux2bu."
So I propose "cloudfart" - just rude enough it can't be casually dismissed, but still tolerable in polite company. "I can't access your website (through the cloudfart |, it's just cloudfarting at me)."
Other names (not all applicable for this exact use): cloudfable, cloudunfair, cloudfalse, cloudfarce, cloudfault, cloudfear, cloudfeeble, cloudfeudalism, cloudflake, cloudfluke, cloudfreeze, cloudfuneral.
Not just sound like we're taking in stride an unavoidable fact of nature.
Want people to stop saying "ClouldFlareup" (like a social disease)? Stop causing it.
Debian FTW.