This is wildly disingenuous, I speak as a flight instructor and major IT incident investigator. Modern software authors have the professional discipline of a cute puppy in comparison to aviation practitioners.
I agree with Chris.
This is the kind of thinking that leads to "Why can't we just have building codes for software? It worked to protect against earthquakes and fire!"
Earthquakes and fire aren't conscious adversaries. Try writing a standards document on how to win at chess.
Airplanes *are* under constant attack by gravity, weather, system complexity, human factors, delivery process, training consistency, traffic congestion, and even under attack now by software developers
Software is authored by an organization, including programmers, architects, technical writers, quality assurance, UX, and most importantly, management. They're all authors with a responsibility to implement and improve standards of professional discipline.
The PMP credential exists for a real reason, because even management can grasp the value of a shared body of knowledge to use in the construction and improvement of workflows and processes. Don't blame "management," they're trying.
Here's an excellent example - 35 years ago, an airline bought their first metric airliner, management cancelled the project to update all ground paperwork from metric. Plane ran out of gas and engines shut down in the air. 200 page report: data2.collectionscanada.gc.ca/e/e444/e011083…
Where's the detailed 200-page public report from Facebook on how their management failed to prevent major disinformation campaigns in the US election? There isn't one, because they're just not that mature.
Sadly, @ErrataRob is making the same mistakes here. The objective in aviation is "Safe Transportation" and not "Preventing accidents" - a subtle wording difference, but an entirely different mindset at a much higher level.
Since this is picking up steam, I want to be clear that it's not "engineering standards" or "way more money" that gives the aviation industry the edge -- it's the constant, daily, global, organized and disciplined continuous improvement.
Surgeons didn't want to use checklists because they were too full of themselves, but then accidental deaths fell by 30-50% in hospitals that adopted them. Know who else often suffers from the same hubris? Programmers.
So many programmers are feeling defensive because they just think I'm talking about bugs. I'm not. There were no "bugs" exploited in the theft of Podesta's emails, there were no "bugs" exploited in the 2016 Facebook disinformation campaign.
Google and Facebook need to get absolutely spanked around because they keep pretending that they are software companies when they are not. They're platforms, environments, ecosystems, societies, whatever.
I know @zeynep has been talking about this stuff for ages -- so long as these companies think that their product is software, and don't get held accountable by society, humanity will increasingly suffer.
If you wonder if gov departments can make progress on reducing vulnerability and threats to their constituents, have a jealous read of the UK's @NCSC Active Cyber Defence report, one year in, by Dr. Ian Levy.
Practical. Measured. Informative.
So, I uttered the words DO-178C and Esterel/ANSYS SCADE in reply to Stamos; what do you think about those two?
Personally, I think they missed the point: the comparison being made is not between election systems and avionics, but deliberate attacks on elections and on avionics.
I have no problems with your points, really - but it's worth pointing out that "software" is an incredibly wide industry. Not saying aviation is simple or a narrow field, but every player has the same interest. In software, you could be making Facebook, control software for (1/x)
surgical robots, traffic lights, a guest book for your personal home page or the means to control a space station. It's just a tool to accomplish something within a "real" domain, so to speak. And many of these disciplines are just not as mature as the aviation industry, (2/x)
...where everyone pulls in the same direction.
And then, for orgs with smaller budgets, the expectations are insanely high even for short term, "cheap" projects. Everyone's colored by how Google and Facebook works, and if their software is in any way worse, (3/x)
it's not good enough. Even though their budget is tiny in comparison. However, with all the open source tooling, all the conferences that are out there etc, I would indeed say the software industry is interested in learning from its own mistakes. (4/x)
It's just an insane amount of interacting parties, and very few standards bodies in comparison. There are some, and many things are indeed standardised, but probably not even close to how the aviation industry is regulated. Sadly.
Lets give software twenty more years and see 🙃
Another problem in the "software industry", I guess, is that many companies that really want good software hire consultants on short term project basis. They come, stay for two years, and leave. A year later, they hire new consultants to upgrade/fix/whatever and then leave.
Instead of hiring their own people permanently, which would probably be both cheaper, give better software, a more stable delivery rate and a better way to contain knowledge inhouse. At least here in Norway this is a very clear trend. Exceptions exist, but for many this is it.
I'd rather spank around the many, at all levels, who still insist on regulating, breaking, nationalizing, rebuilding... whatever those same platforms, instead of making them obsolete by realistic, but truly decentralized personal clouds. E.g. this, mfioretti.com/2018/02/calicu…
My first reaction was also: Well managing an airplane and surgery are repeatable (whilst complex) operations. Constructing a software system for a domain/use case for which there is no precedence (framework, library) has much more variability.
But I am wondering myself now whether that is actually true. A lot of software engineering efforts focus on making knowledge on a particular domain reusable despite the variability. Would that maybe be a good starting point? Making attention to critical aspects "reusable"?
I have seen software devs be highly resistant to having even the most rudimentary operational review requirements for new services. Simple checklists of things like "does it have alerts", "does it log", etc.
Money is not sufficient, but is necessary. The Shuttle group had the culture described, IIRC there were 17 documented defects in 400k LOC over ~30 years, and each one prompted new learning. Perceived risk/benefit for most software is far less. Outlay (effort) correspondingly less
Which is only an explanation. Because while all planes have to fail safe, it truly isn't justified to spend that kind effort on Angry Birds. So the question is how to better judge value and risk, so that truly important things can be resourced to support the culture you describe
I have yet to see credible analysis showing electronic voting offers the same level of security as paper ballots. It's an extra risk, and for what? Saving a few hours of counting ballots? Just because a problem can be addressed with software doesn't mean it should be.
Hmm, I believe the civil aviation devices (planes :-) are not well protected against attacks. Like from rockets (omg, happens more often than I thought en.m.wikipedia.org/wiki/List_of_a…), bombs and malicious insiders (MH370?, GWI18G)
I thinksthat is what @ErrataRob is talking about
I studied software engineering as a degree. One point that was made by a professor stuck with me. Software engineers are NOT engineers. Engineers build structures that don’t fall down: bridges, buildings... The IT industry writes software which breaks ALL the time.
* Pedestrian footpath collapses in Florida
* Hyatt Regency walkway collapse
* Tacoma Narrows Bridge
Should I continue?
Henry Petroski "To Engineer Is Human: The Role of Failure in Successful Design" is good read regarding this
This is wildly disingenuous, I speak as a flight instructor and major IT incident investigator. Modern software authors have the professional discipline of a cute puppy in comparison to aviation practitioners.
FWIW the Stockfish chess engine code annotates relevant routines with expected ELO gains. IIRC the engine goes through a suite of test positions for hours against other engines, stats are taken, it's not guesswork. Some teachable stuff in there IMO.
Major difference here - government body vs private company. There is no upside for Facebook to spend the time or effort on this or to release it.
Inside the government and some large companies massive reports do exist, they just aren’t often published.
Additionally while I agree with your general thought - often catastrophic things have to occur to spark this level of insight (Eg planes falling out of the sky and people dying).
It’s only recently begun happening in software. There is simply less history of truly bad failures
The rules in Canada are simple: Engineering is regulated and requires licencing. Programming is not regulated and can be done with just a CS degree. That's enough of a start, the remaining details come naturally with continuous improvement feedbackery loops over decades.
There's no legal obligation for Facebook to prevent "disinformation campaigns", and not even consensus that there's an ethical obligation, given the free speech issues involved. Get that consensus first, and then you can talk about professional discipline.
In early days, web software was cheap infrastructure for nonessential tasks. “Don’t worry. It’s just a chat client.” But we got tempted to use the same designs for critical infrastructure. The industry needs to mature and do the unglamorous work of fixing that mistake.
We overpromised on the security and safety of software. We continue overpromise. The industry is still focused on “disruption” and “innovation”. We throw a fit and make excuses whenever anyone asks about safety, security, or trust. We’re not young anymore. We have to grow up.
Not miracle: It was a government initiative to give hundreds of kids taxpayer-subsidized scholarships every year to fly gliders make sure there was a recruitment pipeline for military and industry, this is a significant factor in why western countries have far safer airlines
OSS software is often authored by a handful of individuals left alone to support tens or hundreds of thousands of users (or more) with little to no resources. I *wish* I had tech writers, QA, UX, and project management. Please tell me where/how to get them with my zero budget.
You're misreading where I'm coming from. I initially felt you were giving undue emphasis to the devs themselves, rather than the larger, more inclusive, picture. I've worked several roles within both telephony/internet systems and medical industry. I've seen sausage being made.
This is why I find Air Crash Investigation so reassuring. Whenever something goes wrong, there's so much effort put into working out what it was and what to do to prevent it from happening again. Everyone thinks I'm weird.
What would it look like if the government wanted there to be a vulnerability in every airplane so they could crash planes on demand? If users sought other forms of influence to mandate air travel un-safety?
Like users of airplanes don’t actively not want security on the airplanes? I see tons of people complaining and/or ignoring them all the time.
Until we are at freaking NASA moon landing software standards, we’re no where near other industry standards
Sure, but things like that never happen in aviation? I thought it was about higher standards, not about perfect systems, because those don’t exist. You know Armstrong did it because it’s public. There is probably a ton of research documentation on every trip into space.
It's not about how high the standards are, it's about applying momentum to continuous improvement of those standards. Any standard can be improved, and it's that improvement cycle that's missing/immature in the software industry.
Fact: Eagle was off course because of navigational computer computation error, headed for a crater wall.
Fact: Neil Armstrong landed Eagle manually.
Fact: The 1201 and 1202 alarms were unrelated to the above, but added to the confusion at the time.
I think you're making Rob's point quite nicely: users don't want "no security". They want to get the(ir) job done, they want convenience, they need that attachment, etc. Air passengers don't want crashes, they want less hassle at security checks or smoke a cig onboard.
OK. I think my point still holds, but the examples change. Users want to vote quickly and get the job of managing an election done easily. There may be a higher risk of deliberate attacks if they're easy and people think they can get away with it (e.g. voting twice).
The attachment example also underlines the "no culture of learning from incidents" point -- remember the "iloveyou" virus? Nothing has been learned WRT executing active content from mail. That's why more than 10 years later, ransomware-by-mail attacks were so easily carried out.
It's not that users don't want security, it's that they have been trained by horrendous IT to be used to exceptionnaly low levels of security. Like LinkedIn asking your GMail password to get to your contacts. That should get people fired.
NASA & JPL do have software engineers who can write low defect software. Doing so is slow and expensive, and may also require clear requirements. Most end user software is written like toddlers build towers, piling stuff up and hoping for the best.
Eh, not toddlers building towers, more like Lego (but not nec. the Technic kind) - I mean it works, and there can be some good principles in it, but rarely is it truly robust especially in new environs.
Aviation Eng and Builders are solving one set of well-defined problems: keep a plane flying et al, keep a building standing (Ok, dramatic oversimplification) Software is asked to solve HUUGE variety of problems (and cheaply! w/ stakes usually low) Oh, and new UI plz.
You're looking directly at programmers -- but software authorship is done at a management level, defining requirements and such. Of course software programming sucks when software management sucks -- my comments are at the authorship level.
As @kirkjerk was saying, modern software is everywhere, solving every problem. Doesn't make sense to apply the same methodology to every project. Many software shops wouldn't be able to exist if they followed waterfall or somesuch.
Which is to say, this whole "you ain't writing software" gatekeeping act you're putting on here is nonsense. Some software shops are completely flat, no mgmt, no ivory tower handing down algorithms and flow charts... and some of those are making 8+ figures of revenue.
Seriously, software is just a fucking job. I write it to get a paycheck. If my software makes enough money to sustain the business, what more could I ask for?
Nothing. That's why I mention revenue. That's really all that matters here. The rest of this diatribe is faff.
You push paper around a desk and punch keys on a keyboard. You don't have a life in your hands. (The lives that use your product are in the hands of thousands)
If your head got any bigger it'd start affecting the tides.
Agree. From personal experience, after moving to aviation from "normal" software industry, it takes a bit of time to appreciate the role of proper management and process. But when the product needs to be in service for 35 years, the "startup fever" is not the right mindset.
I'm glad that you have changed the pitch from "this is how it was 50 yrs ago". Yes, there are many dimensions, so please allow others do the safety critical stuff the way it needs to be done. Have fun with HFT, but beware med imaging - some of it might need certification.
And shortly after, somewhere else I found this (fresh and close to "med imaging"): bbc.co.uk/news/av/techno…
You don't want it. And making billions in revenue on software that is selling ads gives no credibility in *this* area. Same for autonomous cars, etc.
Conversely, having just gotten to peer into the cockpit of the B757 we were supposed to take but was grounded, and beheld the finest software 1990 had to offer (as well as the iPads suction cupped to the windows to compensate): there are downsides to excessive conservatism.
The cowboys who built Facebook and Google are now billionaires. The software industry doesn't reward careful engineering right now... it rewards shiny new bells and whistles. I do a lot of programming, and there's always huge pressure to build more features in less time.
(Not *quite* a literal quote. But far far too close...)
There is a field that deals with safety critical software engineering, but I'd bet an index finger 90% of professional software engineers couldn't describe any of those principles or practices. The knowledge exists to enable us to build robust software systems; we choose not to.
Yes, modern software discipline is a lot worse than aviation discipline. But the attacks are also more powerful.
Weather, etc. are like the network connection going out. Software should be able to survive that, and often doesn't. But an *attack* is like an AA gun.
Almost none of the things on your list of things attacking airplanes actually *want* the plane to fail, they just make it hard to succeed.
With software, you often *do* have a malicious attack, which is harder to deal with.
Yes, it does suck. But there's still a material difference.
If something can go wrong with a flight one time in a hundred billion by chance, there haven't been enough flights to notice. If there's a five-byte input that causes your program to crash, an attacker will find it.
I didn't say I did; I was just pointing out that it's not *all* bad discipline.
(Although I think with enough care (and time and money), we could have secure, safe voting machines. We're not there yet and a lot of smart people disagree it's even in-principle possible, though.)
There is also a lot of work to make electronic systems safer. Solvers to prove algorithms will work the same way. Fuzzing to find those five byte inputs. There is plenty of work and education left to go into incremental improvement before “trust” isn’t a question for software.
Electronic voting systems also don’t just have electronics problems. You can make your box “unhackable” and then the shipping company sends the wrong one that has a nicer PCB. Change in process of large systems requires methodical introspection.
The point of the five byte figure is it's a nice round number greater than 100 billion, the number I used for airplane problems. I should have taken into account that running the software is cheaper and more common than flying; the point is "wouldn't happen accidentally".
Some fuzzers (AFL, for example) are smart enough to try to find inputs that trigger weird behavior, but if I had said "one kilobyte" instead of "five bytes", the point would hold but fuzzing might not find it.
You could get software regulated to the level of airlines if you could convince people to pay the same kind of pricing as for aircraft, and do similar maintenance. There is a small limited market for this.
Sometimes software on my *phone*, which should be used to a flaky network, misbehaves on router reboot. Software quality could be a lot higher than it is, even without attackers.
But I got that app for free, and nobody's giving out free airplanes.
If a civilian airplane is perfectly safe unless 40mm, 800g balls of metal hits it a couple times a second at faster than the speed of sound, nobody will buy it because it's too heavy and they want something lighter even if it is less safe.
Indeed. Speaking as a software engineer who was on a National Academies panel on software dependability, I totally agree. Yes, software is hard. But Facebook didn't use known best practices like social threat modeling. Shockingly unprofessional.
Speaking as a software developer, tutor, and code reviewer: the software development industry does *not* have any kind of widespread concept of 'professional ethics'.
That's where the problem starts. There's absolutely no common sense of responsibility.
This creates a lot of compounding issues:
1) Issues are treated as 'solved, never think about again'.
2) Because most developers are negligent, clients expect no expenses on security; which makes ethical developers non-competitive.
3) And so on.
Great thread, and something I've been going on about for ages, but it's not disingenuous, it's cultural. Software as a field is so new and subjected to so much light and heat that the practitioners don't know if could be better.
There are other structural problems with bringing standards and good practice to software, the wild growth of the field combined with the lack of centralization or structure to the field, comparatively.
It would be as if most people who got into aviation did so by building a plane at home first. But despite this I do think software liability would go a long way to starting to get the incentives in line with creating good standards similar to medicine and aviation.
and honestly, i think the reason wannacry gets to keep exploiting SMB v1 and Boeings don't get to keep crashing is that we can take pictures of one, and see it clearly, and treat it as an event mentally.
There’s also the monoculture vs diversity aspect though: each airplane has its own set of pilots double-checking each other. Software is like a single godlike pilot flying all the world’s planes at the same time, but she sometimes gets sleepy.
Today a key subsystem failed because the third-party datasource API on which it depended simply disappeared, deactivated and delisted without warning. Whose fault was it—theirs for going down or ours for relying on it? The turtles are all wobbly, all the way down.
it literally doesn't matter. fault is the wrong frame, as the original thread made clear: responsibility is the right frame. Who deals with it and what recourse do they have against others who may have failed their own responsibilities in the chain.
no one in the chain cares. i know we love to beat up on the suits, but this is a toxic culture, and has been as long as i've been in it. the privilege to treat quality as esoteric and impossible is of the same piece as why it's so dominated by whiteness, sexism, etc.
dev culture is a lot of the problem, and management alone can't fix that. the place it needs to get fixed first, or can be, is software liability.
honestly, it's going to be the goddamned insurance companies if it's anyone.
Robustness (robustness against random errors) is much, much easier to handle than security (robustness against targeted errors). Think that everyone could change gravity for each atom all over the world, design airplane for that.
You might think so, but it's just a matter of properly designing for the threat model and failure modes.
Hiring a Red Team to attack the system at various stages of the process (including initial design documents) is one way to develop an appropriate threat model.
I had a reality check when I read somewhere that there was no way NASA would be interested in most tech companies, as they need things to work with little room for error, and could do without side challenges, like npm package management or github merge conflicts 😂
Many topics covered in this thread and incredulity is expected. However, as someone who has been “bridging” Systems Safety/Human Factors and modern software engineering for the past ~8 years, I can confidently say: it’s not as simple as you make it out.
Yes, the longevity and maturity of “software engineering” is part of this. Yes, it’s likely that code of ethics (possibly licensing, but I’m unsure of that) and ‘professionalism’ differences between aviation and software.
But: this is a simplistic comparison between fields.
Regulation plays a significant role here. Some positive in some directions (independent investigation organizations, for example) and some negative in other directions (reports that list bullshit such as “pilot error” or “loss of situational awareness” as causes).
*All* software has potential for unintended consequences, regardless of the domain.
Airplanes, cars, social media, email...all of it. Those unintended consequences manifest sometimes as ‘vulnerabilities’ exploited by adversaries or bugs resulting in unavailability or...
...or software that works exactly as the developer intended but used differently by users, or many others.
The same is true for other domains. Comparisons like these are not even apples and oranges, they’re apples and doorknobs.
I’d agree that he is drastically understating the threat that information security community, but that makes it even more concerning that he is bang on the mark about the degree of seriousness that the problem is approached with.
And that is in large part because their employers aren’t being held to account for their negligence, and legislators aren’t being held to account for their failure to engage with the world we have been actually living in for the last thirty years...
One thing I think the industry needs to do is get a lot more forthright about the damage being caused by these incidents, rather than the “no personal details were leaked” bullshit that often gets dragged out.
"Degree of seriousness" is just another form of "lack of airmanship" found in aviation accident reports. If there are no breaches or they are decreasing, does that indicate a proper degree of seriousness? (no)
Calls for "better" standards of practice is a common reaction to all consequential accidents. It's easy to do, adds very little to dialogue about future improvement, and ignores the real "messy details" of actual work.
Having operated nuclear submarines for many years and more recently computer services, I get excited every time this debate comes up. I authored a few RCAs in the Navy, and read thousands of others. Almost all were attributed to "Human Error."
When I switched from submarines to software services, the differences were puzzling: why were there no operating procedures, periodic maintenance schedules, incident procedures, on-call rotations, checklists, standing orders, hydrostatic test, incident drills, etc
...some of this has changed over the last 15 years. Wiki serves as a evolving operating/incident/maintenance procedure on some teams. On-Call rotations are ubiquitous. And yet these debates are often, to quote Crash Davis in Bull Durham, like a "martian talking to a fungo"
I've puzzled over these differences and have followed the work of Drs Allspaw and Cook with great interest as they create a new field. Yet the debate still explodes occasionally with a practitioner of traditional accident investigation saying, "do the RCAs, human error is real...
...MTTR, MTTB, MTTD, MTTC."
My experience is that the traditional methodologies worked very well on submarines, but were much less successful in software services. For a while I attributed this to immature culture, lack of leadership, lack of accountability.
This thread already mentions two other differences, regulation and the maturity of "the field."
My sense is there are three attributes of "traditional operations" that makes "traditional Problem Management" (RCA, Post Mortems, Continuous Improvement) work:
An operator from WWII, would be quite familiar with the "architecture" of a submarine: Engineroom in the back, control room, Sonar, torpedo-room, ballast tanks. While the flat-screen displays would surprise, we have replicated the intuitive utility of the dial-gauge on those.
Any software service that scales, by necessity, changes architecture. One could argue that at some super high-level there is just a "front end" and "back end", but under the hood there will be entirely new components every few years that dramatically change how operators interact
2. horizontal scaling- sure everyone is going to say, "of course I horizontally scale my service," but let's compare the count of submarines, airplanes or automobiles or to web search services or crypto-currencies. Replication of platform creates a fleet of identical machines,
...each with a different Ops team. This creates a large 'n' for a central office to collect and compare incidents, thereby refining and converging procedures and designs. Read the introductory chapters of ITIL.
The beginning of the submarine reactor plant manual stated that "everything you will ever need to do to this machine is documented here. If you think you need to do something that is not documented, read it again. If you still don't find it, surface the ship and radio home"
If submarines were more like software services, you'd have to imagine a single submarine in the fleet that held a billion torpedoes, traveled at half the speed of light and fit in the palm of your hand.
The operators of that machine would likely be challenged to use traditional methods of accident investigation.
I am excited how the new field generates methodologies that can be retrofitted back onto the traditional Ops disciplines. That has often happened in history. (EOM)
The rule of thumb I like is: you’ve got the root cause if you have a (simple/understood/doable) course of action which clearly fixes the problem. Otherwise you’re still working with symptoms and your investigation is incomplete.
Nope. There is no single root cause of complex systems failure. It doesn't exist, it's not a thing, and it's not only a waste of time trying to find it, it's dangerous to assert confidence with.
That this concept continues to survive is why we will continue to have accidents.
Root Cause: The most basic cause (or causes) of an incident that management has control to fix (i.e. a process/procedure that is Missing, Incomplete or Not followed) and, when fixed, will prevent (or significantly reduce the likelihood of) additional problems of the same type.
My strong suggestion is to read the multiple sources of research on how these (and other) definitions are critically problematic. You could start here: kitchensoap.com/2012/02/10/eac… or cut to the chase and read Dekker's "Field Guide To Understanding Human Error" 3rd edition.
Cost of developing aviation software vs cost of mundane developing business software is very different, because risks are very different. Market forces clearly say that non-safety IT accidents are acceptable.
But to do so for all of software engineering would be like the aviation industry setting standards for everything from a tricycle to an attack helicopter.
There's just too much variation in they types of software being produces for a single set of best practices to work.
That's why most companies do it internally and have their own best practices to prevent past mistakes. But in a capitalist economy, they'll consider those trade secrets as long as they can (<- one place I totally agree with you; major failure analysis should be public).
If you tried to create a "programming checklist" even for a smaller-scope, such as writing netcode, there's still so much variation and innovation that half the items would be checked off as N/A. That's not a useful checklist.
Safety and complexity related to software operating machinery in the physical world are very different to safety and complexity operating in a human design problem space. A lot of abstract thought is needed to have meaningful discussions on risk.
The way I see it, most software doesn't actually need to be that error-free. Where we've recognized that it does, it actually is (airplanes, etc).
The issue is that we haven't recognized that some categories, including voting software (!), need to be error-free.
I was using the term error-free holistically, as in "software on planes never crashes so badly as to crash the plane" which isn't true of, say, many word processors.
(If there's a better word that captures that concept, lmk so I can improve my communication)
The chess documents been written. The real issue is none of the standards stake holders can agree/want a decent standard. Try writing a standard when the earthquakes and meteors from outer space are the standards authors