Convopage : @www_ora_tion_ca : This is wildly disingenuous, I speak as a flight instructor and major IT incident investigator. Modern software authors have the professional discipline of a cute puppy in comparison to aviation practitioners. (url)

Convopage

See the entire conversation

This is wildly disingenuous, I speak as a flight instructor and major IT incident investigator. Modern software authors have the professional discipline of a cute puppy in comparison to aviation practitioners.

Alex Stamos@alexstamos

I agree with Chris. This is the kind of thinking that leads to "Why can't we just have building codes for software? It worked to protect against earthquakes and fire!" Earthquakes and fire aren't conscious adversaries. Try writing a standards document on how to win at chess.

508 replies and sub-replies as of Sep 09 2019

Rob Russell@www_ora_tion_ca

Airplanes *are* under constant attack by gravity, weather, system complexity, human factors, delivery process, training consistency, traffic congestion, and even under attack now by software developers

Rob Russell@www_ora_tion_ca

But every time a disaster happens, we learn from it, publicly, and we share. We're still learning from crashes decades ago. Software developers? Bullshit.

Rob Russell@www_ora_tion_ca

Software is authored by an organization, including programmers, architects, technical writers, quality assurance, UX, and most importantly, management. They're all authors with a responsibility to implement and improve standards of professional discipline.

Rob Russell@www_ora_tion_ca

The PMP credential exists for a real reason, because even management can grasp the value of a shared body of knowledge to use in the construction and improvement of workflows and processes. Don't blame "management," they're trying.

Rob Russell@www_ora_tion_ca

Here's an excellent example - 35 years ago, an airline bought their first metric airliner, management cancelled the project to update all ground paperwork from metric. Plane ran out of gas and engines shut down in the air. 200 page report: data2.collectionscanada.gc.ca/e/e444/e011083…

Rob Russell@www_ora_tion_ca

Where's the detailed 200-page public report from Facebook on how their management failed to prevent major disinformation campaigns in the US election? There isn't one, because they're just not that mature.

Rob Russell@www_ora_tion_ca

And "try to write a standards document on how to win at chess" give me a break my dude there is one and it is software and it works and you know that.

Rob Russell@www_ora_tion_ca

Sadly, @ErrataRob is making the same mistakes here. The objective in aviation is "Safe Transportation" and not "Preventing accidents" - a subtle wording difference, but an entirely different mindset at a much higher level.

That XKCD on voting machine software is wrong

The latest XKCD comic on voting machine software is wrong, profoundly so. It's the sort of thing that appeals to our prejudices, but mistake...

blog.erratasec.com

Rob Russell@www_ora_tion_ca

Similarly, the objective in elections is "Confidence in democracy" and not "stopping attackers," which the CSE clearly lays out as one of many fronts: cse-cst.gc.ca/sites/default/…

Rob Russell@www_ora_tion_ca

Simplistic focus on the machine and loss of perspective on the bigger system & society is the hubris that keeps the technology industry trapped in the footgun cycle.

Rob Russell@www_ora_tion_ca

I mean "Airplanes and elevators are designed to avoid accidental failures" come on have you never heard of fail-safe design? Elevators and planes fail *all the time* but they fail SAFE.

Rob Russell@www_ora_tion_ca

Since this is picking up steam, I want to be clear that it's not "engineering standards" or "way more money" that gives the aviation industry the edge -- it's the constant, daily, global, organized and disciplined continuous improvement.

Rob Russell@www_ora_tion_ca

That's what gave rise to crew resource management and human factors analysis. Even doctors are years behind at this stuff:

Sullenberger: How Aviation Safety Compares with Hospital Safety

At HFMA's ANI 2012, Chesley "Sully" Sullenberger, the "Miracle on the Hudson" pilot, suggests that healthcare professionals think of safety incidents not as ...

youtube.com

Rob Russell@www_ora_tion_ca

But the medical community is learning now, and Microsoft even brought in a surgeon to lecture them on lessons the medical community has learned from the aviation industry:

The Checklist Manifesto

We live in a world of great and increasing complexity, where even the most expert professionals struggle to master the tasks they face. Longer training, more...

youtube.com

Rob Russell@www_ora_tion_ca

Surgeons didn't want to use checklists because they were too full of themselves, but then accidental deaths fell by 30-50% in hospitals that adopted them. Know who else often suffers from the same hubris? Programmers.

Rob Russell@www_ora_tion_ca

So many programmers are feeling defensive because they just think I'm talking about bugs. I'm not. There were no "bugs" exploited in the theft of Podesta's emails, there were no "bugs" exploited in the 2016 Facebook disinformation campaign.

Rob Russell@www_ora_tion_ca

Google and Facebook need to get absolutely spanked around because they keep pretending that they are software companies when they are not. They're platforms, environments, ecosystems, societies, whatever.

Rob Russell@www_ora_tion_ca

I know @zeynep has been talking about this stuff for ages -- so long as these companies think that their product is software, and don't get held accountable by society, humanity will increasingly suffer.

Rob Russell@www_ora_tion_ca

More amplification of smarter voices than mine: "How Complex Systems Fail," Cognitive Technologies Laboratory, web.mit.edu/2.75/resources…

Rob Russell@www_ora_tion_ca

STELLA Report from the SNAFUcatchers Workshop on Coping With Complexity: snafucatchers.github.io

Rob Russell@www_ora_tion_ca

Root Causes don't reflect a technical understanding of the nature of failure:

Jessica DeVita@UberGeekGirl

Friends, never forget that “post accident attribution to a ‘root cause’ is fundamentally wrong” @ri_cook web.mit.edu/2.75/resources…

Rob Russell@www_ora_tion_ca

UK's @NCSC Active Cyber Defence report, one year in, by Dr. Ian Levy

John Lambert@JohnLaTwC

If you wonder if gov departments can make progress on reducing vulnerability and threats to their constituents, have a jealous read of the UK's @NCSC Active Cyber Defence report, one year in, by Dr. Ian Levy. Practical. Measured. Informative.

Rob Russell@www_ora_tion_ca

Human Factors are the important factors to keep foremost in your mind:

mhoye@mhoye

TIL that one of the most respected researchers in human factors in error management is named "James Reason".

James Reason's 12 Principles of Error Management - Aerossurance

We look at Prof James Reason's 12 Principles of Error Management and the greatest challenge of managing error, sustaining continuous systemic improvement.

aerossurance.com

Rob Russell@www_ora_tion_ca

You'll want to learn more about THERAC-25, here's a good start:

every_jorendorff@jorendorff

I wrote about software safety practices, for people starting a programming career.

jorendorff/talks

Some talks I've given

github.com

Rob Russell@www_ora_tion_ca

IT Practitioners can't even start to make things better unless they start with a baseline of psychological safety: usenix.org/system/files/l…

Rob Russell@www_ora_tion_ca

Don't just learn from your own mistakes, learn from *every* industry that has to manage complexity:

SwiftOnSecurity@SwiftOnSecurity

This video on WHY the Deepwater Horizon blowout preventer failed is utterly incredible. @chemsafetyboard valuable.

Deepwater Horizon Blowout Animation

The blowout preventer that was intended to shut off the flow of high-pressure oil and gas from the Macondo well in the Gulf of Mexico during the disaster on ...

youtube.com

Rob Russell@www_ora_tion_ca

For the record: Checklists are not the solution, they are a first baby step towards maturity. The aviation industry has already moved well beyond just checklists to a full SMS model. en.wikipedia.org/wiki/Safety_ma…

Rob Russell@www_ora_tion_ca

The responsibility for fixing the software industry is 100% with software industry leadership, and not with the grunt programming labour told to do whatever management wants.

Alex Fürstenau@afuerstenau

But it‘s also the other way round. Management thinks the programmers cannot think, cannot decide. As always the truth is in the middle. Managers and Developers share the responsibility for success in the end.

Rob Russell@www_ora_tion_ca

Programmers can't even organize themselves well enough to get offices with doors, how can we expect programmers to effect moral change?

Pinboard@Pinboard

The railway engineers who worked out the timetables for Sobibór and Bełżec likely had a good explanation for why it was better to do the loathsome job competently than leave it to the less skilled and virtuous.

Rob Russell@www_ora_tion_ca

So many programmers with fake twitter names trying to explain to me how planes work right now. [zero empathy expected from you, ladies, feel free to chuckle]

Rob Russell@www_ora_tion_ca

For folks late to this thread, this is about **voting machine software**, and how the software industry (Google and Facebook, specifically) hasn't earned the trust required to manage democracy securely.

Rob Russell@www_ora_tion_ca

Two heavyweights from Google (Chrome "engineer") and Facebook's former CISO saw a cartoon and attacked the cartoonist:

Voting Software

(no description)

xkcd.com

Rob Russell@www_ora_tion_ca

They called the cartonist a "non-practitioner" (though he had programming experience at NASA), and a nihilist, and belittled the relative maturity of risk management in the aviation industry (they appear to have been "non-practitioners" of aviation).

Rob Russell@www_ora_tion_ca

There's nothing that the software industry can gain by belittling the life-safety industries that they can learn so much from, just because a comic strip hurt their feelings.

Tony Verow MD@tonyver45

Thank you for facilitating this discussion. I am learning a lot....my own profession of Medicine still has a way to go.

Walt 🚴🏼‍♂️🏗🗽🌎@WaltFrench

How many THOUSAND people got out of the twin towers due to the fire safety rating that kept them standing nearly an hour in the worst case, despite an attack way beyond what was anticipated/imaginable when they were designed? NOW let’s talk life- and society-critical software

Rob Russell@www_ora_tion_ca

They got out safely only because of lessons learned from the previous terrorist attacks on those two towers specifically

Rob Russell@www_ora_tion_ca

Procedures implemented long after the building was constructed. The original procedures would have left thousands more dead.

Rob Russell@www_ora_tion_ca

This is an example proving my point - what's needed is just the Deming wheel, everything else follows

Walt 🚴🏼‍♂️🏗🗽🌎@WaltFrench

Also because of fire safety standards during design & construction. Standards that slowed spread of the fire, delayed the collapse

Rob Russell@www_ora_tion_ca

Prior to the 1990s bombings at the WTC, the building's procedures explicitly stated that the towers should *not* be evacuated. My argument in this thread is that systems maturity in complex environments comes from industry-wide continuous improvement processes.

Rob Russell@www_ora_tion_ca

The fire safety standards during design and construction are absolutely a product of industry-wide continuous improvement processes -- tailored to the needs of the industry based on lessons learned and public sharing of knowledge.

Rob Russell@www_ora_tion_ca

I don't want to say that "standards" are the solution to the things that are wrong in the software industry, because that's way too specific. Standards might be part of what gets implemented, but it'll be a waste of time until they have industry-wide continuous improvement

Rob Russell@www_ora_tion_ca

There were plenty of standards and regulations in place that didn't prevent the Grenfell tower disaster. I think that incident is a powerful example for the need to firmly entrench industry-wide continuous improvement with Just Culture and SMS.

Walt 🚴🏼‍♂️🏗🗽🌎@WaltFrench

There are many ways to share information between competitors to create a greater good. Patents, with lags and leakage issues. Guild-type codes. Employee turnover⇒info sharing. See Elinor Ostrom’s Nobel work on “the commons.” And Standards, enforced by an external regulator.

Rob Russell@www_ora_tion_ca

Where do you think standards come from?

every car is a policy failure@Dr_Memory

#actually bit of a funny story there: the WTC towers had waivers for NYC’s fire code, which were granted over the objections of the FDNY commissioner at the time. “The Fires” by Joe Flood, goes into this in some detail.

every car is a policy failure@Dr_Memory

The miracle of the 9/11 collapse is that it didn’t kill _more_ people.

Walt 🚴🏼‍♂️🏗🗽🌎@WaltFrench

Almost as if “Move fast and break things” was A Thing before Facebook. Thanks for the point, showing you can’t look at “standards,” “protection” or “regulations” in isolation from the political process that enforces them. That they evolve in a confrontational dialectic

Stewart Russell@scruss

Just wait until they have to deal with the equivalent of mud dauber wasps in the pitot tubes …

Philip Crosby Associates@DiscoverCrosby

If we'd adopted "building codes" for software and were working on the next level of prevention, people would understand. Arguing "building codes won't stop nuclear strikes" then letting homes routinely fall over in rainstorms, isn't going to cut it.

Rob Russell@www_ora_tion_ca

You're working very hard to miss the point.

Rob Russell@www_ora_tion_ca

Philip Crosby Associates@DiscoverCrosby

Maybe I'm making a different one? We're on the same page with the need to improve. The question is what you mean by standards. We can think of them as both things we do to ensure a good result (processes and rules we follow). I think that's what you meant by "standards" right?

Rob Russell@www_ora_tion_ca

I think you're trying to sound smart without first reading the things I wrote.

Philip Crosby Associates@DiscoverCrosby

If that's what you think, I'll leave you to it.

Philip Crosby Associates@DiscoverCrosby

But standards also play a role in enabling continuous improvement. Without a definition of what's to be produced (a standard definition, or requirement) there is no way to measure the quality of what you produce and whether you're getting better or not.

Philip Crosby Associates@DiscoverCrosby

And it's worse than that. As long as I'm making up my own definitions as I do the work, I'm never wrong and there is no need to improve. This is where we find the software industry: we say they need to improve, and their answer is, "Improve what? My stuff is great!"

Philip Crosby Associates@DiscoverCrosby

So yes, we can simply say "continuously improve", but there is a fundamental cognitive issue that keeps that from happening: they see the software they wrote yesterday as "good" and failures in the newspaper as "outside hackers", and as long as they do your words can't be heard.

John C. Welch@bynkii

That makes me…laugh? Sigh? I worked on B-1Bs starting in 1987 or so. We had one crash and people die because of a fucking pelican hitting the wing pivot and severing all four hydraulic lines. Because what were the odds? As it turned out…

Jake Cob@jakobpunkt

Madface

Markus Sagebiel@aimathopti

thread @charles_consult @polit2k @charles_consult @schulte_stef @odtorson maths tackle physicists now focus on machine learning please. tweet deleted

Kyle Strand@BatmanAoD

When I saw this cartoon, I printed it out, replaced "blockchain" with "Rust" (because I had a reputation for advocating for it), and put it on my cubicle wall.

E-quipper@annwitbrock

botsplaining?

Bjoern Michaelsen (Sweetshark@chaos.social)@Sweet5hark

Back then reading this, I concluded we are living the 747 moment of IT: future will not be higher, faster, better, but more reliable and failing safely. idlewords.com/talks/web_desi…

P4₁2₁2@crystalolgraphy

'splanation solidarity high five

Raj@RajGoel_NY

Rob - this is a FANTASTIC thread. Thanks for focusing on SAFETY, Organizational Authorship and lack of professional maturity in software. There's a lot of wisdom in THE CHECKLIST MANIFESTO I've been banging this drum for a decade.

TriTexan@TriTexan

Actually the best engineering teams in history have never had offices with doors. Witness NASA from nearly day 1 and countless similar examples. The science behind this is overwhelming.

Rob Russell@www_ora_tion_ca

Sorry for being flippant about the door comment before the thread blew up - my core audience would recognize that as a long-running joke from @Pinboard and @TechSolidarity

Anton Tolchanov@iamknyar

The best engineeing teams succeed despite offices with no doors. The open plan office productivity myth has been debunked multiple times. Here’s a recent study: rstb.royalsocietypublishing.org/content/373/17…

TriTexan@TriTexan

@statvfs Oh there is SO much wrong with this study I don't know where to begin...

TriTexan@TriTexan

@statvfs I guess it's worth saying that blindly opening up all office spaces with no clue on how to do it would def yield the results in this "study".

Ki.@kittenhotep

OK I was deeply into you and your thread and THEN I saw this comment and I died. Bravo.

Alex Fürstenau@afuerstenau

No, I disagree. People are able to think, to speak up, to demand change. Nobody gets off the hook. Everybody has the power and the responsibility to do something to make life better!

VisArch@ruthmalan

Much value in this thread. But! Take grunt out of the vocabulary.

VisArch@ruthmalan

Not just the word (obviously) but all it signifies; still the attitude that leaks through in the word there, undermines the thread

Rob Russell@www_ora_tion_ca

If non-licenced developers stop calling themselves engineers, I'll stop calling them grunts. I think that's a reasonable compromise.

Rob Russell@www_ora_tion_ca

Because calling yourself an engineer when you don't have an engineering licence is dangerous.

VisArch@ruthmalan

Nah, strawman. It is an argument that has a place, but not as a defense for calling _anyone_ a grunt. And no industry gets better if,

Rob Russell@www_ora_tion_ca

I'm regularly told that calling any codemonkey an Engineer is part of the software industry culture in the US. I'm happy to set that straw man on fire.

Symbo1ics Ideas@Symbo1ics

After three years you get "Senior Engineer"

VisArch@ruthmalan

Oh gee guys, you’re so right. My bad. Let’s fix this by calling people grunts and monkeys. Sure. I’ll just shut up now. NOT!!!!

VisArch@ruthmalan

Digging in doesn’t make it right.

Rob Russell@www_ora_tion_ca

If much of the problem is hubris, then much of the solution is humility.

VisArch@ruthmalan

You don’t induce humility by demeaning; you do model (lead by example) humility by recognizing when you made a mistake.

Rob Russell@www_ora_tion_ca

This is a thread about a cartoon on a bird website. This is exactly where I aim to make the most mistakes.

VisArch@ruthmalan

I can see that ;) Seriously, it’s a great thread — but it is undermined at the point where, not only do you erroneously attribute all

VisArch@ruthmalan

responsibility to (senior) management, you also demean.

VisArch@ruthmalan

Our industry has strengths — even if it can be hard to disentangle the strengths from the weaknesses. And one of our strengths is how

VisArch@ruthmalan

often we will speak our truth to power :)

VisArch@ruthmalan

Not often enough, by any means. But still.

zeynep tufekci@zeynep

Super interesting stuff. Too interesting for short-form, almost. There is certainly a lot of practices and mentality–especially on the dealing with errors/crisis or the question of consequences or error/mistake/crisis–that software can learn from previous critical infrastructure.

Alex Butcher@alexjbutcher

Thankfully, checklists (e.g. MITRE ATT&CK and PCI DSS) are beginning to catch on. As you say, far from complete solutions, but if your org works through diligently and in good faith (ha!), you'll be in a better position than orgs that don't.

Andrew Zolnai@azolnai

Thx for that, I heard of the CSR report but couldn't put my finger it

Dian Fay@dianmfay

I can't recommend Trevor Kletz's "What Went Wrong" highly enough on this. There's even a chapter on problems that computers introduce to chemical engineering!

Jens@jeansblergh

Really interesting thread!

ozan (oz) yigit@longword

that accident and the amazing Levenson report detailing the failures changed me. i’ve been annoyed at my own profession ever since, and consider the title “software engineer” [i wore it on and off at various int’l orgs] a howler.

tom byrer@tombyrer

I think most 'hacking' is social hacking, not code.

Agile Hulk@Agile_Hulk

Not all failures are accidents, though. Some are repeatable, and happen over and over—those frequently have smaller sets of causes, or even a single root cause.

Michael Bolton@michaelbolton

I interpret the cited Short Treatise as saying this: Thinking in terms of *root cause* doesn't reflect a technical understanding of the nature of failure. Thinking in terms of *root causes* might help, though.

Michael Bolton@michaelbolton

While we're at it, how is it that we're both in Toronto but we've never met?

Mark Browne@bitking69

A good root cause analysis would include factors similar to the ones named. Any modern iso management system is supposed to consider environments and roles and responsibilities- all the way to the C suite.

Mark Browne@bitking69

A good fish one diagram may end up with dozens of factors and relations.

Mark Browne@bitking69

*Fish bone diagram

mental blanking interval@vsync

yes brilliant

Petr Stedry@Vorkronor

We will "just" experience unforseen consequences.

Luis Bruno@luisbruno

So, I uttered the words DO-178C and Esterel/ANSYS SCADE in reply to Stamos; what do you think about those two? Personally, I think they missed the point: the comparison being made is not between election systems and avionics, but deliberate attacks on elections and on avionics.

Arve Systad@ArveSystad

I have no problems with your points, really - but it's worth pointing out that "software" is an incredibly wide industry. Not saying aviation is simple or a narrow field, but every player has the same interest. In software, you could be making Facebook, control software for (1/x)

Arve Systad@ArveSystad

surgical robots, traffic lights, a guest book for your personal home page or the means to control a space station. It's just a tool to accomplish something within a "real" domain, so to speak. And many of these disciplines are just not as mature as the aviation industry, (2/x)

Arve Systad@ArveSystad

...where everyone pulls in the same direction. And then, for orgs with smaller budgets, the expectations are insanely high even for short term, "cheap" projects. Everyone's colored by how Google and Facebook works, and if their software is in any way worse, (3/x)

Arve Systad@ArveSystad

it's not good enough. Even though their budget is tiny in comparison. However, with all the open source tooling, all the conferences that are out there etc, I would indeed say the software industry is interested in learning from its own mistakes. (4/x)

Arve Systad@ArveSystad

It's just an insane amount of interacting parties, and very few standards bodies in comparison. There are some, and many things are indeed standardised, but probably not even close to how the aviation industry is regulated. Sadly. Lets give software twenty more years and see 🙃

Rob Russell@www_ora_tion_ca

Agreed on many points. Software written in mature industries is mature, software written in the software industry is not.

Arve Systad@ArveSystad

Another problem in the "software industry", I guess, is that many companies that really want good software hire consultants on short term project basis. They come, stay for two years, and leave. A year later, they hire new consultants to upgrade/fix/whatever and then leave.

Arve Systad@ArveSystad

Instead of hiring their own people permanently, which would probably be both cheaper, give better software, a more stable delivery rate and a better way to contain knowledge inhouse. At least here in Norway this is a very clear trend. Exceptions exist, but for many this is it.

astro@archastro

They're governments of societies. Digital societies where subjects have no vote and are essentially serfs.

M. Fioretti@mfioretti_en

I'd rather spank around the many, at all levels, who still insist on regulating, breaking, nationalizing, rebuilding... whatever those same platforms, instead of making them obsolete by realistic, but truly decentralized personal clouds. E.g. this, mfioretti.com/2018/02/calicu…

Richard Jasmin@frazzledjazz

they are a bunch of baffons "running code"(scrypted) on someone else's tech. environment would be appropriate. true programmers write real apps and applications and work on the console.

Patrick R@codezeilen

My first reaction was also: Well managing an airplane and surgery are repeatable (whilst complex) operations. Constructing a software system for a domain/use case for which there is no precedence (framework, library) has much more variability.

Patrick R@codezeilen

But I am wondering myself now whether that is actually true. A lot of software engineering efforts focus on making knowledge on a particular domain reusable despite the variability. Would that maybe be a good starting point? Making attention to critical aspects "reusable"?

Tom 🥑🔰⚄🌐@GabbyMcNabb

Are a tech company's business practices in the realm of engineering here? Software engineering is hella immature but these examples seem like a stretch

Rob Russell@www_ora_tion_ca

All of the flaws exploited by the Russians were executive leadership flaws. To build an airliner, the whole company needs a license, not just the engineers.

Tom 🥑🔰⚄🌐@GabbyMcNabb

Some airlines have shit business practices; are those under the remit of engineering? Facebook is both Boeing and *example airline X* in this case, so the distinction between engineering, operator training (aircrew vs ... devOps/SRE?) and business decision makers?

Rob Russell@www_ora_tion_ca

Engineering, as other disciplines, is under the remit of the management environment of their respective industries. These aren't engineering challenges, they are industry challenges.

Wilhelm Svenselius@w2k_

Maybe if you work in a place that hasn't adopted CI and static analysis tools (are there any of those left)? Automated checklists.

Martin Burns🏴󠁧󠁢󠁳󠁣󠁴󠁿🇪🇺🏳️‍🌈@MartinBurnsSCO

Likewise: Doctors didn’t want to use flow approaches because “you can’t reduce our work to a factory”

weirdHK@weirdhongkong

We know about washing our hands now to yes. Ha

taotetek@taotetek

I have seen software devs be highly resistant to having even the most rudimentary operational review requirements for new services. Simple checklists of things like "does it have alerts", "does it log", etc.

Lazy Kotlinier@Jan_Olex

Well one of reason is that when those checklists trumps and overtakes the actual functionality. Then project gets software which does things right without doing right things.

taotetek@taotetek

Proper observability of production software is basic functionality.

Henk Langeveld@hlangeveld

Just those two phrases are way too scary. ☑️Produces actionable alerts according to recommendation XYZ... ☑️Sends structured event logs while masking sensitive data. Good checklists take work.

Tom Hurst@TomHurst16

Not always: nejm.org/doi/full/10.10… I’m a fan of checklists - use them a lot - but the evidence is less clear than stated and stereotyping surgeons (or any profession) not helpful

Introduction of Surgical Safety Checklists in Ontario, Canada | NEJM

Special Article from The New England Journal of Medicine — Introduction of Surgical Safety Checklists in Ontario, Canada

nejm.org

Brandon Savage@brandonsavage

Aviation is the safest way to travel thanks to checklists.

TriTexan@TriTexan

And yet programmers who work for companies like Intuitive Surgical and Mazor Robotics ARE in fact writing program code for robotic surgery that is BETTER than the human surgeons.

Rob Russell@www_ora_tion_ca

And management got those companies licensed and certified in their industry. That's software written in the medical industry, not in the software industry where management has no regulation.

TriTexan@TriTexan

Ok...perhaps starting to follow you a bit more. When you say "software industry" you should clarify the boundaries. Software is rarely "stand alone" and is embedded in almost every other industry (as a core or side effect)...

Rob Russell@www_ora_tion_ca

I'm pretty sure that my thread, as a whole, makes it quite clear what I'm talking about.

Rob Russell@www_ora_tion_ca

TriTexan@TriTexan

Lots of stuff on the interwebs. Sometimes I don't read every single post. But enjoyed your perspective even if not aligned on all points. Good dialog tho!

Kosher Samosa@qu1j0t3

What I frequently call "culture", or "engineering culture"—am I wrong

Alexander “YELLOWSTONE WITH MY BOYYYYYYS” Payne@myrrlyn

nope

Rob Russell@www_ora_tion_ca

Some culture is bad, some is good. Not all culture has constant, daily, global, organized and disciplined continuous improvement.

Kosher Samosa@qu1j0t3

Right. And I think current bubble economics tends to amplify the problems you've mentioned.

thehenster@thehenster

How does the organised continuous improvement happen? Enforced at a professional or organisational level? Just culture?

Mr Burger@mrburger007

It's called not killing people and everytime you do, the NTSB (or other nation's equivalent) are all over you and make sure everyone in your industry learns from your fuck up

🔴@bigreddot

Money is not sufficient, but is necessary. The Shuttle group had the culture described, IIRC there were 17 documented defects in 400k LOC over ~30 years, and each one prompted new learning. Perceived risk/benefit for most software is far less. Outlay (effort) correspondingly less

🔴@bigreddot

Which is only an explanation. Because while all planes have to fail safe, it truly isn't justified to spend that kind effort on Angry Birds. So the question is how to better judge value and risk, so that truly important things can be resourced to support the culture you describe

Adrian Short@adrianshort

The mantra of Silicon Valley is "move quickly and break things". That's all you need to know. Collateral damage is other people's concern, not theirs.

S P P Whiteley@WhiteAero

#aviation does well because it embeds #SystemsEngineering & #SystemSafetyEngineering in the design of #aircraft, aircraft systems and operations. Dont get me wrong, #aviation is not perfect, but its light years ahead of other industries, especially IT who are simply not as mature

Frank Benson@mfbenson1

I have yet to see credible analysis showing electronic voting offers the same level of security as paper ballots. It's an extra risk, and for what? Saving a few hours of counting ballots? Just because a problem can be addressed with software doesn't mean it should be.

Claudius Link@realn2s

Hmm, I believe the civil aviation devices (planes :-) are not well protected against attacks. Like from rockets (omg, happens more often than I thought en.m.wikipedia.org/wiki/List_of_a…), bombs and malicious insiders (MH370?, GWI18G) I thinksthat is what @ErrataRob is talking about

Rob Russell@www_ora_tion_ca

What he's talking about is that he thinks software programmers can be trusted with the management of democracy. That was the joke in the comic.

Claudius Link@realn2s

=8-o I need to read the article.

Claudius Link@realn2s

Ok. I reread the article. I still don't I don't think it imposes "trust the software developers". It imposes failsafes which work despite the software

reply with gif@deankelshall

I studied software engineering as a degree. One point that was made by a professor stuck with me. Software engineers are NOT engineers. Engineers build structures that don’t fall down: bridges, buildings... The IT industry writes software which breaks ALL the time.

Claudius Link@realn2s

Really??? * Pedestrian footpath collapses in Florida * Hyatt Regency walkway collapse * Tacoma Narrows Bridge Should I continue? Henry Petroski "To Engineer Is Human: The Role of Failure in Successful Design" is good read regarding this

peach puff pegasus@deankelshall

Yes really. You just proved my point in that those failures are heavily investigated and are well known because they are rare. However software failures are way more common and not investigated. Are you trying to counter the principles I’m putting forward with specific examples?

Claudius Link@realn2s

"Engineers build structures that don’t fall down"??? The fall down. Even after 10k years of building bridges! Socioeconomic will cause systems to fail. & Don't get me wrong. U should learn all u can from other disciplines. But I see the "real engineering" discussion as damaging

peach puff pegasus@deankelshall

I was generalizing to save space. I guess I should have said “On the whole engineers mostly build structures that do not regularly collapse. Whereas software engineers build software that crashes all the time.” Hopefully that clarifies the point I’m making.

Claudius Link@realn2s

I believe the all engineering is under constant pressure to optimise, to build thing cheaper and faster. Until it breaks. Than it takes a step back.

Billiard Lentil@BilliardLentil

can always count on his sixhead to float up my timeline being obtuse

Ben Zanin@gnomon

Dear IT folks, please read @www_ora_tion_ca's thread. Think on it. Read it again. ↑ ↓

Rob Russell@www_ora_tion_ca

Alex Stamos on Twitter

“I agree with Chris. This is the kind of thinking that leads to "Why can't we just have building codes for software? It worked to protect against earthquakes and fire!" Earthquakes and fire aren't conscious adversaries. Try writing a standards document on how to win at chess. https://t.co/eAPk6M3ijN”

twitter.com

Dayv@dayv

Please don’t put me in the position of having to defend an XKCD comic. That’s a bridge too fucking far.

Andrew Harris@AndrewH93308574

FWIW the Stockfish chess engine code annotates relevant routines with expected ELO gains. IIRC the engine goes through a suite of test positions for hours against other engines, stats are taken, it's not guesswork. Some teachable stuff in there IMO.

Paul Lundin@Risar_

Major difference here - government body vs private company. There is no upside for Facebook to spend the time or effort on this or to release it. Inside the government and some large companies massive reports do exist, they just aren’t often published.

Paul Lundin@Risar_

Additionally while I agree with your general thought - often catastrophic things have to occur to spark this level of insight (Eg planes falling out of the sky and people dying). It’s only recently begun happening in software. There is simply less history of truly bad failures

Rob Russell@www_ora_tion_ca

My argument would be that the regulators are part of the sector, and the lack of regulators are a major contributor to the lack of maturity in software creation.

Paul Lundin@Risar_

Agree on that front. The question is what parts need to be regulated and at what level? Lots of software in the world, not much of it is actually critical /important.

Rob Russell@www_ora_tion_ca

The rules in Canada are simple: Engineering is regulated and requires licencing. Programming is not regulated and can be done with just a CS degree. That's enough of a start, the remaining details come naturally with continuous improvement feedbackery loops over decades.

Jake Cob@jakobpunkt

CS degree optional

Leonard@LeonardFridge

Contractors follow rules too when building a building. Degree or not. Apprenticeships work for unions, and they work for CS degrees in Germany.

Unqualified@jimmynohands

*cough* en.wikipedia.org/wiki/Therac-25 *cough*

Unqualified@jimmynohands

That not every programmer who has passed through formal training has heard of the Therac-25 is a goddamn indictment of our profession.

Yachts_Adrift_Challenge@CHROME_CANNON

(No comment)

Ashley@polyaletheia

There's no legal obligation for Facebook to prevent "disinformation campaigns", and not even consensus that there's an ethical obligation, given the free speech issues involved. Get that consensus first, and then you can talk about professional discipline.

Hawkguy 🏹@delphi_ote

In early days, web software was cheap infrastructure for nonessential tasks. “Don’t worry. It’s just a chat client.” But we got tempted to use the same designs for critical infrastructure. The industry needs to mature and do the unglamorous work of fixing that mistake.

Hawkguy 🏹@delphi_ote

We overpromised on the security and safety of software. We continue overpromise. The industry is still focused on “disruption” and “innovation”. We throw a fit and make excuses whenever anyone asks about safety, security, or trust. We’re not young anymore. We have to grow up.

Heather MacNaughton@HLMacNaughton

That they had a glider pilot in the cockpit was a GD miracle

Rob Russell@www_ora_tion_ca

Not miracle: It was a government initiative to give hundreds of kids taxpayer-subsidized scholarships every year to fly gliders make sure there was a recruitment pipeline for military and industry, this is a significant factor in why western countries have far safer airlines

Rob Russell@www_ora_tion_ca

i.e. Bob Pearson and Sully Sullenberger and Chris Hadfield all had taxpayer-funded glider training as kids. Investment pays off.

Heather MacNaughton@HLMacNaughton

Cool

Markian Gooley@markdotgooley

Presumably the Gimli Glider. Thorough.

Ralph Holz@ralphholz

Without looking at the link - the Gimli Glider?

Ralph Holz@ralphholz

And customers accept ridiculous liability statements. How did we get here?

just sigh@Myrdhale

Too much control of underlying net infrastructure given to private actors who value their control over public scrutiny

🔴@bigreddot

OSS software is often authored by a handful of individuals left alone to support tens or hundreds of thousands of users (or more) with little to no resources. I *wish* I had tech writers, QA, UX, and project management. Please tell me where/how to get them with my zero budget.

John Stillwell@the_jstillwell

I will not defend all s/w devs, but alot of times it's their mgmt that is the problem. The devs (some of them) *want* to do the right thing in the right way.

Rob Russell@www_ora_tion_ca

Aviation has known for decades that management kills people, so they fixed management. Your argument is invalid.

John Stillwell@the_jstillwell

no, you support it. I blame mgmt for alot of our life ills.

Rob Russell@www_ora_tion_ca

Management is a major part of the software development process, and a major part of the industry. If you think programming alone is what results in software.... Well.

John Stillwell@the_jstillwell

You're misreading where I'm coming from. I initially felt you were giving undue emphasis to the devs themselves, rather than the larger, more inclusive, picture. I've worked several roles within both telephony/internet systems and medical industry. I've seen sausage being made.

Sophie Alice Acton@prosateuse

This is why I find Air Crash Investigation so reassuring. Whenever something goes wrong, there's so much effort put into working out what it was and what to do to prevent it from happening again. Everyone thinks I'm weird.

Rob Russell@www_ora_tion_ca

All the best people are weird

sjh_canada@sjh_canada

Don't forget snakes.

Justin Payne@crashfrog

Yeah, you don’t get it, though. A lot of users actively don’t want security in their software. What would aviation look like if 30% of air passengers actively wanted the plane to crash?

Justin Payne@crashfrog

What would it look like if the government wanted there to be a vulnerability in every airplane so they could crash planes on demand? If users sought other forms of influence to mandate air travel un-safety?

Justin Payne@crashfrog

Hardware itself defeats security. What would aviation look like if Boeing built every aircraft with a button on each passenger’s seat that would let them take control of the cockpit?

Rob Russell@www_ora_tion_ca

Tip from someone who "gets it": There *are* government-mandated vulnerabilities in airliners so that governments can crash them.

Oscar Gala Grano@oscargala

Like users of airplanes don’t actively not want security on the airplanes? I see tons of people complaining and/or ignoring them all the time. Until we are at freaking NASA moon landing software standards, we’re no where near other industry standards

Rob Russell@www_ora_tion_ca

Um: Neil Armstrong switched that computer off and landed manually because he did not want to die.

Rob Russell@www_ora_tion_ca

1202 Data overflow alarm at mission time 102:38:26, 1201 alarm at 102:42:19, Manual control due to computer navigational error established at 102:43:15 -- hq.nasa.gov/alsj/a11/a11.l…

Oscar Gala Grano@oscargala

Sure, but things like that never happen in aviation? I thought it was about higher standards, not about perfect systems, because those don’t exist. You know Armstrong did it because it’s public. There is probably a ton of research documentation on every trip into space.

Rob Russell@www_ora_tion_ca

It's not about how high the standards are, it's about applying momentum to continuous improvement of those standards. Any standard can be improved, and it's that improvement cycle that's missing/immature in the software industry.

Evil X@Evil_X_

Not he did not. The software actually just worked as designed to prevent a failure, see

No, a “checklist error” did not almost derail the first moon landing

The cause of Apollo 11’s landing alarms is actually a lot more complicated.

arstechnica.com

Rob Russell@www_ora_tion_ca

Fact: Eagle was off course because of navigational computer computation error, headed for a crater wall. Fact: Neil Armstrong landed Eagle manually. Fact: The 1201 and 1202 alarms were unrelated to the above, but added to the confusion at the time.

Rob Russell@www_ora_tion_ca

Ars is disputing that the 1201 and 1202 alarms almost caused an abort, which is a fair argument to make, but unrelated to the fact that the computer was about to fly Eagle sideways in to rock.

Evil X@Evil_X_

Do you have good pointers on the “navigational computer computation error”?

Rob Russell@www_ora_tion_ca

Computational error based on incorrect inputs from the ground, search this oral history for the keyword "perturbations"

The Oral History of the Apollo 11 Moon Landing

The knuckle-biting story of the first lunar landing from the people who were there.

popularmechanics.com

Rob Russell@www_ora_tion_ca

Fun quote from later in there: "Under the software control, it did a software restart. Five times during the landing, the whole software was flushed and reconstructed in terms of what was being executed."

Evil X@Evil_X_

Read that, but again it looks related to the 1201/1202. It seems that the not optimal trajectory was indeed because ground gave the initial go 4-5s too late

Evil X@Evil_X_

Why mentioning 1201/1202 then. It especially added confusion to your tweet (at least it confused me :)

Rob Russell@www_ora_tion_ca

Because space jargon is FUN

Evil X@Evil_X_

My understand on this is that 1201/1202 were harmless for the LM but @TheRealBuzz and Neil didn't know it at first, and their handling prevented them to scan the landing area to chose a safe one. Then Neil decided to switch to manual to pick the precise landing site himself

Evil X@Evil_X_

I'm not aware of a navigational error (based on 102:42:35) so any relevant (and accurate, considering the amount of noise one can find on this) pointer would be nice :)

chris pepperdine@cpepp

yipes

Christian Mock@ChristianMock

I think you're making Rob's point quite nicely: users don't want "no security". They want to get the(ir) job done, they want convenience, they need that attachment, etc. Air passengers don't want crashes, they want less hassle at security checks or smoke a cig onboard.

Rob Russell@www_ora_tion_ca

We're talking about a comic about voting machine software, not everyone's daily software to "get their job done" and fart around with attachments.

Christian Mock@ChristianMock

OK. I think my point still holds, but the examples change. Users want to vote quickly and get the job of managing an election done easily. There may be a higher risk of deliberate attacks if they're easy and people think they can get away with it (e.g. voting twice).

Christian Mock@ChristianMock

The attachment example also underlines the "no culture of learning from incidents" point -- remember the "iloveyou" virus? Nothing has been learned WRT executing active content from mail. That's why more than 10 years later, ransomware-by-mail attacks were so easily carried out.

Pierre Thierry@kephasp

It's not that users don't want security, it's that they have been trained by horrendous IT to be used to exceptionnaly low levels of security. Like LinkedIn asking your GMail password to get to your contacts. That should get people fired.

Daniel Baird@danieljbaird

You make a good point. This problem needs to be addressed. Large passenger aircraft are safer than small aircraft and cars because most societies have zero appetite for 100s of people dying in a fireball. I wonder if Malaysian Airlines is still hemorrhaging $.

alienghic@octodon.social@alienghic

NASA & JPL do have software engineers who can write low defect software. Doing so is slow and expensive, and may also require clear requirements. Most end user software is written like toddlers build towers, piling stuff up and hoping for the best.

kirk israel@kirkjerk

Eh, not toddlers building towers, more like Lego (but not nec. the Technic kind) - I mean it works, and there can be some good principles in it, but rarely is it truly robust especially in new environs.

Steven ɥsᴉɟpod Ketelsen 🎙️🐟🥚@podfish

Raj@RajGoel_NY

You know what's more expensive? The current infosec industry. The financial, social and PERSONAL cost of breaches. Shoddy software is only cheap because we've externalized the cleanup costs.

kirk israel@kirkjerk

Aviation Eng and Builders are solving one set of well-defined problems: keep a plane flying et al, keep a building standing (Ok, dramatic oversimplification) Software is asked to solve HUUGE variety of problems (and cheaply! w/ stakes usually low) Oh, and new UI plz.

Rob Russell@www_ora_tion_ca

You're looking directly at programmers -- but software authorship is done at a management level, defining requirements and such. Of course software programming sucks when software management sucks -- my comments are at the authorship level.

Rob Russell@www_ora_tion_ca

Programmers write code, they do not make software. It takes a hell of a lot more than programming to make a software or a service, but programmers still think management is an unmanageable boogeyman.

Western Bacon Hermann@table_delete

I can see why you got into aviation... your horse is so high you need a whirlybird to reach the saddle.

Western Bacon Hermann@table_delete

The whole "mgmt writes specs & coders are just modern day scribes" ethos you're flaunting here is how software was done 50 yrs ago.

Western Bacon Hermann@table_delete

As @kirkjerk was saying, modern software is everywhere, solving every problem. Doesn't make sense to apply the same methodology to every project. Many software shops wouldn't be able to exist if they followed waterfall or somesuch.

Western Bacon Hermann@table_delete

Which is to say, this whole "you ain't writing software" gatekeeping act you're putting on here is nonsense. Some software shops are completely flat, no mgmt, no ivory tower handing down algorithms and flow charts... and some of those are making 8+ figures of revenue.

Rob Russell@www_ora_tion_ca

yes yes and your value to society and humanity is defined by your revenue right sure

Western Bacon Hermann@table_delete

Humanity, huh... so is that an integration test I need to add to my CI/CD pipeline? Can't ship to production unless a child achieves enlightenment by my algorithms?

Western Bacon Hermann@table_delete

And here I thought you couldn't get any more sanctimonious. Boy, have I got some egg on my face!

Western Bacon Hermann@table_delete

Seriously, software is just a fucking job. I write it to get a paycheck. If my software makes enough money to sustain the business, what more could I ask for? Nothing. That's why I mention revenue. That's really all that matters here. The rest of this diatribe is faff.

Rob Russell@www_ora_tion_ca

A job and a profession are two completely different things. It's nice you have a job, I'm glad you enjoy it, I hope your doctor is a professional.

Western Bacon Hermann@table_delete

You push paper around a desk and punch keys on a keyboard. You don't have a life in your hands. (The lives that use your product are in the hands of thousands) If your head got any bigger it'd start affecting the tides.

Rob Russell@www_ora_tion_ca

We're talking about a comic about voting machine software, not punching keyboards and pushing paper.

Western Bacon Hermann@table_delete

That might be what you're thinking, but it sure ain't how you're coming off.

Contains Less Than 2% Hermann@table_delete

There's no P.E. or M.D. after your name. Looks like you have a job too... professionals are accredited and regulated.

Rob Russell@www_ora_tion_ca

I have a real name that I'm willing to stand behind, you don't even have that. Yes, I have accreditations and professional designations and licences, but that's not what makes me right about stuff on a silly bird website.

Contains Less Than 2% Hermann@table_delete

Ooh he's moved from gatekeeping to bullying. You want to meet me in the parking lot, that why you want my name? How very...

Rob Russell@www_ora_tion_ca

yes yes im the first to start bullying in this thread boo hoo anonymous coward

Contains Less Than 2% Hermann@table_delete

Is that part of attaining a professional accreditation: demonstrating mastery of playground taunts? What letters do you get after your name, there... P.U.?

inspirel.com@inspirel_com

Agree. From personal experience, after moving to aviation from "normal" software industry, it takes a bit of time to appreciate the role of proper management and process. But when the product needs to be in service for 35 years, the "startup fever" is not the right mindset.

Western Bacon Hermann@table_delete

This isn't a dichotomy, nor is it even a spectrum. There are many dimensions to our work products. "35 years" is a useless figure to someone who works on HFT algos or med imaging.

inspirel.com@inspirel_com

I'm glad that you have changed the pitch from "this is how it was 50 yrs ago". Yes, there are many dimensions, so please allow others do the safety critical stuff the way it needs to be done. Have fun with HFT, but beware med imaging - some of it might need certification.

inspirel.com@inspirel_com

And shortly after, somewhere else I found this (fresh and close to "med imaging"): bbc.co.uk/news/av/techno… You don't want it. And making billions in revenue on software that is selling ads gives no credibility in *this* area. Same for autonomous cars, etc.

Hack attack can stop people's hearts

Researchers disclose an unfixed vulnerability that threatens medical devices.

bbc.co.uk

Alex Fürstenau@afuerstenau

Robert Tarrall@tarrall

1/ It sounds like you’re saying the root cause of problematic s/w is human error & humans at fault are management. But management answers to Board/investors, who are in turn “answering” to profits/customers; custs are operating w/incomplete info. Root cause attrib is hard.

Robert Tarrall@tarrall

2/ Calling out Facebook‘s failure to prevent foreign election influence conveniently ignores how ludicrous the very idea of bots influencing an election was, not long ago. FB has scaled 2000x in 14 years. Outcomes are more obvious in hindsight.

Rob Russell@www_ora_tion_ca

Sociologists predicted it, gamer gate proved it was coming, but Facebook ignored their expertise and warnings.

Rob Russell@www_ora_tion_ca

There is no single root cause, just accountability.

TriTexan@TriTexan

Rob - your concept of "authorship" is flawed. A building author IS the architect. S/he is the one licensed, regulated & accountable to produce a safe, well engineered building. The prop developer cannot override the laws & regs nor will be liable like the architect for flaws.

Rob Russell@www_ora_tion_ca

You're being overly theoretical and superficial. 72 people died in Grenfell, and I'll wager a beer that no individual architect will lose their license.

TriTexan@TriTexan

Not unique to ANY industry. And theory is precisely the core of of the aviation industry (and others). Being theoretical is kinda what engineers (and pioneers) do, ESP in aerospace. So lets set attacks on proper science aside.

TheElginWorks@AnnieTheObscure

as a modern software engineer, I agree. The few times we let software control really important stuff like airplanes, the rules are completely different.

Michael Saji@saji

cc @techbasset

Nicholas Murphy@techbasset

Conversely, having just gotten to peer into the cockpit of the B757 we were supposed to take but was grounded, and beheld the finest software 1990 had to offer (as well as the iPads suction cupped to the windows to compensate): there are downsides to excessive conservatism.

mjl (etc).@the_mjl

The slightest change in avionics needs to be Certified. And Certification is... *very* expensive. Hence the conservatism.

Nicholas Murphy@techbasset

I’m aware. The discussion was about software/design/compliance. My point was pointing to aviation as an unqualified success of more rigorous design does not tell the whole story.

Rob Russell@www_ora_tion_ca

Rigorous design wasn't what saved the aviation industry, it was applying the Deming wheel of continuous improvement with transparency and accountability and sharing of best practices.

Nicholas Murphy@techbasset

That actually happens pretty well in software too. Issue is one of priorities and incentives: cloud providers take this stuff real seriously, IoT providers (for instance) can’t be bothered to care.

Raj@RajGoel_NY

Would you rather that Aviation be conservative. Or have a shiny UI each flight with a 99% chance of crashing?

Rob Russell@www_ora_tion_ca

Aviation is the most innovative industry we have, from kitty hawk to supersonic space shuttle landings in a single person's lifetime. Aviation isn't conservative, it's disciplined.

Raj@RajGoel_NY

100% agree. Am updating my mental models...

Nicholas Murphy@techbasset

Sigh. Meaninglessly hyperbolic statement. 20th century had a lot of technological low-hanging fruit that was grabbed. Aviation, along with basically every other area I can think of, had remarkable advances.

Nicholas Murphy@techbasset

There is actually a variety of interesting discussions to be had here if anyone cared about nuance. The dynamics of aviation are very different from other tech.

Nicholas Murphy@techbasset

Needs to be much more conservative than other areas because of safety, but unclear whether it needs to be as heavyweight and expensive as it is. Conversely, reasons for crappy reliability in other areas of software have mostly to do with incentives, not ignorance.

Nicholas Murphy@techbasset

E.g., cloud providers do a pretty good job with security, actually, balanced against fast iteration. IoT vendors are terrible because they have little incentive to do better.

Nicholas Murphy@techbasset

I would like airline pilots to have access to UI advances from the past 30 years to give them better situational awareness. I have better tools at my disposal as a private pilot than airline pilots generally do.

Andrea Taylor@taylorandr

The iPads largely replace a lot of paper, including heavy flight bags pilots used to have to carry. They could build that functionality into the airplane, but it turns out the ideal form factor for this interface is a... tablet, that the pilots can hold like a piece of paper.

Rob Russell@www_ora_tion_ca

Yep, the iPad is part of the pilot's equipment, not part of the aircraft's equipment -- and at no point does the safety of the aircraft depend on the iPad.

Nicholas Murphy@techbasset

Sure does if misreading the approach plate means pilot flies into a mountain, which is more of a problem than wings failing. Also, iPads tend to be suction cupped to window, so not exactly loose in cabin. “Pilot” vs “plane” not interesting w.r.t. safety: unified system.

Andrea Taylor@taylorandr

That's like saying the aircraft's safety in the paper approach plate days relied on paper. The pilot misreading the approach plate has nothing to do with it being on an iPad versus paper. Of course, the pilot is not going to set the iPad down loosely when not using it.

Nicholas Murphy@techbasset

My point is planes (currently) don’t fly themselves. You can end up dead on the best engineered plane if the pilot makes a mistake. And user interface design affects the pilot’s awareness of what is going on. How information is displayed to pilot, and when, 100% affects safety.

Rob Russell@www_ora_tion_ca

Pilots train on user interfaces before flying with them, it's never a surprise.

Nicholas Murphy@techbasset

Not the issue. E.g., G1000 gives better awareness of what’s going on, lowers cognitive load, alerts urgent issues, highlights most relevant info. Equivalently experienced pilot using g1000 is _safer_ than pilot using old steam gauges. User interfaces matter. Ask any pilot.

Andrea Taylor@taylorandr

What does this have to do with iPads "compensating" for the 757's software? A G1000 is aircraft equipment, and iPad is not. The iPad does not interface with the aircraft's controls or displays.

Rob Russell@www_ora_tion_ca

I'm a pilot, and a flight instructor, so I asked myself. You're very, very wrong. We have to turn all that blinky shit off for our student pilots.

Rob Russell@www_ora_tion_ca

No digital readout will ever beat the intuitive design of these instruments. Here's the ASI that I flew with today -- I don't even have to read a number, I get what I need with a glance in a quarter of a second, then I get my head back out of the cockpit.

Rob Russell@www_ora_tion_ca

Unless you're flying IFR, the G1000 is a dangerous distraction.

Rob Russell@www_ora_tion_ca

Every second that my eyes are on an instrument will take me away from the real dangers that can knock me out of the sky -- rich doctors in SR22s staring at fancy G1000s who aren't looking where they're going.

hueezer@hueezer

I’ve worked both in aerospace and the software tech industry and I completely agree with this.

Seth Smithy@SmithySeth

The cowboys who built Facebook and Google are now billionaires. The software industry doesn't reward careful engineering right now... it rewards shiny new bells and whistles. I do a lot of programming, and there's always huge pressure to build more features in less time.

Sailor Mairi 🌙@Sailor_Mairi

If my boss on the ship I work on accidentally dumped oil overboard, he could: Lose his engineer's license Get tossed in jail Be required to pay a huge fine.

Sean Eric Fagan@kithrup

More like a kitten on super catnip laced with cocaine

GeorgeWilliamHerbert@GeorgeWHerbert

"All of my previous work was JavaScript but I had a revelation while drunk and high last night and learned Rust and we're coding your key components in it now wheeeeeee I'm still high wheeeee" (Not *quite* a literal quote. But far far too close...)

James@jholyhead

There is a field that deals with safety critical software engineering, but I'd bet an index finger 90% of professional software engineers couldn't describe any of those principles or practices. The knowledge exists to enable us to build robust software systems; we choose not to.

Daniel Houck@daniel_houck

Yes, modern software discipline is a lot worse than aviation discipline. But the attacks are also more powerful. Weather, etc. are like the network connection going out. Software should be able to survive that, and often doesn't. But an *attack* is like an AA gun.

Daniel Houck@daniel_houck

Almost none of the things on your list of things attacking airplanes actually *want* the plane to fail, they just make it hard to succeed. With software, you often *do* have a malicious attack, which is harder to deal with.

Rob Russell@www_ora_tion_ca

You deal with it the same way you deal with everything else: disciplined continuous improvement, which the software industry sucks at compared to the aviation industry.

Daniel Houck@daniel_houck

Yes, it does suck. But there's still a material difference. If something can go wrong with a flight one time in a hundred billion by chance, there haven't been enough flights to notice. If there's a five-byte input that causes your program to crash, an attacker will find it.

Rob Russell@www_ora_tion_ca

Okay so why would you want a voting machine like that

Daniel Houck@daniel_houck

I didn't say I did; I was just pointing out that it's not *all* bad discipline. (Although I think with enough care (and time and money), we could have secure, safe voting machines. We're not there yet and a lot of smart people disagree it's even in-principle possible, though.)

Jamie Stackhouse@stackhousejs

There is also a lot of work to make electronic systems safer. Solvers to prove algorithms will work the same way. Fuzzing to find those five byte inputs. There is plenty of work and education left to go into incremental improvement before “trust” isn’t a question for software.

Jamie Stackhouse@stackhousejs

Electronic voting systems also don’t just have electronics problems. You can make your box “unhackable” and then the shipping company sends the wrong one that has a nicer PCB. Change in process of large systems requires methodical introspection.

Daniel Houck@daniel_houck

The point of the five byte figure is it's a nice round number greater than 100 billion, the number I used for airplane problems. I should have taken into account that running the software is cheaper and more common than flying; the point is "wouldn't happen accidentally".

Daniel Houck@daniel_houck

Some fuzzers (AFL, for example) are smart enough to try to find inputs that trigger weird behavior, but if I had said "one kilobyte" instead of "five bytes", the point would hold but fuzzing might not find it.

Devdas Bhagat@f3ew

An attack against software looks like what happened to MH17. I don't think the airline industry is building defenses for civilian airliners. Bad weather is a router reboot.

Devdas Bhagat@f3ew

You could get software regulated to the level of airlines if you could convince people to pay the same kind of pricing as for aircraft, and do similar maintenance. There is a small limited market for this.

Daniel Houck@daniel_houck

Sometimes software on my *phone*, which should be used to a flaky network, misbehaves on router reboot. Software quality could be a lot higher than it is, even without attackers. But I got that app for free, and nobody's giving out free airplanes.

Daniel Houck@daniel_houck

If a civilian airplane is perfectly safe unless 40mm, 800g balls of metal hits it a couple times a second at faster than the speed of sound, nobody will buy it because it's too heavy and they want something lighter even if it is less safe.

Chief Chaos Monkey@DrNemski

Marsh Ray@marshray

So how many lines of code have come off of your fingers to and into a product someone loved?

(((Eddie)))@abracadocious

Quibble: cute puppies can be housebroken in less than a year.

joel garry@joelgarry

But not after your puppies have puppies. (15 years for GHOST on Linux, who knows how long link_ntoa on BSD...)

Jon Pincus@jdp23

Indeed. Speaking as a software engineer who was on a National Academies panel on software dependability, I totally agree. Yes, software is hard. But Facebook didn't use known best practices like social threat modeling. Shockingly unprofessional. cc @digitalsista

Sven Slootweg@joepie91

Speaking as a software developer, tutor, and code reviewer: the software development industry does *not* have any kind of widespread concept of 'professional ethics'. That's where the problem starts. There's absolutely no common sense of responsibility.

Sven Slootweg@joepie91

This creates a lot of compounding issues: 1) Issues are treated as 'solved, never think about again'. 2) Because most developers are negligent, clients expect no expenses on security; which makes ethical developers non-competitive. 3) And so on.

Sven Slootweg@joepie91

Which unfortunately means that this is something that can't be solved on an individual level; it's a systemic issue and there are no incentives for individual developers to change it.

Quinn Norton@quinnnorton

Great thread, and something I've been going on about for ages, but it's not disingenuous, it's cultural. Software as a field is so new and subjected to so much light and heat that the practitioners don't know if could be better.

Quinn Norton@quinnnorton

There are other structural problems with bringing standards and good practice to software, the wild growth of the field combined with the lack of centralization or structure to the field, comparatively.

Quinn Norton@quinnnorton

It would be as if most people who got into aviation did so by building a plane at home first. But despite this I do think software liability would go a long way to starting to get the incentives in line with creating good standards similar to medicine and aviation.

Meng Weng Wong@mengwong

There’s also an issue of scale: if planes failed the way software does, the next time a single Boeing 777 crashes, every other Boeing 777 in the air would also lose power.

Quinn Norton@quinnnorton

There are various places where the metaphor fails, but also doesn't. In a way that's exactly what we say with wannacry -- over and over again.

Quinn Norton@quinnnorton

and honestly, i think the reason wannacry gets to keep exploiting SMB v1 and Boeings don't get to keep crashing is that we can take pictures of one, and see it clearly, and treat it as an event mentally.

Meng Weng Wong@mengwong

There’s also the monoculture vs diversity aspect though: each airplane has its own set of pilots double-checking each other. Software is like a single godlike pilot flying all the world’s planes at the same time, but she sometimes gets sleepy.

Quinn Norton@quinnnorton

i think that's a misfitting metaphor. pilots might match better with sysadmins/ops, who aren't ime the biggest fans of programmer bullshit.

Meng Weng Wong@mengwong

Today a key subsystem failed because the third-party datasource API on which it depended simply disappeared, deactivated and delisted without warning. Whose fault was it—theirs for going down or ours for relying on it? The turtles are all wobbly, all the way down.

Quinn Norton@quinnnorton

it literally doesn't matter. fault is the wrong frame, as the original thread made clear: responsibility is the right frame. Who deals with it and what recourse do they have against others who may have failed their own responsibilities in the chain.

Quinn Norton@quinnnorton

Attribution is always a political choice, right now that choice is to attribute in a way that protects vendors from any liability or responsibility for what their software does in the world.

Rob Russell@www_ora_tion_ca

The fault was with the management that decided years ago not to care

Quinn Norton@quinnnorton

no one in the chain cares. i know we love to beat up on the suits, but this is a toxic culture, and has been as long as i've been in it. the privilege to treat quality as esoteric and impossible is of the same piece as why it's so dominated by whiteness, sexism, etc.

Quinn Norton@quinnnorton

bullshit is bullshit, and you see it expressed in myriad ways.

Quinn Norton@quinnnorton

sorry i'm so ranty this has been many many years of my life and i'm so over it.

Quinn Norton@quinnnorton

Anyway, this is the most recent piece I've written about it: emptywheel.net/2017/09/14/sof…

Rob Russell@www_ora_tion_ca

Software is not authored by a programmer, it takes a village. The problems need management fixed first.

Quinn Norton@quinnnorton

dev culture is a lot of the problem, and management alone can't fix that. the place it needs to get fixed first, or can be, is software liability. honestly, it's going to be the goddamned insurance companies if it's anyone.

Moh-tay-toe Moh-tah-toe 🎴🃏@iGoog

liability for what though? FOSS is strictly non-liable, and strict liability sounds more like indentured servitude, which is worse if you work with non-geniuses.

Thomas Depierre@Di4naO

As @allspaw point out, this could be aa much good as bad. I personally think bad. Something that is under and that we ought to talk about more is how we move the culture forward toward safety. Lot of possibilities for actions, some already happening.

Thomas Depierre@Di4naO

Personally, i would love to see existing group like the ACM begin to applies publically their ethic code... changing the fubar interview process and more remote job could help in allowing developers to say "no". We forget that a huge part of the field is invisible.

Thomas Depierre@Di4naO

All these "enterprise" dev that still spend their day doing Java or Perl or Cobol at BoA, at most hardware manufacturer, at car manufacturer, at Samsung, etc. You never see them but they also do not see us. The whole conference circuit: do not know. They are stuck in the 90s.

Thomas Depierre@Di4naO

Giving them personal liability will do nothing because they will not know about it. How would they know? A trial? Really? That will just not happen. They know nothing about the consequences of their act : noone understand these systems. They are complex and "organically grown"

Johannes'fish'Ziemke@discordianfish

Maybe this says more about your career and the places you work as „incident investigator“ than it says about the software engineering field?

David Preece@rantydave

You *can* have reliable software, and it can be developed in a professional and deeply risk averse manner. It costs about 100x as much and takes 10x as long.

David Welch@david_welch

But that doesn’t make for a great ivory tower monologue

bwinton.@bwinton

Are you also taking into account the cost of having the US Elections hacked by Russia? Externalities are frequently ignored, but no less costly.

Too many security holes@Toomanysecurity

That's the cost of designing systems that will fail safely in the event of a hack. It might be more expensive than that, even, so for now the gold standard will continue to be paper.

Rob Russell@www_ora_tion_ca

@rantydave This is a thread specifically about voting machines.

David Welch@david_welch

@rantydave Doesn’t make @rantydave’s point any less valid. Even if it were deemed important enough to make it “the secure way”, the mere fact that other software is made at such a reduced price + speed almost guarantees someone will cut corners to be more economical

Rob Russell@www_ora_tion_ca

@rantydave My argument was that Facebook and Google, as corporations, should not be trusted to build voting machines. I fail to see any counter-argument in David's tweet.

Rob Russell@www_ora_tion_ca

@rantydave I think he's saying that other corporations in other industries could theoretically be trusted, which is a point irrelevant to my argument and thread.

David Welch@david_welch

@rantydave Ahhh thank you for clarifying. Yeah, let’s not give them the contracts for voting 😅

Michael Foukarakis@mfukar

It's not wildly disingenuous, it's completely oblivious and uninformed. There's no need to even discuss it.

Dobin Rutishauser@dobinrutis

Robustness (robustness against random errors) is much, much easier to handle than security (robustness against targeted errors). Think that everyone could change gravity for each atom all over the world, design airplane for that.

Too many security holes@Toomanysecurity

You might think so, but it's just a matter of properly designing for the threat model and failure modes. Hiring a Red Team to attack the system at various stages of the process (including initial design documents) is one way to develop an appropriate threat model.

joel garry@joelgarry

The real problem there is, the threat defense model is organization based, but the actual threat surface ranges over 50 billion nodes, some of which are obsolete smart light bulbs and XP machines.

Too many security holes@Toomanysecurity

That's the potential attack vectors. If you design a system you have opportunities at several stages to limit the attack surface. Say by not using massive general-purpose OS distributions as a base, and by not using public networks for communication.

joel garry@joelgarry

That seems pretty restrictive for general software.

Too many security holes@Toomanysecurity

For word processors and games, sure, but there are sensitive applications like voting machines where it is the minimal cost of entry if you want a secure system. I'd say it applies to light bulbs, too.

joel garry@joelgarry

From the xkcd is wrong link: " Security against human attack consists of the entire infrastructure outside the plane, such as TSA forcing us to take off our shoes, to trade restrictions to prevent the proliferation of Stinger missiles." 1/2

joel garry@joelgarry

And the same thing in the OP: "Simplistic focus on the machine and loss of perspective on the bigger system & society is the hubris that keeps the technology industry trapped in the footgun cycle." So while we can prolly agree TSA is silly, the larger issue is the larger surface.

Too many security holes@Toomanysecurity

True, but when designing systems you define attack surfaces at multiple layers. This isn't just the machine, but also the operators and the administrators of the machine, and how all of them together interact with the outside world. We can do this, with discipline and patience.

Rob Russell@www_ora_tion_ca

This is a thread about voting machines, not general software

joel garry@joelgarry

"Where's the detailed 200-page public report from Facebook" kind of threw me off. Sorry.

Paul Littlebury@jaffamonkey

I had a reality check when I read somewhere that there was no way NASA would be interested in most tech companies, as they need things to work with little room for error, and could do without side challenges, like npm package management or github merge conflicts 😂

John Allspaw@allspaw

Many topics covered in this thread and incredulity is expected. However, as someone who has been “bridging” Systems Safety/Human Factors and modern software engineering for the past ~8 years, I can confidently say: it’s not as simple as you make it out.

John Allspaw@allspaw

Yes, the longevity and maturity of “software engineering” is part of this. Yes, it’s likely that code of ethics (possibly licensing, but I’m unsure of that) and ‘professionalism’ differences between aviation and software. But: this is a simplistic comparison between fields.

John Allspaw@allspaw

Regulation plays a significant role here. Some positive in some directions (independent investigation organizations, for example) and some negative in other directions (reports that list bullshit such as “pilot error” or “loss of situational awareness” as causes).

John Allspaw@allspaw

*All* software has potential for unintended consequences, regardless of the domain. Airplanes, cars, social media, email...all of it. Those unintended consequences manifest sometimes as ‘vulnerabilities’ exploited by adversaries or bugs resulting in unavailability or...

John Allspaw@allspaw

...or software that works exactly as the developer intended but used differently by users, or many others. The same is true for other domains. Comparisons like these are not even apples and oranges, they’re apples and doorknobs.

Rob Russell@www_ora_tion_ca

It's not simple to successfully accomplish. It's simple to start trying.

John Allspaw@allspaw

Trying should include a) resisting the temptation to compare incompatible domains, and b) oversimplifying complex adaptive systems in 280 characters. :)

Steven Shorrock@StevenShorrock

Many major diffs, inc. ICAO and major accidents

Matt Leach@nextcontext

I’d agree that he is drastically understating the threat that information security community, but that makes it even more concerning that he is bang on the mark about the degree of seriousness that the problem is approached with.

Matt Leach@nextcontext

And this is mostly because information resources tends to be critically under resourced compared to the level of risk.

Matt Leach@nextcontext

And that is in large part because their employers aren’t being held to account for their negligence, and legislators aren’t being held to account for their failure to engage with the world we have been actually living in for the last thirty years...

Matt Leach@nextcontext

In particular the Equifax and Facebook incidents stand out as opportunities to send a message, that were dramatically overlooked.

Matt Leach@nextcontext

One thing I think the industry needs to do is get a lot more forthright about the damage being caused by these incidents, rather than the “no personal details were leaked” bullshit that often gets dragged out.

Matt Leach@nextcontext

And the public, governments and (I’m guessing the financial sector) need to get far more confrontational about the way risk is being externalised to them.

John Allspaw@allspaw

"Degree of seriousness" is just another form of "lack of airmanship" found in aviation accident reports. If there are no breaches or they are decreasing, does that indicate a proper degree of seriousness? (no)

John Allspaw@allspaw

Calls for "better" standards of practice is a common reaction to all consequential accidents. It's easy to do, adds very little to dialogue about future improvement, and ignores the real "messy details" of actual work.

Ben Klaasen@benklaasen

...non-linear systems with emergent properties. This is a really hard problem. We know enough about how hard software engineering is to know that voting machines are a threat to democracy. Secret ballot on paper, public scrutiny of the tally and counting process is all we need.

Franklin Wirtz@franklinwirtz

Having operated nuclear submarines for many years and more recently computer services, I get excited every time this debate comes up. I authored a few RCAs in the Navy, and read thousands of others. Almost all were attributed to "Human Error."

Franklin Wirtz@franklinwirtz

When I switched from submarines to software services, the differences were puzzling: why were there no operating procedures, periodic maintenance schedules, incident procedures, on-call rotations, checklists, standing orders, hydrostatic test, incident drills, etc

Franklin Wirtz@franklinwirtz

...some of this has changed over the last 15 years. Wiki serves as a evolving operating/incident/maintenance procedure on some teams. On-Call rotations are ubiquitous. And yet these debates are often, to quote Crash Davis in Bull Durham, like a "martian talking to a fungo"

Franklin Wirtz@franklinwirtz

I've puzzled over these differences and have followed the work of Drs Allspaw and Cook with great interest as they create a new field. Yet the debate still explodes occasionally with a practitioner of traditional accident investigation saying, "do the RCAs, human error is real...

Franklin Wirtz@franklinwirtz

...MTTR, MTTB, MTTD, MTTC." My experience is that the traditional methodologies worked very well on submarines, but were much less successful in software services. For a while I attributed this to immature culture, lack of leadership, lack of accountability.

Franklin Wirtz@franklinwirtz

This thread already mentions two other differences, regulation and the maturity of "the field." My sense is there are three attributes of "traditional operations" that makes "traditional Problem Management" (RCA, Post Mortems, Continuous Improvement) work:

Franklin Wirtz@franklinwirtz

1. stability of architecture 2. horizontal scaling 3. absence of Moore's Law (related to #2)

Franklin Wirtz@franklinwirtz

An operator from WWII, would be quite familiar with the "architecture" of a submarine: Engineroom in the back, control room, Sonar, torpedo-room, ballast tanks. While the flat-screen displays would surprise, we have replicated the intuitive utility of the dial-gauge on those.

Franklin Wirtz@franklinwirtz

Any software service that scales, by necessity, changes architecture. One could argue that at some super high-level there is just a "front end" and "back end", but under the hood there will be entirely new components every few years that dramatically change how operators interact

Franklin Wirtz@franklinwirtz

2. horizontal scaling- sure everyone is going to say, "of course I horizontally scale my service," but let's compare the count of submarines, airplanes or automobiles or to web search services or crypto-currencies. Replication of platform creates a fleet of identical machines,

Franklin Wirtz@franklinwirtz

...each with a different Ops team. This creates a large 'n' for a central office to collect and compare incidents, thereby refining and converging procedures and designs. Read the introductory chapters of ITIL.

Franklin Wirtz@franklinwirtz

The beginning of the submarine reactor plant manual stated that "everything you will ever need to do to this machine is documented here. If you think you need to do something that is not documented, read it again. If you still don't find it, surface the ship and radio home"

Franklin Wirtz@franklinwirtz

In years of operations, I never encountered an exception. Name a software service wiki for which that is true. The scaling and rapid iteration of architecture is enabled in services by 3. Moore's Law.

Franklin Wirtz@franklinwirtz

If submarines were more like software services, you'd have to imagine a single submarine in the fleet that held a billion torpedoes, traveled at half the speed of light and fit in the palm of your hand.

Franklin Wirtz@franklinwirtz

The operators of that machine would likely be challenged to use traditional methods of accident investigation. I am excited how the new field generates methodologies that can be retrofitted back onto the traditional Ops disciplines. That has often happened in history. (EOM)

Philip Crosby Associates@DiscoverCrosby

All this boils down to "if we don't know how the machine works, we can't understand how it failed, and if we don't take the time to understand how it might fail, we'll never understand how it works."

Philip Crosby Associates@DiscoverCrosby

Rickover took uranium from the ground and put it to work in a submarine -- a feat of engineering much more difficult than anything most programmers will encounter. He was able to publish the book you read because he made choices and established rules for how the machine operated.

Philip Crosby Associates@DiscoverCrosby

Along the way there were thousands of decisions made to eliminate risk and create the best system. There were experiments with multi-reactors and sodium-cooled models, which were abandoned. But all of it began and ended with safety, a mindset that nuclear accidents weren't OK.

Philip Crosby Associates@DiscoverCrosby

There is no such thinking in most software development. But there is in many areas. And in these areas, the kinds of things we're discussing routinely happen and the software is expected to work every time.

Philip Crosby Associates@DiscoverCrosby

It is not that SW is some special, magical gift unlike anything else in our universe. It's that we can either have an attitude that errors are inevitable or errors can and must be prevented. It's as simple as that.

Philip Crosby Associates@DiscoverCrosby

Here's the reality, @www_ora_tion_ca, hospitals are literally killing their patients from medical errors and can't agree that this shouldn't happen. Each industry sees what is different about it and uses it as a shield such that ideas from outside are difficult to consider.

Philip Crosby Associates@DiscoverCrosby

We have a perfect example here. Someone who worked in a very disciplined industry goes to a much less-disciplined one and what happens? Do they change the industry or does the industry change them?

Philip Crosby Associates@DiscoverCrosby

If SW devs begin flight school, do they argue that complexity compels them to make errors during landing? Or do they focus their minds and talents to properly land the plane every time? And when they've mastered that and go back to work, their mindset changes once again.

Philip Crosby Associates@DiscoverCrosby

It's not a matter of "can", it's a matter of will. Of want to. The same people coding in the morning and flying in the afternoon adopt very different mindsets, each adapting to their environment. Each seems rational by that standard, yet each is clearly a choice.

Rob Russell@www_ora_tion_ca

SW Devs have been some of my most challenging student pilots during instruction, their tendency isn't to complain about complexity, the challenge is to prevent a distracting hyperfocus on a minor detail.

Rob Russell@www_ora_tion_ca

Taking the reformation of the medical industry as our example, we're about to find out if they can take the next big leap:

Why Atul Gawande Will Soon Be The Most Feared CEO In Healthcare

Jeff Bezos, Warren Buffett and Jamie Dimon did not hire Atul Gawande to set up a conventional insurance system or haggle with doctors and hospitals over prices. He was hired to disrupt the industry, to make traditional health plans obsolete, and to create a bold new future for American healthcare.

forbes.com

Philip Crosby Associates@DiscoverCrosby

And if they do, will they be leaping on their own, or will they have been pushed from a comfortable resting place?

Rob Russell@www_ora_tion_ca

They're getting pushed by Bezos, Buffett, and Dimon.

Franklin Wirtz@franklinwirtz

While you may intend derision with "don't know how the machine works" that is in fact the starting point for these new approaches to system failure. Hard work, brilliant engineering and unparalleled adherence to high standards were the keys to Rickover's incredible achievement...

Franklin Wirtz@franklinwirtz

...resulting in a machine that for all practical purposes we fully understand, even in failure modes. But systems live on a spectrum of complexity from my toaster on one end to the global economy on the other. Cook et al have been pretty clear that when they say "complex"...

Franklin Wirtz@franklinwirtz

...they mean that it isn't fully understood by any single operator. Now you might say it's irresponsible, or lazy or indicative of a lack of character to build a machine that isn't fully understood, but in truth hard nosed entrepreneurs build them all the time...

Franklin Wirtz@franklinwirtz

...and people are better off for them. If you don't believe that complex systems (defined in this way) do or should exist, then you're expressing an orthogonal argument.

Philip Crosby Associates@DiscoverCrosby

I think complexity is something that we can generally structure ourselves out of if we choose to do so. You've already seen Rickover do that by turning the atomic bomb into atomic power plants run by 20 year-old kids.

Philip Crosby Associates@DiscoverCrosby

If you want to argue that sending and processing information through a set of pre-defined rules is more complex than that, OK, but as for someone with more time discussing complexity science than me, I'll give you @hvgard and his article: pni2.org/2014/12/comple…

Wellington Commuter@Wellingtoncomm

And it is also "used" by totally untrained, ordinary, people from across the world to do everything from finding a job to arranging a coffee meeting to buying a house to learning about a nuclear submarine's power plant failure protocols.

Wellington Commuter@Wellingtoncomm

A key source of complexity is that successful software gets built upon by other systems. Just look at how we rely on a Simple Mail Transfer Protocol (SMTP) that was invented by one person. Much software complexity comes from vertical integration of systems

Raymond Tomlinson, Who Put the @ Sign in Email, Is Dead at 74

The computer programmer chose the “at” sign to separate a user name from a destination address in his new messaging program.

nytimes.com

Rob Russell@www_ora_tion_ca

Human error is a constant of the universe and never a root cause. Root cause is always process, management, culture, etc.

S P P Whiteley@WhiteAero

#HumanError = #Investigation / #Analysis is unfinished.

Raul Miller@raudelmil

The rule of thumb I like is: you’ve got the root cause if you have a (simple/understood/doable) course of action which clearly fixes the problem. Otherwise you’re still working with symptoms and your investigation is incomplete.

John Allspaw@allspaw

Nope. There is no single root cause of complex systems failure. It doesn't exist, it's not a thing, and it's not only a waste of time trying to find it, it's dangerous to assert confidence with. That this concept continues to survive is why we will continue to have accidents.

Rob Russell@www_ora_tion_ca

True that there is never only one root cause if you look hard enough, though I always have to keep investigations from stopping until we've found at least one that meets the definition.

Rob Russell@www_ora_tion_ca

Root Cause: The most basic cause (or causes) of an incident that management has control to fix (i.e. a process/procedure that is Missing, Incomplete or Not followed) and, when fixed, will prevent (or significantly reduce the likelihood of) additional problems of the same type.

Rob Russell@www_ora_tion_ca

(The actual definition is longer, I shortened it for the character limitation)

Frogolocalypse@1stCrassCitizen

Can i just add a slight counterpoint? The failure of 9/11 was a social phenomenon in that technology (big planes) was used for an unplanned purpose. These problems clearly exist for both software and hardware, and safety systems sometimes amount to nought. All systems fail.

Frogolocalypse@1stCrassCitizen

Perhaps in some systems, especially social ones, we can never achieve 'fail-safe'? If that isn't an option, what are the other ones? Fail less often? Fail less catastrophically? Fail publicly? Fail too often and you get regulated? I really don't know.

John Allspaw@allspaw

My strong suggestion is to read the multiple sources of research on how these (and other) definitions are critically problematic. You could start here: kitchensoap.com/2012/02/10/eac… or cut to the chase and read Dekker's "Field Guide To Understanding Human Error" 3rd edition.

Kitchen Soap – Each necessary, but only jointly sufficient

I thought it might be worth digging in a bit deeper on something that I mentioned in the Advanced Postmortem Fu talk I gave at last year’s Velocity c

kitchensoap.com

Michael Stone@anoncept

Thanks for re-upping this post; the Qureshi overview paper you cite here is really helpful!

CRPIT.COM - One Stop Destination for Education & Job

Get all the Education, Career, Job information at one place. We provide Exam result, Government and Private job details, Important dates and direct link to apply..

crpit.com

Steven Shorrock@StevenShorrock

While there is a hint at at convenience, there should be a nod to “socially constructed”. The idea of “will prevent” and “same type” is also problematic in *complex* systems, esp. wrt emergence. All safety specialists would benefit from reading about the philosophy of causation.

ronnie chen@rondoftw

You can identify the start of the failure cascade but is it even fair to claim a system is complex if it can fail with a singular root? Lack of preventative measures is just as much a casual issue.

Rob Russell@www_ora_tion_ca

As is inadequate monitoring! I see each causal chain as having a root, but there are many chains.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Root Cause: [...] (or causes) All the rest of the palaver is made irrelevant by this idiocy. Gods, give us fucking patience. When the second train carriage is derailed the root cause is the engine being derailed, not the first carriage being derailed.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Let's just tell the second carriage to "not follow" the first carriage.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Ah, but the root cause of the second carriage being derailed was the damaged line that derailed the engine. Wrong. By that iterative logic, the root cause was the Earth coming into existence, or God always existing. Doh. Engineer your answers.

Rob Russell@www_ora_tion_ca

You haven't gotten to anything that management has control to fix yet. Why was the line damaged? Why didn't maintenance catch it?

Rob Russell@www_ora_tion_ca

In this example, 18 different causal chains were traced to their roots: tsb.gc.ca/eng/rapports-r…

Rob Russell@www_ora_tion_ca

And the causes weren't really engineering.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Yes, the train carriages weren't a very good example, just a bad approximation. Language syntax is a part of this problem, specifically here, the interpretation of root. As @allspaw says "...no single root cause of complex systems failure..."

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Introducing "that management has control to fix" appears to prevent reductio ad absurdum but masks unpredictability in complex intersections. This artificial bound provides for KPI's at the cost of admitting the improbable making an appearance.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Attempting to allow for all improbables is simply a no-go. Compiler design has long recognised this with a state of "undefined".

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

The point is that KPI based thinking can instill unwarranted confidence in covering all the bases. It's a necessary evil, without which nobody would have a job, or the wrong people would have the job.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

But it abrogates recognition that in complex systems, things will go wrong, unpredictably.

Rob Russell@www_ora_tion_ca

Applying the word "single" as part of a root cause is scary. Looking for the roots of many causal chains is a better goal.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Agree with that.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

Delayed response, work... We've come to believe that effort in mitigation of the results of failure in complex systems should be viewed as having higher "goodness" in employment KPI terms than extended root causes (note, plural) analysis.

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

This is not meant to imply better, it's a very soft concept, couched in social mores, not amenable to hard analysis, particularly in purely financial terms. Or more succinctly, "what does a human life cost?".

Sophie McLeish ⚧ Ⓐ 🦄 🔞@sophieactual

In case of dick-measuring reactions, we've been the buck stops here person for a national airline network and consulted for other life at risk industry verticals.

Andrew Grose MD, MSc@docgrose

So your search continues until you’ve found something that satisfies an arbitrarily created definition without basis in the realities of a complex world. Interesting.

Rob Russell@www_ora_tion_ca

It's not an arbitrary definition, there's a tremendous amount of research behind it.

Steven Shorrock@StevenShorrock

Really not. Complexity science doesn't agree at all.

Rob Russell@www_ora_tion_ca

My reference texts seem to be in general agreement as to the definition of a root cause, and I've already spent 15 years arguing over specific word choices. It's a high-level definition, because every complex situation is different.

Andrew Grose MD, MSc@docgrose

Research? Can you cite a paper that tested it against something else as a measure of reality?

Rob Russell@www_ora_tion_ca

Not playing that game with someone only bringing negativity here. Say what you believe to be true, don't just say that others are wrong.

Steven Shorrock@StevenShorrock

Imagine if this is how science actually worked.

Rob Russell@www_ora_tion_ca

I'm sorry you seem to be confusing a bird website with sci-hub

Andrew Grose MD, MSc@docgrose

My apologies- but differences of opinion = diversity. Negativity is something else. The concept of “root causes” is an oversimplified model of the world. What are culture, processes and Mgmt but the actions of people?

Rob Russell@www_ora_tion_ca

You're saying that others are wrong, without making any assertions about what you think is right.

Argument Clinic

The classic Monty Python's Flying Circus Argument Clinic sketch. A man goes in to the Argument Clinic to have an argument. This is the full version of this s...

youtube.com

Michael Stone@anoncept

How about: “root cause” language helps people to point to situational factors they believe to be salient while system safety and resilience engineering language help people to elicit and combine all available information, and to seek out broader perspective as necessary.

Michael Stone@anoncept

Re-upping: like most tragedies, accidents and breaches play out on a “stage” that was constructed years in advance. System safety + resilience help illuminate both the script and the stage, not just an arbitrary fatal flaw (“root cause”). twitter.com/anoncept/statu…

Michael Stone@anoncept

A metaphor for people seeking rich(er) narratives: if accidents are tragedies, then on what stage are they played out? — and when and how was the stage constructed and set, by whom, and to what ends?

Michael Stone@anoncept

An especially important class of example: any time we see the same story unfold multiple times with different participants, it’s a good bet that there’s an important common environmental cause and that systems safety tools may help. After all: replacing the people didn’t.

John Allspaw@allspaw

I have given supporting references in this thread. I’m genuinely and earnestly interested in what research you’re referring to, since modern Safety Science and Human Factors and related fields has long since dismissed the concept of root cause in complex systems.

Rob Russell@www_ora_tion_ca

Sorry you got caught up in my frustration, John, my criticism was of Steven and Andrew's behaviour. I agree that the concept of a single root cause is bogus.

Rob Russell@www_ora_tion_ca

That said, my executive stakeholders demand root causes at the conclusion of my investigations. My assertions here are that the term is defined.

Rob Russell@www_ora_tion_ca

Even dismissed concepts still have definitions, and those definitions have been very useful to me in managing executive stakeholder expectations.

Rob Russell@www_ora_tion_ca

I've been successful in completing investigation reports with a dozen casual chains, where each chain gets traced to a root.

Rob Russell@www_ora_tion_ca

I can't finish a report with zero root causes, so I compromise by finishing the reports with several.

Rob Russell@www_ora_tion_ca

As a practitioner in industry, I don't have many of the luxuries that academics can bandy about.

Rob Russell@www_ora_tion_ca

Just today I had to work with senior executives who insisted that a root cause is what you get after asking "why" 5 times.

John Allspaw@allspaw

I (and @docgrose and @StevenShorrock) understand this dilemma in industry (we are all practitioner-researchers) and have been working to change this.

John Allspaw@allspaw

It can be done, albeit slowly. For a great example, see the US Forestry’s Learning Review Guide as an attempt to bring Systems thinking to accident investigation in wilderness firefighting: wildfirelessons.net/HigherLogic/Sy…

Steven Shorrock@StevenShorrock

"behaviour"... I suggest that you rethink think that wrt science. Seriously.

Rob Russell@www_ora_tion_ca

Demonstrably mature comment, consistent with expectations.

Steven Shorrock@StevenShorrock

No need for name calling. My point was that the "behaviour" you referred to was asking for evidence and making the point that root cause is debunked concept in safety science, complexity science, and philosophy of causation. That's all.

Steven Shorrock@StevenShorrock

In European ATM this is now understood, with the help of David Woods, Erik Hollnagel, Richard Cook, etc who have explained why this is the case, to audiences of aviation safety specialists. May seem a small language issue but sig. ramifications, wrt analysis, eg RCA in healthcare

Steven Shorrock@StevenShorrock

People also increasingly understand the problem with the idea of linearity, fishbones, causal chains. There are still reversions to simple and mechanistic system language, but understanding but finally folk are talking about patterns, interactions, influences, emergence...

Steven Shorrock@StevenShorrock

Admittedly this process also involves some simplifications in information presentation in order to enhance understanding (eg. skybrary.aero/index.php/Tool…) since the research on safety science, complexity and philosophy of causation is not accessible to many.

Steven Shorrock@StevenShorrock

Sorry. But Really not.

Rob Russell@www_ora_tion_ca

It's disappointing when experts grumpily fling poop instead of contributing positively to a discussion. If we will stipulate that you're smart, can you be constructive?

Andreas Gutmann@KryptoAndI

Human error can rarely be the "root cause" of an incident in isolated cases. Everything else is systemic.

Rob Russell@www_ora_tion_ca

It's never a root cause. Ever. By definition.

Steven Shorrock@StevenShorrock

Was waiting for you to spot this.

[BHOT] Phoenix@TheFirstOf28

Saved

Jim Miani@JimMiani

spot-on thread, Rob. (I'm a professional software developer).

Konrāds Šmeļkovs@truekonrads

Cost of developing aviation software vs cost of mundane developing business software is very different, because risks are very different. Market forces clearly say that non-safety IT accidents are acceptable.

Louis Ingenthron@LouisIngenthron

This is a very interesting thread, but I have to disagree on the simple matter of scope. The standard for SSL and it's implementers do follow an open process very much like what you're describing.

Louis Ingenthron@LouisIngenthron

But to do so for all of software engineering would be like the aviation industry setting standards for everything from a tricycle to an attack helicopter. There's just too much variation in they types of software being produces for a single set of best practices to work.

Louis Ingenthron@LouisIngenthron

That's why most companies do it internally and have their own best practices to prevent past mistakes. But in a capitalist economy, they'll consider those trade secrets as long as they can (<- one place I totally agree with you; major failure analysis should be public).

Louis Ingenthron@LouisIngenthron

If you tried to create a "programming checklist" even for a smaller-scope, such as writing netcode, there's still so much variation and innovation that half the items would be checked off as N/A. That's not a useful checklist.

Max@changemewtf

What makes programmers unique is that we won't even attempt to make the checklist and learn from which parts are useful or not. We just assume that we know best, it's too complicated and hard for other fields to bother trying to help us organize, so we just keep buttonmashing.

Eric Bailey@EricVBailey

A former boss once told me that he wished everyone did their job like a fighter pilot -- because if a fighter pilot @#$%s up, they *die.*

Corneil du Plessis. ☕🦍@corneil

Safety and complexity related to software operating machinery in the physical world are very different to safety and complexity operating in a human design problem space. A lot of abstract thought is needed to have meaningful discussions on risk.

Kavi Gupta@kavikavigupta

The way I see it, most software doesn't actually need to be that error-free. Where we've recognized that it does, it actually is (airplanes, etc). The issue is that we haven't recognized that some categories, including voting software (!), need to be error-free.

Rob Russell@www_ora_tion_ca

Airplane software is *not* error-free. I've had flight computers crash on me mid-flight several times. But, I'm alive and well -- because the software is designed to be a safely fallible component.

Kavi Gupta@kavikavigupta

I was using the term error-free holistically, as in "software on planes never crashes so badly as to crash the plane" which isn't true of, say, many word processors. (If there's a better word that captures that concept, lmk so I can improve my communication)

N Minnov@nminnov

Agree. Good point on its own. However I think that: - most developers/orgs overestimate (not deliberately) the quality & completeness of their software. - the realization that “software doesn’t actually need to be that error-free” is NOT the reason why software quality is poor.

Kavi Gupta@kavikavigupta

I'd say it's not the developers but the customers who are making that determination, and setting up appropriate incentives

N Minnov@nminnov

Customers are definitely complicit but not equally culpable. Customers don’t (can’t) appreciate the costs of developing software. Developers don’t either and so overpromise & underdeliver. It’s a vicious cycle that results in mediocre (at best) software & missed expectations.

Kavi Gupta@kavikavigupta

I guess I just see it differently; I don't see it as necessarily being a bad thing; perfection isn't necessarily your goal, and I think software is an example of an area where good enough is good enough

Jason Frazzano@jason_frazzano

The chess documents been written. The real issue is none of the standards stake holders can agree/want a decent standard. Try writing a standard when the earthquakes and meteors from outer space are the standards authors

Carlos Cordero@CarlosMCorderoB

Hear, hear!

kele (Damian Bogel)@Keledev

An old (94') article about this: "The Professional Responsibilities of Software Engineers" by David Parnas.

kele (Damian Bogel)@Keledev

And here's a different viewpoint xion.org.pl/2013/11/03/dif… by @Xion__