See the entire conversation

The Case Against Python3… /me puts on fire suit and grits teeth.
196 replies and sub-replies as of Mar 22 2017

py2 example under "Statically Typed Strings": should read y = u"hello", no? Otherwise type(y) == type(x). No __future__.unicodelit.
I don't follow the Turing complete argument that tongue-in-cheek?
Did you read the note?
I do not understand the note. Turing Completeness is a established concept and is about what is *theoretically* possible.
Weird, Python 3 is turing complete? So, that means the VM could run python 2 code? Why do they say it can't?
Yeah, I just wasn't putting 2+2 together for some reason. Carry on!
Extreme sarcasm. But shocked at that 30% claim, had no idea.
Yep, there's been about two studies that put it around 30%. Time for a new plan.
On the flip side...almost everyone is on post-schism Ruby. The transition to "new Ruby" has been comparatively smooth.
I remember berating the Rubyists about Unicode in like 2005.
Python 2 had unicode support already
That "berating" is why Python fucked it up. Unicode is an image format, not the god of all text.
that has been exactly my feeling when reading through the Unicode specs, thank you for putting words to it.
FWIW, I think having Unicode & UTF-8 that Just Work is table stakes for any serious language.
Agree fully. Py3's implementation feels like punishment for not knowing this, which makes it just as broken.
this is misdirection. The question is how to transition. Ruby did it right. Python did it wrong.
And, glaringly obviously wrong, but if you tell them that they freak out and dig their heels in.
in retrospect I wish m17n had been formalized a bit more. It is hard to reason with at times…
I agree that the berating wasn't productive. Thankful the Ruby core team is thoughtful.
Gentle berating :) At RubyConf in 2005(6?) I evangelized & gave Matz a preprint of Unicode 4.
That can't be right. No modern language has strings that Just Work with Unicode astral planes.
Pick your favorite language and I guarantee you it gives the wrong str length for astral-plane code points
>>> len('💩') 1 python 3. not all languages have broken utf-16 "string" types
Python 3 has better Unicode support! But still kinda broken.
only kinda, though. :) thankfully unicodedata.normalize() is in the stdlib
No standard library in any language handles combining marks and grapheme clusters correctly.
Hey, that's new to me. @timbray I guess Apple just picked a default normalization form? How bad could it be? 😬
In conclusion: Zed's right (as always). Treat Unicode strings as opaque byte streams… 𝘫𝘶𝘴𝘵 𝘭𝘪𝘬𝘦 𝘪𝘮𝘢𝘨𝘦𝘴.
So wrong. There are 3 things it’s reasonable to do with a string: 1) ask how many bytes it takes to store it.
Ask how many bytes to store it? In which encoding? You know, how JPEG, GIF, and PNG all have different "bytes to store".
you shouldn't have to care about encoding (which should however be UTF-8). But "café".bytesize or equiv should work.
Well the Python folks think you should have to care about the encoding very very very much.
FYI, you no longer need to use ". " to reply publicly on Twitter. Instead, reply normally, then retweet your own reply.
2) Get a character iterator.
3) Ask what 2-D space it occupies in some particular display context.
#1 and #3 are what we do with image formats, which is why Zed's right. #2 doesn't work in any language you or I use.
Sadly, no. Try it with "Zalgo" text. Go returns code points, not symbols.
Show me code. We can test Ruby.
Here's a nice blog post with code examples.… Note 'ma\xF1ana' != 'man\u0303ana'
Haha, I blogged on Java’s sins (at excessive length)… in 2003:…
I don't find these articles interesting mostly because if there is a standard system it should just work?
They're bug reports. But the bugs are in specifications. We can't just "fix" JavaScript w/out breaking stuff
And the origin of this thread was Python killing itself with v3 just to get somewhat better Unicode support
Ewww, that’s appalling. Java is also broken although not quite so egregiously. Still gonna blog this…
Code points (except Surrogates) is the only safe/correct thing to return. Need to blog this, #2long4Twitter
Agree, but also need to be able to iterate over grapheme clusters.
With, I suppose, a normalization-form enum as an argument?
Every major lang has some kind of lib for grapheme clusters (but none have it in the standard library).
4/4) Characters are not units of storage, nor of input, nor of display. Deal with it.
Once you accept that, you don’t need a Char type anymore, your char iterator can just return smaller strings. Perl was right.
Which language has a correct character iterator that works with astral plane/grapheme clusters? (Nothing you or I use.)
We were arguing whether any language's standard lib "just works" for Unicode. I'd say Rust's doc concedes that it doesn't.
Pointer to that Rust doc?
Iterating over Unicode scalars is an understandable thing to do, but the doc has to warn you about its behavior.
I don't know what you mean with "just works". There's different use cases – depending on what you want you need to choose.
We only iterate over code points because it's easy to implement. It's a "worse is better" approach…
We started with this tweet. "Worse is better" is not "Just Works"
+@thobe +@zedshaw +@headius FWIW, I think having Unicode & UTF-8 that Just Work is table stakes for any serious language.
It wasn't worth having Python commit community suicide to get better-but-still-broken Unicode support. @zedshaw
I disagree. There is no "use case" for iterating over "n" and U+0303 COMBINING TILDE as separate entities.
Unless you need to match the sender’s hash of the contents. Or use a query interface with its own opinion on normalization…
We agree that hashes should use opaque bytes, not code points, scalers, or clusters.
So escaping HTML using numeric character references is not a use case?:…
Iterating over code points isn't always useless, but it's always better to use either bytes or grapheme clusters instead.
e.g. If converting an input UTF-8 string to ASCII via HTML-encoding, better to iterate over GCs and *output* code points.
So how would you output the code points of a GC? By iterating over its code points maybe?
Just out of curiosity, why that conversion? Most things that process HTML are UTF-8-savvy. In fact, most things generally.
I dunno, it was @niborst's example. (Or did I misinterpret it?)
So escaping HTML using numeric character references is not a use case?:…
There's also heaps of simple parsing use cases where it's useful. (With UTF-8 you can also do some of those on bytes.)
For any use case, bytes or grapheme clusters are better. Code points are only useful when CPs == GCs, or CPs == bytes.
Code points == grapheme clusters when there are no combining marks etc. Then CPs are useful because they're GCs.
(Well, except perhaps: you'd use code points to implement another, better, Unicode library)
yes, I'm not debating the importance of good Unicode support, just noting the effort it puts into encoding visual glyphs. ffi anyone?
U+FB03 is completely optional if you’ve a good font with support for automatic ligatures. Only there for backwds compat. /cc @timbray
?!?! BS. Unicode specifies no glyphs. The ones in the charts are examples. Glyphs live in fonts.
carefully choosing ones words AND fitting them in 140 chars. You are right, of course.
just because it contains fluff doesn't mean the crunchy bits aren't useful (?)
Visual glyphs are the one thing that Unicode explicitly does not care about. The weird stuff is just for compatibility. @timbray
I dunno, Ruby did pretty well the right thing, and still left room for non-Unicode encodings. No regrets.
It just wasn't a lot of fun to port it.
I will admit having the flexibility of multiple runtime encodings has mad many things easier for Ruby programmers.
It does suck for FFI at times though
unless you are github then you make it all ascii-8bit
Only easier for people doing weird corner cases; but those happen to include Matz’s personal ancient-scriptures sideline.
Python screwed up with unicode (overestimated the importance), but people cope
Many of the cases you call out will just work in Ruby, even with wildly different encodings.
I have been impressed that Ruby managed to blunt most string issues by having extensive encoding negotiation logic.
Yep, Ruby did this way better. Also transitioning to new Ruby worked better.
Tough to disagree. When using python 3 I find myself desperately .encode()ing things to fix random blowups. The struggle is real.
If you do that it's likely your py2 code has the same issues. No exception doesn't mean the code is correct. Encoding is worth understanding
yeah, my experience is that these encoding issues are time bombs waiting to go off, py3 forces you to do it properly
another case of "explicit is better than implicit" IMO, which is a core part of Python's philosophy
So important you have to smash everyone in the face with it over and over? Oh that's right, no magic (usability) allowed.
As opposed to silently ignoring mistakes? Sure it is.
Strawman. I didn't say "silently ignore mistakes", I said the Py3 string implementation sacrifices usability for punishment.
And I've got a stash of conference tags with my name printed in uniquely incorrect ways to prove that more smashing is required every day.
Also in Go & Rust, strings are Unicode. Bytes are just a different thing, as are ints. PHP mixes them all up, of course.
Why do you want to explicitly decode a string to an int? It prevents mistakes! Same for strings vs. bytes.
The view that strings are 'just' bytes is inherently US-centric, in my opinion.
Py2 str easier, but Py3 makes it easy to write working software. Py2 program often subtly broken. It's the worst!
The problem is the belief that this is because Py3 throws up hard to read exceptions at every turn. It can also be *usable*.
Can you elaborate? b'' + '' gives me a backtrace and "can't concat bytes to str" is clear to me. Beginners hardly use bytes anyway.
Which variables?
That would be nice, I agree. But py2 doesn't have that either for its typing exceptions. Clang/rust/ghc are great for it though!
So you're saying that other languages do this better? Sounds like people might just go use those other languages.
In my opinion, Python should be insanely proud that it lets people write programs that work. Definitely more so in Py3.
Zed, I usually appreciate your writing, but that is... not up to your usual standard of quality. "Not Turing-complete"? Really?
Did you read the note?
Yeah. It left me more confused why you framed the issue that way in the first place. :-/
So, either they broke Python 3 and made it not-TC, or they're lying and taking advantage of people.
B/c they lie to people claiming that everyone has to manually convert code b/c it's impossible to run both.
I think there's a legit criticism buried in there, but phrasing it that way obscures your meaning.
And I think that "how" should be "have"
Probably, but I have a raging fever right now so will just fix it tomorrow if I don't die first. Thanks!
that Turing incompleteness is astounding. Seems to fly in the face of math and professionalism.
Oh Python 3 is totally Turing Complete, but the people working on it still think you can't run 2 and 3, or translate 2 to 3.
from what you said about 2to3 as well seems like they’re deliberately misinforming ppl and crippling tools. 15% success w/ 2to3👎
85% success, 15% failure, but yes it is weird it works so terribly.
sorry, read that too fast. Still, both languages are under their roof. I’m guessing Guido is not in charge of 3?
No idea, but yes. It's not like they're required to run Python 2 on Brainfuck or use a VM they don't control.
I don't know if you caught this, but you've co-opted "Turing Complete" to mean something it does not.
it would be great if you edited the post to fix misinformation; I have no problem with your points against Python 3.
Are you working on How to Win Friends the Hard Way?
If you read the fanmail I get you'd realize I have the exact right kind of friends.
you're just trolling computer scientists at this point. Py3 is obviously Turing complete. Doesn't take much to make it so.
Correction: You don't need to 'make' Python 3 Turing complete. It is Turing complete.
It is? Then why do they claim it's impossible to run Python 2 under the Python 3 VM?
It's perfectly possible with a translation layer, but interop with py3 would be painful, akin to FFI. That's why nobody makes it.
So then not TC right? If you can't compile Py2 code to Py3 byte code and design that to work then they broke it.
TC has nothing to do with this. It's not as if you can easily interop between C# and Haskell, but they're both TC.
I actually said that already, so it seems you need to RTFA.
You repeatedly write that py3 is not Turing compete. It's even one of your sections title. Good troll, but you're actively misleading.
Did you read the note?
The "lol, joking, but no really, Python developers say it's not Turing compete" note? I don't think it explains anything. Still misleading.
Sorry, your sarcasm was genuinely lost on me. I get it now.
LOL, no problem. But, keep in mind, many Python project members actually believe this is impossible. YMMV.
You never say in your article that Python 3 is actually Turing complete. Could definitely use more sarcasm markers.
But then how would I spot the people who can't read?
Beginners don't understand all details; they'd think you're making a serious point about Py3 not being an actual prog language :)
typo: "Python 3 comes with a tool called 2to3which is supposed to take Python 3 code and translate it to Python 2 code." See it?
Thanks. Fixed it.
Does anyone actually use it in production yet?
Yes we do. Actually we have both 2 and 3 running on prod.
About 30% apparently, but I think that's mostly new entrants to the language as experienced coders will just jump ship.
huh, that many. Oh, out of curiosity - know of any good expect/tcl books out there?
Expect? I think there's 1 book on it. One ancient O'Reilly book.
Hrm. Go figure. Just like trying to find UUCP books.
I had that book, and sent it away. Rewriting it to use ssh looked useful.
No. No one I know certainly.
Many do, Most new projects do. It's an excellent language, Python3 fixes most of the issues in Python2.
He's also comically wrong on a lot of parts, like Turing completeness. Makefiles are Turing complete, it's a low bar.
Wait, Python 3 is turing complete? Then why do they claim it's impossible to run Python 2?
It's as Turing complete as a C-based language can get (if you ignore the finite space/type requirements).
I could run Py3 with a CMake backend, but just because it hasn't been done **yet** doesn't mean it **couldn't** be done.
Uhhhhh isn't CMake written in Python?
No, it's C/C++. you're confusing it with Scons or Waf, which are build systems using Python.
Ah yes, Scons. Oh man what a piece of crap.
If you ignore that, Py2 is just as Turing complete as Py3 is. Someone could write a tool to execute Py2 code from Py3.
Really? But, Python people say that's impossible. Keep in mind, "impossible" is the word used.
If they meant mathematically, they're wrong. If they meant it's a lot of work, then they just don't aim to support it.
And the scary part is nobody sat to think it through until I made a joke about Py3 not being TC.
But even if the Turing complete part is a joke, I still have a few more issues with the blog post.
Learning Python the hard way, I was taught to find creative ways to make and break programs.
A lot of what got me in early programs was code that silently worked, but failed because it made no sense.
Unicode + string concatenation, sorting strings with ints, many things that worked normally but failed when I scaled up.
It all failed immediately when I tried to run it in Python3, which is the strength of a strongly-typed language.
It might take 5 more minutes to learn, but when finding creative ways to break programs, shouldn't unicode/bytes be crucial?
Yes, but it can be crucial *and* easy to use. Python has this habit of taking important things and making them abusive.
But isn't explicitly failing when you do dangerous things, such as operations on different types, easy to use?
It's a matter of degrees. Adding bytes+strings is not exception worthy. Making the exception lack all var names is abusive.
Then when I point out TC they jump to ABI. When I point out F#/C#, JRuby/Java, they claim classes.
I have literally sat there and had numerous people claim it's mathematically impossible. Not shitting you.
Then they're simply wrong. I'm not doubting it would be a copious amount of work, the low-level interface has changed a lot.
You're almost there: So who changed this low level interface? Who was it that made manual migration necessary?
Python core developers. As Python2 as it's gotten larger and larger has changed dramatically. Old classes, exceptions.
If you have a language with many exceptions to the core rule, is it really beginner friendly?
It would be a shame if you added a fact to this. This is arrogant drivel stated by someone with no knowledge on the subject.
Your book taught me to program and I still use python 2 for some scripting. But I've switched to Go coz python3 sucks so bad.
Yep, that's been my feeling too. Watching people shoot the messenger when I try to tell them that is more proof.
You could have just said "upgrading is too hard, no one cares, and Unicode is ugly" and dropped the sarcasm. It hurts your point.
Thanks for the writing advice.
(Disclaimer: I'm a PHP guy still very new to Python, but I ran the GoPHP5 effort back in 2007.)
I unfortunately have to agree. And it’s a shame — such a good language but now looking else where
Python sucks. I learned 2 from @zedshaw then realized it was a slow moving trainwreck w/3. Then got a job writing C#. Go is good too
Thx for taking the time and explaining this, it took a big pressure off my shoulders. It was too confusing.
When Python 3 defeated Python 2, @gvanrossum missed the opportunity to cast del into the fires of Mount Doom.
Nice writeup. There's a typo at "disconnect the particulars of any langauge's syntax from the execution requirements": langauge
does "I'm ready to move on" mean... *gasp* no more rants on Python 3 from you?
Probably not. I mean there's always assholes like you around right?
I see nothing in that post that would justify such a rude reply. You seem to suggest that Py3k users are assholes? Is this correct?