See the entire conversation

A tale from the @Tailscale build system: @apenwarr: we could use git smudge for this me: that's not a thing google: it's a thing @dave_universetf: when in doubt, use more git features @bradfitz: there are a few we're not using yet! @apenwarr: *apologizes in Canadian"
72 replies and sub-replies as of Feb 22 2020

You rolled your own build system? Is it all bash and make? 😏 or is this a shiny new thing y’all built?
Redo, so kinda all bash? 😜
Yes it's all bash: redo.readthedocs.io/en/latest/ That bit is easy and works well. It's the git submodules that get complicated.
Why is it that wherever there’s use of git submodules, there’s complexity in these things? 😏
The problems are caused by having multiple lockstep git repos in the first place (in this case our corp and opensource repos, and wireguard’s repo). git submodules are just one of the weak and buggy tools for supporting that. So are go modules.
The weakness (and/or rarity) of submodules cascades into other tools. Consider: if we checked in our wireguard-go staging repository into our public tailscale repository with subtree, then go.mod replace directives would "just work" for us.
(That said, it does feel like there's a missing go.mod feature in here somewhere. Not sure what it is yet. Generally multi-module development feels awkward.)
We would have done that by removing the sub go.mod files, and we wouldn’t be able to rebase wireguard at all. It would be a hard fork of wireguard-go. Similarly if we just tossed opensource tarballs over the wall periodically these problems would go away, but we’d have new ones.
Comparing replacing rebasing to manually diffing trees to create patches for upstream/downstream to tossing tarballs is a false equivalence. Real OSS projects do manual diffs all the time. In fact, I suspect it's more common than submodules. Lockstep multi-repository is rare.
It's not equivalence, it's just the set of available options. This whole thing is not really about the tools; nobody has ever adequately solved this class of problems, anywhere. It would make a good topic for a very ranty (and very long) blog post.
The subtree option is just a monorepo. I know how monorepos work with open source: devs import a project once, patch it sometimes, and it never sees the outside world again. One 20%er writes a tool to occasionally extract tarballs from third_party and throw them over the wall.
OS package managers are actually a variant of this. So is the whole "monolithic app" to "microservices" spectrum. It's all dependency management and version skew, all different ways, each unsatisfying. One can totally see how the Go team concluded static linking is a good idea.
In all the conversations about monorepos I've had with Googlers (and I'm mostly sold on the idea), I've never had a good apology for this part. “Upgraded only when urgently needed for a project likely to result in promotion” is how I'd describe it 😂
And by “upgraded” I mean “imported again at a new version, under a different path”. Do _you_ want to make sure you didn't break _every existing piece of code_ that uses the old version of libfoo?
Exactly. But at least the monorepo+tests give you a way to actually detect all the stuff you're about to break. When you remove the monorepo, you remove the problems of lockstep, but this creates other problems (like security holes not getting patched everywhere).
I have this conversation once a week here. Most of the arguments are, “we could move faster if we were just in our own repo.” IMO, the best argument for 1000 individual repos with semver is that people are now used to it, because that's how open source code works.
The problem with how open source code works is it assumes stable-ish APIs, where breaking changes are uncommon. I want to write 3-4 cross-repo API breaking changes every day. The tools aren't there. Avery is trying to make some and I get to be the guinea pig. (Hence grumbling.)
Yeah. The conclusion I've reached is that while we're all capable of full version discipline, backporting fixes, and defined version support windows, it's a lot of work, and monorepos let you avoid it. Our @CashApp folks are open-sourcing much more, so have “polyrepos”…
They have polyrepo scripts which understand the topology, and can move sub-projects in and out of active development (vs import as a library), by tweaking IDE config, module overrides, etc. It's mostly convenience rather than rigor, but seems to work for them…
I suspect for a @Tailscale-sized team, having a master repo that includes such tools, and also acts as a global version sequence by stamping in versions of all sub-projects might work, despite being completely unsound (because you might merge sub-project PRs in interleaved order)
Hello and Welcome, I will be the commentator for your fall down the rabbit hole today. Please note that the sharp jagged rocks at the bottom are composed of git submodules. Good luck on the mid-air sewing of your parachute, we're all cheering for you!
What you're talking about is basically our infernal combination of git submodules (global version sequence) and go modules (works with open source). It's not all roses. It is nominally theoretically sound, which I guess is more than I would have claimed yesterday.
If I merge a meta-PR across sub-repos A,B,C and you merge one across B,C,D and my B PR merges ahead of yours, but my C PR merges after, then the global sequence needs to know to use max(the versions I asked for, the versions you asked for), right?
With a few developers (and thus relatively rare interleaves), that seems fine. With enough sub-repos and/or enough developers, every PR will interleave and you won't be able to find the max faster than new (interleaving) PRs move it again. That's just a guess though :-)
Type typescript/react, Swift, and Kotlin repos are going to make it even more fun…
Actually I'm not too worried about that. I'm pretty satisfied at redo's ability to glue together very disjoint platform tools. And if we can make all this submodule crap work reliably once, specific app platforms shouldn't make it any worse. At the bottom it's just files.
Oh, are you using git smudge for the go module rewrites for local development? I guess that would work…
Oh no, you guessed it. That means your git brain worms have made far more progress than I anticipated. My condolences.
The @Tailscale interview process is just going to be Avery explaining the build system to you for four hours, and then if you can explain it back, you're in
Narrator: nobody was in
A bit more reading, and figuring out why perfectly normal shell functions result in syntax errors, and I may finally be hireable!
(Though in redo's defense, now that I've internalized the few rules it has, I can reason about what's happening just fine. Can't say that about most build systems I've used.)
Starting out my career as a consultant, 90% of the pain we caused ourselves was reading in the docs that a piece of software had a capability, and then using it.
Eventually I learned the difference between the beaten path and the path that beats you.
I know what you mean. But if you can believe it, all the other non-git-smudge workaround options in this case were even more horrible. "Forget git submodules" and "forget go modules" were really tempting (and simplifying) but not willing to sacrifice that much quite yet.
Have you looked at copybara? It seems to be pretty decent nowadays for doing more than just "drop a tarball".
Have you considered git-subrepo? Or this: github.com/grailbio/grit Android’s repo tool? source.android.com/setup/develop Jiri? fuchsia.googlesource.com/jiri/ I’m sure y’all know all that “Googly” stuff. It’s just that I’d avoid submodules at all cost :)
Source Control Tools  |  Android Open Source Project
(no description)
source.android.com
I've tried most of them. They each have tradeoffs, as does the "solution" we currently use. You might as well add my own tools, git-subtree and git-subtrac, to the list. This will continue to suck until someone spends time to really fix all the git bugs. Not me, not today.
Yeah, looks like life is all about trade offs ¯\_(ツ)_/¯
Shall there be an awesome-repo-tooling list with all that stuff somewhere on GitHub or something like that? :)
Oh, if the name is going to have "awesome" in it you'll be needing to exclude my tools after all. :)
I thought that “awesome” has to be in quotes, right after publishing the tweet 😂
I am unreasonably happy that y'all are using redo. I imagine “because Avery” is a common reason for lots of things at @Tailscale… 😂
To be clear, our multi-platform corp repo (which builds deb/rpm, Swift binaries, Windows installers, etc) uses redo. Our open source go tools (for whatever platform) intentionally avoid the need for any non-go build system. Oddly *that* separation works pretty well.
Yeah, we're using vanilla Go for our Go monorepo, and it works well. Just like git, it's too good of a local maximum to be replaced by anything. Although I could see us moving to gazelle+bazel for build sharding, as long as Go devs don't need to wrangle BUILD files.
Tell me more :) I’ve never heard about this redo thing. Will take a look. But at the moment I’m sooo frustrated by the existing build systems 🤦‍♂️ Seems like only Bazel got it right, but I still can’t convince people to run scripts to generate BUILD files.
Update: I used git smudge to solve my problem. Now I have two problems: the one in the worktree and the one in the repo.
I love it when twitter is your corporate slack.
I thought about checking in with press@ before tweeting but we haven't set up that alias yet.
First gotta write that new mailserver.
What about the state of the art HVAC control plane
yeah we all win :)
How many people I like are working there???
This gives me an idea for the name of our upcoming corp blog: Tale Scale. Or Scale Tales.
Six Apart had Foo Bar. Google had TGIF. I proposed we have Tail Ale.
Maybe @ESBAle can make you a company beer
I was angry that I didn't my invite and almost registered stalekale[dot]com to insult. @apenwarr saved me from the trouble :D
Updating the title right now.
/me looks up `git smudge` documentation… Oh hell no! [Later] Oh hell yes, I know what I can use… smudge/clean is so on-brand for git's particular type of horribleness, now that I know they exist, I would be disappointed if they didn't
You have to give them credit for the name though. There's "clean" files (unmodified) and "dirty" files (modified vs repo). But we want a file that's kinda modified, but doesn't show up as modified, but it's kinda got some dirt hidden under it, so...
(Useless trivia: filters were originally introduced to satisfy the CVS weirdos who wanted $Id$ and other special strings to be expanded when they check files out of the repo, then contracted when they check them back in, just like CVS used to do.)
(This wasn't hard to implement in cvs because "cvs status" would literally just re-extract every file, one by one, and compare it to the one on disk, so it was no big deal to expand a string while you're there. cvs was... slow.)
(svn "optimized" this by storing an extra copy of the original checked-out version of every single worktree file in the .svn directory. So it wasn't too hard for them to implement this feature either.)
What is this specific smudge usecase? $Id$ ?
Auto-adding 'replace' clauses in go.mod under certain conditions. I would not really recommend doing this.