What do you call the parts of a story? Or: why can’t journalists spell “lead”?

I’ve been working as a journalism-adjacent programmer for some time. It’s an area I find very rewarding, but no job is without its downsides. Let’s face it: for people whose job involves writing professionally, journalists are bad at spelling.

“Hed”, “lede”, and other bits of jargon are just part of the problem. The deeper issue is that every publication has its own nomenclature, and jargon has drifted in meaning since the switch to web publication. “Slug”, for example, began as a literal slug of lead melted into a row of letters by a Linotype machine. Now, it generally refers to a “short label for something, containing only letters, numbers, underscores or hyphens”, and it might mean the keywords in a URL or an internal ID system used by a publication for editorial workflow. Those two meanings overlap but aren’t actually the same, which leads to confusion when developers talk to journalists and editors.

I have been working on the website for Spotlight PA, and I wanted to try to give the parts of an article more-or-less standard names in our CMS. There was only one problem: just what are the standard names? To find out, I made a survey and asked in chatrooms and on Twitter for other news nerds to fill it out.

Read more…

Link: Maciej Cegłowski · Privacy Rights and Data Collection in a Digital Economy

For sixty years, we have called the threat of totalitarian surveillance ‘Orwellian’, but the word no longer fits the threat. The better word now may be ‘Californian’. A truly sophisticated system of social control, of the kind being pioneered in China, will not compel obedience, but nudge people towards it. Rather than censoring or punishing those who dissent, it will simply make sure their voices are not heard. It will reward complacent behavior, and sideline troublemakers. It’s even possible that, judiciously wielded, such a system of social control might enjoy wide public support in our own country.

But I hope you will agree with me that such a future would be profoundly un-American.

The Ethical-Trained Programmer Sells Out

As all tech industry observers are aware, the twin pillars of a successful startup are cost shifting (i.e. user generated labor) and regulatory arbitrage (i.e. tax avoidance).

For years, Facebook had its users write its content for it, and Amazon has avoided sales tax. Google’s famous “PageRank” algorithm gets every website in the world to do the hard work of ranking links for them, and Apple successfully avoided billions of dollars in taxation at the cost of a few awkward meetings. The greatest unicorns do both, like Uber and AirBnB, which avoid fare taxes and hotel taxes respectively, while pushing the costs and risks of car and property ownership onto their “partners.” YouTube and Instagram have even managed to work around child labor laws by the simple expedient of not paying their content producers. It’s win-win!

Looking at this competitive landscape, it is clear that if The Ethically-Trained Programmer is ever going to “exit”, I need to find a novel tax to scofflaw and a way to force my inventory costs onto my “partners”.

Read more…

Link: John Millikin · No Haunted Forests

Fresh graduates often push for a rewrite at the first sign of complexity, because they’ve spent the last four years in an environment where codebase lifetimes are measured in weeks. After their first unsuccessful rewrite they will evolve into Junior Engineers, repeating the parable of Chesterton’s Fence and linking to that old Joel Spolsky thunkpiece about Netscape3.

Be careful not to confuse this reactive anti-rewrite sentiment with true objections to your particular rewrite. Remind them that Joel wrote that when source control meant CVS.

3. The real reason Netscape failed is they wrote a dreadful browser, then spent three years writing a second dreadful browser. The fourth rewrite (Firefox) briefly had a chance at being the most popular browser, until Google’s rewrite of Konqueror took the lead. The moral of this story: rewrites are a good idea if the new version will be better.

The Joel article bugs me for a number of reasons, and the fact that his core example is totally demonstrably historically wrong is one of them.

Rewriting software is something you should only do if you can answer the question, “Why will it be better this time?”

New programmers to a code base think the answer is “because the last people working on this were idiots.” Sometimes that’s true! If you’re inheriting, e.g., some PHP code written by people who weren’t really web developers but just learned by copy-pasting, you might be smarter-enough to have a rewrite work. However, that’s the exception, and typically, you’re not any smarter than the people who wrote the code in the first place, so you’re not going to do any better.

Another answer that’s possible but unlikely to be correct is “they made bad technology choices.” Yes, some technology choices really are so bad that correcting them can make a difference, but moving, e.g., from Rails+MySQL to Node+Mongo is unlikely to make things better.

A good answer is “the business requirements they built this product for no longer apply.” For example, Microsoft Office vs. Google Docs or iTunes vs. Spotify. The former product in both cases is more robust, more complete, technically capable of doing it all, but the latter product by virtue of not having to do things that are no longer necessary can be radically simplified.

Anyway, John Millikin is correct: no haunted forests!

More Than a Dozen Command Line Tools I've Written—and So Can You!

Way back in 2013, I wrote Google Go: The Good, the Bad, and the Meh, which means I now have more than the job-recruiting-required “at least five years of experience” using Go. But I don’t want to write about that. I want to write a little about a baker’s dozen of the many small tools I’ve written in Go to scratch personal itches since then.

Most of these programs were the result of some passing enthusiasm, now mostly forgotten. They tend to be work-adjacent but not actually part of my direct job responsibilities. (That is to say, I was never asked to write any of these by a boss, and probably my bosses would see them as a waste of time if they knew about them.) Some of them I use a lot, and others I wrote and then forgot about completely. Mostly though, they’re just fun to write and satisfying to look back on.

Read more…

Git Standards

Note: At my old job, I helped write up our Git standards, but I lost access to that document when I changed jobs. To keep track of my thoughts on Git in perpetuity, I am posting them publicly here.

Git is the democracy of programming: it is the worst tool for version control, except all those other tools that have been tried from time to time. Among other problems, Git’s command names are obtuse (what is a “rebase”?) and non-orthogonal (how is “resetting” different than “checking out”?), and Git generally impedes the creation of an accurate mental model, but without understanding its underlying directed acyclic graph, one can’t move from beginner to intermediate user. Still, it is a necessary tool in every developer’s toolbox, and following good Git practices leads to smoother, more productive development. This guide assumes you already know how to use Git and discusses some of the higher level issues around standards for collaboration.

 

If that doesn't fix it, git.txt contains the phone number of a friend of mine who understands git. Just wait through a few minutes of 'It's really pretty simple, just think of branches as...' and eventually you'll learn the commands that will fix everything.

XKCD on Git

Read more…

Link: What Is Jason Goldstein? · I’m too stupid for AsyncIO

My friend and former colleague Jason Goldstein has a great article up about the problems with Python’s asyncio framework.

For what it’s worth, when I was at PBS, a different coworker and I tried to do a test project to learn how to write asynchronous code. We wrote scripts in both Python 3 and Go that would go onto Github, get a list of users on our project, and download their personal repo information concurrently. When we finished, we compared the apps to see the strengths and weakness of the languages.

Both apps ended up working (although the Python app cheated in a few ways, for example by ignoring paginated responses), but I found the Go app to be easier to write than the Python app, even though it was significantly more verbose. One of the biggest problems for the Python app was just finding documentation that I could understand and apply. In Go, the main problem was that you’re writing the concurrency scaffolding yourself, so it’s easy to write a spaghetti mess if you let yourself. In Python you more often run into the problem that doing something concurrently is a pain, so you do it in a blocking manner even when you shouldn’t. For example, really you should be collecting links asynchronously and adding new links to a queue as you go, but it turns out to be easier to do things one at time, even if that’s less efficient.

How I Build My Static Assets for Hugo

Update: This post is obsolete as of Hugo 0.43. See Regis Philbert’s great summary of best practices instead.


Hugo is a great static site generator written in Go. I use it for this blog. Its advantages are that it’s very fast, very easy to set up, and very flexible, but its disadvantage is that it doesn’t have the mature community support that Jekyll has. One example of that is that Hugo has no particular recommended route for managing a static asset pipeline. In this post, I’d like to explain how my personal pipeline works to see if it can help other Hugo users.

Read more…

Tracking Users is Bad for Advertisers

At its Worldwide Developers Conference on June 5, Apple announced that one of the tentpole features of macOS High Sierra will be anti-ad tracking technology:

Intelligent Tracking Prevention in Safari uses machine learning to identify and remove the tracking data that advertisers employ to follow users’ web activity.

At first glance, this may seem to be bad for Google and other online advertisers. However, that perception is mistaken.

Read more…