If You Use Unix, Use Version Control

If you’ve used Unix (or Linux. This applies to Linux, and MacOS X, and probably various flavors of Windows as well), you’ve no doubt found yourself editing configuration files with a text editor. This is especially true if you’ve been administering a machine, either professionally or because you got roped into doing it.

And if you’ve been doing it for more than a day or two, you’ve made a mistake, and wished you could undo it, or at least see what things looked like before you started messing with them.

This is something that programmers have been dealing with for a long time, so they’ve developed an impressive array of tools to allow you to keep track of how a file has changed over time. Most of them have too much overhead for someone who doesn’t do this stuff full-time. But I’m going to talk about RCS, which is simple enough that you can just start using it.

Most programmers will tell you that RCS has severe limitations, like only being able to work on one file at a time, rather than a collection of files, that makes it unsuitable for use in all but a few special circumstances. Thankfully, Unix system administration happens to be one of those circumstances!

What’s version control?

Basically, it allows you to track changes to a file, over time. You check a file in, meaning that you want to keep track of its changes. Periodically, you check it in again, which is a bit like bookmarking a particular version so you can come back to it later. And you can check a file out, that is, retrieve it from the history archive. Or you can compare the file as it is now to how it looked one, five, or a hundred versions ago.

Note that RCS doesn’t record any changes unless you tell it to. That means that you should get into the habit of checking in your changes when you’re done messing with a file.

Starting out

Let’s create a file:

# echo "first" > myfile

Now let’s check it in, to tell RCS that we want to track it:

# ci -u myfile
myfile,v <-- myfile
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> This is a test file
>> .
initial revision: 1.1

ci stands for “check in”, and is RCS’s tool for checking files in. The -u option says to unlock it after checking in.

Locking is a feature of RCS that helps prevent two people from stepping on each other’s toes by editing a file at the same time. We’ll talk more about this later.

Note that I typed in This is a test file. I could have given a description on multiple lines if I wanted to, but usually you want to keep this short: “DNS config file” or “Login message”, or something similar.

End the description with a single dot on a line by itself.

You’ll notice that you now have a file called myfile,v. That’s the history file for myfile.

Since you probably don’t want ,v files lying around cluttering the place up, know that if there’s a directory called RCS, the RCS utilities will look for ,v history files in that directory. So before we get in too deep, let’s create an RCS directory:

# mkdir RCS

Now delete myfile and start from scratch, above.

Done? Good. By the way, you could also have cheated and just moved the ,v file into the RCS directory. Now you know for next time.

Making a change

All right, so now you want to make a change to your file. This happens in three steps:

  1. Check out the file and lock it.
  2. Make the change(s)
  3. Check in your changes and unlock the file.

Check out the file:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.1 (locked)

co is RCS’s check-out utility. In this case, it pulls the latest version out of the history archive, if it’s not there already.

The -l (lower-case ell) flag says to lock the file. This helps to prevent other people from working on the file at the same time as you. It’s still possible for other people to step on your toes, especially if they’re working as root and can overwrite anything, but it makes it a little harder. Just remember that co is almost always followed by -l.

Now let’s change the file. Edit it with your favorite editor and replace the word “first” with the word “second”.

If you want to see what has changed between the last version in history and the current version of the file, use rcsdiff:

# rcsdiff -u myfile
RCS file: RCS/myfile,v
retrieving revision 1.1
diff -u -r1.1 myfile
--- myfile 2016/06/07 20:18:12 1.1
+++ myfile 2016/06/07 20:32:38
@@ -1 +1 @@

The -u option makes it print the difference in “unified-diff” format, which I find more readable than the other possibilities. Read the man page for more options.

In unified-diff format, lines that were deleted are preceded with a minus sign, and lines that were added are preceded by a plus sign. So the above is saying that the line “first” was removed, and “second” was added.

Finally, let’s check in your change:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> Updated to second version.
>> .

Again, we were prompted to list the changes we made to the file (with a dot on a line by itself to mark the end of our text). You’ll want to be concise yet descriptive in this text, because these are notes you’re making for your future self when you want to go back and find out when and why a change was made.

Viewing a file’s history

Use the rlog command to see a file’s history:

# rlog myfile

RCS file: RCS/myfile,v
Working file: myfile
head: 1.2
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 2; selected revisions: 2
Test file.
revision 1.2
date: 2016/06/07 20:36:52; author: arensb; state: Exp; lines: +1 -1
Made a change.
revision 1.1
date: 2016/06/07 20:18:12; author: arensb; state: Exp;
Initial revision

In this case, there are two revisions: 1.1, with the log message “Initial revision”, and 1.2, with the log “Made a change.”.

Undoing a change

You’ve already see rlog, which shows you a file’s history. And you’ve seen one way to use rcsdiff.

You can also use either one or two -rrevision-number arguments, to see the difference between specific revisions:

# rcsdiff -u -r1.1 myfile

will show you the difference between revision 1.1 and what’s in the file right now, and

# rcsdiff -u -r1.1 -r1.2 myfile

will show you the difference between revisions 1.1 and 1.2.

(Yes, RCS will just increment the second number in the revision number, so after a while you’ll be editing revision 1.2486 of the file. Getting to revision 2.0 is an advanced topic that we won’t cover here.)

With the tools you already have, the simplest way to revert an unintended change to a file is simply to see what the file used to look like, and copy-paste that into a new revision.

Once you’re comfortable with that, you can read the manual and read up on things like deleting revisions with rcs -o1.2 myfile.

Checking in someone else’s changes

You will inevitably run into cases where someone changes your file without going through RCS. Either it’ll be a coworker managing the same system who didn’t notice the ,v file lying around, or else you’ll forget to check in your changes after making changes.

Here’s a simple way to see whether someone (possibly you) has made changes without your knowledge:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.2 (locked)
writable myfile exists; remove it? [ny](n):

In this case, either you forgot to check in your changes, or else someone made the file writable with chmod, then (possibly) edited it.

In the former case, see what you did with rcsdiff, check in your changes, then check the file out again to do what you were going to do.

The latter case requires a bit more work, because you don’t want to lose your coworker’s changes, even though they bypassed version control.

  1. Make a copy of the file
  2. Check out the latest version of the file.
  3. Overwrite that file with your coworker’s version.
  4. Check those changes in.
  5. Check the file out and make your changes.
  6. Have a talk with your coworker about the benefits of using version control..

You already know, from the above, how to do all of this. But just to recap:

Move the file aside:

# mv myfile myfile.new

Check out the latest version:

# co -l myfile

Overwrite it with your coworker’s changes:

# mv myfile.new myfile

Check in those changes:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.3; previous revision: 1.2
enter log message, terminated with single '.' or end of file:
>> Checking in Bob's changes:.
>> Route around Internet damage.
>> .

That should be enough to get you started. Play around with this, and I’m sure you’ll find that this is a huge improvement over what you’ve probably been using so far: a not-system of making innumerable copies of files when you remember to, with names like “file.bob”, “file.new”, “file.new2”, “file.newer”, and other names that mean nothing to you a week later.

There Are Days When I Hate XML

…and days when I really hate XML.

In this case, I have an XML document like

<?xml version="1.0" ?>
<foo xmlns="http://some.org/foo">
  <thing id="first"/>
  <thing id="second"/>
  <thing id="third"/>

and I want to get things out of it with XPath.

But I couldn’t manage to select things exactly, such as the <foo> element at the top. You’d think “/foo” would do it, but it didn’t.

Eventually I found out that the problem is the xmlns=”...” attribute. It looks perfectly normal, saying that if you have <thing> without a prefix (like “<ns:thing>”), then it’s in the “http://some.org/foo” namespace.

However, in XPath, if you specify “ns:thing”, it means “a thing element in whichever namespace the ns prefix corresponds to”. BUT “thing” means “a thing element that’s not in a namespace”.

So how do you specify an element that’s the empty-string namespace, as above? The obvious way would be to select “:thing”, but that doesn’t work. Too simple, I suppose. Maybe that gets confused with CSS pseudo-selectors or something.

No, apparently the thing you need to do is to invent a prefix for the standard elements of the file you’re parsing. That is, add “ns” as another prefix that maps onto “http://some.org/foo” and then select “ns:thing”. There are different ways of doing this, depending which library you’re using to make XPath queries, but still, it seems like a giant pain in the ass.

Would You Get Off the Plane?

There’s an old joke about an instructor who asks, “if you were on a plane, and found out that the inflight systems were controlled by a beta version of software written by your team, would you get off the plane?” Most of the students pale and say yes. One person says, “No. If it were written by my team, it wouldn’t make it to the runway.”

The implication is that most software is far crappier than people realize, and that programmers are keenly aware of this. But I’d like to present another reason for getting off the plane:

Even if my team writes good code, the fact that it’s still in beta means that it hasn’t been fully tested. That means that there must have been some kind of snafu at the airline and/or the software company for untested software to be put in control of a flight with ordinary passengers. And if management permitted this error, what other problems might there be? Maybe the fuel tanks are only half-full. Maybe the landing gear hasn’t been inspected.

DD-WRT Config Backups

So the other day I managed to hose my DD-WRT configuration at home, badly enough that I figured I really ought to back up my config so I don’t wind up trying to reconstruct my config from memory.

(If you just want to see the script, you can jump ahead.)

I have nightly backups of my desktop (and so do you, right? Right?!) so I figured the simplest thing is just to copy the router’s config to my desktop, and let it be swept up with the nightly backups. So then it’s just a question of getting the configuration and such to my desktop.

In DD-WRT, under “Administration → Backup”, you can do manual backups and restores. This is what we want, except that we want this to happen automatically.

The trivial way to get the config is

curl -O -u admin:$PASSWORD http://$ROUTER/nvrambak.bin


wget --http-user=admin --http-password=$PASSWORD http://$ROUTER/nvrambak.bin

where $ROUTER is your router’s name or IP address, and $PASSWORD is your admin password. But this sends your password over the network in the clear, so it’s not at all secure.

Instead, let’s invest some time into setting up ssh on the router; specifically, the public key method that’ll allow passwordless login.

Go follow the instructions there. When you come back, you should have working ssh, and a key file that we’ll use for backups.

Back? Good. Now you can log in to the router with

ssh root@$ROUTER

and entering your password. Or you can log in without a password with

ssh -i /path/to/key-file root@$ROUTER

Once on the router, you can save a copy of the configuration with nvram backup /tmp/nvrambak.bin. So let’s combine those last two parts with:

ssh -i /path/to/key-file root@$ROUTER "nvram backup /tmp/nvrambak.bin"

And finally, let’s copy that file from the router to the desktop:

scp -i /path/to/key-file root@$ROUTER:/tmp/nvrambak.bin /path/to/local/nvrambak.bin

So the simple script becomes:

ROUTER=router name or IP address

$SSH -4 -q ${RSA_ID} root@${ROUTER} "nvram backup /tmp/nbrambak.bin"
$SCP -4 -q ${RSA_ID} root@${ROUTER}:/tmp/nbrambak.bin ${LOCALFILE}

I’ve put this in the “run this before starting a backup” section of my backup utility. But you can also make it a daily cron job. Either way, you should wind up with backups of your router’s config.

I Don’t Want Flying Cars; I Just Want Working Bluetooth

I love Bluetooth. I love that it’s supported on all my various electronic gadgets, and lets them talk to each other and exchange information, be it streaming audio data, or a text note, or what have you.

Or at least I love the idea of Bluetooth. The unfortunate reality is that the implementations that I’ve seen never quite live up to the ideal.

For instances, it often takes several attempts to pair two devices, even when they’re two feet from each other. Sometimes devices disconnect for no obvious reason, or seem to become unpaired without me doing anything.

And then there’s the stuttering, which might be related. I have yet to find a Bluetooth headset, speaker, or other audio receiver that doesn’t stutter for five minutes until it finds it groove. In fairness, after the initial five minutes, things tend to stay pretty stable (at least until I, say, move my phone five feet further from the speaker, at which point, they need to resync). But if it’s a matter of the two devices negotiating, I don’t know, frequencies and data throttling rates and protocols, why don’t they do it at the beginning? Or is it a TCP thing, where the two start out using little bandwidth and ramping up over time?

Lastly, there are the tunnel-vision implementations. From what I’ve seen, the Bluetooth standard defines roles that each device can play, e.g., “I can play audio”, “I can dial a phone number”, “I can display an address card”, “I can store files”, and so forth. But in practice, that doesn’t always work: my cell phone sees my desk phone as an earpiece, and earpieces can’t handle address cards, don’t be silly, so I can’t copy my cell phone’s contact list to my desk phone.

In the age of the Internet of Things, my desk phone can store contacts, my TV can run a browser, and pretty soon my toaster will be able to share its 5G hotspot with the neighborhood. There’s no reason to be limited by a noun on the box it came in.

I understand that most of the above is likely caused by bad implementation of a fundamentally decent protocol. But Bluetooth has been around for, what, a decade or more? And I still regularly run into these problems. That points to something systemic in the software community.

Artificial Lightning

A while back, I ran across this page by Daniel Kennett, explaining how to make an Arduino control a set of Ikea Dioder LED lights. Very cool stuff, and I was able to use his work to build something similar.

So with Halloween fast approaching, I thought it would be neat to use this to create lightning. Tape a set of LED lights in the window at the front of the house, and make it make lightning flashes.

Continue reading “Artificial Lightning”


One of the things I learned in math is that a polynomial of degree N can pass through N+1 arbitrary points. A straight line goes through any two points, a parabola goes through any three points, and so forth. The practical upshot of this is that if your equation is complex enough, you can fit it to any data set.

That’s basically what happened to the geocentric model: it started out simple, with planets going around the Earth in circles. Except that some of the planets wobbled a bit. So they added more terms to the equations to account for the wobbles. Then there turned out to be more wobbles on top of the first wobbles, and more terms had to be added to the equations to take those into account, and so on until the theory collapsed under its own weight. There wasn’t any physical mechanism or cause behind the epicycles (as these wobbles were called). They were just mathematical artifacts. And so, one could argue that the theory was simpler when it had fewer epicycles and didn’t explain all of the data, but also was less wrong.

Take another example (adapted from Russell Glasser, who got it from his CS instructor): let’s say you and I order a pizza, and it comes with olives. I hate olives and you love them, so we want to cut it up in such a way that we both get slices of the same size, but your slice has as many of the olives as possible, and mine have as few as possible. (And don’t tell me we could just order a half-olive pizza; I’m using this as another example.)

We could take a photo of the pizza, feed it into an algorithm that’ll find the position of each olive and come up with the best way to slice the pizza fairly, but with a maximum of olives on your slices.

The problem is, this tells us nothing about how to slice the next such pizza that we order. Unless there’s some reason to think that the olives on the next pizza will be laid out in some similar way on the next pizza, we can’t tell the pizza parlor how to slice it up when we place our next order.

In contrast, imagine if we’d looked at the pizza and said, “Hm. Looks like the cook is sloppy, and just tossed a handful of olives on the left side, without bothering to spread them around.” Then we could ask the parlor slice to slice it into wedges, and we have good odds of winding up with three slices with extra olives and three with minimal olives. Or if we’d found that the cook puts the olives in the middle and doesn’t spread them around. Then we could ask the parlor to slice the pizza into a grid; you take the middle pieces, and I’ll take the outside ones.

But our original super-optimal algorithm doesn’t allow us to do that: by trying to perfectly account for every single olive in that one pizza, it doesn’t help us at all in trying to predict the next pizza.

In The Signal and the Noise, Nate Silver calls this overfitting. It’s often tempting to overfit, because then you can say, “See! My theory of Economic Epicycles explains 29 of the last 30 recessions, as well as 85% of the changes in the Dow Jones Industrial Average!” But is this exciting new theory right? That is, does it help us figure out what the future holds; whether we’re looking at a slight economic dip, a recession, or a full-fledged depression?

We’ve probably all heard the one about how the Dow goes up and down along with skirt hems. Or that the performance of the Washington Redskins predicts the outcome of US presidential elections. Of course, there’s no reason to think that fashion designers control Wall Street, or that football players have special insight into politics. More importantly, it goes to show that if you dig long enough, you can find some data set that matches the one you’re looking at. And in this interconnected, online, googlable world, it’s easier than ever to find some data set that matches what you want to see.

These two examples are easy to see through, because there’s obviously no causal relationship between football and politics. But we humans are good at telling convincing stories. What if I told you that pizza sales (with or without olives) can help predict recessions? After all, when people have less spending money, they eat out less, and pizza sales suffer.

I just made this up, both the pizza example and the explanation. So it’s bogus, unless by some million-to-one chance I stumbled on something right. But it’s a lot more plausible than the skirt or football examples, and thus we need to be more careful before believing it.

Update: John Armstrong pointed out that the first paragraph should say “N+1”, not “N”.

Update 2: As if on cue, Wonkette helps demonstrate the problems with trying to explain too much in this post about Glenn Beck somehow managing to tie together John Kerry’s presence or absence on a boat, his wife’s seizure, and Hillary Clinton’s answering or not answering questions about Benghazi. Probably NSFW because hey, Wonkette. But also full of Glenn Beck-ey crazy.

Removing Missing Podcast Episodes from iTunes

So I figured something out today.

I add and remove podcasts in iTunes all the time. And every so often, iTunes loses track of, er, tracks. This usually shows up in a smart playlist, where a podcast episode exists in iTunes’s database, but there’s no corresponding MP3 file on disk. You can’t manually delete entries from smart playlists; and if I’ve deleted the podcast, then I can’t even go back to the podcast list to delete the bogus episode. Nor does it show up in the “Music” list, since it’s a podcast episode.

What I finally figured out is that if you change the Media type from “Podcast” to “Music”, that’ll move the episode to the Music list, where you can delete it.

You can change the media type with “Get info” (⌘-I) → Options → “Media Kind”.

Unfortunately, if the file is missing, iTunes won’t let you edit the media kind. So first you need to associate the podcast episode with an MP3 file. The easiest way is to copy an existing MP3 file to /tmp/foo.mp3 or some such. Then, when you ⌘-I and iTunes says no such file and “Do you want to locate it?”, say yes, and point it at /tmp/foo.mp3. Then you can edit the media kind, and delete the file from the Music list.

Death of the Desktop

I’m a geek. If you didn’t know this, it’s because you’ve never met me or talked to me for more than five minutes.

I keep reading that the desktop PC is dying: , even as tablet and smartphone sales are rising. One popular theory, then, is that people are doing on their tablets and phones what they used to do on their desktop PCs.

I hope this isn’t the case, because frankly, tablets and phones are crap when it comes to doing real work.

Don’t get me wrong: I’ve been using PDAs, and now a smartphone, for well over a decade. I also have an iPad that I use regularly. I also have a Swiss army knife, but while it’s a wonderful tool in a pinch, I’d rather have a real set of tools if they’re available.

The same goes for laptops, tablets, and phones: they’re portable, and that certainly counts for a lot. But size matters, too, and size is intrinsically non-portable.

I’m not terribly picky about keyboards: as long as it’s full-sized (i.e., the keys are roughly the size of my fingers), has arrow keys, function keys, a keypad, and reasonable action (in the piano sense of the word), I’m happy. I know people who swear by the industrial-style IBM keyboards, and while I don’t share their enthusiasm, I get it: not only are they nigh-indestructible, they also have decent springs and make a satisfying “click” noise when you type. When you’ve typed a key, you know it. It’s a small thing, but it makes a difference.

At home, I have a 20-inch monitor, and wouldn’t consider anything smaller. In fact, I wouldn’t mind adding a second one, the way I have at work, to be able to have more windows in front of me.

I see people who resize their browser or spreadsheet or whatever to the full size of the display, and I don’t get it. Half the screen seems ample, and would allow them to see what else is open at the same time. Even worse are people who have a full-screen browser with multiple tabs open. How can they see what’s going on in those other tabs, with the current one blocking their view?

I’m not terribly picky when it comes to mice, though I do prefer a mouse to a trackball or laptop-style trackpad (though I find myself tempted by Apple’s super-sized trackpad). It’s more a matter of dexterity and fine control than anything else. I’m not as good zeroing in on a small button with a trackpad that lies between my thumbs as I am with a mouse that has its own area to the side.

All of these things are relatively minor: they don’t stop me from doing work, they just make it a little easier, a little more pleasant. But then, what makes a workspace pleasant isn’t so much the things it does, as the things it doesn’t do: the annoyances that aren’t there so they don’t get in the way. Not having to look down at my fingers to make sure they’re on the home row. Not clicking on the wrong button by mistake.

But the other thing, the thing that keeps getting me awed reactions about how fast I work, is keybindings. I’ve taken the time to either learn or customize my environment—both the windowing environment and common applications—to be able to do common operations from the keyboard, without having to move my hand all the way to the mouse. Again, I see people who raise their hand, move it over to the mouse, click on the window they want to switch to, then put their hand back on the keyboard. It’s like watching someone with a 1970s-era TV set get up off the couch, turn the station knob on the set, and come back to the couch. You’d want to say “Why don’t you just use the remote?” just as I want to yell “Why don’t you just alt-tab through to the window you want?”

(Also, in both Firefox and Chrome, did you know that you can set up a URL with a keyword, that’ll fill in whatever you type after the keyword? If I want to look up Star Wars at IMDb, I don’t type in eye em dee bee dot com into the browser’s URL bar, then click on the search box and type in the name. I just type “imdb Star Wars” into the URL bar, and the browser replaces that with “http://www.imdb.com/find?s=all&q=Star%20Wars&#8221;. Try it with images.google.com, Wikipedia, or Bible Gateway and see how convenient it is.)

Yes, these things only take a few seconds each. But a few seconds here, a few seconds there, and it all eventually adds up to significant time.

So when I hear it suggested that people are abandoning desktop machines for portable ones, what I hear is that people are switching from dedicated workspaces where you can get stuff done comfortably, to something fundamentally inferior.

In principle, there’s no reason why a portable device running, say, Android, couldn’t be as flexible and configurable as a Linux/FreeBSD/Solaris box running X11/KDE/GNOME/what have you. But in practice, they’re not. Whether it’s a matter of limiting the configurability to simplify development, or the fact that Android apps are sandboxed and can’t talk to each other, or something else, I don’t know. But the fact is that right now, I couldn’t bind “shake the phone up and down” to mean “create a new appointment in the calendar” if I wanted to.

And then comes along something like Ubuntu’s Unity, which aims to be a common UI for both desktop and portable devices. Which is to say, it aims to strip down the desktop to allow only those things that are convenient on tablets.

That’s taking away configurability; it’s simplification that makes it harder to get work done, and that annoys me.

UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things.
—Doug Gwyn

In Defense of Gussie Fink-Nottle

For those who may have forgotten, Gussie Fink-Nottle is a character in the Jeeves stories by P.G. Wodehouse. He is the series’s stereotypical nerd: socially inept, a teetotaler, and physically unimpressive. His most memorable trait, however, is his fascination with newts.

Clearly Wodehouse tried to find the least interesting subject he could think of, to allow his character to easily bore all the other characters to tears by going on at length about his pathetic pet subject.

I don’t remember his early life ever being discussed in any detail, but I imagine that, as a weakling, he was never any good at sports and thus never developed an interest in them. Unable to hold his liquor, he never got into the habit of meeting with the chaps over drinks and experiencing the sorts of things that only seem to happen during alcohol-fueled debaucheries. His social ineptitude meant that he never became a lothario. Eventually, he was forced to become interested in that most uninteresting of subjects, newts.

But I would look at it from another angle: there is an infinite number of subjects in the world. What is it about newts that’s so interesting that Gussie would choose to devote his life to them?

Stephen Jay Gould, as I recall, did his graduate research on snails. Carl Sagan was interested in points of light in the sky. Bertrand Russell worked on breaking down existing mathematical proofs into longer chains of simpler steps. In each case, they found something interesting in what might appear to be an unintersting subject.

Likewise, I sometimes wonder what makes people want to go into professions like accounting or proctology. It can’t just be the money, can it? Presumably there’s something there that I don’t see, some hidden bit of beauty that I haven’t seen or had explained to me.

I don’t want to think, “Wow, what a loser, for being interested in something as boring as newts.” Rather, I want to ask, “What is it about newts that’s so interesting?”

XKCD: Beauty