WFHing with Emacs: Work Mode and Command-Line Options

Like the rest of the world, I’m working from home these days. One of the changes I’ve made has been to set up Emacs to work from home.

I use Emacs extensively both at home and at work. So far, my method for keeping personal stuff and work stuff separate has been to, well, keep separate copies of ~/.emacs.d/ on work and home machines. But now that my home machine is my work machine, I figured I’d combine their configs.

To do this, I just added a -work command-line option, so that emacs -work runs in work mode. The command-switch-alist variable is useful here: it allows you to define a command-line option, and a function to call when it is encountered:

(defun my-work-setup (arg)
   ;; Do setup for work-mode
  )
(add-to-list 'command-switch-alist
  '("work" . my-work-setup))

Of course, I’ve never liked defining functions to be called only once. That’s what lambda expressions are for:

(add-to-list 'command-switch-alist
  '("work" .
    (lambda (arg)
      ;; Do setup for work-mode
      (setq my-mode 'work)
      )))

One thing to bear in mind about command-switch-alist is that it gets called as soon as the command-line option is seen. So let’s say you have a -work argument and a -logging option. And the -logging-related code needs to know whether work mode is turned on. That means you would always have to remember to put the -work option before the -logging option, which isn’t very elegant.

A better approach is to use the command-switch-alist entries to just record that a certain option has been set. The sample code above simply sets my-mode to 'work when the -work option is set. Then do the real startup stuff after the command line has been parsed and you know all of the options that have been passed in.

Unsurprisingly, Emacs has a place to put that: emacs-startup-hook:

(defvar my-mode 'home
  "Current mode. Either home or work.")
(add-to-list 'command-switch-alist
  '("work" . (lambda (arg)
                (setq my-mode 'work))))

(defvar logging-p nil
  "True if and only if we want extra logging.")
(add-to-list 'command-switch-alist
  '("logging" . (lambda (arg)
                  (setq logging-p t))))

(add-hook 'emacs-startup-hook
  (lambda nil
    (if (eq my-mode 'work)
      (message "Work mode is turned on."))
    (if logging-p
      (message "Extra logging is turned on."))
    (if (and (eq my-mode 'work)
             logging-p)
      (message "Work mode and logging are both turned on."))))

Check the *Messages* buffer to see the output.

Ansible As Scripting Language

Ansible is billed as a configuration manager similar to Puppet or cfengine. But it occurred to me recently that it’s really (at least) two things:

  1. A configuration manager.
  2. A scripting language for the machine room.

Mode 1 is the normal, expected one: here’s a description; now make the machine(s) look like the description. Same as Puppet.

Mode 2 is, I think, far more difficult to achieve in Puppet than it is in Ansible. This is where you make things happen in a particular order, not just on one machine (you’d use /bin/sh for that), but on multiple hosts.

For instance, adding a new user might involve:

  1. Generate a random password on localhost.
  2. Add a user on the Active Directory server.
  3. Create and populate a home directory on the home directory server.
  4. Add a stub web page on the web server.

This is something I’d rather write as an ansible play, than as a Puppet manifest or module.

Which brings me to my next point: it seems that for Mode 1, you want to think mainly in terms of roles, while for Mode 2, you’ll want to focus on playbooks. A role is designed to encapsulate the notion of “here’s a description of what a machine should look like, and the steps to take, if any, to make it match that description”, while a playbook is naturally organized as “step 1; step 2; step 3; …”.

These are, of course, just guidelines. And Mode 2 suffers from the fact that YAML is not a good way to express programming concepts.  But I find this to be a useful way of thinking about what I’m doing in Ansible.

Ansible: Running Commands in Dry-Run Mode in Check Mode

Say you have an Ansible playbook that invokes a command. Normally, that command executes when you run ansible normally, and doesn’t execute at all when you run ansible in check mode.

But a lot of commands, like rsync have a -n or --dry-run argument that shows what would be done, without actually making any changes. So it would be nice to combine the two.

Let’s start with a simple playbook that copies some files with rsync:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        -avi
        /tmp/source/
        /tmp/destination/
  hosts: localhost
  become: no
  gather_facts: no

When you execute this playboook with ansible-playbook foo.yml rsync runs, and when you run in check mode, with ansible-playbook -C foo.yml, rsync doesn’t run.

This is inconvenient, because we’d like to see what rsync would have done before we commit to doing it. So let’s force it to run even in check mode, with check_mode: no, but also run rsync in dry-run mode, so we don’t make changes while we’re still debugging the playbook:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        --dry-run
        -avi
        /tmp/source/
        /tmp/destination/
      check_mode: no
  hosts: localhost
  become: no
  gather_facts: no

Now we just need to remember to remove the --dry-run argument when we’re ready to run it for real. And turn it back on again when we need to debug the playbook.

Or we could do the smart thing, and try to add that argument only when we’re running Ansible in check mode. Thankfully, there’s a variable for that: ansible_check_mode, so we can set the argument dynamically:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        {{ '--dry-run' if ansible_check_mode else '' }}
        -avi
        /tmp/source/
        /tmp/destination/
      check_mode: no
  hosts: localhost
  become: no
  gather_facts: no

You can check that this works with ansible-playbook -v -C foo.yml and ansible-playbook -v foo.yml.

Pseudo-Numeric Identifiers

Let’s say you’re a programmer, and your application uses Library of Congress Control Numbers for books, e.g., 2001012345, or ZIP codes, like 90210. What data types would you use to represent them? Or maybe something like the Dewey Decimal System, which uses 320 to classify a book as Political Science, 320.5 for Political Theory, and 320.973 for “Political institutions and public administration (United States)”?

If you said “integer”, “floating point”, or any kind of numeric type, then clearly you weren’t paying attention during the title.

The correct answer was “string” (or some kind of array of tokens), because although these entities consist of digits, they’re not numbers: they’re identifiers, same as “root” or “Jane Smith”. You can assign them, sort them, group them by common features, but you can’t meaningfully add or multiply them together. If you’re old enough, you may remember the TV series The Prisoner or Get Smart, where characters, most of them secret agents, refer to each other by their code numbers all the time; when agents 86 and 99 team up, they don’t become agent 185 all of a sudden.

If you keep in mind this distinction between numbers, which represent quantities, and strings that merely look like numbers because they happen to consist entirely of integers, you can save yourself a lot of grief. For instance, when your manager decides to store the phone number 18003569377 as “1-800-FLOWERS”, dashes and all. Or when you need to store a foreign phone number and have to put a plus sign in front of the country code.

Removing Magic

So this was one of those real-life mysteries.

I like crossword puzzles. And in particular, I like indie crossword puzzles, because they tend to be more inventive and less censored than ones that run in newspapers. So I follow several crossword designers on Twitter.

Yesterday, one of them mentioned that people were having a problem with his latest puzzle. I tried downloading it on my iPad, and yeah, it wouldn’t open in Across Lite. Other people were saying that their computers thought the file was in PostScript format. I dumped the HTTP header with

lynx -head -dump http://url.to/crossword.puz

and found the header

Content-type: application/postscript

which was definitely wrong for a .puz file. What’s more, other .puz files in the same directory were showing up as

Content-type: application/octet-stream

as they should.

I mentioned all this to the designer, which led to us chatting back and forth to see what the problem was. And eventually I had the proverbial aha moment.

.puz files begin with a two-byte checksum. In this particular case, they turned out to be 0x25 and 0x21. Or, in ASCII, “%!“. And as it turns out, PostScript files begin with “%!“, according to Unix’s magic file.

So evidently what happened was: the hosting server didn’t have a default type for files ending in .puz. Not terribly surprising, since that’s not really a widely-used format. So since it didn’t recognize the filename extension, it did the next-best thing and looked at the first few bytes of the file (probably with file or something equivalent) to see if it could make an educated guess. It saw the checksum as “%!” and decided it was a PostScript file.

The obvious fix was to change something about the file: rewrite a clue, add a note, change the copyright statement, anything to change the contents of the file, and thus the checksum.

The more permanent solution was to add a .htaccess file to the puzzle file directory, with

AddType application/octet-stream .puz

assuming that the hosting provider used Apache or something compatible.

This didn’t take immediately; I think the provider cached this metadata for a few hours. But eventually things cleared up.

I’m not sure what the lesson is, here. “Don’t use two-byte checksums at offset 0”, maybe?

Programming Tip: Open and Close at the Same Time

One useful programming habit I picked up at some point is: if you open or start something, immediately close it or end it. If you open a bracket, immediately write its closing bracket. If you open a file, immediately write the code to close it.

These days, development environments take care of the niggling little details like matching parentheses and brackets for you. That’s great, but that’s just syntax. The same principle extends further, and automatic tools can’t guess what it is you want to do.

There’s a problem in a lot of code called a resource leak. The classic example is memory leaks in C: the code asks for, and gets, a chunk of memory. But if you don’t free the memory when you’re done with it, then your program will get larger and larger — like a coffee table where a new magazine is added every month but none are ever taken away — until eventually the machine runs out of memory.

These days, languages keep track of memory for you, so it’s easier to avoid memory leaks than it used to be. But the best way I’ve found to manage them is: when you allocate memory (or some other resource), plan to release it when you’re done.

The same principle applies to any resource: if you read or write a file, you’ll need a file handle. If you never close them, they’ll keep lying around, and you’ll eventually run out. So plan ahead, and free the resource as soon as you’ve alocated it:

Once you’ve written

open INFILE, "<", "/path/to/myfile";

go ahead and immediately write the code to close that file:

open INFILE, "<", "/path/to/myfile";
close INFILE;

and only then write the code to do stuff with the file:

open INFILE, "<", "/path/to/myfile";
while ()
{
	print "hello\n" if /foo/;
}
close INFILE;

The corollary of this is, if you’ve written the open but aren’t sure where to put the close, then you may want to take a look at the structure of your code, and refactor it.

This same principle applies in many situations: when you open a connection to a remote web server, database server, etc., immediately write the code to close the connection. If you’re writing HTML, and you’ve written <foo>, immediately write the corresponding </foo>. If you’ve sent off an asynchronous AJAX request, figure out where you’re going to receive the reply. When you throw an exception, decide where you’re going to catch it.

And only then write the meat of the code, the stuff that goes between the opening and closing code.

As I said, I originally came across this as a tip for avoiding memory leaks. But I’ve found that doing things this way forces me to be mindful of the structure of my code, and avoid costly surprises down the line.

Nigerian Scammers Are Good People

Via Slashdot comes an IEEE Spectrum article about a new scam from Nigeria. In brief, instead of asking you for money directly, they redirect your business email. They wait until someone orders something from your company, then rewrites the bank routing numbers and such so that the client sends money to the scammers’ account instead of yours.

So far, so bad. Technically interesting, ethically very bad. The moral of the story, as always, is be careful where you type your password, and if something looks hinky, think about it.

But then there’s this part:

Bettke and Stewart estimate the group they studied has at least 30 members and is likely earning a total of about $3 million a year from the thefts. The scammers appear to be “family men” in their late 20s to 40s who are well-respected, church-going figures in their communities. “They’re increasing the economic potential of the region they’re living in by doing this, and I think they feel somewhat of a duty to do this,” Stewart says.

Let’s just toss that on the pile marked “Religion doesn’t make people more moral”, shall we?

If You Use Unix, Use Version Control

If you’ve used Unix (or Linux. This applies to Linux, and MacOS X, and probably various flavors of Windows as well), you’ve no doubt found yourself editing configuration files with a text editor. This is especially true if you’ve been administering a machine, either professionally or because you got roped into doing it.

And if you’ve been doing it for more than a day or two, you’ve made a mistake, and wished you could undo it, or at least see what things looked like before you started messing with them.

This is something that programmers have been dealing with for a long time, so they’ve developed an impressive array of tools to allow you to keep track of how a file has changed over time. Most of them have too much overhead for someone who doesn’t do this stuff full-time. But I’m going to talk about RCS, which is simple enough that you can just start using it.

Most programmers will tell you that RCS has severe limitations, like only being able to work on one file at a time, rather than a collection of files, that makes it unsuitable for use in all but a few special circumstances. Thankfully, Unix system administration happens to be one of those circumstances!

What’s version control?

Basically, it allows you to track changes to a file, over time. You check a file in, meaning that you want to keep track of its changes. Periodically, you check it in again, which is a bit like bookmarking a particular version so you can come back to it later. And you can check a file out, that is, retrieve it from the history archive. Or you can compare the file as it is now to how it looked one, five, or a hundred versions ago.

Note that RCS doesn’t record any changes unless you tell it to. That means that you should get into the habit of checking in your changes when you’re done messing with a file.

Starting out

Let’s create a file:

# echo "first" > myfile

Now let’s check it in, to tell RCS that we want to track it:

# ci -u myfile
myfile,v <-- myfile
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> This is a test file
>> .
initial revision: 1.1
done

ci stands for “check in”, and is RCS’s tool for checking files in. The -u option says to unlock it after checking in.

Locking is a feature of RCS that helps prevent two people from stepping on each other’s toes by editing a file at the same time. We’ll talk more about this later.

Note that I typed in This is a test file. I could have given a description on multiple lines if I wanted to, but usually you want to keep this short: “DNS config file” or “Login message”, or something similar.

End the description with a single dot on a line by itself.

You’ll notice that you now have a file called myfile,v. That’s the history file for myfile.

Since you probably don’t want ,v files lying around cluttering the place up, know that if there’s a directory called RCS, the RCS utilities will look for ,v history files in that directory. So before we get in too deep, let’s create an RCS directory:

# mkdir RCS

Now delete myfile and start from scratch, above.

Done? Good. By the way, you could also have cheated and just moved the ,v file into the RCS directory. Now you know for next time.

Making a change

All right, so now you want to make a change to your file. This happens in three steps:

  1. Check out the file and lock it.
  2. Make the change(s)
  3. Check in your changes and unlock the file.

Check out the file:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.1 (locked)
done

co is RCS’s check-out utility. In this case, it pulls the latest version out of the history archive, if it’s not there already.

The -l (lower-case ell) flag says to lock the file. This helps to prevent other people from working on the file at the same time as you. It’s still possible for other people to step on your toes, especially if they’re working as root and can overwrite anything, but it makes it a little harder. Just remember that co is almost always followed by -l.

Now let’s change the file. Edit it with your favorite editor and replace the word “first” with the word “second”.

If you want to see what has changed between the last version in history and the current version of the file, use rcsdiff:

# rcsdiff -u myfile
===================================================================
RCS file: RCS/myfile,v
retrieving revision 1.1
diff -u -r1.1 myfile
--- myfile 2016/06/07 20:18:12 1.1
+++ myfile 2016/06/07 20:32:38
@@ -1 +1 @@
-first
+second

The -u option makes it print the difference in “unified-diff” format, which I find more readable than the other possibilities. Read the man page for more options.

In unified-diff format, lines that were deleted are preceded with a minus sign, and lines that were added are preceded by a plus sign. So the above is saying that the line “first” was removed, and “second” was added.

Finally, let’s check in your change:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> Updated to second version.
>> .
done

Again, we were prompted to list the changes we made to the file (with a dot on a line by itself to mark the end of our text). You’ll want to be concise yet descriptive in this text, because these are notes you’re making for your future self when you want to go back and find out when and why a change was made.

Viewing a file’s history

Use the rlog command to see a file’s history:

# rlog myfile

RCS file: RCS/myfile,v
Working file: myfile
head: 1.2
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 2; selected revisions: 2
description:
Test file.
----------------------------
revision 1.2
date: 2016/06/07 20:36:52; author: arensb; state: Exp; lines: +1 -1
Made a change.
----------------------------
revision 1.1
date: 2016/06/07 20:18:12; author: arensb; state: Exp;
Initial revision
=============================================================================

In this case, there are two revisions: 1.1, with the log message “Initial revision”, and 1.2, with the log “Made a change.”.

Undoing a change

You’ve already see rlog, which shows you a file’s history. And you’ve seen one way to use rcsdiff.

You can also use either one or two -rrevision-number arguments, to see the difference between specific revisions:

# rcsdiff -u -r1.1 myfile

will show you the difference between revision 1.1 and what’s in the file right now, and

# rcsdiff -u -r1.1 -r1.2 myfile

will show you the difference between revisions 1.1 and 1.2.

(Yes, RCS will just increment the second number in the revision number, so after a while you’ll be editing revision 1.2486 of the file. Getting to revision 2.0 is an advanced topic that we won’t cover here.)

With the tools you already have, the simplest way to revert an unintended change to a file is simply to see what the file used to look like, and copy-paste that into a new revision.

Once you’re comfortable with that, you can read the manual and read up on things like deleting revisions with rcs -o1.2 myfile.

Checking in someone else’s changes

You will inevitably run into cases where someone changes your file without going through RCS. Either it’ll be a coworker managing the same system who didn’t notice the ,v file lying around, or else you’ll forget to check in your changes after making changes.

Here’s a simple way to see whether someone (possibly you) has made changes without your knowledge:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.2 (locked)
writable myfile exists; remove it? [ny](n):

In this case, either you forgot to check in your changes, or else someone made the file writable with chmod, then (possibly) edited it.

In the former case, see what you did with rcsdiff, check in your changes, then check the file out again to do what you were going to do.

The latter case requires a bit more work, because you don’t want to lose your coworker’s changes, even though they bypassed version control.

  1. Make a copy of the file
  2. Check out the latest version of the file.
  3. Overwrite that file with your coworker’s version.
  4. Check those changes in.
  5. Check the file out and make your changes.
  6. Have a talk with your coworker about the benefits of using version control..

You already know, from the above, how to do all of this. But just to recap:

Move the file aside:

# mv myfile myfile.new

Check out the latest version:

# co -l myfile

Overwrite it with your coworker’s changes:

# mv myfile.new myfile

Check in those changes:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.3; previous revision: 1.2
enter log message, terminated with single '.' or end of file:
>> Checking in Bob's changes:.
>> Route around Internet damage.
>> .
done

That should be enough to get you started. Play around with this, and I’m sure you’ll find that this is a huge improvement over what you’ve probably been using so far: a not-system of making innumerable copies of files when you remember to, with names like “file.bob”, “file.new”, “file.new2”, “file.newer”, and other names that mean nothing to you a week later.

There Are Days When I Hate XML

…and days when I really hate XML.

In this case, I have an XML document like

<?xml version="1.0" ?>
<foo xmlns="http://some.org/foo">
  <thing id="first"/>
  <thing id="second"/>
  <thing id="third"/>
</foo>

and I want to get things out of it with XPath.

But I couldn’t manage to select things exactly, such as the <foo> element at the top. You’d think “/foo” would do it, but it didn’t.

Eventually I found out that the problem is the xmlns=”...” attribute. It looks perfectly normal, saying that if you have <thing> without a prefix (like “<ns:thing>”), then it’s in the “http://some.org/foo” namespace.

However, in XPath, if you specify “ns:thing”, it means “a thing element in whichever namespace the ns prefix corresponds to”. BUT “thing” means “a thing element that’s not in a namespace”.

So how do you specify an element that’s the empty-string namespace, as above? The obvious way would be to select “:thing”, but that doesn’t work. Too simple, I suppose. Maybe that gets confused with CSS pseudo-selectors or something.

No, apparently the thing you need to do is to invent a prefix for the standard elements of the file you’re parsing. That is, add “ns” as another prefix that maps onto “http://some.org/foo” and then select “ns:thing”. There are different ways of doing this, depending which library you’re using to make XPath queries, but still, it seems like a giant pain in the ass.

Would You Get Off the Plane?

There’s an old joke about an instructor who asks, “if you were on a plane, and found out that the inflight systems were controlled by a beta version of software written by your team, would you get off the plane?” Most of the students pale and say yes. One person says, “No. If it were written by my team, it wouldn’t make it to the runway.”

The implication is that most software is far crappier than people realize, and that programmers are keenly aware of this. But I’d like to present another reason for getting off the plane:

Even if my team writes good code, the fact that it’s still in beta means that it hasn’t been fully tested. That means that there must have been some kind of snafu at the airline and/or the software company for untested software to be put in control of a flight with ordinary passengers. And if management permitted this error, what other problems might there be? Maybe the fuel tanks are only half-full. Maybe the landing gear hasn’t been inspected.