Ansible As Scripting Language

Ansible is billed as a configuration manager similar to Puppet or cfengine. But it occurred to me recently that it’s really (at least) two things:

  1. A configuration manager.
  2. A scripting language for the machine room.

Mode 1 is the normal, expected one: here’s a description; now make the machine(s) look like the description. Same as Puppet.

Mode 2 is, I think, far more difficult to achieve in Puppet than it is in Ansible. This is where you make things happen in a particular order, not just on one machine (you’d use /bin/sh for that), but on multiple hosts.

For instance, adding a new user might involve:

  1. Generate a random password on localhost.
  2. Add a user on the Active Directory server.
  3. Create and populate a home directory on the home directory server.
  4. Add a stub web page on the web server.

This is something I’d rather write as an ansible play, than as a Puppet manifest or module.

Which brings me to my next point: it seems that for Mode 1, you want to think mainly in terms of roles, while for Mode 2, you’ll want to focus on playbooks. A role is designed to encapsulate the notion of “here’s a description of what a machine should look like, and the steps to take, if any, to make it match that description”, while a playbook is naturally organized as “step 1; step 2; step 3; …”.

These are, of course, just guidelines. And Mode 2 suffers from the fact that YAML is not a good way to express programming concepts.  But I find this to be a useful way of thinking about what I’m doing in Ansible.

Ansible: Running Commands in Dry-Run Mode in Check Mode

Say you have an Ansible playbook that invokes a command. Normally, that command executes when you run ansible normally, and doesn’t execute at all when you run ansible in check mode.

But a lot of commands, like rsync have a -n or --dry-run argument that shows what would be done, without actually making any changes. So it would be nice to combine the two.

Let’s start with a simple playbook that copies some files with rsync:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        -avi
        /tmp/source/
        /tmp/destination/
  hosts: localhost
  become: no
  gather_facts: no

When you execute this playboook with ansible-playbook foo.yml rsync runs, and when you run in check mode, with ansible-playbook -C foo.yml, rsync doesn’t run.

This is inconvenient, because we’d like to see what rsync would have done before we commit to doing it. So let’s force it to run even in check mode, with check_mode: no, but also run rsync in dry-run mode, so we don’t make changes while we’re still debugging the playbook:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        --dry-run
        -avi
        /tmp/source/
        /tmp/destination/
      check_mode: no
  hosts: localhost
  become: no
  gather_facts: no

Now we just need to remember to remove the --dry-run argument when we’re ready to run it for real. And turn it back on again when we need to debug the playbook.

Or we could do the smart thing, and try to add that argument only when we’re running Ansible in check mode. Thankfully, there’s a variable for that: ansible_check_mode, so we can set the argument dynamically:

- name: Copy files
  tasks:
    - name: rsync the files
      command: >-
        rsync
        {{ '--dry-run' if ansible_check_mode else '' }}
        -avi
        /tmp/source/
        /tmp/destination/
      check_mode: no
  hosts: localhost
  become: no
  gather_facts: no

You can check that this works with ansible-playbook -v -C foo.yml and ansible-playbook -v foo.yml.

Removing Magic

So this was one of those real-life mysteries.

I like crossword puzzles. And in particular, I like indie crossword puzzles, because they tend to be more inventive and less censored than ones that run in newspapers. So I follow several crossword designers on Twitter.

Yesterday, one of them mentioned that people were having a problem with his latest puzzle. I tried downloading it on my iPad, and yeah, it wouldn’t open in Across Lite. Other people were saying that their computers thought the file was in PostScript format. I dumped the HTTP header with

lynx -head -dump http://url.to/crossword.puz

and found the header

Content-type: application/postscript

which was definitely wrong for a .puz file. What’s more, other .puz files in the same directory were showing up as

Content-type: application/octet-stream

as they should.

I mentioned all this to the designer, which led to us chatting back and forth to see what the problem was. And eventually I had the proverbial aha moment.

.puz files begin with a two-byte checksum. In this particular case, they turned out to be 0x25 and 0x21. Or, in ASCII, “%!“. And as it turns out, PostScript files begin with “%!“, according to Unix’s magic file.

So evidently what happened was: the hosting server didn’t have a default type for files ending in .puz. Not terribly surprising, since that’s not really a widely-used format. So since it didn’t recognize the filename extension, it did the next-best thing and looked at the first few bytes of the file (probably with file or something equivalent) to see if it could make an educated guess. It saw the checksum as “%!” and decided it was a PostScript file.

The obvious fix was to change something about the file: rewrite a clue, add a note, change the copyright statement, anything to change the contents of the file, and thus the checksum.

The more permanent solution was to add a .htaccess file to the puzzle file directory, with

AddType application/octet-stream .puz

assuming that the hosting provider used Apache or something compatible.

This didn’t take immediately; I think the provider cached this metadata for a few hours. But eventually things cleared up.

I’m not sure what the lesson is, here. “Don’t use two-byte checksums at offset 0”, maybe?

If You Use Unix, Use Version Control

If you’ve used Unix (or Linux. This applies to Linux, and MacOS X, and probably various flavors of Windows as well), you’ve no doubt found yourself editing configuration files with a text editor. This is especially true if you’ve been administering a machine, either professionally or because you got roped into doing it.

And if you’ve been doing it for more than a day or two, you’ve made a mistake, and wished you could undo it, or at least see what things looked like before you started messing with them.

This is something that programmers have been dealing with for a long time, so they’ve developed an impressive array of tools to allow you to keep track of how a file has changed over time. Most of them have too much overhead for someone who doesn’t do this stuff full-time. But I’m going to talk about RCS, which is simple enough that you can just start using it.

Most programmers will tell you that RCS has severe limitations, like only being able to work on one file at a time, rather than a collection of files, that makes it unsuitable for use in all but a few special circumstances. Thankfully, Unix system administration happens to be one of those circumstances!

What’s version control?

Basically, it allows you to track changes to a file, over time. You check a file in, meaning that you want to keep track of its changes. Periodically, you check it in again, which is a bit like bookmarking a particular version so you can come back to it later. And you can check a file out, that is, retrieve it from the history archive. Or you can compare the file as it is now to how it looked one, five, or a hundred versions ago.

Note that RCS doesn’t record any changes unless you tell it to. That means that you should get into the habit of checking in your changes when you’re done messing with a file.

Starting out

Let’s create a file:

# echo "first" > myfile

Now let’s check it in, to tell RCS that we want to track it:

# ci -u myfile
myfile,v <-- myfile
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> This is a test file
>> .
initial revision: 1.1
done

ci stands for “check in”, and is RCS’s tool for checking files in. The -u option says to unlock it after checking in.

Locking is a feature of RCS that helps prevent two people from stepping on each other’s toes by editing a file at the same time. We’ll talk more about this later.

Note that I typed in This is a test file. I could have given a description on multiple lines if I wanted to, but usually you want to keep this short: “DNS config file” or “Login message”, or something similar.

End the description with a single dot on a line by itself.

You’ll notice that you now have a file called myfile,v. That’s the history file for myfile.

Since you probably don’t want ,v files lying around cluttering the place up, know that if there’s a directory called RCS, the RCS utilities will look for ,v history files in that directory. So before we get in too deep, let’s create an RCS directory:

# mkdir RCS

Now delete myfile and start from scratch, above.

Done? Good. By the way, you could also have cheated and just moved the ,v file into the RCS directory. Now you know for next time.

Making a change

All right, so now you want to make a change to your file. This happens in three steps:

  1. Check out the file and lock it.
  2. Make the change(s)
  3. Check in your changes and unlock the file.

Check out the file:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.1 (locked)
done

co is RCS’s check-out utility. In this case, it pulls the latest version out of the history archive, if it’s not there already.

The -l (lower-case ell) flag says to lock the file. This helps to prevent other people from working on the file at the same time as you. It’s still possible for other people to step on your toes, especially if they’re working as root and can overwrite anything, but it makes it a little harder. Just remember that co is almost always followed by -l.

Now let’s change the file. Edit it with your favorite editor and replace the word “first” with the word “second”.

If you want to see what has changed between the last version in history and the current version of the file, use rcsdiff:

# rcsdiff -u myfile
===================================================================
RCS file: RCS/myfile,v
retrieving revision 1.1
diff -u -r1.1 myfile
--- myfile 2016/06/07 20:18:12 1.1
+++ myfile 2016/06/07 20:32:38
@@ -1 +1 @@
-first
+second

The -u option makes it print the difference in “unified-diff” format, which I find more readable than the other possibilities. Read the man page for more options.

In unified-diff format, lines that were deleted are preceded with a minus sign, and lines that were added are preceded by a plus sign. So the above is saying that the line “first” was removed, and “second” was added.

Finally, let’s check in your change:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> Updated to second version.
>> .
done

Again, we were prompted to list the changes we made to the file (with a dot on a line by itself to mark the end of our text). You’ll want to be concise yet descriptive in this text, because these are notes you’re making for your future self when you want to go back and find out when and why a change was made.

Viewing a file’s history

Use the rlog command to see a file’s history:

# rlog myfile

RCS file: RCS/myfile,v
Working file: myfile
head: 1.2
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 2; selected revisions: 2
description:
Test file.
----------------------------
revision 1.2
date: 2016/06/07 20:36:52; author: arensb; state: Exp; lines: +1 -1
Made a change.
----------------------------
revision 1.1
date: 2016/06/07 20:18:12; author: arensb; state: Exp;
Initial revision
=============================================================================

In this case, there are two revisions: 1.1, with the log message “Initial revision”, and 1.2, with the log “Made a change.”.

Undoing a change

You’ve already see rlog, which shows you a file’s history. And you’ve seen one way to use rcsdiff.

You can also use either one or two -rrevision-number arguments, to see the difference between specific revisions:

# rcsdiff -u -r1.1 myfile

will show you the difference between revision 1.1 and what’s in the file right now, and

# rcsdiff -u -r1.1 -r1.2 myfile

will show you the difference between revisions 1.1 and 1.2.

(Yes, RCS will just increment the second number in the revision number, so after a while you’ll be editing revision 1.2486 of the file. Getting to revision 2.0 is an advanced topic that we won’t cover here.)

With the tools you already have, the simplest way to revert an unintended change to a file is simply to see what the file used to look like, and copy-paste that into a new revision.

Once you’re comfortable with that, you can read the manual and read up on things like deleting revisions with rcs -o1.2 myfile.

Checking in someone else’s changes

You will inevitably run into cases where someone changes your file without going through RCS. Either it’ll be a coworker managing the same system who didn’t notice the ,v file lying around, or else you’ll forget to check in your changes after making changes.

Here’s a simple way to see whether someone (possibly you) has made changes without your knowledge:

# co -l myfile
RCS/myfile,v --> myfile
revision 1.2 (locked)
writable myfile exists; remove it? [ny](n):

In this case, either you forgot to check in your changes, or else someone made the file writable with chmod, then (possibly) edited it.

In the former case, see what you did with rcsdiff, check in your changes, then check the file out again to do what you were going to do.

The latter case requires a bit more work, because you don’t want to lose your coworker’s changes, even though they bypassed version control.

  1. Make a copy of the file
  2. Check out the latest version of the file.
  3. Overwrite that file with your coworker’s version.
  4. Check those changes in.
  5. Check the file out and make your changes.
  6. Have a talk with your coworker about the benefits of using version control..

You already know, from the above, how to do all of this. But just to recap:

Move the file aside:

# mv myfile myfile.new

Check out the latest version:

# co -l myfile

Overwrite it with your coworker’s changes:

# mv myfile.new myfile

Check in those changes:

# ci -u myfile
RCS/myfile,v <-- myfile
new revision: 1.3; previous revision: 1.2
enter log message, terminated with single '.' or end of file:
>> Checking in Bob's changes:.
>> Route around Internet damage.
>> .
done

That should be enough to get you started. Play around with this, and I’m sure you’ll find that this is a huge improvement over what you’ve probably been using so far: a not-system of making innumerable copies of files when you remember to, with names like “file.bob”, “file.new”, “file.new2”, “file.newer”, and other names that mean nothing to you a week later.

Disk Hack

One of the things I enjoy about Unix system administration is the McGyver aspect of it: when something goes pear-shaped, and your preferred tools aren’t available because they’re on the disk that just died, or on the other side of the pile of smoking ashes that used to be a router, you have to figure out how to recover with what you’ve got left. It’s a bit like that scene in Apollo 13 when they realize that the space capsule has a round hole for the air filter, but only square filters, and the engineer dumps all the equipment the astronauts have available onto the table, and says “We’ve got to find a way to make this fit into the hole for this, using nothing but that“.

So anyway, my mom’s Mac recently died. And, naturally, there are no available backups. But I said I’d do what I could, and took the machine home.

I’m glad we wrote off the old machine as a total loss, since it was (the tense should give you an idea of what’s coming) an iMac, one of those compact everyting-in-the-monitor models that tries oh-so-hard to fit everything into as small a space as possible. Of course, this compactness means that there’s no room to do anything: the disk is behind the LCD display, and wedged in a tight slot between the graphics card and the DVD drive. And the whole thing is wrapped in — I kid you not — foil, most likely to help control airflow. So basically I wound up ripping things out with little or no grace or elegance. If it wasn’t totaled then, it certainly is now.

At any rate, that left me with a disk that, thankfully, turned out to be unharmed. The next question was how to hook it up. I was pretty sure it had an HFS or HFS+ filesystem, which meant that the obvious thing to do would be to put it in a Mac to read. But I don’t have a Mac that I could put a second internal drive in. I toyed briefly with the idea of finding an enclosure and whatever conversion hardware would be necessary to turn an internal SATA drive into an external USB drive, but figured that was too hard for a one-shot. Then I found that my Linux box has hfs and hfsplus filesystem kernel modules, so hey.

(Of course, I don’t know how stable the Linux HFS driver is, so I figured it’d be best to write-protect the disk. At which point I discovered that this model only has one hardware jumper slot, and it doesn’t write-protect the disk. Fuck you very much, Maxtor/Apple.)

The Linux HFS driver turned out to be good enough for reading, and I could mount the disk and read files, so yay. The next question was how to get the files from there to a laptop that I could bring over to my parents’. Ths is complicated by the fact that on HFS, a file isn’t just a stream of bytes, the way it is in Unix; it has two “forks”: the data fork contains the actual data of the file (e.g., a JPEG image), and the resource fork contains metadata about the file, such as its icon, the application that should open the file by default, and so on. I didn’t want to lose that if I could help it.

The way the Linux HFS driver deals with resource forks is to create virtual or transient or whatever you want to call them files: if you open myfile, you’ll get the data fork, which looks just like any file. But if it has a resource fork, you can also open myfile/rsrc and read the contents of that. This meant that in the worst case, I 1) copy over the data forks to a directory on the Mac, then 2) find which files have resource forks (something like

find /mnt -type f |
sh -c 'while read filename; do
    if [ -s "$filename/rsrc" ]; then
        echo "$filename";
    fi;
done'

and 3) somehow re-graft the resource forks onto the files on the Mac end. But that seemed like a lot of work.

Apple software is often distributed on .dmg (disk image) files, which are mounted as virtual disks. I figured that’d be the obvious way to package up the contents of the disk. So I dded the raw disk device (/dev/sdb rather than /dev/sdb2 which was the mounted partition, in order to get the entire disk, including partition map and such), but when I tried to mount that, it didn’t work, so presumably there’s more to a .dmg file than just the raw disk data. I tried a couple of variations on that theme, but without success.

(In passing, I also noticed that rsync supports “extended attributes” on both my Mac and my Linux box, so I tried using that to copy files (with resource forks) over, only to find that the two implementations use different options to say “turn on extended attributes”, so the client couldn’t start the remote server correctly.)

Eventually, I realized that dd could be used not just to read a disk image, but to write one. Yes, I said above that reading from the disk and writing to a file didn’t produce a usable disk image. But I also said that .dmg files are mounted like disks, and that implies that there has to be a device to mount.

So on the Mac, I created a disk image file with hdiutil create, then opened it with the Disk Utility. Forcing “Verify disk” made the Disk Utility mount the image on /dev/disk2s9, just before telling me that there was no usable filesystem on the disk image. That was fine; all I wanted was for it to create a /dev/disk* device that I could write to. Then I was able to

ssh linuxbox 'dd if=/dev/sdb bs=2M' | dd bs=2048k of=/dev/disk2

to transfer the raw contents of the disk to the “disk data” portion of the disk image.

To my slight surprise, this actually worked. Yes, I had to repair the disk image, but from the log messages, that appears to be because some of the superblock copies were missing (the disk is 120Gb, but I only created a 32Gb image).

The final problem should be that of getting the disk image from my locked-down(-ish) laptop to my mom’s new vanilla Mac, but I don’t think I’ll bother. It’ll be a lot easier, and better in the long run, to put the old disk image onto an external drive that can then double as a backup disk, so I don’t have to do this again.

Because while it can be fun to solve a puzzle and figure out how to fix something with suboptimal tools, there’s also wisdom in avoiding getting into such situations in the first place.

Bourne Shell Introspection

So I was thinking about how to refactor our custom Linux and Solaris init scripts at work. The way FreeBSD does it is to have the scripts in /etc/rc.d define variables with the commands to execute, e.g.,

start_cmd='/usr/sbin/foobard'
stop_cmd='kill `cat /var/run/foobar.pid`'

run_rc_command "$1"

where $1 is “start”, “stop”, or whatever, and run_rc_command is a function loaded from an external file. It can check whether $stop_cmd is defined, and if not, take some default action.

This is great and all, but I was wondering whether it would be possible to check whether a given shell function exists. That way, a common file could implement a generic structure for starting and stopping daemons, and the daemon-specific file could just set the specifics by defining do_start and do_stop functions.

The way to do this in Perl is to iterate over the symbol table of the package you’re looking for, and seeing whether each entry is a function. The symbol table for Foo::Bar is %Foo::Bar::; for the main package, it’s %::. Thus:

while (my ($k, $v) = each %::)
{
	if (defined())
	{
		print "$k is a functionn";
	}
}

sub test_x() {}
sub test_y() {}
sub test_z() {}

But I didn’t know how to do it in the Bourne shell.

Enter type, which tells you exactly that:

#!/bin/sh

# List of all known commands
STD_CMDS="start stop restart status verify"
MORE_CMDS="graceful something_incredibly_daemon_specific"

do_start="This is a string, not a function"

do_restart() {
	echo "I ought to restart something"
}

do_graceful() {
	echo "I am so fucking graceful"
}

for cmd in ${STD_CMDS} ${MORE_CMDS}; do
	if type "do_$cmd" >/dev/null 2>&1; then
		echo "* do_$cmd is defined"
	else
		echo "- do_$cmd is not defined"
	fi
done

And yes, this works not just in bash, but in the traditional, bourne-just-once shell, on every platform that I care about.

So yay, it turns out that the Bourne shell has more introspection than
I thought.