Removing Magic

So this was one of those real-life mysteries.

I like crossword puzzles. And in particular, I like indie crossword puzzles, because they tend to be more inventive and less censored than ones that run in newspapers. So I follow several crossword designers on Twitter.

Yesterday, one of them mentioned that people were having a problem with his latest puzzle. I tried downloading it on my iPad, and yeah, it wouldn’t open inĀ Across Lite. Other people were saying that their computers thought the file was in PostScript format. I dumped the HTTP header with

lynx -head -dump http://url.to/crossword.puz

and found the header

Content-type: application/postscript

which was definitely wrong for a .puz file. What’s more, other .puz files in the same directory were showing up as

Content-type: application/octet-stream

as they should.

I mentioned all this to the designer, which led to us chatting back and forth to see what the problem was. And eventually I had the proverbial aha moment.

.puz files begin with a two-byte checksum. In this particular case, they turned out to be 0x25 and 0x21. Or, in ASCII, “%!“. And as it turns out, PostScript files begin with “%!“, according to Unix’s magic file.

So evidently what happened was: the hosting server didn’t have a default type for files ending in .puz. Not terribly surprising, since that’s not really a widely-used format. So since it didn’t recognize the filename extension, it did the next-best thing and looked at the first few bytes of the file (probably with file or something equivalent) to see if it could make an educated guess. It saw the checksum as “%!” and decided it was a PostScript file.

The obvious fix was to change something about the file: rewrite a clue, add a note, change the copyright statement, anything to change the contents of the file, and thus the checksum.

The more permanent solution was to add a .htaccess file to the puzzle file directory, with

AddType application/octet-stream .puz

assuming that the hosting provider used Apache or something compatible.

This didn’t take immediately; I think the provider cached this metadata for a few hours. But eventually things cleared up.

I’m not sure what the lesson is, here. “Don’t use two-byte checksums at offset 0”, maybe?