The government is going to have to come up with new ways to block out information in public documents. Already there was a scandal when a sensitive document was released by the State Department as an electronic PDF file in which only the image of the words was blacked out, not the selectable text layer underneath the image. Now, a cryptography expert has treated blacked out text as a “code” to be “cracked” and seems to have succeeded. It all depends upon knowing the length of the blacked out word and then guessing how many words could be the same length in the same font.
The program rejected all of the words that were not within three pixels of the length of the word that was probably under the blackened-out area in the document.
The software then reduced the number of possible words to just 7 from 1,530 by using semantic guidelines, including the grammatical context. The researchers selected the word “Egyptian” from the seven possible words, rejecting “Ukrainian” and “Ugandan,” because those countries would be less likely to have such information.
It seems that the government made things worse when they decided to switch from using Courier to Times New Roman:
In January, the State Department required that its documents use a more modern font, Times New Roman, instead of Courier, Mr. Naccache said. Because Courier is a monospace font, in which all letters are of the same width, it is harder to decipher with the computer technique. There is no indication that the State Department knew that.
While it is always nice to see the hackers win one over on the government, there is reason to worry that this could lead to even less information being released to the public:
Experts on the Freedom of Information Act said they feared the computer technique might be used as an excuse by government agencies to release even more restricted versions of documents.