Leaking info

Leaking info
22 December 2003

Sensitive commercial information that you’d prefer customers, suppliers or rivals not to see may be leaking out of your business every time you send an email attachment or create a downloadable document for your web site.

Create, work or circulate documents on any type of MS Office document like Word or Excel and a multitude of background details can be uncovered by a knowledgeable recipient.

With Windows being the dominant platform, MS Office is inevitably the most widely used word processing software used in business. Typically, most people just want to use the tools immediately to hand safe in the knowledge the person they email will be able to open their document. But this convenience comes with a price. For when the Send button is clicked, tagging along in that email attachment are all the properties and history of the document too.

As well as previous versions, comments, original names and client data, there are around 30 embedded pieces of information which you might not want others to see. All sorts of gems are there, and taken together they can help build up an alternative picture that challenges the idea of ‘What you see is what you get’.

Just ask ex-Downing Street spin doctor, Alistair Campbell who had some explaining to do to the House of Commons Foreign Affairs Select Committee. Embedded information in the so-called Iraq ‘dodgy dossier’ revealed, among other things, the names of four civil servants who had worked on the document. Campbell had to spell out who these people were to the MPs investigating the background surrounding the dossier’s production.

The greatest risk of information leaking is when a document has had:

• a number of revisions made
• a number of people working on it and adding comments
• or when another document is used as the basis of a new one.

The type of information that can be gleaned from such a Word or Excel document includes:

Text from other documents open at the same time
Previously deleted text
E-mail headers and server information
Printer names
Data about the machine where the document was written Where the document was saved
Word version number and document format
Names and usernames of document authors
When the document was created
How long was spent editing the document


As well as revealing potentially embarrassing details not intended for the public domain, the metadata is a security risk. Some of the information, such as printer and server details etc., can assist hackers trying to break into a network, or help criminals bent on identity theft.

So, in the interests of responsible journalism, the author carried out a quick audit of his in-box for Word doc attachments.
The first put to the test was a Word document sent from a friend regarding a non business matter. What was uncovered was the fact that the document had been knocked out at work on November 11. The document was created at 13:06 and completed by 13:27, so presumably he was writing it during his lunch break. There was only revision made of the document and when rolled back to the original it was clear he couldn’t spell reputation.

Not particularly earth shattering, but interesting enough. For a business, though, this kind of information is particularly damaging.

Say my friend’s document had been a quotation for a customer and I had called his company the day before and been told the quote was ‘nearly there’ and would be sent out by close of play the next day. I wouldn’t have been very impressed to later learn the quote hadn’t even been started when I phoned and that it was bashed out in 21 minutes over someone’s lunch break. Is this the sort of company I like to do business with? I don’t think so.

When you consider collaboration where a single document may be worked on by a number of different people, the potential risk is multiplied. As the document goes through the process of writing, amendment and final revision with input from various people, most will make changes or add comments for the other reviewers. Although edited out for the final draft, it is still possible to roll back the document and see its entire genesis. And this could include comments from the sales department like, “We think we can get away with a 30 per cent margin on this contract. They didn’t seem to bat an eyelid last time.”

Hairy stuff, indeed.

UNIX and Linux users, for instance, can turn to tools such as Antiword and Catdoc to turn the document, including its formatting information, into a simple text file.

But the simplest way to combat the threat posed by metadata is to convert Word docs to PDFs using either Adobe Distiller or Adobe PDF Writer. Shareware to create PDF document is also available to download from places like Tucows.com and download.com . Certainly the UK Government thinks this is a good move. They have now largely abandoned Microsoft Word for documents that become public.

Alternatively there’s always other word processing software out there. OpenOffice is one, and it’s fully compatible with MS Office.

Links in this story
http://www.openoffice.org/

Useful links
Microsoft Knowledge Base http://support.microsoft.com
Metadata help search