Metadata, “Hash” and Signature Analysis, Steganography, Spyware, Digital Watermarking

Metadata is defined as “data about data,” a concept that applies to electronic documents. Metadata answer questions about: the name of the author who created a document; what the title and abstract of the document are; when and where the document was created; keywords included in the document; how pages are ordered to form chapters within the document; the file type and other technical information about the document; what language the document is written in; what tools were used to create the document; where to go for more about the subject of the document; information needed to archive and preserve a resource; and who can access the document.

Metadata can be embedded in a digital object or it can be stored separately. Metadata is often inserted in HTML documents and the headers of image files. For example, an image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. Metadata can also include the history of the modifications that occurred in a document.

Web pages often include metadata in the form of “meta tags.” Description and keyword meta tags are commonly used to describe a Web page's content. A meta tag is a special HTML tag that is used to store information about a Web page but is not displayed in a Web browser. For example, meta tags provide information such as what program was used to create the page, a description of the page, and keywords that are relevant to the page. Many search engines use the information stored in meta tags when they index Web pages.

An important reason for creating descriptive metadata is to facilitate discovery of relevant information. Metadata can help organize electronic resources, facilitate interoperability and legacy resource integration, provide digital identification, and support archiving and preservation.

When an attorney transmits documents, the attorney has a duty to take reasonable precautions to ensure that confidential information contained in metadata is removed prior to transmittal. A New York State bar opinion held that an attorney has “a duty… to use reasonable care when transmitting documents by e-mail to prevent the disclosure of Metadata containing client confidences or secrets.” NY Eth. Op. 782 (2004).

What if a lawyer sent a final draft of a document to the opposing counsel without first removing the metadata from the document? The opposing counsel then may be able to track the different stages of document modification. In his article titled “The Hidden Perils of Metadata,” published in the Newsletter from the ABA Standing Committee on Lawyers’ Professional Liability, Vol. 9, No. 2, Fall 2006, David L. Brandon, Esq. reported the story of an attorney who sent a final draft of a document to the opposing counsel. The final draft was prepared by using an old agreement for another client as a model. The attorney had changed the names and added modifications pertinent to the specificities of this case. Because certain provisions required the client’s input, comments or questions were inserted directly into the body of the document, and some paragraphs were highlighted. This earlier draft was sent to the client via e-mail. The client then used redlines to make some changes in the document. After several e-mails and drafts, the attorney prepared a final draft and sent it via e-mail to the opposing counsel, not realizing that the opposing counsel could use metadata to uncover all of the modifications and comments made in the drafts prior to its final version – thereby potentially jeopardizing the entire negotiation of the agreement.

It is important that an attorney have a formal policy regarding the handling of electronic transmission, including educating employees and clients. Documents sent to clients and/or opposing counsel should be screened for metadata. Lawyers should be aware that transforming a Microsoft Word document into a PDF file will not remove the metadata. There are several kinds of software on the market available to remove metadata from documents. As technology advances, even unsophisticated users will be able to find metadata intentionally or unintentionally.

Another ethical question regarding metadata is whether an attorney who receives a document can search it for metadata. Some jurisdictions have imposed an affirmative duty on lawyers to refrain from searching for metadata [NY Eth. Op. 749 (2001): “a lawyer may not make use of computer software applications to surreptitiously get behind visible documents…”]. Because of the evolution of technology, it is not certain what positions other jurisdictions may take on these issues. A simple internal procedure of removing metadata with affordable software may be viewed as a basic attorney obligation to protect a client’s information.

Computer forensics is a process that uses scientific knowledge to collect, analyze, and present digital evidence to courts. Since files are the standard persistent form of data on computers, the collection, analysis, and presentation of computer files as digital evidence is essential in computer forensics. However, data can be hidden behind files and can be enough to trick the naked eye. Therefore, a more comprehensive data analyzing method called File Signature Analysis is needed to support the process of computer forensics. File Signature Analysis is used for two purposes:

1. Spotting file signature and file extension mismatches

2. File carving – a process to identify file headers and footers using predefined file signatures. This is typically used to find deleted files.

However, with millions of files stored on a hard disk and some of which are system files, a process called Hash Analysis can be carried out before the File Signature Analysis. Hash Analysis works by comparing hashes of files from the disk with a list of predefined file hashes. Those which match can be of two categories, known and notable. This simply automates the process of finding those files which can be ignored, such as typical system files, and those which can be of evidentiary value, such as Internet browser history files. [See “Signature Analysis and Computer Forensics” by Michael Yip, School of Computer Science, University of Birmingham, United Kingdom, December 26, 2008,]

Steganography is the technique of hiding a message inside another seemingly harmless message (such as a grocery list or a spam e-mail) so that no one suspects the existence of the hidden message. Though the practice dates back to the days of the ancient Greeks, the advent of the personal computer spawned the creation of new digital stenographic techniques – messages hidden in text, image and video files. Steganography, also known as “steg” or “stego,” poses a major challenge to law enforcement, according to the National Institute of Justice, a research agency of the U.S. Department of Justice. Often, the files are not just hidden but also encrypted, adding another layer of security in attempts to thwart investigators. Data can be hidden "in plain sight" in a wide range of digital files, including video and music. Files on a computer's hard disk can be made invisible to those who do not have the file name and its corresponding password.  Not only are there many stenographic algorithms and programs readily available, but their techniques are growing in sophistication. Currently, there are more than 30 publicly available steg-encoding programs employing many different encryption algorithms and these diverse techniques have to date precluded any universal test for steganography. One of the most common illicit uses of steganography is for the possession and storage of child pornography images. However, steganography can also be used to commit fraud, terrorist activities and other illegal acts.

Spyware is software that is typically downloaded for free from the Internet for some other purpose, which also sends information without your knowledge from your computer to third parties whenever the computer connects to the Internet. In general, this information is benign information about web surfing habits not linked to you personally, but that is not always true. [See “Computer Viruses to Spyware: Things You Don’t Want To Pick Up Online” by Jim Calloway, Oklahoma Bar Association Journal, October 2003]

Be aware of techniques to trick you into downloading files or passing on information for unintended use. Read all license agreements and privacy policies. Do not download software from illegal file sharing sites. Avoid “over 18” sites. Do not provide personal information to anyone unless you have checked the validity of the request. Do not download viewers from websites that you do not recognize. If a dialog box appears, study it carefully and make sure it is legitimate before taking action. Practice due diligence or check the Better Business Bureau to see if the company is reputable. Do not click on pop-up ads or blinking ads.

Obtain software to help monitor and eradicate problems from spyware. Products that have received good reviews include Spybot Search and Destroy, Ad-Aware, Microsoft Antispyware, Spysweeper, and eTrust PestPatrol [See “Attack of the Pernicious Spyware,” by Catherine Sanders Reach, MLIS, ABA Legal Technology Resource Center, Law Technology News, February 2005].

Digital watermarking is defined by the Center for Democracy & Technology (CDT) as technology that embeds machine‑readable information within the content of a digital media file (image, audio, or video). The information is encoded through subtle changes to the image, audio, or video. Much like watermarks on stationery, these changes typically would not be noticeable to a person viewing or listening to the content. Indeed, digital watermarks often are not perceptible by humans at all, but rather are designed to be detected and decoded only by machines specifically programmed to do so. Digital watermarking can be used to embed various types of data, depending on the particular application and intended use. For example, a watermark in a digital movie file might simply identify the name or version of the movie. In the alternative, it might convey copyright or licensing information from the movie’s creator. Or it might embed a customer or transaction number that could be used to identify individual payment or transaction data relating to that particular copy of the movie. However, the number of bits that can be contained in a watermark itself today is typically modest – enough to provide some basic codes or identifiers, but not enough to include the equivalent of a full sentence of text.

A number of watermarking applications embed data that can help identify a class of files – such as photos owned by a particular professional photographer, or songs distributed by a particular music store, or copies of a particular movie. In this kind of application, the watermarks do not identify or aid in identifying any individual transaction, consumer, or device. This kind of watermarking could be termed “generic” in the sense that identical watermarks (corresponding to, for example, the name of the photographer, music store, or movie) are embedded in many separate digital media files. The watermark signals that a file belongs to a general class, but does not distinguish the file from other members of that class.

Privacy issues surrounding digital watermarking have been raised mainly with respect to applications in which the data contained in watermarks corresponds to individual transactions, consumers, or devices. In applications of this type, different copies of the same digital content (the same movie, for example) are likely to contain different watermarks. Accordingly, the watermarks might signal something about the individual uses or users of the watermarked files.

Types of watermarking that can be used to associate a file with an individual transaction, consumer, or device, may be useful for a variety of legitimate applications. It also can raise privacy questions. Possibly the most frequently raised privacy concern is the idea that watermarks could enable increased monitoring, recording, or disclosure of an individual’s media purchases or usage. Therefore, the fear is that watermarking could compromise an individual’s ability to use and enjoy lawfully acquired media on a private, anonymous basis. Specific media usage choices could be sensitive if exposed, or could contribute to the creation of profiles of individuals’ overall media purchase and consumption habits, which might be used in ways that the individuals do not expect or understand. Additional possible privacy concerns include the risk that watermarks could contain personal information that could be exposed to third parties, and the risk that errors in or manipulation of watermark data could paint a false picture of an individual’s behavior and perhaps lead to adverse consequences, including potential legal liability. See “Privacy Principles for Digital Watermarking: May 2008 – Version 1.0,” Center for Democracy & Technology