Blog: Tips on Legal Tech, E-Discovery, & More

Dictionary of E-Discovery: A Helpful Glossary of ESI Terminology

Written by Jeff Kerr | April 5, 2016

Lawyers who “do” e-discovery tend to use a lot of hard-core terminology that is bewildering to the uninitiated. That’s unfortunate. Given that e-discovery is discovery, the terminology we use should be user-friendly for all litigators, not just geeks. In this post, we’ll break down some of the most forbidding e-discovery terms and hopefully show that the concepts they refer to are pretty straightforward.

If we missed anything, or if you disagree with any of our definitions, shoot us an email or find us on Twitter.

The Dictionary of E-Discovery

Carving: the process of searching through the unused parts of a disk for files that haven’t been overwritten and recovering those files. Word do the wise: “deleted” does not mean gone--deleting a file usually just unlinks it from your computer’s file system. With the right software, the deleted files can usually be recovered.

Clawback Agreement: a very handy agreement which states that if you accidentally give the other side your privileged documents, they have to give them back and can’t use them against you or claim they aren’t privileged anymore. There are no known reasons for not having a clawback agreement, but there are very good reasons to have one in place. A serviceable clawback agreement can usually be written in one paragraph. 

Checksum: a sequence of numbers and letters that is essentially unique for each and every file in the world. Comes in several different flavors, including MD5 and SHA1. Extremely useful for finding duplicates, determining if someone has files they shouldn’t have, and identifying evidence.

Cost Shifting: when the responding party forces the requesting party to pay for the costs of responding to certain discovery. Often a Solomonic remedy imposed by the Judge when one party is asking for too much but maybe shouldn’t be prevented outright from getting it. Under so-called American rules of discovery, cost shifting is unlikely to be applied to well-drafted and reasonable discovery requests.

Culling: processing a large set of data and removing the junk data so that it’s easier to search and less expensive to host or transfer. It’s best for the parties to agree on the criteria that will be used to cull the data.

Custodian: a person who cleans … just kidding. A person... Full stop. Seriously, a custodian is just a person. Why do we need a word that just means “person” in the context of e-discovery?

Deduplication: a process that removes multiple copies of the same file from a set of files, leaving you with only one of the copies. This is super helpful when you have to review a large number of files and you don’t want to waste your time going line-by-line through to files to see if they are the same. Horizontal deduplication means removing all the duplicates across the board. Vertical deduplication means keeping a copy of a duplicate if it belongs to a different custodian (see above). With vertical deduplication and 9 custodians what is the maximum number of copies of the same file you might have after deduplication? If you answered 9, you are smart. Have a cookie.

DeNISTing: one way of culling data (see above). One takes a huge list of checksums (see above) for known junk files and removes any matching files from the data set. The NIST part derives from the National Institute for Standards and Technology, who, among other things, maintains the list of junk files.

E-Discovery: a process where the parties to litigation exchange electronic evidence. E-Discovery has been the subject of much teeth-gnashing and hair-pulling, with many lawyers and commentators complaining about its cost and difficulty, but e-discovery is inescapable unless the parties live in caves and do not use computers. E-Discovery tools such as case management, document review, transcript management, timeline software, and others make building a case more efficient. If a lawyer wants to prove that certain facts did or did not occur, then e-discovery is strongly recommended.

Fielded: a form of production (usually native, or nearly native -- see below) wherein the fields that hold discrete bits of information remain in place. For example, an email when converted to a PDF file is no longer fielded because the “to:” and “from:” fields of the email in a PDF document have the same status as any of the other text on the page. In contrast, when email is produced in a native or near-native format, the “to:” and “from:” fields retain their special status, and it is possible to construct searches like ‘from:hook@bidness.com to:crook@bidness.org subject:conspir!’ using a review platform. This can be very effective.

Forms of Production: electronic evidence can be “produced” (i.e., exchanged) in multiple forms. For example, if there is a Word file on your client’s laptop, and you need to produce it to another party, you have several choices: (1) you can copy the file to some sort of transfer media (e.g., a thumb drive) to produce an exact copy; (2) you can convert the file to PDF and produce the PDF file; (3) you can print the file to TIFF (see below) also produce a load file (also see below) that contains searchable text; or (4) you can literally print the file out on a piece of paper using a printer and deliver a copy of the paper to the other party. There are pros and cons to each form of production. If you are billing hourly, the only known “pro” of option 4 (printing) is that it wastes a lot of time and paper, and often results in motion practice. For reasons that we do not comprehend, some attorneys are flustered by native production and instead choose to have files produced PDF. Recommendation: talk about about forms of production with your opposing counsel before discovery starts; if you are requesting evidence, tell the other party (in writing) the form of production that you want.

Hash: see Checksum

Linear Review: assume that your client has 1TB of data that could be responsive to discovery requests. Assume that you agree on some keywords with the other party. Assume that those keywords are “hits” for 500,000 documents. Linear review is the process of having a human--usually a lawyer--set eyes on each of the documents before any of them are produced to the other side. On average, human reviewers can review 55 documents per hour, and the average hourly cost for a reviewer is $70 per hour. That means you’ll spend, ahem, more than $600,000 on document review! The process will also take several months, even for a large review team. But the legal system ain’t got time for this. Discovery is supposed to finish … it can’t drag on and on for years while reviewers strain their eyes and wonder if this is what they went to law school for. In short, linear review is a bad idea, and it’s prohibitively expensive and time-consuming. Alternatives include technology-assisted review and creative use of keyword searches, selective review, and clawback agreements.

Litigation Hold: a document provided to a custodian when litigation is on the horizon or already happening that instructs him or her how to avoid deleting or corrupting evidence. Sometimes litigation hold letters confuse ordinary people by telling them things like “cease rotating backup tapes.” Ideally, a litigation hold should be readable and comprehensible by its target audience, and compliance with the hold should be monitored. Watch out for company-sponsored paper shredding or hard-drive dumping events!

Load File: a special file that you get (or give) with other files that provides additional information about those files, such as the directories they came from, metadata not contained in the files themselves, Bates numbers corresponding to the files, and information about the requests to which the files are supposed to responsive. Even though load files are essentially “flat”--i.e., non-relational databases (like Excel files)--they appear in any number of bizarre proprietary formats. There is no agreed-upon standard for formatting load files, and unless one happens to own the same software that was used to generate the load file, viewing one can be a serious pain in the hindquarters. If you don’t own the software that generated the load file, you may want to ask for a comma-delimited (CSV) file instead, which at least you can open in Excel.

MD5: see Checksum

Meet & Confer: a meeting (or phone call) at the beginning of a case for lawyers to talk about discovery and try to reach agreement on preliminary matters like forms of production and dates for depositions. Required in federal court. Most often the meet & confer session is “phoned-in” both literally and figuratively, to the detriment of everyone involved. Best if counsel prepare beforehand, talk with their clients about e-discovery and the evidence that’s likely to be sought, and come with a game plan.

Metadata: least helpful definition: “data about data.” More helpful definition: contextual information about computer files that helps explain how/when/where/why they were created. Metadata can also prove that a piece of a evidence “is what it purports to be”--e.g., “even though he denies it ladies and gentlemen, this email is in fact an email written by Mr. X on [insert date] from his home computer.” Metadata comes in two main categories, embedded metadata and system metadata. The handy thing about embedded metadata is that it travels with the file, so that if you copy the file to transfer media and give it to your opponent, it will still be there. In contrast, system metadata does not travel, and is therefore difficult to produce in discovery. Examples of system metadata are: directory paths, last-modified dates, and created dates. System metadata is often produced in load file (see above) that accompanies the discovery response.

Native: A file that is in the form in which it was originally created. If the file started its life by someone opening Microsoft Word, typing something, and then hitting “save,” then the native file will have a “.doc” or “.docx” extension. The opposite of a native file is printing a “.doc” file to paper or to “virtual” paper--e.g., TIFF (see below) or PDF.

Near Native: functionally the same as native. Because some things can’t really be produced in the application that created them, then we call the next best thing near-native. An example is an email generated in Gmail.

Review Platform: software for examining electronic evidence--either your own or the other side’s. Can be hosted in a “cloud” environment--in which case expect to pay by GB, and don’t say I didn’t warn you. Alternatively, software that runs on one’s desktop. Ranges from inexpensive to insanely expensive. More often the latter. We’re trying to change this.

Preservation Demand: a letter or email to your adversary demanding that he or she keep evidence safe and prevent it from being destroyed. Sometimes critical to point to when seeking sanctions at a later date if the other side “lost” some evidence. Preservation demands are often wildly overbroad, but hey, how is the sender supposed to know what the receiver has and doesn’t have?

PST: a super handy file format for wrapping up huge numbers of emails and attachments in a way that preserves their ability to be searched. We like PSTs. Ask for them, often.

Quick Peek Agreement: whoever came up with this term should be shot. It just means an agreement that you can sit with your opponent and look at certain documents in the same room to facilitate coming to some sort of agreement about what to do with them next. There could be some other provision like that you can’t tell the judge what you say. Frankly, I’ve never had occasion to use one of these, but I’m sure I wouldn’t call it a “quick peek” if I did! Inane terminology overload.

Redaction: taking the secret parts of a document and crossing them out with a black Sharpie.

Relational Database: the thing that runs your bank account, your email account, your smartphone app, your Facebook account, your doctor’s medical records system, and generally everything else in the world, including many of the things that are valuable in e-discovery. Learning a bit about them and their lingo--highly recommended.

Rule 502(d): see Clawback

Sources: the places where electronic evidence lives--computer disks, smart phones, thumb drives, Dropbox. Custodians (see above) have been known to have them.

TIFF: a mild form of disagreement among opposing counsel, usually caused by bickering about forms of production. Ok, sorry for the pun. A TIFF is an image file, like a JPEG, PNG, or GIF, except that it has almost no legitimate purpose for existing. (At least one can make hilarious cat videos with GIFs!) In very backward, retrograde forms of e-discovery, native files (see above) are converted to TIFF images and produced as such, with a load file (see above) provided to make up for the fact that the TIFF conversion process strips out almost every useful piece of information contained in the original file! Responding parties: please stop giving us TIFFs. Producing parties: don’t accept TIFFs.

Vendor: people who firmly believe that you cannot survive without them. Sometimes, but not always, they are right. If you are about to try carving (see above) and are not yourself trained in this field, please call a vendor.