With so many e-discovery software options available to attorneys today, it can be challenging to find the solution that best suits the needs of your firm. When reviewing a document production from your adversary, do you really need all the bells and whistles or could you cut costs by using a simpler option?
In my last blog post, I discussed the challenges associated with using consumer-grade software for e-discovery by evaluating three different types of document productions - PDFs, TIFF images and native files. In this, the second of our three part series on e-discovery software, I'll explore the top 10 features to consider when evaluating solutions designed specifically for the legal community.
E-Discovery Review Software: Core Features
Even though e-discovery review software comes in two varieties (desktop and cloud-based), all review applications share a common set of core features, with some variations. Indeed, to be considered e-discovery review software, a program must have certain basic components: recursive data parsing, searching capabilities and organizational features, such as tagging.
1. Recursive Data Parsing
First, and perhaps foremost, e-discovery review software must be able to read the document production in its original format (PDF, native, or TIFF). This is no small task, especially when the documents are in their native format. A diverse native production could include deeply nested subdirectories, various types of files that contain other files, password-protected files, files whose extension has been tampered with, corrupted files, and even files infected with malware.
When the parser encounters any kind of file, its job is to extract the text content and metadata so that it can be searched and viewed by the user. No parser will be able to make sense of every anomaly, but it should at least notify the user in a helpful way when it encounters a file it can’t handle. The parser must also be recursive, meaning it can handle various nested data container files. For instance, it should be able to consider a zip file that contains numerous other files and file types, including other zip files.
2. Search Index & Interface
The parser works to provide clean, normalized, and structured data that can be used to build a search index. The search index is a bit like the index at the end of a book: a compact listing of every word (or “token” in computer science terminology) in the original data and every location in which the word appears. For example, the word “overtime” could appear at Document 1, page 2, line 12, and offset by 34 characters – and in various other places. A search index allows the user to obtain almost instantaneous results when performing full-text search for various words or phrases.
Some e-discovery applications also let the user configure the search index to include or omit so-called “stop words” such as articles and pronouns. Some are also optimized to allow for foreign language content or unusual characters, like emojis. At a minimum, the software should reveal how its index is configured so that you can understand which searches will succeed.
Generally, e-discovery review platforms allow you to perform powerful searches using a combination of techniques:
- searching for documents containing two words, but not containing a third word (Boolean search)
- searching for documents containing any one of a set of words (Boolean)
- searching for documents containing one word that is separated by no more than 3 words from another word (proximity search)
- searching for documents containing one or more words that start with the prefix ‘hir’ (stemming)
- searching for documents containing the word “gauguin” but allowing for misspellings where up to 2 characters are different (fuzzy search)
Search indexes also enable so-called “fielded” searches that filter or search using common properties of the documents. Take an email, for instance. A fielded search could easily search for send dates, from email addresses, to email addresses, and more. Fielded search is one of the most powerful features for handling big review jobs efficiently, so be sure that any platform you’re investigating includes it.
3. Tagging & Organizing Documents
Another essential feature of e-discovery review software is the ability to organize the documents in your case with labels or tags. Attorneys can create simple tags, such as “Relevant,” “Privileged,” and “Hot” – or more complex tags that correspond to the elements of the claims and defenses. With tags, subsets of documents can be pulled up for various purposes, such as generating reports, exporting the files in bulk, or converting to a format that can be attached as an exhibit to a motion.
E-Discovery Review Software: Extra Features
Besides the three core features discussed above, most e-discovery platforms have extra features that can be extremely valuable.
4. Conversion to Other Formats
Frequently, it is useful – if not essential – to be able to convert a native format document to another format so that it can be used in litigation. For example, most electronic filing systems only permit PDF documents to be uploaded, so documents in any other formats must first be converted. Most review tools offer the ability to convert documents from any supported format to PDF.
5. Document Viewing Options
When you open a document in your e-discovery tool, you may see it as a chunk of plain text, or it may be displayed in a similar appearance to what the document would look like if opened in its original application. When reviewing documents, being able to view a styled version without opening the original document is often useful. Remember, opening the original document is rarely a good idea – you could inadvertently change the file and/or expose you to the risk of infecting your computer with a virus or malware.
6. Audio Transcription
Electronic document productions often contain a variety of media, but they can only be searched for matching text. As a result, a file from which text cannot be extracted is invisible to the search engine. Fortunately, some e-discovery applications now offer the ability to create text transcriptions from audio or video files, permitting those files to be searched with text-based queries. If audio transcription is not available, some review tools will flag audio, video, and other files that lack text content so that they can be reviewed manually.
7. Optical Character Recognition (OCR)
OCR is to images what text transcription is to audio – they both locate text in non-textual media. Sometimes document sets include images taken from a smartphone or camera, and these images could be pictures of documents or even screenshots. OCR is necessary for the information in these images to be visible to the search index.
8. Email Threading
Emails are some of the most common sources of electronic evidence, and they can also be the most revealing. Having adequate email review functionality is critical for any document review project. Each email in the review tool should be grouped with its attachments and with other emails in the “chain” to which the email belongs. For example, if you are viewing an email that is a reply to an original message and that was subsequently replied to, there should be links to the original message and the later replies.
9. Machine Learning
Machine learning (aka, “Predictive Coding” or “Technology Assisted Review”) is an exciting technology that offers the promise of locating relevant documents accurately and automatically. Some e-discovery experts believe that machine learning is able to locate relevant documents faster and more accurately than keyword searching. When working on a case with a large data set (e.g., multiple terabytes of ESI), I would recommend choosing a platform that has machine-learning capabilities.
10. Data Visualization
Data visualization in e-discovery runs the gamut from simple pie graphs showing the relative frequency of various file types in the source data to complex diagrams showing the volume of email traffic between different persons of interest in the case over time. While a well-designed data visualization can help litigators see the forest and not just the individual trees, visualizations are often gimmicky and not very helpful. It’s important to have an idea of what you’d like to see depicted in visualizations first, and then to assess whether a tool supplies that specific need.
The best way to obtain the a good e-discovery solution for your case or your firm is to be as educated as possible about the technology, compare multiple options and negotiate pricing. Fortunately, a good e-discovery tool will pay for itself by making document review more efficient and helping you to locate “smoking gun” evidence in more of your cases.
In my final post in this series, we'll compare the pros and cons of desktop versus cloud-based versions. In the meantime, leave a comment below if you have any questions.