Blog: Tips on Legal Tech, E-Discovery, & More

Database Discovery 102

Written by Jeff Kerr | June 12, 2018

In a previous post, we covered the nuts and bolts of databases and some of the terminology you need to know when discussing them. Now, we can get down to brass tacks and tell you how to get the data out so that you can use it to prove your case. Getting data out of certain databases is REALLY hard, and in some cases, you won't be able to extract raw data. But there are always options. I'll provide some tried-and-true techniques to use in a variety of database discovery situations.

Getting Data When Exports are Enabled

Conscientious app developers allow users to download their own data in a variety of formats, and most of them have online documentation explaining exactly how an export is generated. The only way to find out if exports are enabled is to read the online documentation, but this is much easier than it sounds. Let's see why.

Assume that you know your opponent uses Kronos as its payroll and time-tracking database, and you've requested data from the Kronos database, including data showing when various records were edited and who edited them. To home in on the documentation you need, just search Google. Here's the query I'd use: "kronos data export csv". I add "csv" to the query because CSV is the format I most often want for database exports (more on this later). In the first few results, I confirmed that Kronos enabled end users to generate custom reports in exactly the format I want.

If your opposing counsel is claiming that providing data in the format you want is too expensive or even not possible, the best rejoinder is to send them what you learned from the support documentation for the software their client uses. Having this documentation would also be very useful in the event you need to file a motion to compel. Google is your friend, and knowledge is power.

Related: DIY eDiscovery: Taking Control & Saving Money

Embedded Databases

Some databases are a nightmare to work with. For example, have you ever used an app that doesn't require you to create an online account? Lots of apps work this way but still save data so that you see your past activity when you reopen the app. They generally do this with the help of tiny embedded databases — one per app. The software used in this scenario called SQLite. If you are able to obtain the SQLite database (usually a single file), then you may be able to extract data from it with the help of a computer expert, or if you're feeling brave you can browse the data with an app like DB Browser for SQLite.

That's all well and good when it works, but in many cases you won't be able to obtain the database. If you don't have the database, then your options can be extremely limited. In some cases, your best bet will be to obtain screenshots of the various pages in the app showing how the data appears on screen. Be sure to consider how these will be authenticated.

Embedded databases are everywhere. You might have one in your refrigerator, your car, or your watch. It's important to be aware of them and to devise strategies to extracting data from them.

 

Related: Dictionary of E-Discovery: A Helpful Glossary of ESI Terminology

Third-Party Cloud Databases

Apps like Facebook, Twitter, and Instagram run on databases, but you're never going to get access to the actual database. For one thing, you'd have to invest millions in setting up a data center to be able to host that quantity of information. But you'd never get the data because the companies could not give it to you: the data you're seeking would be mingled with data belonging to millions of other users.

Again, your best option is to research the options for exporting data with a Google search. Both Twitter and Facebook provide export options that enable end-users to export their own data. It's not clear how complete these exports are, but they are better for nothing, and they usually come in a friendly format.

Related: Cloud Computing and the Practice of Law

Export Formats: CSV and JSON

Dealing with databases in discovery requires knowing about the most common data interchange formats. You may be familiar with CSV (it stands for "Comma Separated Values") already, and in fact this format is very user friendly because you can open CSV files in Excel or Google Sheets. Still, there can be difficulties with CSV exports. For instance, if you received a raw export from a relational database, you'll see numbers in places where you expected to see names. Let's see why.

Assume you've obtained CSV file exports from a timekeeping database. Let's assume that the database has separate "tables" for employees and for time punches. (In fact, it would be a very poorly designed database if employee data and time-punch data appeared in the same table!) The time-punch CSV file will probably have a column (called a "field") named "Employee," but you won't see employee names here. You'll see numbers, which are references to the ID number of each employee in the employees table (and CSV file).

To make the data more readable, you'll probably want to add the employee names in the time-punch CSV file. There are a variety of methods for doing this, but they aren't entirely trivial. I personally would write a short script in Python to accomplish the task, but there may be functions in Google Sheets or Excel to do the same thing. If you're not a computer ninja, you may want to check with an expert.

Related: 3 Simple & Effective Uses of Excel for Litigation

JSON is the new kid at the party when it comes to data interchange formats. It's extremely popular now due to the rise of complex websites that rely on hot-swapping data without actually reloading the page (called Single-Page Apps). You can't open a JSON file in Excel, but you may be able to convert JSON to CSV using an online utility, such as Convert JSON to CSV. A word of caution, though: I'd avoid uploading confidential data to an untrusted website, so don't use the former site for real data in a case. Again, you may want to consult an expert, but don't let them overcharge you! Converting JSON to CSV — even an enormous JSON file can be done in about four lines of code that even a beginner coder could write.

Wrapping Up

Databases are everywhere, and they will have relevant information in nearly every case you litigate — especially today, in 2018. The best approach to dealing with databases is to do your homework and to be as pragmatic as possible. Due to the variety of databases you'll encounter, a one-size-fits-all approach won't work. You'll probably want to find someone who can help you with the more technically challenging aspects of database discovery. Or you can develop some useful skills yourself in a few weekends using sites like Codecademy or Treehouse (check out the tutorials on SQL and on Python).

Did you enjoy this post? Be sure to sign up for blog notifications.