DNA: Database of the New Age

The other day, my uncle asked me if my other uncle was “inventing yottabytes.” I took the question more literally than I should have and replied with some snide remark about it making no sense to say that anyone is “inventing yottabytes.” What he was getting at, obviously, is this: is my uncle working on technology to handle that amount of information? The name of my uncle’s company is, in fact, Yottabyte, and in terms of far-reaching goals sure, he’d love to be the one to harness (or just store) that amount of data, but so far it’s just a name. The fact is, no one has yet coined a term for any amount of data larger than a yottabyte because there’s nowhere close to that much information waiting to be stored, and we certainly don’t have the means to store it all even if there was.

Let’s get some perspective:

1 yottabyte = 1,000,000,000,000,000,000,000,000 bytes = 10²⁴ bytes

If printed out, it would be something like 500 quintillion (500,000,000,000,000,000,000) pages of text. Currently, the largest hunks of data that are being thrown around are exabytes, and only by big guns like the NSA.¹

1 exabyte = 1,000,000,000,000,000,000 bytes = 10¹⁸ bytes

That’s 1 quintillion bytes. A yottabyte is 1 septillion bytes. In other words, it’s a whole hell of a lot bigger.² Storing this much data requires massive databases, and I mean massive. Just to give you an idea, the NSA’s Utah Data Center will cover 1 million square feet and include four 25,000 square foot facilities housing rows and rows of servers, not to mention the 60,000 tons of cooling equipment needed to keep the servers from overheating and the huge electrical substation charged with meting the center’s estimated 65 megawatt demand. To put it simply: big data is super high maintenance.

This is why “we wouldn’t dream of blanketing every square meter of Earth with cameras, and recording every moment for all eternity/human posterity—we simply don’t have the storage capacity.” The question is: will we ever? The solution, it turns out, may be in our very own DNA.

In an instance of serendipity brought to me by my good friend Twitter, I stumbled upon the breaking news that a bioengineer and a geneticist at Harvard’s Wyss Institute had smashed the DNA density storage record by a thousand times. DNA is an ideal candidate for data storage for three reasons:

It’s really dense—you can store 1 bit in a base that’s only a few atoms large.
It’s volumetric—it’s shaped like a beaker instead of like a piece of paper like. traditional planar disks, think of it like pouring data into a jar rather than writing it out
It’s super stable—DNA can survive for hundreds of thousands of years in a cardboard box in your basement. It doesn’t need to be kept cool or stored in a sub-zero vacuum like other databases.

“The work, carried out by George Church and Sri Kosuri, basically treats DNA as just another digital storage device. Instead of binary data being encoded as magnetic regions on a hard drive platter, strands of DNA that store 96 bits are synthesized, with each of the bases (TGAC) representing a binary value (T and G = 1, A and C = 0). To read the data stored in DNA, you simply sequence it—just as if you were sequencing the human genome—and convert each of the TGAC bases back into binary.”³

Church and Kosuri successfully stored 700 terabytes (10¹² bytes) of data in a single gram of DNA. I want you to think about this for a minute. Really think about it. A single gram of DNA. 1 gram of DNA—a droplet that can fit on the tip of your little finger—can store 700 terabytes of data. You’d need 700 terabyte drives (obviously) to store that kind of information on hard drives. The densest data storage devices in use today are 3TB drives—you’d need about 233 of those, and together they would weigh 151 kilos. I can’t fit 151 kilos of anything on the tip of my little finger. Sure, DNA isn’t exactly the fastest storage system out there, even though the speed with which we can sequence DNA has increased exponentially. (The Human Genome Project’s first sequencing of the human genome took years; today, it takes only a few hours. That’s astonishing growth, but it’s not fast by data storage standards.) But for the long-term archival of truly massive amounts of data, DNA outperforms all other storage systems at a rate that isn’t even fair. Throw in the possibility of storing information in living cells (even though living cells can only store information for short periods of time) and we’re talking about storing information in our skin. And not just information, but a lot of information.

It would appear then that the answer to the question posed earlier—will we ever have the storage capacity to record every moment for all eternity/human posterity—is yes. Biological storage will allow us to record any and everything without reservation. And if the entirety of human knowledge—every book, uttered word, and cat video—can be stored in a few hundred kilos of DNA, why wouldn’t we record it?

So what exactly did these two guys encode with their record-smashing data density storage? George Church’s book Regenesis of course. They made 7 billion copies of it.

I strongly urge everyone to read this article. It’s from a few months ago, and I’m shocked that I haven’t heard more about it. ↩
In between an exabyte and a yottabyte there is a zettabyte, which is equal to 1 billion terabytes, or 1,000,000,000,000,000,000,000 (1021) bytes. As of this writing, no database has stored a zettabyte of information yet. ↩
Harvard cracks DNA storage, crams 700 terabytes of data into a single gram, Extreme Tech. ↩