Computer scientists at Columbia University has developed a technique that compresses digital files which can then be stored in your DNA. By embedding digital files into the genetic code, it is now possible for data to survive for millennia.

The new technique involves compressing them into a master file and splitting them into short strings of binary code made up of ones and zeros. After that, they used fountain codes, an algorithm deletes letter combinations that create errors. After that, they break up the files into "droplets" or smaller parts and map the zeros and ones into the droplets. After that, they squeezed the droplets into the four nucleotide bases - A,G, C, and T - in the DNA. They also added a bar code into each droplet that would help in reassembling the files when needed.

The scientists were able to create a digital list of 72,000 DNA strands with a length of 200 bases each. They then sent them to Twist Bioscience, a San Francisco startup that specializes in transforming digital data to biological data.

So what's inside that speck of biological data?

Yaniv Erlich, a computer science professor at Columbia Engineering and Dina Zielinski, an associate scientist at the New York Genome Center, squeezed six files into the DNA - a computer virus, a full computer operating system, a $50 Amazon gift card, a Pioneer plaque, a French film called "Arrival of a Train at La Ciotat", and a 1948 study by Calude Shannon.

Retrieving and Copying the Data

To retrieve the files, the scientists used a modern sequencing technology and software to read the DNA strands and translate them back into binary code respectively. The process was successful with no errors occurring.

They also said that this technique can create unlimited copies by multiplying the DNA samples using the polymerase chain reaction. What's more amazing is that the copies are error-free as well.

The real whopper, however, was the amount of data that can be stored inside a single gram of DNA - an amazing 125 petabytes. To give you an idea how big that is, one petabyte is one quadrillion bytes or around a thousand terabytes. If that still doesn't make sense, that amount of data storage can hold 13.3 years of HDTV content or around 58,000 movies.

Despite the excitement it generates, the scientists say that further research is needed to make the technique much more cheaper for it to become a common method of data storage in the future.

The study was published in the journal Science.

Topics Genetics