Thursday, December 25, 2008

Entering the Petabyte Age

Imagine a world 100% digitized, interconnected, gathering information and content via devices and sensors of all kinds (small to micro to nano to dust) shown via dashboards of all kinds on smartphones, PCs, tablets, readers, watches, and TVs. Well, that world is upon us. And the real challenge will be the amount of data that it will generate and how to make relevant decision making information from it all.

We measure data space (storage) in bytes. A kilobyte, is 1,000 bytes. A megabyte is 1,000,000 bytes. A gigabyte is a 1,000,000,000 bytes. A terabyte is 1,000,000,000,000 bytes. And a petabyte is 1,000,000,000,000,000 bytes.

A petabyte is 2 to the 50th power, or 1,125,899,906,842,624 bytes. However, petabytes are often estimated as 10 to the 15th power, or 1,000,000,000,000,000 bytes. To avoid ambiguity, the exact calculation is often referred to as a pebibyte instead of a petabyte, though both definitions are commonly accepted. A petabyte is 1,024 terabytes and precedes the exabyte unit of measurement.

So how do you get a feeling for what a terabyte or petabyte holds? Let's try a few examples:

1 terabyte goes for less than $200 and can hold some 300,000 songs.

20 terabytes - number of photos uploaded to Facebook each month.

200 terabytes - all the data in the U.S. Library of Congress.

500 terabytes - all the videos in Youtube.

1,000 terabytes or 1 petabyte - data processed by Google's servers every 2 hours.

60,000 terabytes or 60 petabytes - one year of 5 minute interval data reads from all the meters in the U.S.

10 million terabytes or 10,000 petabytes - one year of one minute interval data reads from all the electric appliances/devices connected to the U.S grids.

Computers made possible the digitalization of information sixty years ago. The Internet made that information reachable twenty years ago. Search engine crawlers made it all a single database ten years ago. And now new semantic search products are about to turn all that data into incredibly valuable information to many. Rules engines, correlation engines, predictive engines, autonomic engines all redefining how data becomes information.

With sensors everywhere, clouds of processors, and infinite storage, we could capture, warehouse and understand massive amounts of data that would change how we make decisions on any discipline. How would science, marketing, medicine, financial services, energy, law, and many other crafts change with the true ability to capture every data point and make decisions from it? And how about predicting decisions into the future from the data?

The possibilities are endless for new opportunities.

No comments: