The throughput of a hard disk

Today, we are used to sending also highly confidential data via public networks, i.e. the Internet. However, we have seen that due to Quantum Computing our safely encrypted data connections will not be guaranteed to be safe in some future.

So we can trust in computer scientists to come up with more sophisticated means of securing our network traffic, or we look out for alternatives. And, as one probably not so obvious solution, let’s think of sending a data medium via classical postage.

Sending a bank transaction

We start by considering a bank transaction, where we have data such as the sending and receiving bank accounts and account holders, the amount and currency, the desired transaction date and some more information. Overall, the amount of data sent can be packed into much less than 1 MB. With a current 100MBit/s Internet connection such a transaction can be sent within a fraction of a second. Comparing it to postal mail which takes a day or two is simply ridiculous.

Sending Big Data

But we know that not all datasets are that small and that there is this Big Data buzzword we hear all the time. When talking to people, you will quickly discover that this “Big” is interpreted quite differently. For example, we at Cloudflight are processing satellite data measured in Petabytes (a Petabyte is 1.000 Terabytes or 1 million Gigabytes) of data.

Now, let’s think of a common 4 Terabyte hard disk and its throughput (“bandwidth”) when sending it via postal mail. We assume that it takes exactly 2 days until the disk arrives at its destination. Doing some unit conversions this is about 33,5 million MBit sent in 172.800 seconds. The result is a throughput of 194 MBit/s and suddenly does correspond to a very fast internet connection.

What remains is a latency of still those 2 days. However, when increasing the amount of data sent, this will become more and more irrelevant for use-cases where - in contrast to streaming - a complete dataset is required to be transferred.

Application

If this sounds very theoretical to you, you will be surprised that this is actually done in practice. Migrating or mirroring Petabytes or more of data from one data center into another can take years via ordinary internet connections.

The big data center companies provide services where they send and transport special boxes or even trucks filled with storage for migration or delivery of Peta- or Exabytes of data.

To sum up, sending data via traditional means in contrast to today’s internet connections is a method actually applied in practice. In addition, it allows for different data security mechanisms than encryption which is necessary when public networks are used.


Are you curious now, what the actual programming challenge will be? Register for the Coding Contest and find out this Friday amongst several thousand fellow software developers!

We at Cloudflight are continuously searching for motivated colleagues that will together with us move the boundaries of latest technologies. , if you want to join our team.