Dropbox is a really cool service. I love being able to share files with friends and co-workers with such ease.
But recently I noticed something a bit surprising.
When I copied a large disk image to my Dropbox, within a few seconds, it had finished uploading: “All Files up to date.” Ok, I have a pretty good connection, but it seemed impossible that I could somehow upload hundreds of megabytes in just a few seconds. I then went to download that file from a different computer, and sure enough, the whole file came down, in tact.
How was Drobox able to upload so damn quickly? I tested this again several times using various large files (music, movies, whatever). What I found was that certain large files uploaded almost immediately, while others took hours. Was it compression? Maybe delayed uploads?
Here’s a screencast showing this in action (without sound):
dropbox exposed from Jeremy Seitz on Vimeo.
From what I can tell, Dropbox calculates and remembers the unique fingerprints of files. So if user A uploads a big file, and later, user B uploads the SAME file, then Dropbox recognizes that. There’s no need to waste the bandwidth (and presumably, the disk space in Dropbox’s cloud) for that second upload.
This is brilliant, because I’m sure it saves them huge amounts of bandwidth and disk resources. However, in my testing, the files I uploaded were were not shared with anyone, and they were not public. Dropbox was able to “magically” upload huge files that I had never put in my account before. I can only assume that another user had uploaded the same files to their Dropbox before I had.
Maybe everyone is cool with it, but I think the privacy implications are pretty significant. Dropbox claims that their employees can’t see the CONTENTS of your files. But apparently, they know that different users have the SAME FILES.
With that in mind, here’s some hypothetical ways in which this could be exploited:
Suppose a movie industry lawyer uploads a pirated film to Dropbox. If the upload finished instantly, they could potentially prove that at least one user had that file. Perhaps this is enough to convince a court to force Dropbox to release information?
Suppose a recording studio wanted to make sure that a hot new album, ready for release, had not been leaked to the Internet. So they put the files in their own Dropbox folder. If they uploaded right away, they would realize that there was a security problem at hand.
For a hacker, knowing when a file is NOT UNIQUE could be particularly interesting. Let’s suppose that company A has a file on Dropbox that contains encrypted data. Evil company B has some information about that file, but they don’t have the encryption key. Could they use this feature to crack it?
Maybe this sounds overly paranoid. Again, I like Dropbox, but I would not put sensitive or private data there. Clever and cool tech? Definitely. Scary? I think it is, a little.
UPDATE: According to the Wikipedia page for Dropbox, they use Delta Technology: “Files are split into chunks, the chunks are hashed and only those chunks that have never been uploaded before by any user are transmitted again. This makes uploading popular files very efficient and helps if only small portions of a large file has changed.” That certainly explains what I observed.