Timely! I'm just in the process of uploading 32 million objects to S3 :-) The pa...

Timely! I'm just in the process of uploading 32 million objects to S3 :-)

The parallel upload code I'm using is written in Python, using the multiprocessing and boto libraries and is here:

http://github.com/twpayne/s3-parallel-put

It has some nice features, like reading values directly from uncompressed tar files - this means your disk heads will scan linearly rather than seeking around. It can also gzip and set Content-Encoding, restart interrupted transfers from its own log files, and do a MD5 sum check to avoid putting keys that are already set.

Comments and feedback welcome.