Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Timely! I'm just in the process of uploading 32 million objects to S3 :-)

The parallel upload code I'm using is written in Python, using the multiprocessing and boto libraries and is here:

http://github.com/twpayne/s3-parallel-put

It has some nice features, like reading values directly from uncompressed tar files - this means your disk heads will scan linearly rather than seeking around. It can also gzip and set Content-Encoding, restart interrupted transfers from its own log files, and do a MD5 sum check to avoid putting keys that are already set.

Comments and feedback welcome.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: