I was faced by a challenge of transferring some files from an FTP server to S3. At first, the nature of the problem seemed simple. I just had to get the files to my local machine and then upload them to S3. Piece of cake! But lo and behold, when I saw that there were over 32K files and each step took me almost 2 minutes each! Oooh lala!
I knew this was a job for fanfare sound here the almighty ruby! :)
And so, I set my priorities straight.. I had to work elegantly and save time altogether. I'm feeling quite exhausted and would really love to get some sleep soon, so here I am, depending on my great yet simple ruby skills and the power of ruby itself to help me achieve this task!
As I write this article, my script is silently running in the background as a very obedient and competent slave. :P But to help you out, here is a brief guide on how to sit back, relax and let the task be accomplished in no time! You should really be sending me gifts.. or better yet, donate to me via Paypal! I'll love it, I promise.. I might even give you a page of exposure! :)So, here's the deal. You have a set of files to download via FTP and a particular folder structure where you want it to go.
Gather your FTP credentials as follows:
We will need some gems here. Now, if you don't have them yet, now is a good time to install them:
sudo gem install net-sftp
sudo gem install fastercsv
Net-Sftp is needed for the communication to your FTP server. You can use this even if your access is just set to anonymous. You might want to use another CSV parser if you want, or if you already have something else in mind. You can just tweak the code for that section.
Now, pull out your beloved spreadsheet editor (mine is OpenOffice) and aggregate your data.
|Full remote path to the files
|Full local path to save into
|/your/ultra/deep-stack/of/remote/folders/path/to/A super special filename here.doc
With this, you should export your file to CSV format and remove the Text Delimiter setting.
[caption id="" align="aligncenter" width="569"] Export[/caption]
Open your CSV file using your favorite text editor (mine is TextWrangler, you might want to try it. Its opensource and free). Make sure that all spaces in your filenames are escaped properly, so that you have something like this:
/your/deep-stack/of/remote/folders/A super special filename here.doc
/your/deep-stack/of/remote/folders/A\ super\ special\ filename\ here.doc
then don't forget to save your csv file.
Now, this is a fork. You have two (2) choices you can take. Either one works fine. Just pick your desired approach.
- Save the remote path and the local path in two separate csv files for loading (version 1), or
- Use one big csv file of remote, local path (version 2)
The rest is quite straightforward from this point onwards. Just fill in your constants in the ruby script and then make sure that your files are in the right directories. All set? Execute away!
You might also find that the generated errors.txt file could help you reiterate on some parts of downloading/uploading your file. Sometimes your path could be wrong, or that you forgot to specify the correct local directory.
Now, your files should be created in your machine. Now what? For this, you will need the handy-dandy S3Fox. Get it here: http://www.s3fox.net/ Launch the tool from your browser. Now, you can simply sync your folders from your machine to your S3 buckets. Its as easy as that!
Grab your favorite coffee and wait for the script to complete. Try to sleep for awhile, watch a movie, maybe give me a tip?
Check your buckets after some time and enjoy the presence of your easily uploaded files. :)