Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

PIVOTAL LABS
Standup 08/29/2008
  • Using multiple buckets for Amazon S3. One of our sites has a lot of images (perhaps 30+ photos per page, different for each page and user) and got significant benefits from using four buckets instead of one. Multiple buckets allows browsers to fetch several images in parallel. Increasing it beyond four probably wouldn’t help, as browsers have a limit on how many parallel requests they will send.

  • Amazon S3 now has a copy command. This could be useful, for example, if you have a lot of data in a single bucket and want to move it to multiple buckets. Copy is faster than downloading and re-uploading all that data. The ruby S3 gem, however, only lets you copy in one bucket, so you’ll need to bypass the S3 gem.

  • We wrote a script to dump a local SQL database and copy it up to a remote server (for example, a demo or production server). This is in contrast with a script we wrote some months ago which copies from demo to a local workstation (for test data, reproducing data-driven bugs, etc). The push to remote feature was for a situation in which there was a bunch of data to be generated (based on some XML input files) and we could afford to bog down a workstation for half an hour, but not an overloaded (and perhaps underpowered) server.

  • Deprec is a set of capistrano recipes for setting up a remote server (in conjunction with deploying an application), for example creating accounts, ssh keys, init scripts, logrotate, etc.

  • Capistrano 2.3 has weird sudo issues (deleting old releases or something). Recommend Capistrano 2.5.

Comments
  1. Strass says:

    Regarding your solution with 4 buckets, here is our take on this situation. We call the solution you describe as “use S3 as a CDN” while there’s another one : “use S3 as a storage solution”. In this last scenario, we use nginx on an EC2 instance as a reverse proxy between the users browsers and S3. Furthermore, there’s a side effect about using 4 buckets and serving s3 urls to your users : if you don’t set an “expires” metadata in your object of your s3 bucket, the users browsers will redownload the content every time and could lead to a huge bill at the end of the month.

  2. Joseph Palermo says:

    With no cache headers on the images we put in S3, we see the browser do a cache check and get a 304 back from S3. So we’ll have more get requests than if we had set a cache control headers, but the bandwidth shouldn’t be much different.

    For us, we don’t have the option of setting a cache control header, since user’s can update our image assets. But if the images you’re putting in S3 are static, you should definitely set a cache header, not only to reduce bandwidth, but not having to do all the cache checks on page load can be significant too.

    I’m not sure I see how using 4 buckets will affect your caching either way though. Unless you have all your images in all your buckets and use a random one for each image, but in that case, cache headers won’t help either since the image is going to have 1 of 4 different url’s. We currently have each image in only 1 of the 4 buckets, so the url doesn’t change and they should get a cache hit.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *