Amazon S3, Ruby and Rails slides

published Dec 05, 2007

The slides from the talk are here. (Yes, they’re hosted on S3).

There are two points in the presentation where I switched to a different window.

At the ‘S3SH DEMO’ slide, I did some live coding showing how you can work with S3 using s3sh. It basically followed the script shown in ‘s3sh demo script’ below, so read that part when you see the ‘S3SH DEMO’ slide.

At the ‘Example: S3Syncer’ slide, I switched over to textmate and showed the code for a simple script to synchronize a single directory to S3. I then demoed the script to show it working. So, when you see the ‘Example; S3Syncer’ slide, read the s3syncer code and s3syncer demo sections below.

s3sh demo script

Start up s3sh


$> s3sh

Create a bucket.
Show that you can create a bucket multiple times if you own it, but trying to create a bucket that somebody else owns raises an error.

>> Bucket.create('spatten_s3demo')
=> true
>> Bucket.create('spatten_s3demo')
=> true
>> Bucket.create('test')
AWS::S3::BucketAlreadyExists: The requested bucket name is not available. The bucket namespace is shared by all users of the system. Please select a different name and try again.
        from /usr/local/lib/ruby/gems/1.8/gems/aws-s3-0.4.0/bin/../lib/aws/s3/error.rb:38:in `raise'
        from /usr/local/lib/ruby/gems/1.8/gems/aws-s3-0.4.0/bin/../lib/aws/s3/base.rb:72:in `request'
        from /usr/local/lib/ruby/gems/1.8/gems/aws-s3-0.4.0/bin/../lib/aws/s3/base.rb:83:in `put'
        from /usr/local/lib/ruby/gems/1.8/gems/aws-s3-0.4.0/bin/../lib/aws/s3/bucket.rb:79:in `create'
        from (irb):3

You can save a bucket in a variable using Bucket.find


>> b = Bucket.find(‘spatten_s3demo’)
=> #<AWS::S3::Bucket:0×14ae7b8 @attributes={"prefix"=>nil, “name”=>"spatten_s3demo", “marker”=>nil, “max_keys”=>1000, “is_truncated”=>false, “xmlns”=>"http://s3.amazonaws.com/doc/2006-03-01/"}, @object_cache=[]>

Create a text object


>> S3Object.store(‘test.txt’, ‘This is a test’, ‘spatten_s3demo’)
=> #
>> b.objects
=> [#<AWS::S3::S3Object:0×10804170 ‘/spatten_s3demo/test.txt’>]
>> pp b.objects0.about
{"last-modified"=>"Wed, 05 Dec 2007 19:56:49 GMT",
“x-amz-id-2”=>
“JACm9T+m9CgZhmj4q6q00OSGHgSyBVAbQ1cgRWGydYZLTKdhLc/IUZ+K7b/1snOc”,
“content-type”=>"text/plain",
“etag”=>"\“ce114e4501d2f4e2dcea3e17b546f339\”",
“date”=>"Wed, 05 Dec 2007 19:57:03 GMT",
“x-amz-request-id”=>"CA170D2AA5DEB0C9",
“server”=>"AmazonS3",
“content-length”=>"14"}
=> nil
>> b.objects0.key
=> “test.txt”
>> b.objects0.value
=> “This is a test”

Create a binary object and show it in a browser


>> S3Object.store(‘vampire.jpg’, File.open(‘vampire.jpg’), ‘spatten_s3demo’)
=> #

Show the photo in browser

This doesn’t work, as the file is only readable by me. Make it public readable and do it again.

>> S3Object.store('vampire.jpg', File.open('vampire.jpg'), 'spatten_s3demo', 
     :access => :public_read)
=> #<AWS::S3::S3Object::Response:0x10747950 200 OK>

Show it in a browser again. It works this time.

Look at bucket.objects. We have to reload the bucket to show the new object.

>> b.objects
=> [#<AWS::S3::S3Object:0x10804170 '/spatten_s3demo/test.txt'>]
>> b.objects(:reload)
=> [#<AWS::S3::S3Object:0x10708080 '/spatten_s3demo/test.txt'>, #<AWS::S3::S3Object:0x10708070 '/spatten_s3demo/vampire.jpg'>]

Hash access to bucket objects


>> b[‘vampire.jpg’]
=> #<AWS::S3::S3Object:0×10708070 ‘/spatten_s3demo/vampire.jpg’>
>> vamp = b[‘vampire.jpg’]
=> #<AWS::S3::S3Object:0×10708070 ‘/spatten_s3demo/vampire.jpg’>

A look at metadata

>> vamp.content_type
=> "image/jpeg"
>> vamp.size
=> 10817
>> vamp.metadata
=> {}
>> vamp.metadata['subject'] = 'Claire'
=> "Claire"
>> vamp.metadata['photographer'] = 'Nadine Inkster'
=> "Nadine Inkster"
>> vamp.store
=> true

Storing the picture data in a variable


>> picdata = vamp.value
=> "\377\330\377\340\000\020JFIF\000\001\002\000…….

Downloading a picture by streaming it to an IO object.

>> File.open('vampire_downloaded.jpg', 'w') {|file| file.write(vamp.value)}
=> 10817
>> exit
s3demo $>ls
flowers.jpg             vampire.jpg
test.txt                vampire_downloaded.jpg
s3demo $>open vampire_downloaded.jpg 
s3demo $>

S3Syncer Code

Please note that this code is really only useful as an example of how to synchronize with S3.

It won’t recurse directories and it dies a horrible death if there are any symlinked files in a directory.

If you are looking for something to synchronize directories, check out s3sync.rb.

#!/usr/bin/env ruby

require 'digest/md5'
require 'aws/s3'
include AWS::S3

class S3Syncer
  attr_reader :local_files, :files_to_upload
  
  def initialize(directory, bucket_name)
    @directory = directory
    @bucket_name = bucket_name
  end
  
  def S3Syncer.sync(directory, bucket)
    syncer = S3Syncer.new(directory, bucket)
    syncer.get_local_files
    syncer.connect_to_s3
    syncer.get_bucket
    syncer.select_files_to_upload
    syncer.sync
  end
  
  # This does not recurse directories.
  def get_local_files
    @local_files = Dir.entries(@directory)
  end
  
  def connect_to_s3
    Base.establish_connection!(
        :access_key_id     => ENV['AMAZON_ACCESS_KEY_ID'],
        :secret_access_key => ENV['AMAZON_SECRET_ACCESS_KEY']
      )
  
    raise "\nERROR: Connection not made or bad access key " +
          "or bad secret access key.  Exiting" unless AWS::S3::Base.connected? 
  end  
  
  def get_bucket
    Bucket.create(@bucket_name)
    @bucket = Bucket.find(@bucket_name) 
  end
  
  # Files should be uploaded if 
  #   The file doesn't exist in the bucket
  #      OR
  #   The MD5 hashes don't match
  def select_files_to_upload
    @files_to_upload = @local_files.select do |file|                 
      case
      when File.directory?(local_name(file))
         false # Don't upload directories
      when !@bucket[file]
         true  # Upload if file does not exist on S3
      when @bucket[file].etag != Digest::MD5.hexdigest(File.read(local_name(file)))
         true  # Upload if MD5 sums don't match
      else
        false  # the MD5 matches and it exists already, so don't upload it
      end
    end
  end
  
  # This will choke on symlinked files
  def sync
    (puts "Directories are in sync"; return) if @files_to_upload.empty?

    @files_to_upload.each do |file|
      puts "#{file} ===> #{@bucket.name}:#{file}"
      S3Object.store(file, File.open(local_name(file), 'r'), @bucket_name)      
    end
  end
  
  private 
  
  def local_name(file)
    File.join(@directory, file)
  end
  
end

if __FILE__ == $0
  S3Syncer.sync('/Users/Scott/versioned/spattendesign/presentations/s3-on-rails/s3demo', 
                'spatten_syncdemo')
end

S3Syncer demo

Start with spatten_syncdemo bucket empty, and four files in the local directory.

Run the script


s3demo $>ls
flowers.jpg vampire.jpg
test.txt vampire_downloaded.jpg
s3demo $>s3syncer
flowers.jpg => spatten_syncdemo:flowers.jpg test.txt => spatten_syncdemo:test.txt
vampire.jpg => spatten_syncdemo:vampire.jpg vampire_downloaded.jpg => spatten_syncdemo:vampire_downloaded.jpg

Run it again, it says there’s no need to do anything

s3demo $>s3syncer
Directories are in sync

Change a file locally and sync again


s3demo $> vi test.txt
Make some changes using vi
s3demo $>s3syncer
test.txt ===> spatten_syncdemo:test.txt

Delete flower.jpg using the Firefox S3 Organizer and then sync again.


s3demo $>s3syncer
flowers.jpg ===> spatten_syncdemo:flowers.jpg

So there you go, a quick intro to the wonders of Amazon S3.

blog comments powered by Disqus