Lojic Technologies

Name Guessing in Ruby

leave a comment »

I came across a web site recently that prompted me to enter two names before I could view the site. So I wondered how difficult it would be to write a Ruby program that could pass the challenge. As you’d expect, it’s trivial in Ruby. If you encounter such a site, please don’t use the code in part two unless it’s permitted by the terms of service of the site in question.

Part One – Obtain a file of names

I found a government site that provided lists of the 1,000 most popular names for each decade, so I wrote a Ruby program to screen scrape the names into a list in order of popularity.

Make the open-uri library available.

require 'open-uri'

Create a simple Struct to contain the url for each decade along with an array for boy names and girl names (the site provides an ordered list for each gender). Then create an array of Decade instances for 80’s, 90’s and 00’s.

Decade = Struct.new(:url, :boy_names, :girl_names)
decades = [
  Decade.new('http://www.ssa.gov/OACT/babynames/decades/names1980s.html', [], []),
  Decade.new('http://www.ssa.gov/OACT/babynames/decades/names1990s.html', [], []),
  Decade.new('http://www.ssa.gov/OACT/babynames/decades/names2000s.html', [], [])
]

Iterate over each decade and screen scrape the list of boy names and girl names.

decades.each do |decade|
  str = open(decade.url).read
  str.scan(/<tr><td align="right">.*?<td width="15%">(w+)</td>.*?<td width="15%">(w+)</td>/m) do
    decade.boy_names << $1
    decade.girl_names << $2
  end
end

Now we have a pair of lists (boys & girls) for each decade with each list in order of popularity – 6 lists in total. We want to form a single list by selecting the most popular name from each of the 6 lists, followed by the second most popular name, etc. The zip function comes to mind, so we’ll first create an array of the 6 lists.

name_lists = decades.inject([]) do |result, decade|
  result << decade.boy_names
  result << decade.girl_names
  result
end

Now we have a list containing 6 lists. Here is where the “everything is an object” aspect of ruby is a bit of a pain. What I want to do is simply zip the 6 lists together, for example:

x = zip(one, two, three, four, five, six)

but I’m forced to pick one of the lists to be the instance for the zip method, for example:

x = one.zip(two, three, four, five, six)

I suppose my dissatisfaction is due in part to my recent research in functional programming.

The line below will do the following (I really like Ruby 🙂 ):

  1. pop one of the lists off of the main list (leaving 5) to be the instance for zip
  2. explode the list of 5 remaining lists into 5 individual arguments via the * operator
  3. zip the 6 lists together
  4. flatten the resulting list of lists produced by zip into one list
  5. remove duplicate entries
  6. write each name to standard output
(name_lists.shift).zip(*name_lists).flatten.uniq.each {|name| puts name }

Run the program as follows to create a file named names.txt which will be used in part two.

ruby scrape_names.rb > names.txt

Part Two – Brute force form submission

Now that we have a list of over 2,000 names, it’s a simple matter of repeatedly trying pairs of names in order of popularity. Here’s the program in its entirety with comments following:

 1  require 'net/http'

 2  names = ARGF.inject([]) {|result, line| result < 'application/x-www-form-urlencoded' }
 5    (1...names.length).each do |i|
 6      (0..i).each do |j|
 7        response = query.post('/guess/',
 8          "name1=#{names[i]}&name2=#{names[j]}", headers)
 9        if !response.body.include?('Failed')
10          puts "Answer: #{names[i]}, #{names[j]}"
11          exit
12        end
13      end
14    end
15  end

Line 1: makes the HTTP library accessible.

Line 2: reads a list of names from standard input into an array.

Line 3: opens a TCP connection to the specified web server (site withheld for privacy).

Line 4: through trial and error I discovered that contrary to the example in the Pick Axe p. 700, I need to add this header to get the form submission to work.

Lines 5-6: setup the iteration to compare each name (beginning with the second) with each of the previous names up to and including the current one (in case the answer is two identical names).

Lines 7-8: perform an HTTP post with the pair of names.

Lines 9-11: if the retrieved page doesn’t contain a failure string, print the solution to standard output and exit.

Run the program as follows:

ruby guess_names.rb < names.txt

There you have it – a pair of Ruby programs to pass a “guess two names to enter” challenge. The amount of time to find a solution will depend on the uncommonness of the names used. This is an O(n^2) algorithm, so if the names in the solution are near the bottom of the popularity scale, expect a long run time. The name list I produced had 2,697 names, so the worst case scenario is (2697 * 2698) /2 -1 = 3,638,252 requests. If only the first 100 names need to be compared there would be 5,049 requests.

This algorithm is trivial to parallelize – just change line 5 above from (1…names.length) to multiple non-overlapping ranges (one per process), and let her fly.

Ruby has many benefits and is currently my most productive programming language, but I think it particularly shines in this type of ad-hoc, web-enabled task because of the expressiveness of the language and the availability of useful libraries. I highly recommend putting it high on your list of languages to learn if you don’t already use it.

Written by Brian Adkins

December 20, 2007 at 9:04 pm

Posted in programming

Tagged with ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: