Lojic Technologies

How to Write a Spelling Corrector in Ruby

with 14 comments

Update 10/16/2015: Please see the Racket Version also.

Peter Norvig wrote a simple spelling corrector in 20 lines of Python 2.5,
so I thought I’d see what it looks like in Ruby. Here are some areas I’m not pleased with:

  1. List comprehensions in Python made the edits1 function more elegant IMO.
  2. The boolean expression in the correct function evaluates empty sets/arrays as false in Python but not in Ruby, so I had to add the “result.empty? ? nil : result” expression to several functions. I expect there’s a better way to handle this also.

Otherwise, the translation was pretty straightforward.

Here’s a link to Norvig’s page:
http://www.norvig.com/spell-correct.html

That page includes a link to a text file that I saved locally as
holmes.txt: http://www.norvig.com/holmes.txt

def words text
  text.downcase.scan(/[a-z]+/)
end

def train features
  model = Hash.new(1)
  features.each {|f| model[f] += 1 }
  return model
end

NWORDS = train(words(File.new('holmes.txt').read))
LETTERS = ("a".."z").to_a.join

def edits1 word
  n = word.length
  deletion = (0...n).collect {|i| word[0...i]+word[i+1..-1] }
  transposition = (0...n-1).collect {|i| word[0...i]+word[i+1,1]+word[i,1]+word[i+2..-1] }
  alteration = []
  n.times {|i| LETTERS.each_byte {|l| alteration << word[0...i]+l.chr+word[i+1..-1] } }
  insertion = []
  (n+1).times {|i| LETTERS.each_byte {|l| insertion << word[0...i]+l.chr+word[i..-1] } }
  result = deletion + transposition + alteration + insertion
  result.empty? ? nil : result
end

def known_edits2 word
  result = []
  edits1(word).each {|e1| edits1(e1).each {|e2| result << e2 if NWORDS.has_key?(e2) }}
  result.empty? ? nil : result
end

def known words
  result = words.find_all {|w| NWORDS.has_key?(w) }
  result.empty? ? nil : result
end

def correct word
  (known([word]) or known(edits1(word)) or known_edits2(word) or
    [word]).max {|a,b| NWORDS[a] <=> NWORDS[b] }
end

After you’ve saved the holmes.txt file, load the code into irb and call the correct function with a string as follows:

badkins:~/sync/code/ruby$ irb
irb(main):001:0> require 'spelling_corrector.rb'
=> true
irb(main):002:0> correct "whree"
=> "where"

Written by Brian Adkins

September 4, 2008 at 3:58 pm

Posted in people, programming

Tagged with ,

14 Responses

Subscribe to comments with RSS.

  1. […] How to Write a Spelling Corrector in Ruby Peter Norvig wrote a simple spelling corrector in 20 lines of Python 2.5, so I thought I’d see what it looks like in Ruby. Here are some areas I’m not pleased with: (tags: ruby programming) […]

  2. Brian, is it okay with you if I include your code in the Ruby Benchmark Suite? It’s an open source project that’s released under the MIT license. Your file would therefore be released as MIT as well, carry a copyright line with your name, and a link back to this post.

    Antonio Cangiano

    April 12, 2009 at 10:48 am

  3. @Antonio – sure, as long as the copyright is retained.

    Brian Adkins

    April 12, 2009 at 4:57 pm

  4. I know both Python and Ruby, and IMHO your Ruby edition is much more readable. As pretty as a typical elementary number theory theorem and proof.

    Ray Pereda

    January 26, 2010 at 5:06 pm

  5. for the alteration (and insertion) you could also use this syntax, for consistency:

    alteration = (0…n).collect { |i| (0…26).collect { |l| word[0…i] + LETTERS[l].chr+word[i+1..-1] } }.flatten

    McGee

    September 16, 2010 at 7:35 pm

  6. I think this is a bit more Rubyish:

    !result.empty? && result

    Mark Wilden

    December 23, 2010 at 5:03 pm

  7. result.any? && result

    Also, result = words.find_all {|w| NWORDS.has_key?(w) } can be rewritten as:

    result = words & HWORDS.keys

    which is faster and removes duplicates.

    Michael Dvorkin

    December 24, 2010 at 12:03 am

  8. […] – Brian Adkins’ article How to Write a Spelling Corrector in Ruby helped a lot when I was trying to translate the python […]

  9. thanks for the ruby version, remember that in the
    train method you don’t need to put return model as in
    other languages.

    alejandrodumas

    January 19, 2011 at 8:17 am

  10. […] Brian Adkins […]

    Spellbound « newlisper

    February 8, 2011 at 9:43 am

  11. Thanks for this code. It will be very useful in one of my websites. I only need to adapt it to Portuguese language (with some special chars).

    Carlos Werberich

    May 4, 2012 at 12:48 pm

  12. […] September, 2008, I translated Peter Norvig’s spelling corrector into Ruby. My current favorite language is Racket, so I thought it would be a good exercise to port it to […]

  13. In your second last line you want to have NWORDS[a] NWORDS[b]. The operator is missing.

    Prab

    September 8, 2016 at 4:15 pm

    • Looks like that operator isn’t displaying on this page. <=>

      Prab

      September 8, 2016 at 4:15 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: