Peter Norvig wrote a simple spelling corrector in 20 lines of Python 2.5,
so I thought I’d see what it looks like in Ruby. Here are some areas I’m not pleased with:
- List comprehensions in Python made the edits1 function more elegant IMO.
- The boolean expression in the correct function evaluates empty sets/arrays as false in Python but not in Ruby, so I had to add the “result.empty? ? nil : result” expression to several functions. I expect there’s a better way to handle this also.
Otherwise, the translation was pretty straightforward.
Here’s a link to Norvig’s page:
http://www.norvig.com/spell-correct.html
That page includes a link to a text file that I saved locally as
holmes.txt: http://www.norvig.com/holmes.txt
def words text
text.downcase.scan(/[a-z]+/)
end
def train features
model = Hash.new(1)
features.each {|f| model[f] += 1 }
return model
end
NWORDS = train(words(File.new('holmes.txt').read))
LETTERS = ("a".."z").to_a.join
def edits1 word
n = word.length
deletion = (0...n).collect {|i| word[0...i]+word[i+1..-1] }
transposition = (0...n-1).collect {|i| word[0...i]+word[i+1,1]+word[i,1]+word[i+2..-1] }
alteration = []
n.times {|i| LETTERS.each_byte {|l| alteration << word[0...i]+l.chr+word[i+1..-1] } }
insertion = []
(n+1).times {|i| LETTERS.each_byte {|l| insertion << word[0...i]+l.chr+word[i..-1] } }
result = deletion + transposition + alteration + insertion
result.empty? ? nil : result
end
def known_edits2 word
result = []
edits1(word).each {|e1| edits1(e1).each {|e2| result << e2 if NWORDS.has_key?(e2) }}
result.empty? ? nil : result
end
def known words
result = words.find_all {|w| NWORDS.has_key?(w) }
result.empty? ? nil : result
end
def correct word
(known([word]) or known(edits1(word)) or known_edits2(word) or
[word]).max {|a,b| NWORDS[a] NWORDS[b] }
end
After you’ve saved the holmes.txt file, load the code into irb and call the correct function with a string as follows:
badkins:~/sync/code/ruby$ irb irb(main):001:0> require 'spelling_corrector.rb' => true irb(main):002:0> correct "whree" => "where"

11 comments
Comments feed for this article
February 6, 2009 at 9:00 am
links for 2009-02-06 « Brent Sordyl’s Blog
[...] How to Write a Spelling Corrector in Ruby Peter Norvig wrote a simple spelling corrector in 20 lines of Python 2.5, so I thought I’d see what it looks like in Ruby. Here are some areas I’m not pleased with: (tags: ruby programming) [...]
April 12, 2009 at 10:48 am
Antonio Cangiano
Brian, is it okay with you if I include your code in the Ruby Benchmark Suite? It’s an open source project that’s released under the MIT license. Your file would therefore be released as MIT as well, carry a copyright line with your name, and a link back to this post.
April 12, 2009 at 4:57 pm
Brian Adkins
@Antonio – sure, as long as the copyright is retained.
January 26, 2010 at 5:06 pm
Ray Pereda
I know both Python and Ruby, and IMHO your Ruby edition is much more readable. As pretty as a typical elementary number theory theorem and proof.
September 16, 2010 at 7:35 pm
McGee
for the alteration (and insertion) you could also use this syntax, for consistency:
alteration = (0…n).collect { |i| (0…26).collect { |l| word[0...i] + LETTERS[l].chr+word[i+1..-1] } }.flatten
December 23, 2010 at 5:03 pm
Mark Wilden
I think this is a bit more Rubyish:
!result.empty? && result
December 24, 2010 at 12:03 am
Michael Dvorkin
result.any? && result
Also, result = words.find_all {|w| NWORDS.has_key?(w) } can be rewritten as:
result = words & HWORDS.keys
which is faster and removes duplicates.
January 1, 2011 at 2:42 am
SpellingBee gem released | Binary Doodles
[...] – Brian Adkins’ article How to Write a Spelling Corrector in Ruby helped a lot when I was trying to translate the python [...]
January 19, 2011 at 8:17 am
alejandrodumas
thanks for the ruby version, remember that in the
train method you don’t need to put return model as in
other languages.
February 8, 2011 at 9:43 am
Spellbound « newlisper
[...] Brian Adkins [...]
May 4, 2012 at 12:48 pm
Carlos Werberich
Thanks for this code. It will be very useful in one of my websites. I only need to adapt it to Portuguese language (with some special chars).