Lojic Technologies

How to Write a Spelling Corrector in Racket

with one comment

In September, 2008, I translated Peter Norvig’s spelling corrector into Ruby. My current favorite language is Racket, so I thought it would be a good exercise to port it to Racket. After some helpful tips by Vincent St-Amour and Sam Tobin-Hochstadt in the #racket IRC channel, I came up with the following. I’ll show it two different ways, the first minimizes the line count (without sacrificing too much stylistically) to 27 lines, and the second is closer to how I’d normally format it:

#lang racket

(define (words text) (regexp-match* #rx"[a-z]+" (string-downcase text)))

(define (train features)
  (define model (make-hash))
  (for ([f features]) (hash-update! model f add1 1)) model)

(define nwords (train (words (file->string "big.txt"))))

(define (edits1 word)
  (let* ([alphabet "abcdefghijklmnopqrstuvwxyz"]
         [splits (for/list ([i (in-range (+ (string-length word) 1))])
                   (cons (substring word 0 i) (substring word i)))]
         [deletes (for/set ([(a b) (in-dict splits)] #:when (> (string-length b) 0))
                    (string-append a (substring b 1)))]
         [transposes (for/set ([(a b) (in-dict splits)] #:when (> (string-length b) 1))
                       (string-append a (substring b 1 2) (substring b 0 1) (substring b 2)))]
         [replaces (for/set ([(a b) (in-dict splits)] #:when (> (string-length b) 0) [c alphabet])
                     (string-append a (string c) (substring b 1)))]
         [inserts (for*/set ([(a b) (in-dict splits)] [c alphabet])
                    (string-append a (string c) b))])
    (set-union deletes transposes replaces inserts)))

(define (known-edits2 word)
  (for*/set ([e1 (edits1 word)] [e2 (edits1 e1)] #:when (hash-has-key? nwords e2)) e2))

(define (known words) (for/set ([w words] #:when (hash-has-key? nwords w)) w))

(define (nes set) (if (set-empty? set) #f set))

(define (correct word)
  (let ([candidates (or (nes (known (list word))) (nes (known (edits1 word)))
                        (nes (known-edits2 word)) (set word))])
    (argmax (λ (x) (hash-ref nwords x 1)) (set->list candidates))))

 

And here is a more aesthetically pleasing format:

 

#lang racket

(define (words text)
  (regexp-match* #rx"[a-z]+" (string-downcase text)))

(define (train features)
  (define model (make-hash))
  (for ([f features])
    (hash-update! model f add1 1))
  model)

(define nwords
  (train (words (file->string "big.txt"))))

(define (edits1 word)
  (let* ([alphabet "abcdefghijklmnopqrstuvwxyz"]
         [splits (for/list ([i (in-range (+ (string-length word) 1))])
                   (cons (substring word 0 i) (substring word i)))]
         [deletes (for/set ([(a b) (in-dict splits)]
                            #:when (> (string-length b) 0))
                    (string-append a (substring b 1)))]
         [transposes (for/set ([(a b) (in-dict splits)]
                               #:when (> (string-length b) 1))
                       (string-append a
                                      (substring b 1 2)
                                      (substring b 0 1)
                                      (substring b 2)))]
         [replaces (for/set ([(a b) (in-dict splits)]
                             #:when (> (string-length b) 0)
                             [c alphabet])
                     (string-append a (string c) (substring b 1)))]
         [inserts (for*/set ([(a b) (in-dict splits)]
                             [c alphabet])
                    (string-append a (string c) b))])
    (set-union deletes transposes replaces inserts)))

(define (known-edits2 word)
  (for*/set ([e1 (edits1 word)]
             [e2 (edits1 e1)]
             #:when (hash-has-key? nwords e2))
    e2))

(define (known words)
  (for/set ([w words] #:when (hash-has-key? nwords w))
    w))

(define (nes set)
  (if (set-empty? set)
      #f
      set))

(define (correct word)
  (let ([candidates (or (nes (known (list word)))
                        (nes (known (edits1 word)))
                        (nes (known-edits2 word))
                        (set word))])
    (argmax (λ (x) (hash-ref nwords x 1)) (set->list candidates))))

Written by Brian Adkins

October 16, 2015 at 12:36 am

Posted in programming

Tagged with

One Response

Subscribe to comments with RSS.

  1. […] 10/16/2015: Please see the Racket Version […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: