Find Longest Substring Palindrome in Haskell

2011-03-13 :: programming, haskell, ruby

I stumbled upon a programming challenge a company was using for recruitment purposes and thought I’d create a Haskell solution as a learning exercise. The first problem was to find the longest palindrome embedded in a text string.

The following Haskell solution seems very readable to me, but it’s a naive solution that’s inefficient. It computes an answer on my 2.6 year old Macbook Pro in under 4 seconds, but a 2x increase in text requires a 7x increase in CPU time.

I believe there are algorithms to find the longest embedded palindrome in linear time, so I may post a refinement later.

-- Find the longest palindrome in a text string.

module Main where
import Char

text = "I'll just type in some example text here and embed a little 
palindrome - A man, a plan, a canal, Panama! - I expect that will be 
the longest palindrome found in this text.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris 
tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id 
condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere 
ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed
 odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante 
ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium 
dolor, et mattis leo turpis non est. Sed aliquet, sapien quis 
consequat condimentum, sem magna ornare ligula, id blandit odio nisl 
vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt 
lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros 
accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc 
et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc 
vitae tempus purus."

clean = map toLower . filter isAlpha

palindrome str = str == reverse str

substrings []     = []
substrings (x:xs) = substrings' (x:xs) ++ substrings xs where
  substrings' []     = []
  substrings' (y:ys) = [y] : [ (y:s) | s <- substrings' ys ]

longest []     = []
longest (x:xs) = if length x > length max then x else max
  where max = longest xs

longest_palindrome xs =
  longest (filter palindrome (substrings (clean text)))

main = print (longest_palindrome text)

As a comparison, I translated the program into Ruby. I program predominantly in Ruby these days, and I like it, but the Ruby version is 25 times slower (98 sec. vs. 4 sec.), and it’s 2.4 times more lines of code (31 vs. 13 – excluding the text).

A gain in runtime efficiency, expressive power and multi-core capability is very attractive!

I’m using Ruby 1.9.2 and GHC 6.12.3 on Mac OS X 10.5.8 on a 2.4 GHz Core 2 Duo w/ 4 GB RAM.

ruby 1.9.2p0 (2010–08–18 revision 29036) [i386-darwin9.8.0]
Glasgow Haskell Compiler, Version 6.12.3, for Haskell 98, stage 2 booted by GHC version 6.12.2

TEXT = <<END
I'll just type in some example text here and embed a little
palindrome - A man, a plan, a canal, Panama! - I expect that will be
the longest palindrome found in this text.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris
tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id
condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere
ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed
 odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante
ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium
dolor, et mattis leo turpis non est. Sed aliquet, sapien quis
consequat condimentum, sem magna ornare ligula, id blandit odio nisl
vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt
lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros
accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc
et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc
vitae tempus purus.
END

def clean str
  str.gsub(/[^A-Za-z]/,'').downcase
end

def palindrome? str
  str == str.reverse
end

def subs str
  return [] if str.empty?
  y = str[0,1]
  subs(str[1..-1]).inject([y]) do |result, s|
    result << y + s
    result
  end
end

def substrings str
  return [] if str.empty?
  subs(str) + substrings(str[1..-1])
end

def longest strs
  strs.inject("") do |max, str|
    max = str if str.length > max.length
    max
  end
end

def longest_palindrome str
  longest(substrings(clean(str)).inject([]) {|result, str|
            result << str if palindrome?(str)
            result
          })
end

puts longest_palindrome(TEXT)

Update 3/13/11 19:30

Rick’s comment (see his blog post linked to in his comment) regarding the importance of algorithm choice is certainly a valid one – a better algorithm in a slower language may win over an inferior algorithm in a faster language (for large enough datasets). However, what I’m becoming interested in is the fact that one may gain both productivity/power and runtime speed by making wise programming language choices.

In the case of an interpreted language such as Ruby, it’s helpful to “stay in C code” as much as possible. In other words, to favor built-in library routines that have been implemented in C over hand written Ruby code. As much as I like Ruby, this is one of the things that bothers me – the fact that the Ruby code written by the programmer is vastly inferior in performance to the built-in library routines.

I wasn’t planning on refining the Haskell version this soon, but after seeing Rick’s blog post response, I couldn’t resist 🙂

I should first provide some background info for context. When I wrote the original Haskell version above, I was in the middle of an online programming challenge, and my goal was simply to compute the answer in a reasonable amount of time to move on to the next challenge, so the brief, easily understandable, Haskell version worked great. Hence, my disclaimers in the post regarding the naivity and inefficiency of the solution. The Ruby code in this post was an afterthought simply to see a comparison between identical algorithms. Given that apple & apple comparison, I’m impressed with the brevity and speed of the Haskell version.

Since I’m still very much a Haskell newbie, and short on time, I found an incredible solution by Johan Jeuring. It’s a little bit longer than the Ruby version, but much, much faster. It’s so fast, I had to increase the input quite a bit to get a reasonable comparison – I replicated the original text 23 times and reversed one of the replications to make a fairly long palindrome.

Rick’s Ruby version took 169.4 seconds, Johan’s Haskell version took 0.032 seconds. In other words, Ruby takes over 5,000 times as long to compute the result. Clearly this is an apples and oranges comparison, but I fully expect that a Ruby version using an identical algorithm will take 100 times as long (or longer) to run and would be less concise. Giving up runtime performance to gain programmer power is one thing, but giving up runtime performance and power is a tough pill to swallow.

Here is a slightly modified version of Johan Jeuring’s code. He was also kind enough to provide his code here:

-- Reorganized from Johan Jeuring's solution:
module Main where
import Data.List (maximumBy,intersperse)
import Data.Char
import Data.Array

text = "I'll just type in some example text here and embed a little 
palindrome - A man, a plan, a canal, Panama! - I expect that will be 
the longest palindrome found in this text.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris 
tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id 
condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere 
ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed
 odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante 
ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium 
dolor, et mattis leo turpis non est. Sed aliquet, sapien quis 
consequat condimentum, sem magna ornare ligula, id blandit odio nisl 
vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt 
lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros 
accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc 
et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc 
vitae tempus purus."

clean = map toLower . filter isAlpha

longestPalindrome input =
  let inputArray      =  listArrayl0 input
      (maxLength,pos) =  maximumBy
                            ((l,_) (l',_) -> compare l l')
                            (zip (palindromesAroundCentres inputArray) [0..])
  in showPalindrome inputArray (maxLength,pos)

longestPalindromes m input =
  let inputArray =  listArrayl0 input
  in concat $ intersperse "n"
            $ map (showPalindrome inputArray)
            $ filter ((m String
lengthLongestPalindrome = show . maximum . palindromesAroundCentres . listArrayl0

lengthLongestPalindromes :: String -> String
lengthLongestPalindromes = show . palindromesAroundCentres . listArrayl0

palindromesAroundCentres a =
  let (afirst,_) = bounds a
  in reverse $ extendTail a afirst 0 []

extendTail a n currentTail centres
  | n > alast = finalCentres currentTail centres
                   (currentTail:centres)
  | n-currentTail == afirst =
      extendCentres a n (currentTail:centres)
                    centres currentTail
  | a!n == a!(n-currentTail-1) =
      extendTail a (n+1) (currentTail+2) centres
  | otherwise =
      extendCentres a n (currentTail:centres)
                    centres currentTail
  where  (afirst,alast)  =  bounds a

extendCentres a n centres tcentres centreDistance
  | centreDistance == 0 =
      extendTail a (n+1) 1 centres
  | centreDistance-1 == head tcentres  =
      extendTail a n (head tcentres) centres
  | otherwise =
      extendCentres a n (min (head tcentres)
                    (centreDistance-1):centres)
                    (tail tcentres) (centreDistance-1)

finalCentres 0     _        centres  =  centres
finalCentres (n+1) tcentres centres  =
  finalCentres n
               (tail tcentres)
               (min (head tcentres) n:centres)
finalCentres _     _        _        =  error "finalCentres: input < 0"

showPalindrome a (len,pos) =
  let startpos = pos `div` 2 - len `div` 2
      endpos   = if odd len
                 then pos `div` 2 + len `div` 2
                 else pos `div` 2 + len `div` 2 - 1
  in show [a!n|n <- [startpos .. endpos]]

listArrayl0 string  = listArray (0,length string - 1) string

sampleText s = concat (replicate 8 s ++ [ "x" ] ++ [ reverse s ] ++ replicate 14 s)

main = print (longestPalindrome (clean (sampleText text)))

Here’s the relevant change to Rick’s Ruby version:

def sample_text str
  str * 8 + 'x' + str.reverse + str * 14
end

puts clean(sample_text(TEXT)).longest_palindrome

From the comments on my old blog

Rick DeNatale March 13, 2011 9:52

I couldn’t resist. My first bit of programming during DST in 2011.

http://talklikeaduck.denhaven2.com/2011/03/13/algorithm-choice-trumps-programming-language-choice-for

Brian Adkins March 13, 2011 12:25

Nice work Rick. As I mentioned my solution above is naive with respect to runtime performance, but I do like the brevity of the Haskell version.

Algorithm choice is certainly important, but what about when you use the same algorithm in both Haskell and Ruby ? :)

Brian Adkins March 13, 2011 21:31

I stand corrected on my “100 times slower” statement in the post. I thought I’d post a comment before someone calls me on it 🙂

I just incorporated Sam McCall’s improved Ruby function (from a comment on Rick’s blog post), and I’m now getting interpreted Ruby times of only double that of compiled Haskell – even when cranking up the text sizes considerably. Sam’s Ruby function is very concise also. Well done.

I’ve already jumped to a few conclusions, so I’ll refrain from doing that now, but I think I will poke around the Haskell and Ruby profilers

Rick DeNatale March 13, 2011 23:14

A few comments.

In your original article, you seemed to equate ‘expressiveness’ with fewer lines of code, something I’d equate more to Paul Graham’s notion of ‘power’.

On the other hand, the improved Haskell program has blown up to 60 non-blank lines which is about twice as long as my Ruby version and about 3 times as long as Sam’s.

And I for one find the Ruby code far easier to comprehend, but that’s probably my failing.

And I also have to point out that I wrote my version this morning literally in about 30 minutes, while drinking my first cup of coffee on the first morning of daylight saving time. On the other hand I did have the advantage of looking at your solution first.

Brian Adkins March 15, 2011 7:30

I took Sam McCall’s idea and implemented it in Haskell. It’s much shorter than Johan’s, 30% faster and more readable IMO, but it’s not as readable as Sam’s Ruby because of my hack job with ByteStrings. The Ruby code takes 2.6 times longer to run. Given that, I’d still probably give the edge to Sam’s Ruby code because of greater readability. I’m pretty sure the lack of elegance in the following program is just due to my lack of skill, so I think I’ll be able to come up with something satisfactory with a little more learnin’.

Don Wilson March 14, 2011 23:43

This little back-and-forth conversation between you guys is very intriguing and you’ve both made great points. There definitely are somewhat drastic performance and aesthetic differences between the languages. I’m personally kind of bummed that Ruby doesn’t perform as well as other languages. But, I definitely love the way that Ruby looks compared to Haskell. It’s nice to be able to express so much in so few lines of code, but it is not very readable.

Anyways, my main point is that it’s great to see both of your opinions and points be made. (And to me, they don’t appear to clash in any way).

Thanks!

module Main where

import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as BC
import Char

sampleText = "I'll just type..."

clean = map toLower . filter isAlpha

palindrome s b e
  | b < e     = (B.index s b) == (B.index s e) && palindrome s (b+1) (e-1)
  | otherwise = True

longestCentered s (b,e) (b',e') =
  if b >= 0 && e < B.length s && palindrome s b e then
    longestCentered s (b-1,e+1) (b,e)
  else
    (b',e')

longestPalindrome s c (b',e')
  | c < B.length s =
    let (odd1,odd2)   = longestCentered s (c,c)   (b',e')
        (even1,even2) = longestCentered s (c,c+1) (b',e')
        (best1,best2) =
          if odd2-odd1 > even2-even1 then
            if odd2-odd1 > e'-b' then (odd1,odd2) else (b',e')
          else
            if even2-even1 > e'-b' then (even1,even2) else (b',e')
    in longestPalindrome s (c+1) (best1,best2)
  | otherwise = (b',e')

main = print (palToStr text (longestPalindrome text 0 (0,0))) where
  text = BC.pack (concat
      (replicate 800 s ++ [ "x" ] ++ [ reverse s ] ++ replicate 14 s))
    where s = clean sampleText
  palToStr s (b,e) = B.pack [ B.index s c | c <- [b..e]]

Brian Adkins March 15, 2011 8:52

As a benchmark, the following C program runs in 0.184 sec vs. 0.636 sec for the previous Haskell version and 1.674 sec for Sam’s Ruby version. So, using the C program as 1.0, Haskell runs in 3.46x and the Ruby in 9.10x.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int palindrome(char * s, int b, int e) {
  while (b < e) {
    if (s[b++] != s[e--]) { return 0; }
  }

  return 1;
}

void longest_centered(char * s, int slen, int b, int e, int * b1, int * e1) {
  while (b >= 0 && e < slen && palindrome(s, b, e)) {
    *b1 = b--;
    *e1 = e++;
  }
}

void longest_palindrome(char * s, int slen, int c, int* b1, int* e1) {
  while (c < slen) {
    int odd1 = 0, odd2 = 0;
    longest_centered(s, slen, c, c, &odd1, &odd2);
    int odd = odd2 - odd1;

    int even1 = 0, even2 = 0;
    longest_centered(s, slen, c, c+1, &even1, &even2);
    int even = even2 - even1;
    int best = *e1 - *b1;

    if (odd > even) {
      if (odd > best) {
        *b1 = odd1;
        *e1 = odd2;
      }
    } else {
      if (even > best) {
        *b1 = even1;
        *e1 = even2;
      }
    }

    ++c;
  }
}

char * reverse(char * s) {
  int len = strlen(s);
  char * result = malloc(len + 1);
  int src = len - 1;
  int dst = 0;
  while (src >= 0) {
    result[dst++] = s[src--];
  }
  result[dst] = 0;
  return result;
}

char * sample_text(char * text) {
  int text_len = strlen(text);
  char * s = malloc(text_len * 800 + 1 + text_len + text_len * 14 + 1);
  strcpy(s, text);
  int i;
  for (i = 1; i <= 800; i++) { strcat (s, text); }
  strcat(s, "x");
  strcat(s, reverse(text));
  for (i = 1; i <= 14; i++) { strcat (s, text); }
  return s;
}

int main(void) {
  char * text = "illjusttype...";

  char * s = sample_text(text);
  int beg = 0;
  int end = 0;
  int slen = strlen(s);

  longest_palindrome(s, slen, 0, &beg, &end);

  printf("beg=%d, end=%dn", beg, end);

  return 0;
}