Find Longest Substring Palindrome in Haskell
I stumbled upon a programming challenge a company was using for recruitment purposes and thought I’d create a Haskell solution as a learning exercise. The first problem was to find the longest palindrome embedded in a text string.
The following Haskell solution seems very readable to me, but it’s a naive solution that’s inefficient. It computes an answer on my 2.6 year old Macbook Pro in under 4 seconds, but a 2x increase in text requires a 7x increase in CPU time.
I believe there are algorithms to find the longest embedded palindrome in linear time, so I may post a refinement later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
-- Find the longest palindrome in a text string. module Main where import Char text = "I'll just type in some example text here and embed a little palindrome - A man, a plan, a canal, Panama! - I expect that will be the longest palindrome found in this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium dolor, et mattis leo turpis non est. Sed aliquet, sapien quis consequat condimentum, sem magna ornare ligula, id blandit odio nisl vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc vitae tempus purus." clean = map toLower . filter isAlpha palindrome str = str == reverse str substrings [] = [] substrings (x:xs) = substrings' (x:xs) ++ substrings xs where substrings' [] = [] substrings' (y:ys) = [y] : [ (y:s) | s <- substrings' ys ] longest [] = [] longest (x:xs) = if length x > length max then x else max where max = longest xs longest_palindrome xs = longest (filter palindrome (substrings (clean text))) main = print (longest_palindrome text) |
As a comparison, I translated the program into Ruby. I program predominantly in Ruby these days, and I like it, but the Ruby version is 25 times slower (98 sec. vs. 4 sec.), and it’s 2.4 times more lines of code (31 vs. 13 – excluding the text).
A gain in runtime efficiency, expressive power and multi-core capability is very attractive!
I’m using Ruby 1.9.2 and GHC 6.12.3 on Mac OS X 10.5.8 on a 2.4 GHz Core 2 Duo w/ 4 GB RAM.
ruby 1.9.2p0 (2010–08–18 revision 29036) [i386-darwin9.8.0]
Glasgow Haskell Compiler, Version 6.12.3, for Haskell 98, stage 2 booted by GHC version 6.12.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
TEXT = <<END I'll just type in some example text here and embed a little palindrome - A man, a plan, a canal, Panama! - I expect that will be the longest palindrome found in this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium dolor, et mattis leo turpis non est. Sed aliquet, sapien quis consequat condimentum, sem magna ornare ligula, id blandit odio nisl vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc vitae tempus purus. END def clean str str.gsub(/[^A-Za-z]/,'').downcase end def palindrome? str str == str.reverse end def subs str return [] if str.empty? y = str[0,1] subs(str[1..-1]).inject([y]) do |result, s| result << y + s result end end def substrings str return [] if str.empty? subs(str) + substrings(str[1..-1]) end def longest strs strs.inject("") do |max, str| max = str if str.length > max.length max end end def longest_palindrome str longest(substrings(clean(str)).inject([]) {|result, str| result << str if palindrome?(str) result }) end puts longest_palindrome(TEXT) |
Update 3/13/11 19:30
Rick’s comment (see his blog post linked to in his comment) regarding the importance of algorithm choice is certainly a valid one – a better algorithm in a slower language may win over an inferior algorithm in a faster language (for large enough datasets). However, what I’m becoming interested in is the fact that one may gain both productivity/power and runtime speed by making wise programming language choices.
In the case of an interpreted language such as Ruby, it’s helpful to “stay in C code” as much as possible. In other words, to favor built-in library routines that have been implemented in C over hand written Ruby code. As much as I like Ruby, this is one of the things that bothers me – the fact that the Ruby code written by the programmer is vastly inferior in performance to the built-in library routines.
I wasn’t planning on refining the Haskell version this soon, but after seeing Rick’s blog post response, I couldn’t resist 🙂
I should first provide some background info for context. When I wrote the original Haskell version above, I was in the middle of an online programming challenge, and my goal was simply to compute the answer in a reasonable amount of time to move on to the next challenge, so the brief, easily understandable, Haskell version worked great. Hence, my disclaimers in the post regarding the naivity and inefficiency of the solution. The Ruby code in this post was an afterthought simply to see a comparison between identical algorithms. Given that apple & apple comparison, I’m impressed with the brevity and speed of the Haskell version.
Since I’m still very much a Haskell newbie, and short on time, I found an incredible solution by Johan Jeuring. It’s a little bit longer than the Ruby version, but much, much faster. It’s so fast, I had to increase the input quite a bit to get a reasonable comparison – I replicated the original text 23 times and reversed one of the replications to make a fairly long palindrome.
Rick’s Ruby version took 169.4 seconds, Johan’s Haskell version took 0.032 seconds. In other words, Ruby takes over 5,000 times as long to compute the result. Clearly this is an apples and oranges comparison, but I fully expect that a Ruby version using an identical algorithm will take 100 times as long (or longer) to run and would be less concise. Giving up runtime performance to gain programmer power is one thing, but giving up runtime performance and power is a tough pill to swallow.
Here is a slightly modified version of Johan Jeuring’s code. He was also kind enough to provide his code here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
-- Reorganized from Johan Jeuring's solution: module Main where import Data.List (maximumBy,intersperse) import Data.Char import Data.Array text = "I'll just type in some example text here and embed a little palindrome - A man, a plan, a canal, Panama! - I expect that will be the longest palindrome found in this text. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium dolor, et mattis leo turpis non est. Sed aliquet, sapien quis consequat condimentum, sem magna ornare ligula, id blandit odio nisl vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc vitae tempus purus." clean = map toLower . filter isAlpha longestPalindrome input = let inputArray = listArrayl0 input (maxLength,pos) = maximumBy ((l,_) (l',_) -> compare l l') (zip (palindromesAroundCentres inputArray) [0..]) in showPalindrome inputArray (maxLength,pos) longestPalindromes m input = let inputArray = listArrayl0 input in concat $ intersperse "n" $ map (showPalindrome inputArray) $ filter ((m String lengthLongestPalindrome = show . maximum . palindromesAroundCentres . listArrayl0 lengthLongestPalindromes :: String -> String lengthLongestPalindromes = show . palindromesAroundCentres . listArrayl0 palindromesAroundCentres a = let (afirst,_) = bounds a in reverse $ extendTail a afirst 0 [] extendTail a n currentTail centres | n > alast = finalCentres currentTail centres (currentTail:centres) | n-currentTail == afirst = extendCentres a n (currentTail:centres) centres currentTail | a!n == a!(n-currentTail-1) = extendTail a (n+1) (currentTail+2) centres | otherwise = extendCentres a n (currentTail:centres) centres currentTail where (afirst,alast) = bounds a extendCentres a n centres tcentres centreDistance | centreDistance == 0 = extendTail a (n+1) 1 centres | centreDistance-1 == head tcentres = extendTail a n (head tcentres) centres | otherwise = extendCentres a n (min (head tcentres) (centreDistance-1):centres) (tail tcentres) (centreDistance-1) finalCentres 0 _ centres = centres finalCentres (n+1) tcentres centres = finalCentres n (tail tcentres) (min (head tcentres) n:centres) finalCentres _ _ _ = error "finalCentres: input < 0" showPalindrome a (len,pos) = let startpos = pos `div` 2 - len `div` 2 endpos = if odd len then pos `div` 2 + len `div` 2 else pos `div` 2 + len `div` 2 - 1 in show [a!n|n <- [startpos .. endpos]] listArrayl0 string = listArray (0,length string - 1) string sampleText s = concat (replicate 8 s ++ [ "x" ] ++ [ reverse s ] ++ replicate 14 s) main = print (longestPalindrome (clean (sampleText text))) |
Here’s the relevant change to Rick’s Ruby version:
1 2 3 4 5 |
def sample_text str str * 8 + 'x' + str.reverse + str * 14 end puts clean(sample_text(TEXT)).longest_palindrome |
From the comments on my old blog
Rick DeNatale March 13, 2011 9:52
I couldn’t resist. My first bit of programming during DST in 2011.
Brian Adkins March 13, 2011 12:25
Nice work Rick. As I mentioned my solution above is naive with respect to runtime performance, but I do like the brevity of the Haskell version.
Algorithm choice is certainly important, but what about when you use the same algorithm in both Haskell and Ruby ? :)
Brian Adkins March 13, 2011 21:31
I stand corrected on my “100 times slower” statement in the post. I thought I’d post a comment before someone calls me on it 🙂
I just incorporated Sam McCall’s improved Ruby function (from a comment on Rick’s blog post), and I’m now getting interpreted Ruby times of only double that of compiled Haskell – even when cranking up the text sizes considerably. Sam’s Ruby function is very concise also. Well done.
I’ve already jumped to a few conclusions, so I’ll refrain from doing that now, but I think I will poke around the Haskell and Ruby profilers
Rick DeNatale March 13, 2011 23:14
A few comments.
In your original article, you seemed to equate ‘expressiveness’ with fewer lines of code, something I’d equate more to Paul Graham’s notion of ‘power’.
On the other hand, the improved Haskell program has blown up to 60 non-blank lines which is about twice as long as my Ruby version and about 3 times as long as Sam’s.
And I for one find the Ruby code far easier to comprehend, but that’s probably my failing.
And I also have to point out that I wrote my version this morning literally in about 30 minutes, while drinking my first cup of coffee on the first morning of daylight saving time. On the other hand I did have the advantage of looking at your solution first.
Brian Adkins March 15, 2011 7:30
I took Sam McCall’s idea and implemented it in Haskell. It’s much shorter than Johan’s, 30% faster and more readable IMO, but it’s not as readable as Sam’s Ruby because of my hack job with ByteStrings. The Ruby code takes 2.6 times longer to run. Given that, I’d still probably give the edge to Sam’s Ruby code because of greater readability. I’m pretty sure the lack of elegance in the following program is just due to my lack of skill, so I think I’ll be able to come up with something satisfactory with a little more learnin’.
Don Wilson March 14, 2011 23:43
This little back-and-forth conversation between you guys is very intriguing and you’ve both made great points. There definitely are somewhat drastic performance and aesthetic differences between the languages. I’m personally kind of bummed that Ruby doesn’t perform as well as other languages. But, I definitely love the way that Ruby looks compared to Haskell. It’s nice to be able to express so much in so few lines of code, but it is not very readable.
Anyways, my main point is that it’s great to see both of your opinions and points be made. (And to me, they don’t appear to clash in any way).
Thanks!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
module Main where import qualified Data.ByteString as B import qualified Data.ByteString.Char8 as BC import Char sampleText = "I'll just type..." clean = map toLower . filter isAlpha palindrome s b e | b < e = (B.index s b) == (B.index s e) && palindrome s (b+1) (e-1) | otherwise = True longestCentered s (b,e) (b',e') = if b >= 0 && e < B.length s && palindrome s b e then longestCentered s (b-1,e+1) (b,e) else (b',e') longestPalindrome s c (b',e') | c < B.length s = let (odd1,odd2) = longestCentered s (c,c) (b',e') (even1,even2) = longestCentered s (c,c+1) (b',e') (best1,best2) = if odd2-odd1 > even2-even1 then if odd2-odd1 > e'-b' then (odd1,odd2) else (b',e') else if even2-even1 > e'-b' then (even1,even2) else (b',e') in longestPalindrome s (c+1) (best1,best2) | otherwise = (b',e') main = print (palToStr text (longestPalindrome text 0 (0,0))) where text = BC.pack (concat (replicate 800 s ++ [ "x" ] ++ [ reverse s ] ++ replicate 14 s)) where s = clean sampleText palToStr s (b,e) = B.pack [ B.index s c | c <- [b..e]] |
Brian Adkins March 15, 2011 8:52
As a benchmark, the following C program runs in 0.184 sec vs. 0.636 sec for the previous Haskell version and 1.674 sec for Sam’s Ruby version. So, using the C program as 1.0, Haskell runs in 3.46x and the Ruby in 9.10x.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
#include <stdio.h> #include <string.h> #include <stdlib.h> int palindrome(char * s, int b, int e) { while (b < e) { if (s[b++] != s[e--]) { return 0; } } return 1; } void longest_centered(char * s, int slen, int b, int e, int * b1, int * e1) { while (b >= 0 && e < slen && palindrome(s, b, e)) { *b1 = b--; *e1 = e++; } } void longest_palindrome(char * s, int slen, int c, int* b1, int* e1) { while (c < slen) { int odd1 = 0, odd2 = 0; longest_centered(s, slen, c, c, &odd1, &odd2); int odd = odd2 - odd1; int even1 = 0, even2 = 0; longest_centered(s, slen, c, c+1, &even1, &even2); int even = even2 - even1; int best = *e1 - *b1; if (odd > even) { if (odd > best) { *b1 = odd1; *e1 = odd2; } } else { if (even > best) { *b1 = even1; *e1 = even2; } } ++c; } } char * reverse(char * s) { int len = strlen(s); char * result = malloc(len + 1); int src = len - 1; int dst = 0; while (src >= 0) { result[dst++] = s[src--]; } result[dst] = 0; return result; } char * sample_text(char * text) { int text_len = strlen(text); char * s = malloc(text_len * 800 + 1 + text_len + text_len * 14 + 1); strcpy(s, text); int i; for (i = 1; i <= 800; i++) { strcat (s, text); } strcat(s, "x"); strcat(s, reverse(text)); for (i = 1; i <= 14; i++) { strcat (s, text); } return s; } int main(void) { char * text = "illjusttype..."; char * s = sample_text(text); int beg = 0; int end = 0; int slen = strlen(s); longest_palindrome(s, slen, 0, &beg, &end); printf("beg=%d, end=%dn", beg, end); return 0; } |