The End of Moore’s Law

For the last few years (since 2009), I’ve been pitching the idea to
my peers that language speed & concurrency/parallel capabilities
will become more important as CPU clock speeds plateau and
manufacturers add more CPU cores instead of advancing clock
rates. My 2+ year old Macbook Pro has 4 cores and
8 hyperthreads.

I, and many of my peers, program primarily in Ruby which is one of
the slowest languages, but is also one of the most productive. Even
though Ruby is often on the critical path (as opposed to the
database, network, ec.), with caching provided by Rails, more server
resources, client side processing with JavaScript, etc., the
productivity of Ruby has thus far outweighed the slowness
disadvantage (enter mantra “hardware is cheaper than developers”),
so I haven’t received much (any?) buy in from the programmers I hang
with.

Programmer productivity will continue to be important, but the
question that I’ve been considering has been, “What if a language
provides the same productivity as Ruby, but much better resource
efficiency?”

The minimum bar for efficiency is fuzzy & variable, but I’ve
nonetheless discounted any language below that bar. Except for R
which would be a special purpose language for me, I’m not
considering any language below the Racket/Clojure level of
efficiency for the future, and ideally the language has a good
concurrency/parallelism story.

Clojure

“A Lisp for Functional Programming symbiotic with an established
Platform designed for Concurrency.”

Reasons for interest:

  • A lisp
  • Macros
  • Functional programming
  • Concurrency
  • Efficiency
  • Concision
  • Clojurescript
  • Community

Common Lisp

“Common Lisp is a general-purpose, multi-paradigm programming
language. It supports a combination of procedural, functional, and
object-oriented programming paradigms. As a dynamic programming
language, it facilitates evolutionary and incremental software
development, with iterative compilation into efficient run-time
programs.”

Reasons for interest:

  • A lisp
  • Macros
  • Multi-paradigm
  • Efficiency
  • Huge standard library

Go

“Go is an open source programming environment that makes it easy to
build simple, reliable, and efficient software.”

Reasons for interest:

  • C replacement
  • Efficiency
  • Concurrency
  • Systems programming
  • Concision

Haskell

“Haskell is an advanced purely-functional programming language. An
open-source product of more than twenty years of cutting-edge
research, it allows rapid development of robust, concise, correct
software. With strong support for integration with other languages,
built-in concurrency and parallelism, debuggers, profilers, rich
libraries and an active community, Haskell makes it easier to
produce flexible, maintainable, high-quality software.”

Reasons for interest:

  • Functional programming
  • Concision
  • Purity
  • Advanced type system
  • Parallelism & concurrency
  • Efficiency
  • Monads
  • Community

J

“J is a modern, high-level, general-purpose, high-performance
programming language. J is particularly strong in the mathematical,
statistical, and logical analysis of data.”

Reasons for interest:

  • APL-like power with a normal keyboard
  • Array language
  • Concision
  • Big data

JavaScript

The language of the web.

Reasons for interest:

  • Only game in town for browsers
  • Compilation target for browsers

Prolog

“Prolog is a high-level programming language based on formal
logic. Unlike traditional programming languages that are based on
performing sequences of commands, Prolog is based on defining and
then solving logical formulas. Prolog is sometimes called a
declarative language or a rule-based language because its programs
consist of a list of facts and rules. Prolog is used widely for
artificial intelligence applications, particularly expert systems.”

Reasons for interest:

  • Logic language
  • A new paradigm
  • AI

R

“R is a language and environment for statistical computing and
graphics. It is a GNU project which is similar to the S language and
environment which was developed at Bell Laboratories (formerly AT&T,
now Lucent Technologies) by John Chambers and colleagues. R can be
considered as a different implementation of S. There are some
important differences, but much code written for S runs unaltered
under R.”

Reasons for interest:

  • Statistics
  • Big data
  • Extensive statistical/numerical library

Racket

“Racket (formerly named PLT Scheme) is a general purpose,
multi-paradigm programming language in the Lisp/Scheme family. One
of its design goals is to serve as a platform for language creation,
design, and implementation. The language is used in a variety of
contexts such as scripting, general-purpose programming, computer
science education, and research.”

Reasons for interest:

  • A lisp
  • Functional programming
  • Continuations
  • Hygienic macros
  • Concurrency
  • Efficiency
  • Built-in web server
  • Concision
  • Can be small & clean
  • pg used it to implement Arc

Ruby

“Ruby is a language of careful balance. Its creator, Yukihiro “Matz”
Matsumoto, blended parts of his favorite languages (Perl, Smalltalk,
Eiffel, Ada, and Lisp) to form a new language that balanced
functional programming with imperative programming.”

Reasons for interest:

  • Joy to program
  • Concision
  • Rails
  • Community
  • Primary revenue generating language currently
  • Dead slow, but lovable

Summary

I hope to decide on which of the three lisps I want to pursue,
determine if J can supplant R, and see if Go can be replaced with
one of the others, so that would leave six.

  • Haskell
  • J
  • JavaScript
  • A lisp
  • Prolog
  • Ruby

I’ve briefly researched Erlang, Factor, Lua, OCaml, Scala, Smalltalk
& Standard ML. At the moment, I don’t feel they should replace any
of the languages on the above list for subjective, practical and/or
technical reasons, but if you’re excited about those languages, or
others, comments are welcome.

Standard ML came close to bumping Haskell from the list, but despite
the nicety of SML and the advocacy of Robert Harper, I felt the
platforms (compilers, runtimes, libraries, etc.) were antiquated and
dividing/conquering themselves, so I couldn’t trust them to advance
quickly enough – particularly in the area of parallelism &
concurrency.

Update 1/17/2014:
For a lower level language, I think the D language is a candidate to replace Go. For numerical & statistical computing, I think Julia is a candidate to replace R. Lastly, I think other Scheme languages that are candidates to replace Racket include Gambit, Bigloo and Chicken.

I compiled some programming language popularity statistics in April 2009October 2009October 2010September 2011August 2012 and Februrary 2013. Here’s an update for September 2013:

I made a number of Google searches of the forms below and summed the results:

"implemented in <language>"
"written in <language>"
"developed in <language>"
"programmed in <language>"

I’ve divided the table into sections based on large percentage drops from one language to the next.

|------+-----------------+------------+----------+-------|
| Rank | Language        | # Results  | Previous |  Rank |
|      |                 |            |     Rank | Delta |
|------+-----------------+------------+----------+-------|
|    1 | PHP             | 54,516,000 |        1 |       |
|------+-----------------+------------+----------+-------|
|    2 | C               | 31,536,000 |        2 |       |
|    3 | Python          | 24,700,000 |        4 |     1 |
|    4 | C++             | 22,697,000 |        3 |    -1 |
|    5 | C#              | 20,309,000 |        5 |       |
|------+-----------------+------------+----------+-------|
|    6 | Java            | 13,314,000 |        7 |     1 |
|    7 | Perl            | 11,588,000 |        6 |    -1 |
|------+-----------------+------------+----------+-------|
|    8 | JavaScript      |  6,389,000 |        8 |       |
|    9 | Ruby            |  4,710,000 |        9 |       |
|   10 | FORTRAN         |  4,167,000 |       11 |     1 |
|------+-----------------+------------+----------+-------|
|   11 | Lisp Family (1) |  2,444,860 |       10 |    -1 |
|   12 | ML Family (2)   |  1,787,586 |       16 |     4 |
|   13 | Lua             |  1,653,100 |       15 |     2 |
|   14 | Prolog          |  1,389,900 |       22 |     8 |
|   15 | Smalltalk       |  1,003,200 |       27 |    12 |
|   16 | R               |  1,000,700 |       13 |    -3 |
|   17 | COBOL           |    856,500 |       18 |     1 |
|   18 | Lisp            |    835,600 |       12 |    -6 |
|   19 | Erlang          |    808,400 |       17 |    -2 |
|   20 | Haskell         |    798,700 |       19 |    -1 |
|------+-----------------+------------+----------+-------|
|   21 | (S)ML (3)       |    559,186 |       23 |     2 |
|   22 | Common Lisp     |    548,100 |       20 |    -2 |
|   23 | Go              |    490,230 |       28 |     5 |
|   24 | Scheme          |    487,100 |       25 |     1 |
|   25 | Scala           |    481,100 |       24 |    -1 |
|   26 | OCaml           |    429,700 |       21 |    -5 |
|   27 | Clojure         |    352,000 |       30 |     3 |
|------+-----------------+------------+----------+-------|
|   28 | Forth           |    230,630 |       31 |     3 |
|   29 | Racket          |    222,060 |       33 |     4 |
|   30 | CoffeeScript    |    205,809 |       29 |    -1 |
|------+-----------------+------------+----------+-------|

(1) combines Lisp, Common Lisp, Scheme, Clojure & Racket
(2) combines Haskell, (S)ML & OCaml
(3) summed separate searches for standard ml, sml & ml

I compiled some programming language popularity statistics in April 2009, October 2009, October 2010, September 2011 and August 2012 . Here’s an update for February 2013:

I made a number of Google searches of the forms below and summed the results:

"implemented in <language>"
  "written in <language>"

Naturally this is of very limited utility, and the numbers are only useful when comparing relatively within the same search since the number of results Google returns can vary greatly over time.

I’ve divided the table into sections based on large percentage drops from one language to the next.

|------+-----------------+------------+----------+-------+------------|
| Rank | Language        |    # Search| Previous |  Rank | Delta from |
|      |                 |     Results|     Rank | Delta |    Apr '09 |
|------+-----------------+------------+----------+-------+------------|
|    1 | PHP             |  52,699,000|        1 |       |          3 |
|    2 | C               |  39,330,000|        2 |       |         -1 |
|    3 | C++             |  26,490,000|        4 |     1 |            |
|    4 | Python          |  22,410,000|        3 |    -1 |          1 |
|    5 | C#              |  21,474,000|        5 |       |          2 |
|------+-----------------+------------+----------+-------+------------|
|    6 | Perl            |  11,013,000|        8 |     2 |            |
|    7 | Java            |  10,150,000|        6 |    -1 |         -5 |
|    8 | JavaScript      |   7,340,000|        9 |     1 |          1 |
|------+-----------------+------------+----------+-------+------------|
|    9 | Ruby            |   3,456,000|        7 |    -2 |          1 |
|   10 | Lisp Family (1) |   2,955,000|       10 |       |         -2 |
|   11 | FORTRAN         |   2,256,000|       11 |       |        N/A |
|   12 | Lisp            |   1,708,000|       17 |     5 |            |
|   13 | R               |   1,305,000|       21 |     8 |        N/A |
|   14 | Tcl             |   1,072,100|       13 |    -1 |         -1 |
|   15 | Lua             |   1,011,000|       19 |     4 |          5 |
|   16 | ML Family (2)   |     988,400|       16 |       |         -2 |
|   17 | Erlang          |     842,000|       18 |     1 |         -1 |
|   18 | COBOL           |     729,200|       23 |     5 |        N/A |
|   19 | Haskell         |     707,000|       12 |    -7 |         -4 |
|   20 | Common Lisp     |     557,000|       20 |       |         -2 |
|   21 | OCaml           |     528,000|       24 |     3 |         -4 |
|   22 | Prolog          |     521,000|       25 |     3 |         -3 |
|   23 | (S)ML (3)       |     496,800|       27 |     4 |          1 |
|   24 | Scala           |     426,100|       22 |    -2 |          1 |
|   25 | Scheme          |     347,000|       28 |     3 |        -14 |
|   26 | Groovy          |     320,000|       14 |   -12 |        N/A |
|------+-----------------+------------+----------+-------+------------|
|   27 | Smalltalk       |     201,400|       29 |     2 |         -6 |
|   28 | Go              |     201,200|       15 |   -13 |        N/A |
|   29 | CoffeeScript    |     182,800|       31 |     2 |        N/A |
|   30 | Clojure         |     173,100|       30 |       |         -2 |
|   31 | Forth           |     128,800|       26 |    -5 |         -8 |
|   32 | Caml            |     102,600|       34 |     2 |         -6 |
|   33 | Racket          |      93,500|       33 |       |        N/A |
|   34 | Arc             |      76,400|       32 |    -2 |        -12 |
|   35 | Io              |      60,200|       35 |       |         -8 |
|------+-----------------+------------+----------+-------+------------|

(1) combines Lisp, Scheme, Common Lisp, Racket, Arc & Clojure
(2) combines OCaml, (S)ML, Caml
(3) summed separate searches for standard ml, sml & ml

I compiled some programming language popularity statistics in April 2009, October 2009, October 2010 and September 2011 . Here’s an update for August 2012:

I made a number of Google searches of the forms below and summed the results (some earlier posts averaged the results):

"implemented in <language>"
  "written in <language>"

Naturally this is of very limited utility, and the numbers are only useful when comparing relatively within the same search since the number of results Google returns can vary greatly over time.

While formatting the table, I noticed there was a fairly natural break into three sections. Languages more popular than FORTRAN, languages between FORTRAN and COBOL and languages less popular than COBOL, so I highlighted the three sections :). I’m curious to see if any languages jump into a higher or lower section next time.

I also switched from HTML to using Emacs org-mode to create a textual table since the latter is so much nicer to deal with.

Update 8/5/12: Someone on the TriFunc mailing list mentioned the omission of Groovy. I don’t feel like updating the table, but I’ll record the total number of results here to allow for a comparison next time. Groovy came in at 460,300 results which puts it above Go and in the middle section.

|----+-----------------+------------+----------+----------+------------|
|    | Language        |   # Search | Previous | Position | Delta from |
|    |                 |    Results | Position |    Delta |    Apr '09 |
|----+-----------------+------------+----------+----------+------------|
|  1 | PHP             | 21,120,000 |        2 |        1 |          3 |
|  2 | C               | 15,440,000 |        1 |       -1 |         -1 |
|  3 | Python          | 11,441,000 |        4 |        1 |          2 |
|  4 | C++             |  9,788,000 |        3 |       -1 |         -1 |
|  5 | C#              |  8,319,000 |        5 |        0 |          2 |
|  6 | Java            |  6,020,000 |        6 |        0 |         -4 |
|  7 | Ruby            |  4,784,000 |        9 |        2 |          3 |
|  8 | Perl            |  4,183,000 |        7 |       -1 |         -2 |
|  9 | JavaScript      |  3,117,000 |        8 |       -1 |          0 |
| 10 | Lisp Family (1) |    898,680 |       10 |        0 |         -2 |
|----+-----------------+------------+----------+----------+------------|
| 11 | FORTRAN         |    795,500 |       11 |        0 |        N/A |
|----+-----------------+------------+----------+----------+------------|
| 12 | Haskell         |    490,000 |       14 |        2 |          3 |
| 13 | Tcl             |    476,000 |       12 |       -1 |          0 |
| 14 | Go              |    391,100 |      N/A |      N/A |        N/A |
| 15 | ML Family (2)   |    375,780 |       17 |        2 |         -1 |
| 16 | Lisp            |    352,000 |       13 |       -3 |         -4 |
| 17 | Erlang          |    334,000 |       15 |       -2 |         -1 |
| 18 | Lua             |    304,000 |       16 |       -2 |          2 |
| 19 | Common Lisp     |    256,000 |       19 |        0 |         -1 |
| 20 | R               |    221,000 |      N/A |      N/A |        N/A |
| 21 | Scala           |    219,000 |       22 |        1 |          4 |
|----+-----------------+------------+----------+----------+------------|
| 22 | COBOL           |    218,600 |       18 |       -4 |        N/A |
|----+-----------------+------------+----------+----------+------------|
| 23 | OCaml           |    181,100 |       20 |       -3 |         -6 |
| 24 | Prolog          |    176,300 |       21 |       -3 |         -5 |
| 25 | Forth           |    168,700 |       27 |        2 |         -2 |
| 26 | (S)ML (3)       |    159,700 |       26 |        0 |         -2 |
| 27 | Scheme          |    135,500 |       23 |       -4 |        -16 |
| 28 | Smalltalk       |    114,500 |       24 |       -4 |         -7 |
| 29 | Clojure         |     74,500 |       25 |       -4 |          0 |
| 30 | CoffeeScript    |     68,660 |      N/A |      N/A |        N/A |
| 31 | Arc             |     40,990 |       30 |       -1 |         -9 |
| 32 | Racket          |     39,690 |      N/A |      N/A |        N/A |
| 33 | Caml            |     34,980 |       28 |       -5 |         -7 |
| 34 | Io              |     23,110 |       29 |       -5 |         -7 |
|----+-----------------+------------+----------+----------+------------|

(1) combines Lisp, Scheme, Common Lisp, Racket, Arc & Clojure
(2) combines OCaml, (S)ML, Caml
(3) summed separate searches for standard ml, sml & ml

The Problem

I noticed some unusual behavior with respect to YAML String serialization between my Linux production system and my Mac OSX development system.

After dumping the production database via pg_dump -O –no-acl mydb | gzip > ~/mydb.sql.gz and then restoring it on my development system via rake db:drop; rake db:create; psql mydb < mydb.sql, I noticed that a particular serialized field in my Rails app that should always be an Array of String objects occasionally contained Integers.

After a little research and experimentation, I discovered that the production Linux system would occasionally omit quotations around Strings containing only numeric digits. I haven’t analyzed the pattern fully, but here are some examples where the YAML serialization did or did not use quotes:

  • “90103″
  • 000080
  • “000071″
  • “000124″
  • “000003″
  • 008397
  • 000408
  • 000009
  • 000188
  • “000021″

Further investigation revealed that the Linux production system was using Syck (a “dated C implementation of YAML 1.0″) and my Mac OSX development system was using psych (a “libyaml wrapper (in Ruby core for 1.9.2)”). libyaml is a “fast C implementation of YAML 1.1. So, either the quotation rules have changed between YAML 1.0 and YAML 1.1, or there is a bug in one of the implementations (likely Syck).

The Solution

The solution for proper “future” behavior is pretty simple. Install libyaml on the Linux system as follows:

wget http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
tar xzf yaml-0.1.4.tar.gz
cd yaml-0.1.4
./configure
make
make install

I think that’s enough, but I went ahead and rebuilt my Ruby 1.9.2 just in case it needed to know about the existence of libyaml at build time.

The solution for converting my database with YAML 1.0 serialization to YAML 1.1 serialization is a bit trickier. Since the “dump” and “load” operations are matched for a particular version of YAML, it seems difficult to load the data using YAML 1.0 (thereby retaining the String type when reading an unquoted 000088) and then dump the data using YAML 1.1 (to get proper quoting of ’000088′). Further complicating this is the fact that Rails handles the serialization operations automatically.

It does appear possible to dynamically switch between syck and psyck by using the following:

YAML::ENGINE.yamler = 'syck'
YAML::ENGINE.yamler = 'psych'

So, one option is to repeatedly switch to syck, read in data, switch to psych, and then write the data. <sigh>

Update:

It appears that due to the semantics of the Rails serialize function, it’s not enough to just read the model object using syck and then immediately write with psych because that doesn’t appear to be enough to cause the field to be deserialized. I had to refer to the field for each object. This is a pain because it prevents me from doing a generic loop where I can handle all model objects easily w/o reference to their specific fields.

I’ll withhold judgment for a while, but my first inclination is to consider abandoning YAML serialization for something a little more robust and portable.

Update 2:

It appears my welcome from psych is a serious memory leak. I’ve been running long running Ruby/Rails processes for years, and this is the first time I’ve experienced a failure due to an out of memory condition. There are a number of Google hits regarding the issue. After I fix the leak, I’ll begin researching alternatives to YAML serialization in Rails.

Update 3:

The number of bug reports on psych and rubygems I’ve had to wade through recently is amazing. My current solution is to remove the psych system gem and install Ruby 1.9.3p0 which required upgrading Passenger to the latest version from source to get Ruby 1.9.3 compatibility. I still had to track down a few odd errors such as “undefined method `yaml’ for #<Psych::Nodes::Stream:…>” and “invalid date format in specification: “2011-10-02 00:00:00.000000000Z”” – all because I chose to use the default Rails serialization assuming there would be no issues. Lesson learned.

See Part Five

I compiled some programming language popularity statistics in April 2009, October 2009 and October 2010 . Here’s an update for September 2011:

I made a number of Google searches of the forms below and summed the results (previous posts averaged the results):

"implemented in <language>"
  "written in <language>"

Naturally this is of very limited utility, and the numbers are only useful when comparing relatively within the same search since the number of results Google returns can vary greatly over time.

Language Total Prev. Position Position Delta
C 10,360,000 2 1
PHP 10,351,000 1 -1
C++ 6,495,000 3 0
Python 5,759,000 5 1
C# 5,335,000 4 -1
 
Java 4,890,000 8 2
Perl 3,702,000 6 -1
JavaScript 3,077,000 7 -1
Ruby 1,654,000 9 0
Lisp Family1 1,022,870 11 1
 
FORTRAN 975,600 10 -1
Tcl 594,500 12 0
Lisp 486,000 14 1
Haskell 450,500 16 2
Erlang 419,700 13 -2
 
Lua 367,100 18 2
ML Family2 348,400 17 0
COBOL 308,270 15 -3
Common Lisp 254,900 19 0
OCaml 240,300 21 1
 
Prolog 224,000 20 -1
Scala 203,400 23 1
Scheme 184,700 22 -1
Smalltalk 129,700 24 0
Clojure 84,600 27 2
 
(S)ML3 83,630 25 -1
Forth 69,980 26 -1
Caml 24,470 28 0
Io 17,700 30 1
Arc 12,670 29 -1

1 combines Lisp, Scheme, Common Lisp, Arc & Clojure
2 combines OCaml, (S)ML, Caml
3 summed separate searches for sml and ml

It’s foolish to use an absolute term such as securely in the context of computer security. A better title would have used the qualifier more. If you happen to be a security expert, please be so kind as to leave a comment with any tips, admonishments, etc.

The Task

I need to allow users to upload files to a web server via an SFTP client, period. I don’t want them to be able to do anything else, and I don’t want them to be able to see any files other than their own.

Configure SSH

SSH configuration is located in the /etc/ssh/sshd_config file.

I prefer to disallow the root user from logging in remotely via ssh, change yes to no for PermitRootLogin:

PermitRootLogin no

I prefer to disallow using passwords when logging in remotely, so I require the use of public keys. Change yes to no for PasswordAuthentication. Before you activate the following line, make sure you’ve setup your SSH keys properly or you may lock yourself out of your server. The short answer is to generate a public/private key pair on your local machine and place the public key (typically id_rsa.pub) in the ~/.ssh/authorized_keys file on the server, then test it by logging in via ssh – you should not be prompted for a password if successful:

PasswordAuthentication no

To allow access for these special sftp users, I change the Subsystem sftp line to be:

Subsystem sftp internal-sftp

and add the following at the very end of the config file:

Match Group sftp
        ChrootDirectory /home/%u
        PasswordAuthentication yes
        X11Forwarding no
        ForceCommand internal-sftp

The Match statement sets up a chroot jail for any user in the sftp group so that the user will only have access to their home directory. It also overrides the global rule to disallow password login so the sftp users may use the traditional means of username/password to login. It also forces them to use the sftp command instead of allowing normal shell access.

To activate these changes, restart the ssh daemon:

sudo /etc/init.d/ssh restart

Add Users

When adding users, we need to ensure they’re in the sftp group so that the Match statement in sshd_config will actually match. Alternatively, you could match on some other criteria such as username. First create the sftp group:

sudo groupadd sftp

Then add a user with that group and no shell specified:

sudo adduser --shell /bin/false --ingroup sftp brian

Conclusion

That should take care of everything. Test the new user by attempting to login via ssh:

ssh brian@10.0.0.1
brian@10.0.0.1's password:
This service allows sftp connections only.
Connection to 10.0.0.1 closed.

That should fail as shown above. Also test to see if you can view any directories outside of the users’s home directory:

sftp brian@10.0.0.1
sftp> ls /etc
Couldn't stat remote file: No such file or directory
Can't ls: "/etc" not found
sftp>

Some additional tools to enhance security:

  • shorewall firewall
  • fail2ban (locks ip addresses out for a time period after N failed attempts)
  • rkhunter (checks for root kits)
  • chkrootkit (checks for root kits)

I stumbled upon a programming challenge a company was using for recruitment purposes and thought I’d create a Haskell solution as a learning exercise. The first problem was to find the longest palindrome embedded in a text string.

The following Haskell solution seems very readable to me, but it’s a naive solution that’s inefficient. It computes an answer on my 2.6 year old Macbook Pro in under 4 seconds, but a 2x increase in text requires a 7x increase in CPU time.

I believe there are algorithms to find the longest embedded palindrome in linear time, so I may post a refinement later.

-- Find the longest palindrome in a text string.

module Main where
import Char

text = "I'll just type in some example text here and embed a little 
palindrome - A man, a plan, a canal, Panama! - I expect that will be 
the longest palindrome found in this text.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris 
tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id 
condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere 
ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed
 odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante 
ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium 
dolor, et mattis leo turpis non est. Sed aliquet, sapien quis 
consequat condimentum, sem magna ornare ligula, id blandit odio nisl 
vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt 
lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros 
accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc 
et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc 
vitae tempus purus."

clean = map toLower . filter isAlpha

palindrome str = str == reverse str

substrings []     = []
substrings (x:xs) = substrings' (x:xs) ++ substrings xs where
  substrings' []     = []
  substrings' (y:ys) = [y] : [ (y:s) | s <- substrings' ys ]

longest []     = []
longest (x:xs) = if length x > length max then x else max
  where max = longest xs

longest_palindrome xs =
  longest (filter palindrome (substrings (clean text)))

main = print (longest_palindrome text)

As a comparison, I translated the program into Ruby. I program predominantly in Ruby these days, and I like it, but the Ruby version is 25 times slower (98 sec. vs. 4 sec.), and it’s 2.4 times more lines of code (31 vs. 13 – excluding the text).

A gain in runtime efficiency, expressive power and multi-core capability is very attractive!

I’m using Ruby 1.9.2 and GHC 6.12.3 on Mac OS X 10.5.8 on a 2.4 GHz Core 2 Duo w/ 4 GB RAM.

ruby 1.9.2p0 (2010-08-18 revision 29036) [i386-darwin9.8.0]
Glasgow Haskell Compiler, Version 6.12.3, for Haskell 98, stage 2 booted by GHC version 6.12.2

TEXT = <<END
I'll just type in some example text here and embed a little
palindrome - A man, a plan, a canal, Panama! - I expect that will be
the longest palindrome found in this text.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris
tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id
condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere
ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed
 odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante
ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium
dolor, et mattis leo turpis non est. Sed aliquet, sapien quis
consequat condimentum, sem magna ornare ligula, id blandit odio nisl
vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt
lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros
accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc
et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc
vitae tempus purus.
END

def clean str
  str.gsub(/[^A-Za-z]/,'').downcase
end

def palindrome? str
  str == str.reverse
end

def subs str
  return [] if str.empty?
  y = str[0,1]
  subs(str[1..-1]).inject([y]) do |result, s|
    result << y + s
    result
  end
end

def substrings str
  return [] if str.empty?
  subs(str) + substrings(str[1..-1])
end

def longest strs
  strs.inject("") do |max, str|
    max = str if str.length > max.length
    max
  end
end

def longest_palindrome str
  longest(substrings(clean(str)).inject([]) {|result, str|
            result << str if palindrome?(str)
            result
          })
end

puts longest_palindrome(TEXT)

Update 3/13/11 19:30
Rick’s comment (see his blog post linked to in his comment) regarding the importance of algorithm choice is certainly a valid one – a better algorithm in a slower language may win over an inferior algorithm in a faster language (for large enough datasets). However, what I’m becoming interested in is the fact that one may gain both productivity/power and runtime speed by making wise programming language choices.

In the case of an interpreted language such as Ruby, it’s helpful to “stay in C code” as much as possible. In other words, to favor built-in library routines that have been implemented in C over hand written Ruby code. As much as I like Ruby, this is one of the things that bothers me – the fact that the Ruby code written by the programmer is vastly inferior in performance to the built-in library routines.

I wasn’t planning on refining the Haskell version this soon, but after seeing Rick’s blog post response, I couldn’t resist :)

I should first provide some background info for context. When I wrote the original Haskell version above, I was in the middle of an online programming challenge, and my goal was simply to compute the answer in a reasonable amount of time to move on to the next challenge, so the brief, easily understandable, Haskell version worked great. Hence, my disclaimers in the post regarding the naivity and inefficiency of the solution. The Ruby code in this post was an afterthought simply to see a comparison between identical algorithms. Given that apple & apple comparison, I’m impressed with the brevity and speed of the Haskell version.

Since I’m still very much a Haskell newbie, and short on time, I found an incredible solution by Johan Jeuring. It’s a little bit longer than the Ruby version, but much, much faster. It’s so fast, I had to increase the input quite a bit to get a reasonable comparison – I replicated the original text 23 times and reversed one of the replications to make a fairly long palindrome.

Rick’s Ruby version took 169.4 seconds, Johan’s Haskell version took 0.032 seconds. In other words, Ruby takes over 5,000 times as long to compute the result. Clearly this is an apples and oranges comparison, but I fully expect that a Ruby version using an identical algorithm will take 100 times as long (or longer) to run and would be less concise. Giving up runtime performance to gain programmer power is one thing, but giving up runtime performance and power is a tough pill to swallow.

Here is a slightly modified version of Johan Jeuring’s code. He was also kind enough to provide his code here:

-- Reorganized from Johan Jeuring's solution:
module Main where
import Data.List (maximumBy,intersperse)
import Data.Char
import Data.Array

text = "I'll just type in some example text here and embed a little 
palindrome - A man, a plan, a canal, Panama! - I expect that will be 
the longest palindrome found in this text.
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Integer volutpat lorem imperdiet ante bibendum ullamcorper. Mauris 
tempor hendrerit justo at elementum. Vivamus elit magna, accumsan id 
condimentum a, luctus a ipsum. Donec fermentum, lectus at posuere 
ullamcorper, mauris lectus tincidunt nulla, ut placerat justo odio sed
 odio. Nulla blandit lorem sit amet odio varius nec vestibulum ante 
ornare. Aliquam feugiat, velit a rhoncus rutrum, turpis metus pretium 
dolor, et mattis leo turpis non est. Sed aliquet, sapien quis 
consequat condimentum, sem magna ornare ligula, id blandit odio nisl 
vitae erat. Nam vulputate tincidunt quam, non lacinia risus tincidunt 
lacinia. Aenean fermentum tristique porttitor. Nam id dolor a eros 
accumsan imperdiet. Aliquam quis nibh et dui ultricies cursus. Nunc 
et ante non sapien vehicula rutrum. Duis posuere dictum blandit. Nunc 
vitae tempus purus."

clean = map toLower . filter isAlpha

longestPalindrome input =
  let inputArray      =  listArrayl0 input
      (maxLength,pos) =  maximumBy
                            ((l,_) (l',_) -> compare l l')
                            (zip (palindromesAroundCentres inputArray) [0..])
  in showPalindrome inputArray (maxLength,pos)

longestPalindromes m input =
  let inputArray =  listArrayl0 input
  in concat $ intersperse "n"
            $ map (showPalindrome inputArray)
            $ filter ((m String
lengthLongestPalindrome = show . maximum . palindromesAroundCentres . listArrayl0

lengthLongestPalindromes :: String -> String
lengthLongestPalindromes = show . palindromesAroundCentres . listArrayl0

palindromesAroundCentres a =
  let (afirst,_) = bounds a
  in reverse $ extendTail a afirst 0 []

extendTail a n currentTail centres
  | n > alast = finalCentres currentTail centres
                   (currentTail:centres)
  | n-currentTail == afirst =
      extendCentres a n (currentTail:centres)
                    centres currentTail
  | a!n == a!(n-currentTail-1) =
      extendTail a (n+1) (currentTail+2) centres
  | otherwise =
      extendCentres a n (currentTail:centres)
                    centres currentTail
  where  (afirst,alast)  =  bounds a

extendCentres a n centres tcentres centreDistance
  | centreDistance == 0 =
      extendTail a (n+1) 1 centres
  | centreDistance-1 == head tcentres  =
      extendTail a n (head tcentres) centres
  | otherwise =
      extendCentres a n (min (head tcentres)
                    (centreDistance-1):centres)
                    (tail tcentres) (centreDistance-1)

finalCentres 0     _        centres  =  centres
finalCentres (n+1) tcentres centres  =
  finalCentres n
               (tail tcentres)
               (min (head tcentres) n:centres)
finalCentres _     _        _        =  error "finalCentres: input < 0"

showPalindrome a (len,pos) =
  let startpos = pos `div` 2 - len `div` 2
      endpos   = if odd len
                 then pos `div` 2 + len `div` 2
                 else pos `div` 2 + len `div` 2 - 1
  in show [a!n|n <- [startpos .. endpos]]

listArrayl0 string  = listArray (0,length string - 1) string

sampleText s = concat (replicate 8 s ++ [ "x" ] ++ [ reverse s ] ++ replicate 14 s)

main = print (longestPalindrome (clean (sampleText text)))

Here’s the relevant change to Rick’s Ruby version:


def sample_text str
  str * 8 + 'x' + str.reverse + str * 14
end

puts clean(sample_text(TEXT)).longest_palindrome

I first wrote a program to solve the Cracker Barrel peg board puzzle (15 holes arranged in a triangle with 14 golf tees) many years ago as youth using the BASIC language. I wish I still had the source to that, because I’m pretty sure this Haskell version would kick its butt :)

I’m still trying to get my head around Haskell, so I expect there are many possible improvements to this program, but even so, I’m pleased with how Haskell allows me to express logic.

-- Solve the Cracker Barrel Peg Board Puzzle

module Main where

type Pos = (Int, Int)
type Move = (Pos, Pos)
type Board = [ Pos ]

isOccupied b p = elem p b
isEmpty b p    = not (isOccupied b p)
isPos (r,c)    = elem r [0..4] && elem c [0..r]

-- Possible moves for one position
positionMoves b p = [ (p, dst) | (neighbor, dst)  isPos p1 && isPos p2)
                   [ ((r + or `div` 2, c + oc `div` 2),(r + or, c + oc)) |
                     (or, oc) <- [ (-2,0), (0,2), (2,2), (2,0), (0,-2), (-2,-2) ] ]

-- Possible moves for all positions on the board
possibleMoves b = concat [ positionMoves b pos | pos  (pos /= src) && (pos /= neighbor)

-- Make moves until the goal position is met
play b p moves =
  if null nextMoves then
    if length b == 1 && head b == p then reverse moves else []
  else
    tryMoves nextMoves
  where
    nextMoves       = possibleMoves b
    tryMoves []     = []
    tryMoves (m:ms) =
      let result = play (move b m) p (m:moves)
      in if null result then tryMoves ms else result

-- Compute the initial empty position to know the goal, then solve the puzzle
solve b = let emptyPos = head [ (r,c) | r <- [0..4], c <- [0..r], isEmpty b (r,c) ]
          in play b emptyPos []

-- A sample board with the topmost hole empty
board :: Board
board = [ (1,0), (1,1),
          (2,0), (2,1), (2,2),
          (3,0), (3,1), (3,2), (3,3),
          (4,0), (4,1), (4,2), (4,3), (4,4) ]

main = print (solve board)

See Part Five

I compiled some programming language popularity statistics in April 2009 and October 2009 . Here’s an update for October 2010:

I made a number of Google searches of the forms below and averaged the results:

"implemented in <language>"
  "written in <language>"

Naturally this is of very limited utility, and the numbers are only useful when comparing relatively within one column since the number of results Google returns can vary greatly over time.

Language Apr 2009 Oct 2009 Oct 2010 Position Delta
PHP 680,000 5,083,500 14,096,000 +3
C 1,905,500 16,975,000 9,675,000 -1
C++ 699,000 6,270,000 6,510,000 -1
C# 349,700 2,125,000 5,132,000 +4
Python 396,000 3,407,000 5,114,500 +1
Perl 365,500 3,132,500 4,675,000 +1
JavaScript 102,700 1,163,000 2,120,000 +4
Java 850,000 5,118,000 1,495,500 -5
Ruby 99,650 227,000 1,426,000 +13
FORTRAN 1,621,000 770,850 0
Lisp Family1 176,507 3,489,650 399,685 -6
Tcl 44,800 382,000 313,400 +5
Erlang 22,285 161,700 188,800 +12
Lisp 61,900 486,500 174,050 +1
COBOL 247,300 166,435 +6
Haskell 22,550 280,500 157,150 +4
ML Family2 29,062 1,003,800 149,005 -5
Lua 13,065 131,800 128,150 +9
Common Lisp 20,600 554,500 112,750 -5
Prolog 17,750 390,500 100,000 -4
OCaml 22,000 343,500 99,050 -3
Scheme 86,450 2,100,000 82,650 -13
Scala 3,570 66,250 65,950 +6
Smalltalk 9,105 187,500 56,950 0
(S)ML3 5,173 590,700 42,130 -12
Forth 6,465 146,450 25,880 0
Clojure 782 62,200 23,525 +3
Caml 1,889 69,600 7,825 0
Arc 6,775 286,500 6,710 -10
Io 1,760 198,500 3,025 -7

1 combines Lisp, Scheme, Common Lisp, Arc & Clojure
2 combines OCaml, (S)ML, Caml
3 summed separate searches for sml and ml

Follow

Get every new post delivered to your Inbox.