Lojic Technologies

Posts Tagged ‘programming

Beware of YAML serialization on Linux without libyaml

leave a comment »

The Problem

I noticed some unusual behavior with respect to YAML String serialization between my Linux production system and my Mac OSX development system.

After dumping the production database via pg_dump -O –no-acl mydb | gzip > ~/mydb.sql.gz and then restoring it on my development system via rake db:drop; rake db:create; psql mydb < mydb.sql, I noticed that a particular serialized field in my Rails app that should always be an Array of String objects occasionally contained Integers.

After a little research and experimentation, I discovered that the production Linux system would occasionally omit quotations around Strings containing only numeric digits. I haven’t analyzed the pattern fully, but here are some examples where the YAML serialization did or did not use quotes:

  • “90103”
  • 000080
  • “000071”
  • “000124”
  • “000003”
  • 008397
  • 000408
  • 000009
  • 000188
  • “000021”

Further investigation revealed that the Linux production system was using Syck (a “dated C implementation of YAML 1.0”) and my Mac OSX development system was using psych (a “libyaml wrapper (in Ruby core for 1.9.2)”). libyaml is a “fast C implementation of YAML 1.1. So, either the quotation rules have changed between YAML 1.0 and YAML 1.1, or there is a bug in one of the implementations (likely Syck).

The Solution

The solution for proper “future” behavior is pretty simple. Install libyaml on the Linux system as follows:

wget http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
tar xzf yaml-0.1.4.tar.gz
cd yaml-0.1.4
make install

I think that’s enough, but I went ahead and rebuilt my Ruby 1.9.2 just in case it needed to know about the existence of libyaml at build time.

The solution for converting my database with YAML 1.0 serialization to YAML 1.1 serialization is a bit trickier. Since the “dump” and “load” operations are matched for a particular version of YAML, it seems difficult to load the data using YAML 1.0 (thereby retaining the String type when reading an unquoted 000088) and then dump the data using YAML 1.1 (to get proper quoting of ‘000088’). Further complicating this is the fact that Rails handles the serialization operations automatically.

It does appear possible to dynamically switch between syck and psyck by using the following:

YAML::ENGINE.yamler = 'syck'
YAML::ENGINE.yamler = 'psych'

So, one option is to repeatedly switch to syck, read in data, switch to psych, and then write the data. <sigh>


It appears that due to the semantics of the Rails serialize function, it’s not enough to just read the model object using syck and then immediately write with psych because that doesn’t appear to be enough to cause the field to be deserialized. I had to refer to the field for each object. This is a pain because it prevents me from doing a generic loop where I can handle all model objects easily w/o reference to their specific fields.

I’ll withhold judgment for a while, but my first inclination is to consider abandoning YAML serialization for something a little more robust and portable.

Update 2:

It appears my welcome from psych is a serious memory leak. I’ve been running long running Ruby/Rails processes for years, and this is the first time I’ve experienced a failure due to an out of memory condition. There are a number of Google hits regarding the issue. After I fix the leak, I’ll begin researching alternatives to YAML serialization in Rails.

Update 3:

The number of bug reports on psych and rubygems I’ve had to wade through recently is amazing. My current solution is to remove the psych system gem and install Ruby 1.9.3p0 which required upgrading Passenger to the latest version from source to get Ruby 1.9.3 compatibility. I still had to track down a few odd errors such as “undefined method `yaml’ for #<Psych::Nodes::Stream:…>” and “invalid date format in specification: “2011-10-02 00:00:00.000000000Z”” – all because I chose to use the default Rails serialization assuming there would be no issues. Lesson learned.

Written by Brian Adkins

November 22, 2011 at 2:44 pm

Posted in programming

Tagged with , ,

Programming Language Popularity – Part Four

leave a comment »

See Part Five

I compiled some programming language popularity statistics in April 2009, October 2009 and October 2010 . Here’s an update for September 2011:

I made a number of Google searches of the forms below and summed the results (previous posts averaged the results):

"implemented in <language>"
  "written in <language>"

Naturally this is of very limited utility, and the numbers are only useful when comparing relatively within the same search since the number of results Google returns can vary greatly over time.

Language Total Prev. Position Position Delta
C 10,360,000 2 1
PHP 10,351,000 1 -1
C++ 6,495,000 3 0
Python 5,759,000 5 1
C# 5,335,000 4 -1
Java 4,890,000 8 2
Perl 3,702,000 6 -1
JavaScript 3,077,000 7 -1
Ruby 1,654,000 9 0
Lisp Family1 1,022,870 11 1
FORTRAN 975,600 10 -1
Tcl 594,500 12 0
Lisp 486,000 14 1
Haskell 450,500 16 2
Erlang 419,700 13 -2
Lua 367,100 18 2
ML Family2 348,400 17 0
COBOL 308,270 15 -3
Common Lisp 254,900 19 0
OCaml 240,300 21 1
Prolog 224,000 20 -1
Scala 203,400 23 1
Scheme 184,700 22 -1
Smalltalk 129,700 24 0
Clojure 84,600 27 2
(S)ML3 83,630 25 -1
Forth 69,980 26 -1
Caml 24,470 28 0
Io 17,700 30 1
Arc 12,670 29 -1

1 combines Lisp, Scheme, Common Lisp, Arc & Clojure
2 combines OCaml, (S)ML, Caml
3 summed separate searches for sml and ml

Written by Brian Adkins

September 22, 2011 at 10:37 am

Programming Language Popularity – Part Three

with one comment

See Part Five

I compiled some programming language popularity statistics in April 2009 and October 2009 . Here’s an update for October 2010:

I made a number of Google searches of the forms below and averaged the results:

"implemented in <language>"
  "written in <language>"

Naturally this is of very limited utility, and the numbers are only useful when comparing relatively within one column since the number of results Google returns can vary greatly over time.

Language Apr 2009 Oct 2009 Oct 2010 Position Delta
PHP 680,000 5,083,500 14,096,000 +3
C 1,905,500 16,975,000 9,675,000 -1
C++ 699,000 6,270,000 6,510,000 -1
C# 349,700 2,125,000 5,132,000 +4
Python 396,000 3,407,000 5,114,500 +1
Perl 365,500 3,132,500 4,675,000 +1
JavaScript 102,700 1,163,000 2,120,000 +4
Java 850,000 5,118,000 1,495,500 -5
Ruby 99,650 227,000 1,426,000 +13
FORTRAN 1,621,000 770,850 0
Lisp Family1 176,507 3,489,650 399,685 -6
Tcl 44,800 382,000 313,400 +5
Erlang 22,285 161,700 188,800 +12
Lisp 61,900 486,500 174,050 +1
COBOL 247,300 166,435 +6
Haskell 22,550 280,500 157,150 +4
ML Family2 29,062 1,003,800 149,005 -5
Lua 13,065 131,800 128,150 +9
Common Lisp 20,600 554,500 112,750 -5
Prolog 17,750 390,500 100,000 -4
OCaml 22,000 343,500 99,050 -3
Scheme 86,450 2,100,000 82,650 -13
Scala 3,570 66,250 65,950 +6
Smalltalk 9,105 187,500 56,950 0
(S)ML3 5,173 590,700 42,130 -12
Forth 6,465 146,450 25,880 0
Clojure 782 62,200 23,525 +3
Caml 1,889 69,600 7,825 0
Arc 6,775 286,500 6,710 -10
Io 1,760 198,500 3,025 -7

1 combines Lisp, Scheme, Common Lisp, Arc & Clojure
2 combines OCaml, (S)ML, Caml
3 summed separate searches for sml and ml

Written by Brian Adkins

October 8, 2010 at 3:30 pm

Programming in Standard ML – Part 3

leave a comment »

Table of Contents

Chapter 6 – Case Analysis

Tuple types are homogeneous e.g. all values of type int*real are pairs containing an int and a real. Match failures occur at compile time. Types that have more than one form, such as int, are heterogeneous. Pattern matches fail at run time as a bind failure.

ML defines functions over heterogeneous types using clausal function expressions. For example:

fn pat1 => exp1
 | pat2 => exp2
 | ...
 | patn => expn

Each component pat => exp is called a clause, or a rule. The entire assembly of rules is a called a match. For example:

val recip : int -> int =
    fn 0 => 0 | n:int => 1 div n

The fun notation is generalized so we can be more concise:

fun recip 0 = 0
  | recip (n:int) = 1 div n

Case analysis on the values of a heterogeneous type is performed by application of a clausally-defined function. The notation:

case exp
  of pat1 => exp1
   | ...
   | patn => expn

is short for the application:

(fn pat1 => exp1
  | ...
  | patn => expn)

Matches are subject to two forms of “sanity check” – exhaustiveness checking and redundancy checking.

Chapter 7 – Recursive Functions

For a function to be able to call itself, it needs a name. For example:

val rec factorial : int->int =
    fn 0 => 1 | n:int => n * factorial (n-1)

or using fun notation:

fun factorial 0 = 1
  | factorial (n:int) = n * factorial (n-1)


If we define a helper function that accepts an accumulator we can reduce the storage needed:

fun helper (0,r:int) = r
  | helper (n:int,r:int) = helper (n-1,n*r)
fun factorial (n:int) = helper (n,1)

It’s better programming style to hide the helper function w/in a local declaration:

    fun helper (0,r:int) = r
      | helper (n:int,r:int) = helper (n-1,n*r)
    fun factorial (n:int) = helper (n,1)

Tail recursive functions are analogous to loops in imperative languages – they iterate the computation w/o needing auxiliary storage.

Mutual Recursion

Definitions which are mutually recursive can be joined together with the and keyword to indicate they are defined simultaneously by mutual recursion:

fun even 0 = true
  | even n = odd (n-1)
and odd 0 = false
  | odd n = even (n-1)

Chapter 8 – Type Inference and Polymorphism

Standard ML allows you to omit type information whenever it can be determined from context. Consider the following:

fn s:string => s ^ "n"

There is no need to declare the type of s since it can be inferred from the context, so we may write:

fn s => s ^ "n"

A type scheme is a type expression involving one or more type variables standing for an unknown, but arbitrary type expression. Type variables are written ‘a (pronounced alpha), ‘b (pronounced beta), etc. For example, the type scheme ‘a->’a has instances int->int, string->string, (int*int)->(int*int), and (‘b->’b)->(‘b->’b), etc. It does not have the type int->string.

We may bind an identity function to the variable I as follows:

val I : 'a->'a = fn x=>x

We may also write:

fun I(x:'a) : 'a = x

Standard ML eliminates the need to ascribe a type scheme to the variable:

val I = fn x=>x


fun I(x) = x

Chapter 9 – Programming with Lists

The values of type type list are the finite lists of values of type type:

1. nil is a value of type typ list.
2. if h is a value of type typ, and t is a value of type typ list,
   then h::t is a value of type typ list.
3. Nothing else is a value of type typ list.

The type expression typ list is a postfix notation for the application of the type constructor list to the type typ.

A value val of type typ list has the form:
val1 :: (val2 :: (… :: (valn :: nil) … ))

The :: operator is right-associative, so we may omit parentheses:
val1 :: val2 :: … :: valn :: nil

Or, we may use list notation:
[ val1, val2, …, valn ]

Computing With Lists

Some examples:

fun length nil = 0
  | length (_::t) = 1 + length t

Note we do not give a name to the head of the list, instead we use a wildcard _

fun append (nil, l) = l
  | append (h::t, l) = h :: append (t, l)

The latter is built into Standard ML and is written using infix as: exp1 @ exp2

fun rev nil = nil
  | rev (h::t) = rev t @ [h]

The running time of the latter is O(n2). The following definition makes use of an accumulator and has a running time of O(n):

    fun helper (nil, a) = a
      | helper (h::t, a) = helper (t, h::a)
    fun rev' l = helper (l, nil)

Chapter 10 – Concrete Data Types

Non-Recursive Datatypes

Example of nullary i.e. zero argument, constructors:

datatype suit = Spades | Hearts | Diamonds | Clubs

It is conventional to capitalize the names of value constructors, but this is not required by the language.

Datatypes may be parameterized by a type:

datatype 'a option = NONE | SOME of 'a

The values are NONE or Some val, where val is a value of type typ. The option type constructor is pre-defined in Standard ML.

Option types can also be used in aggregate data structures:

type entry = { name:string, spouse string option }

An entry for an unmarried person would have a spouse field with a value of NONE.

Recursive Datatypes

datatype 'a tree =
  Empty |
  Node of 'a tree * 'a * 'a tree

1. The empty tree Empty is a binary tree.
2. If tree_1 and tree_2 are binary trees, and val is a value of type type, then Node (tree_1, val, tree_2) is a binary tree.
3. Nothing else is a binary tree.
A function to compute the height of a binary tree, and one to compute the number of nodes:

fun height Empty = 0
  | height (Node (lft, _, rht)) = 1 + max (height lft, height rht)

fun size Empty = 0
  | size (Node (lft, _, rht)) = 1 + size lft + size rht

Heterogeneous Data Structures

The tree data type above requires that the type of the data items at the nodes must be the same for every node of the tree. To represent a heterogeneous tree, the data item must be labelled with enough info to determine the type at run-time.

datatype int_or_string =
  Int of int |
  String of string

type int_or_string =
  int_or_string tree

Datatype declarations and pattern matching can be useful for manipulating the abstract syntax of a language. Consider an example representing arithmetic expressions:

datatype expr =
  Numeral of int |
  Plus of expr * expr |
  Times of expr * expr

fun eval (Numeral n) = Numeral n
  | eval (Plus (e1, e2)) =
        val Numeral n1 = eval e1
        val Numeral n2 = eval e2
        Numeral (n1+n2)
  | eval (Times (e1, e2)) =
        val Numeral n1 = eval e1
        val Numeral n2 = eval e2
        Numeral (n1*n2)

If we extend the expr datatype as follows:

datatype expr =
  Numeral of int |
  Plus of expr * expr |
  Times of expr * expr
  Recip of expr

The compiler will complain about eval being incompatible with the new version of expr. Recompiling eval will produce an inexhaustive match warning since eval lacks a case for Recip. This is one of the benefits of static typing provided in Standard ML.

Written by Brian Adkins

August 15, 2009 at 2:43 pm

Posted in programming

Tagged with ,

Send Growl Notifications From Carbon Emacs On OSX

leave a comment »

I found a handy article for sending growl notifications from Emacs and made some modifications for my particular setup.

Step One: Register Emacs with Growl
We need to register Carbon Emacs with Growl. This is a one time setup task. The following AppleScript script can be run in Script Editor.

tell application "GrowlHelperApp"
	-- Declare a list of notification types
	set the allNotificationsList to {"Emacs Notification"}

	-- Declare list of active notifications.  If some of them
	-- isn't activated, user can do this later via preferences
	set the enabledNotificationsList to {"Emacs Notification"}

	-- Register our application in Growl.
	register as application "Emacs.app" all notifications allNotificationsList default notifications enabledNotificationsList icon of application "Emacs.app"
end tell

Step Two: Create Growl Notification Script
This is a template of the AppleScript that will be used in the Emacs Lisp function. It can be tested by executing it in the Script Editor.

tell application "GrowlHelperApp"
	notify with name "Emacs Notification" title "Emacs alert" description "Message!!!" application name "Emacs.app"
end tell

Step Three: Emacs Lisp Function
Create an Emacs Lisp function that can be invoked with a title and message.

(defun bja-growl-notification (title message &optional sticky)
  "Send a Growl notification"
   (format "tell application "GrowlHelperApp"
              notify with name "Emacs Notification" title "%s" description "%s" application name "Emacs.app" sticky %s
           end tell"
           (replace-regexp-in-string """ "''" message)
           (if sticky "yes" "no"))))

This can be tested with the following function which sends two growl notifications. The “sticky” one requires an explicit dismissal.

(defun bja-growl-test ()
  (bja-growl-notification "Emacs Notification" "This is my message")
  (bja-growl-notification "Emacs Notification" "This is my sticky message" t))

UPDATE: I noticed bja-growl-notification was failing if the message had embedded double quotes, so I added a call to replace-regexp-in-string to replace them with two apostrophes.

UPDATE 2: My main motivation for setting this up was to be able to generate growl notifications from ERC (Emacs IRC client). However, I quickly realized that having a generic reminder that I can easily fire up from within Emacs is very handy. Here’s the code:

(defun bja-growl-timer (minutes message)
  "Issue a Growl notification after specified minutes"
  (interactive (list (read-from-minibuffer "Minutes: " "10")
                     (read-from-minibuffer "Message: " "Reminder") ))
  (run-at-time (* (string-to-number minutes) 60)
               (lambda (minutes, message)
                 (bja-growl-notification "Emacs Reminder" message t))

Written by Brian Adkins

August 6, 2009 at 6:42 am

Posted in programming

Tagged with , ,

Use Ruby to parse NMEA sentences from your GPS

with 2 comments

I recently obtained a mobile broadband device that has a built in GPS receiver and can emit NMEA sentences. My old Garmin portable GPS can emit NMEA also, but it’s a pain to hookup to the laptop. Combining a GPS unit in a mobile broadband device is a great idea.

Update: it appears that the accuracy radius of the wireless card is quite a bit larger than my old Garmin unit. The Garmin is usually between 15 and 30 feet, but the Sierra Wireless 598U ranges from 100 to 1,000 feet or more.

After installing the ruby-serialport gem, I was able to write a simple Ruby program to read GPS information from the device and update a remote file on my web server to allow real time location tracking.

Add a simple server side script to read the file and update an iframed Google Map and you’re all set.

The code is also in the Ruby section of my sample code repository on Github.

sudo gem install ruby-serialport
# Author: Brian Adkins
# Date:   2009/04/08
# Copyright 2009 Brian Adkins - All Rights Reserved
# Ruby program to retrieve and parse GPS information (via NMEA sentences)
# from a Sprint Sierra Wireless 598U device.
# ruby gps-nmea.rb                # prints latititude/longitude info
# ruby gps-nmea.rb update-remote  # scp a file of location info to a remote server
# This program depends on the ruby-serialport gem:
# sudo gem install ruby-serialport
# From: http://www.gpsinformation.org/dale/nmea.htm#GGA
#  $GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,46.9,M,,*47
# Where:
#      GGA          Global Positioning System Fix Data
#      123519       Fix taken at 12:35:19 UTC
#      4807.038,N   Latitude 48 deg 07.038' N
#      01131.000,E  Longitude 11 deg 31.000' E
#      1            Fix quality: 0 = invalid
#                                1 = GPS fix (SPS)
#                                2 = DGPS fix
#                                3 = PPS fix
#              4 = Real Time Kinematic
#              5 = Float RTK
#                                6 = estimated (dead reckoning) (2.3 feature)
#              7 = Manual input mode
#              8 = Simulation mode
#      08           Number of satellites being tracked
#      0.9          Horizontal dilution of position
#      545.4,M      Altitude, Meters, above mean sea level
#      46.9,M       Height of geoid (mean sea level) above WGS84
#                       ellipsoid
#      (empty field) time in seconds since last DGPS update
#      (empty field) DGPS station ID number
#      *47          the checksum data, always begins with *

require 'rubygems'
require 'serialport'

# Emacs macro to reset user modified values (highlight, then: M-x eval-region )
# ((lambda (&optional arg) "Keyboard macro." (interactive "p") (kmacro-exec-ring-item (quote ("USERNAME
372"" 0 "%d")) arg)))

# --- MODIFY THESE -- #
USERNAME   = ""  # Username for remote host
HOSTNAME   = ""  # Remote host name e.g. foo.com
REMOTE_DIR = ""  # Remote directory e.g. /var/www/bar
# --- MODIFY THESE -- #

port_str  = '/dev/cu.sierra05'
baud_rate = 9600
data_bits = 8
stop_bits = 1
parity    = SerialPort::NONE

sp = SerialPort.new(port_str, baud_rate, data_bits, stop_bits, parity)

# lat is of the form 4807.038 where the first 2 digits are degrees and
#   the remainder is minutes.
# dir is either 'N' or 'S'
def convert_lat lat, dir
  degrees = lat[0,2].to_f + (lat[2,lat.length-2].to_f / 60.0)
  dir == 'N' ? degrees : -degrees

# lon is of the form 01131.000 where the first 3 digits are degrees and
#   the remainder is minutes.
# dir is either 'E', or 'W'
def convert_lon lon, dir
  degrees = lon[0,3].to_f + (lon[3,lon.length-2].to_f / 60.0)
  dir == 'E' ? degrees : -degrees

TEMP_PATH = '/tmp'
TEMP_FILE = 'location.txt'

def update_remote_info lat, lon
  File.open("#{TEMP_PATH}/#{TEMP_FILE}", 'w') do |tf|
    tf.puts Time.now.to_s
    tf.puts "#{lat},#{lon}"
  puts 'Updating remote location info'

# 99 requests should be sufficient to find a $GPGGA sentence
99.times do
  if (str = sp.gets) =~ /^$GPGGA/
    fix = str.split(',')
    if fix[6] == '1'
      lat = convert_lat(fix[2], fix[3])
      lon = convert_lon(fix[4], fix[5])
      if ARGV[0] == 'update-remote'
        puts "#{lat}, #{lon}"
      exit 0

puts "Invalid data - GPS coordinates not found"

Written by Brian Adkins

April 8, 2009 at 11:07 am

Posted in programming, technology

Tagged with , , ,