The timeless repository

Sexp for Rubyists

Written by Magnus Holm.

Sorry to interrupt you, but you’ve totally lost me. What is this “Sexp” you’re speaking of? I’ve heard it before, but never quite understood it…

Oh, don’t feel sorry! It’s still quite esoteric for Rubyists.

Yeah…

Okay. Let’s start at the beginning. Lisp!

With all the parenthesis?

Bingo! Have you tried it?

Not really. It seems a little too “hardcore”, if you know what I mean?

Ah, yes. It’s just a neat little language, nothing to be afraid of. Let me show you an example:

(define (plus-five n)
  (+ n 5))

It’s a method that adds five to the argument? Like this:

def plus_five(n)
  n + 5
end

Yep, but more importantly: It’s also an example of S-expressions. Because Sexp is really just a data format.

Data format? Like YAML?

Just like YAML. It has support for numbers, symbols, pairs, lists and nil.

Really?

Let me explain a few more things. You see, Lisp wasn’t originally supposed to be programmed in S-expression. Nope, the idea was to have complex, human-readable M-expressions which should manipulate data in S-expressions.

    Data  =  S-expressions
    Code  =  M-expressions

Another idea was that M-expressions could easily be compiled into S-expressions. So the computer itself would only work with S-expressions, while humans could focus on M-expressions.

    M-expressions  -->   S-expressions

However, the first Lisp implementations only accepted S-expressions since it was easier to start from there and rather add M-expression support later.

So what happend with these M-expressions?

They were “receded into the indefinite future” as the creator, John McCarthy, said. It turned out that S-expressions were readable enough.

Go on, please.

Now Lisp suddenly used S-expressions for both code and data. The interesting thing is how code was represented in Lisp. It’s in fact very simple: The first element in the list is the operator/function and the rest is the operands/arguments:

(+ 1 2)
(plus-five 5)
(factorial 10)

This makes Lisp a very simple language to implement. You need very few primitives before you have a fully working Lisp system. In fact, I would recommend watching James Coglan’s screencast. There isn’t any audio since he used it in a talk he gave, but it’s still fairy interesting.

Scheme interpreter in 15 minutes from James Coglan on Vimeo.

Wow. I must definitely check out Lisp.

Indeed. He’s also written a “real” Lisp implementation in Ruby if you’re interested: Heist. Oh, and I’ll also have to recommend The Little Schemer by Daniel P. Friedman and Matthias Felleisen, it has inspired me in several ways…

I will, but back to Sexp?

No. Let’s talk about AST.

AST? While I certainly appreciate these digressions, (whispers) others might be a little bored, you know?

Don’t worry, this is pretty essential. An Abstract Syntax Tree is a tree representation of source code. Compilers and interpreters never work directly on the source code, but parses it into an AST which is much easier work with. Your Ruby example above could look something like:

[Def,
  plus_five
  [n]
  [Call, n, +, [[Lit, 5]]]]

The basic idea here is that we only deal with nodes like Def, Call and Lit, and every node has a set of children. In this tree the nodes are built like this:

[Def, method_name, arguments, body] # => def method_name(*arguments); body; end
[Lit, number]                       # => (integer)
[Call, receiver, method_name, arguments] # => receiver.method_name(*arguments)

I think I get your point. This is very similar to S-expressions!

(def plus_five (n)
  (call n + 5))

Yes! It turns out that S-expressions are excellent to represent abstract syntax trees. And when compiler gurus talk about S-expressions, they don’t talk about the file format, they talk about the way Lisp represent it’s code: The first element in the list is the operator/function and the rest are the operands/arguments.

So S-expressions themselves aren’t very exciting, it all depends on what you put in them?

Exactly. The Sexp-class basically looks like this:

class Sexp < Array
  def sexp_type
    self[0]
  end
  
  def sexp_body
    self[1..-1]
  end
end

A glorified Array. That’s all there is to it.

And what do people put in these Sexps?

Let me show you an example. First, gem install ruby_parser. Then:

require 'ruby_parser'
require 'pp'

pp RubyParser.new.parse(<<-EOF)
  def plus_five(n)
    n + 5
  end
EOF

Interesting:

s(:defn,
 :plus_five,
 s(:args, :n),
 s(:scope,
   s(:block,
     s(:call, s(:lvar, :n), :+, s(:arglist, s(:lit, 5))))))

This is the syntatic strucure of our code. It’s not possible to reconstruct the original snippet character-by-character; you only get what the compiler/interpreter/virtual machine actually sees. And often that’s exactly the only thing we want.

Aha. Who decided this structure and these names by the way?

Well, I’m not quite certain, but I believe ParseTree by Ryan Davis started it all.

Oh, ParseTree sounds familiar.

It extracts the raw AST that Ruby 1.8 uses internally. It’s not very elegant, so Ryan also wrote UnifiedRuby which cleans it up a bit. RubyParser returns such a cleanup up Sexp.

And what do people use these cleaned up Sexps for?

There’s plenty of projects floating around, here’s some I’ve heard of:

Cool, that’s more than I thought.

And if more people knew about them, I guess there would be even more.

Let me recap:

For Rubyists, Sexp is an Array-ish list where the first element is the operator/function and the rest is the operands/arguments. The arguments can also contain other Sexp. When I think about it, that’s probably the most important thing: They are nested.

I guess it’s mostly useful when we’re actually manipulating code that executes. Like source code or templates?

Exactly!

Ah, excellent. Why were we talking about Sexps again?

Hm.. Now that’s a good question. I simply don’t remember. Let’s talk about it another time, shall we?