Close
Glad You're Ready. Let's Get Started!

Let us know how we can contact you.

Thank you!

We'll respond shortly.

LABS
Treetop: Bringing the Elegance of Ruby to Syntactic Analysis

(This is my proposal for this year’s RubyConf. Fingers crossed!)

Treetop is a parsing framework that brings the elegance and simplicity of Ruby to syntactic analysis. Rather than being just another copy of classic LALR/LR based generators like Lex and Yacc, Treetop blends the unique expressive power of Ruby with cutting edge parsing research. Its “packrat” implementation enables recognition of parsing expression grammars, which dispense with the need for lexical scanning and can take advantage Ruby’s mixin and inheritance model for composition.

I will begin with an overview of recursive descent parsing, explaining why the exponential time complexity of backtracking make naive implementations impractical. Packrat parsers solve this problem with memoization, exchanging heap space for linear time performance. In the process, they obviate the look-ahead techniques that have traditionally been used to reign in the time complexity of parsing.

This opens a new world for grammar definitions.

Without lookahead, a compositionality-destroying tokenization process is no longer necessary. Combined with the embrace of backtracking, this results in a grammar definition language that is more intuitive and free of the hidden obstacles that plague more restrictive conceptual frameworks. In the literature, these definitions are called parsing expression grammars. They are free of ambiguity and explicit operator precedence annotations. They can also be freely composed.

Imagine taking the Ruby grammar and the SQL grammar and combining them to yield a grammar for Ruby that recognizes the syntax of SQL statements embedded in strings. A token-free, compositional approach is the key to this sort of reusability.

The rest of the talk explores the manifestation of these expressive advantages in the Treetop framework. Treetop’s grammar definition language is a superset of Ruby, conservatively extending its syntax with two keywords (grammar and rule), along with inline parsing expressions. It blends parsing expression constructs with Ruby method declarations, which are automatically installed on nodes of the syntax tree produced during the parse. I will introduce these fundamental concepts by example, creating a toy parser that recognizes and evaluates arithmetic expressions. Here is a shortened sample:

module RubyConf
  grammar Arithmetic
    rule additive
      number '+' additive {
        def value
          number.value + additive.value
        end
      }
      /
      number
    end

    rule number
      [1-9][0-9]* {
        def value
          text_value.to_i
        end
      }
    end
  end
end

I will then discuss the system’s implementation. Important issues include:

  • The metagrammar and the system’s metacircular bootstrapping process
  • The code generator and the mapping of parsing expressions to Ruby code
  • Riding on Ruby’s semantics for grammar composition, including the use of mixins and the super keyword.

Time permitting, I will explore more advanced usage by writing a simple interpreter for the untyped lambda calculus atop the framework, which I will extend with facilities for arithmetic by mixing in the grammar from the first example.

Comments
  1. Nick Kallen says:

    Awesome, Sobo! Good Luck!

  2. Wilson Bilkovich says:

    Awesome. I plan to attend this talk.

Post a Comment

Your Information (Name required. Email address will not be displayed with comment.)

* Copy This Password *

* Type Or Paste Password Here *