Planet Stratego

You have a term and it don't look good, who're you gonna call?

January 23, 2012

Karl Trygve Kalleberg

An Eclipse Console for Spoofax

More good news, everyone! I have found time to integrate the command line shell for Spoofax into Eclipse. You can now have all sorts of tricky conversations with the Spoofax interpreter inside a perfectly innocent-looking Eclipse console:

A conversiation with Spoofax

As you can see, there are still rough edges to be ironed out. One of them is clearly the color palette. Another is the lack of inline rules, which are not supported yet. Neither are dynamic rules (and it is doubtful they ever will be — we are currently exploring other opportunities for dealing with context-sensitive rewrite rules).

You can bring up the actual console by using the console page selector in the top right-hand corner of your console, like you do for the other types of consoles:

Console page selector

Remember that you can always report the issues you come across in YellowGrass and tag me (@karltk).

January 23, 2012 17:34

November 02, 2010

Karl Trygve Kalleberg

Hackathon in Delft: Go!

It’s that time of the year again. The glorious month of intensive parser implementation, compiler engineering and language workbenches — the essentials of any IDE — has arrived. I’ve retreated to the TU Delft campus for the month of November to hack on interactive language infrastructure for our startup, and to think big thoughts about IDEs and DSLs in general.

Our startup is a user of both Stratego and Spoofax, so it was only natural to join forces with Eelco Visser and some of his henchmen here in Delft, who are the maintainers of said projects.

For somewhat practical and somewhat nostalgic reasons, I decided to stay at the TU Delft campus. Campus life here already brings back memories of my year living at Cambridgelaan: the access to lightning fast broadband in your dorm room, the four minute walk to the lab, the 24-hour party people (aka Erasmus students), and the freedom to follow your biological clock.

Time for some commits:)

November 02, 2010 19:33

October 06, 2010

Karl Trygve Kalleberg

Prototype of a Scannerless, Generalized Left-to-right Rightmost (SGLR) derivation parser for JavaScript

Need a Scannerless, Generalized Left-to-right Rightmost (SGLR) derivation parser for JavaScript? Then my JSGLR-GWT proof-of-concept might interest you. (If you don’t know what a scannerless GLR parser is, then you are probably not interested in writing modular grammars for your programming language implementation anyway — or you might want to read a few sections down).

A word of warning: I’m not claiming it is written in JavaScript, merely for JavaScript. I simply took our old Java-based SGLR implementation and massaged it quite heavily until GWT could spit out a working version for the JavaScript interpreters.

Performance

The performance is, quite frankly, beyond my wildest expectations. I’ve not made a single attempt at performance tweaking for the JavaScript engines yet, and still I manage to parse through 830 lines of Stratego code in ~3 seconds (Chromium 7.0.519.0). For smaller fragments, even Firefox 3.5 manages to keep up: it manages ~20 lines in ~1s (Chromium does that test in 74ms).

To most people, this probably doesn’t sound very fast at all, but I was much worse off when I initially wrote JSGLR back in 2006 — and that used a term(tree) library that wasn’t hacked together over night. All in all, I think there are plenty of performance tweaking opportunities, and the JavaScript engines are getting faster almost by the minute.

Now, another reason for my optimism, is my use case. I’m interested only interested in JavaScript port of SGLR for interactive use. Think web IDE. My aim is to attain interactive speeds when editing text. A few optimization ideas off the top of my head:

What is a Scannerless GLR parser anyway?

(optional reading)

SGLR does not separate your parser into the usual two steps: (1) lexer/tokenizer/scanner followed by (2) LL- or LR-style parsing of the token stream. Instead, SGLR runs the parser on every character. This is why it’s scannerless.

The next point, GLR, is tricker to explain briefly. Remember that a parser takes a string of text and creates a (parse) tree for it. There might, at some stage during the parsing, be multiple possible trees that might fit the text seen so far. This is called an ambiguity. We don’t yet know what is meant by the text. Only by continue reading, will it (hopefully) become apparent. However, this is where the k comes in. LL(k)/LALR(k) lets you look only k tokens into the future. GLR allows you to look all tokens into the future (for SGLR, I should probably say all characters into the future.).

Typically, LL and LALR-parsers, such as the archaic Yacc and Bison parsers, force you to contort your grammar a lot to meet the k-token lookahead. If you have ever run into the dreaded “shift/reduce” conflict, you know what I’m talking about.

The combination of scannerless and GLR gives us something wonderful: grammars are closed under composition. So, if you write grammars for languages L1 and L2, you can just make a new module L3, include L1 and L2, et voila — you have L3 = L1 ∪ L2.

Trying such a stunt with LALR will almost invariably push you into the “shift/reduce”-conflict corner. Admittedly, there is a significant trade-off or gotcha with the (S)GLR approach. And it is very similar to the trade-offs/gotchas with early vs late binding, or static vs dynamic typing.

If, after processing the entire text, there still are multiple possible parse trees, GLR will give you all of them. So, you catch the ambiguity at runtime. But you can use GLR to say more: the set of grammars you can express with GLR is larger than that of LL(k) and LALR(k).

The trade-off is simple: with LL(k)/LALR(k), you are guaranteed to only have one, unambiguous, tree at the end — but the you might not be able to express the language you want to design. With SGLR, you are free to express the full class of context-free grammars, but the parsing result might be a forest of trees. It’s up to you to prune the forest afterwards, finding the tree you want.

I’m not claiming that SGLR is the best solution for all kinds for parsing needs. With great power comes great responsibility. You might not always need great power, and great responsibilty is usually just a liability, anyway;)

Implementation Details

To interested parties, this is what I did:

October 06, 2010 17:57

October 04, 2010

Eelco Visser

Static Consistency Checking of Web Applications with WebDSL

Zef Hemel, Danny M. Groenewegen, Lennart C. L. Kats, Eelco Visser. Static Consistency Checking of Web Applications with WebDSL. Journal of Symbolic Computation, 2010 [doi:10.1016/j.jsc.2010.08.006] [researchr (pre-print)]

Abstract Modern web application development frameworks provide web application developers with high- level abstractions to improve their productivity. However, their support for static verification of applications is limited. Inconsistencies in an application are often not detected statically, but appear as errors at run-time. The reports about these errors are often obscure and hard to trace back to the source of the inconsistency. A major part of this inadequate consistency checking can be traced back to the lack of linguistic integration of these frameworks. Parts of an applications are defined with separate domain-specific languages, which are not checked for consistency with the rest of the application. Examples include regular expressions, query languages and XML- based languages for definition of user interfaces. We give an overview and analysis of typical problems arising in development with frameworks for web application development, with Ruby on Rails, Lift and Seam as representatives.

To remedy these problems, in this paper, we argue that domain-specific languages should be designed from the ground up with static verification and cross-aspect consistency checking in mind, providing linguistic integration of domain-specific sub-languages. We show how this approach is applied in the design of WebDSL, a domain-specific language for web applications, by examining how its compiler detects inconsistencies not caught by web frameworks, providing accurate and clear error messages. Furthermore, we show how this consistency analysis can be expressed with a declarative rule-based approach using the Stratego transformation language.

October 04, 2010 06:36

September 12, 2010

Karl Trygve Kalleberg

Great things in the Eclipse JDT

Since we started with the KolibriFX project six months ago, I have programmed a lot of Scala and Java, plus some Stratego and JavaScript. I love programming in Scala and Stratego because my implementations are shorter and usually a lot easier to reason about. I often get this warm, fuzzy feeling of having captured something tricky in an elegant fashion.

However, in day-to-day programming, Java has one killer feature that makes up for a lot of cruft in the language: the Eclipse JDT. I recently spent a few days rewriting some of our Scala code to Java, and that really brought the point home. I find that, at least for our code-base, the JDT offers productivity benefits that far outweigh the linguistic disadvantage Java suffers compared to Scala.

The points that follow will be an old hat to programmers familiar with the JDT, though they might serve as a reminder that for all its flaws, the JDT is a rather great tool, after all.

Realtime Compilation

This is probably the biggest tool-provided productivity booster I know of. I’ve always been a huge fan of realtime compilation, because it encourages flow. It brings me into “the zone”, “the flow” or whatever you prefer to call it. Whenever I have to wait for the compiler for more than a few seconds, I lose my flow and switch to another desktop “to remain productive”. This invariably forces an expensive mental context switch, and ultimately reduces my productivity, but without reducing my perception of productivity — I’m still doing stuff, so I feel I’m moving forward, except I’m not.

Realtime compiler error messages and warnings, allow me to mindlessly work through the live updated list until it’s empty. At that point, I run my unit tests and continue flowing until they’re all green. I get to decide how quickly I work. The computer (nearly) always follows my pace, so I’m the performance bottleneck. It makes me feel in control. I easily enter the flow, and I stay there. One thing I use as a flow marker is how often I watch the time. With Scala, 5 minutes rarely go by before I check it, since I’m waiting for the compiler to churn through. With the JDT, an hour may pass, easily.

This is not a bash on Scala. Stratego has the same problem, but we’re working on an incremental compiler to improve the problem there. I predict that both Scala and Stratego will eventually have compilers working at interactive speeds, and the same degree of flow will be attained as with the JDT compiler.

Quick Fixes

Beyond realtime compilation, the JDT offers additional inspiration for up-and-coming languages: its refactorings and quick fixes make life with Java easy and nearly enjoyable. Based on my usage the last week, these are my favourite quick fixes (admittedly, this list changes from week to week, based on the nature of the programming tasks):

Assign to Field

class X {
   X(int y|) {
   }
 }

With the cursor placed after y, quick fix suggests the creation of a new field, private final int y, and also adds the line this.y = y to the constructor body.

Extract to Separate File

File X.java:

class X { ... }
class Y| { ... }

With the cursor after Y, quick fix suggests that the class Y be moved to a separate file, Y.java, and it deals with all the imports at the top of both X.java and Y.java.

Generate Getters and Setters

class X {
   private int x;
   private int y;
}

Creates getX/setX and getY/setY. You can control if you want both get and set on a per-variable basis.

Change Visibility to ‘x’.

File foo/X.java:

package foo;
class X {
   Horoscope computeHoroscope();
}

File bar/Y.java:

package bar;
class Y {
   ...
   x.computeHoroscope()|;
}

If you are in another package and invoke, x.computeHoroscope(), quick fix tells you that friendly access is not possible, and suggests adding public to the computeHoroscope() method declaration.

Add/remove Argument to Match.

File IntPair.java:

class IntPair {
   Pair(int x) {
   }
}

File Main.java:

IntPair x = new IntPair(10, 100)|;

The usage doesn’t match any of the known constructors of IntPair. Quick fix is going to suggest that another argument is added to IntPair#IntPair(int), so that it becomes IntPair#IntPair(int,int).

Change type of ‘x’ to match expression.

String currentTime = new Set>();

The type mismatch prompts quick fix to suggest replacing String with Set> for the variable currentTime. You may also sometimes get away with

currentTime| = new Set>();

and ask quick fix to create new local variable with the cursor placed near currentTime.

Infer generic arguments

Map map = new HashMap()|;

Here, quick fix can compute the correct generic arguments for Map ().

Import Type ‘x’.

new HashMap

Quick fix may be used to quickly import java.util.HashMap. It allows you to stop fussing about the endless imports lists, and it actually works flawlessly 99% of the time (sometimes, it may insert the import in a funny place if your code is not parseable at the time you ask for an import).

Wrapping Up

Most of the above features are great when writing fresh code. However, they’re even better when you need to change code — especially the realtime compilation feature. Refactorings that may take hours in Stratego or Scala are usually done in minutes in Java, thanks to the IDE support. Imagine where we may go with that kind of IDE support for the newer languages…

September 12, 2010 21:50

August 14, 2010

Karl Trygve Kalleberg

"Parsing is considered a solved problem. Unfortunately, this view is naïve, rooted in the widely..."

“Parsing is considered a solved problem. Unfortunately, this view is naïve, rooted in the widely believed myth that programming languages exist.”

- Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, Dawson Engler, in A Few Billion Lines of Code Later: Using Static Analysis to Find Bugs in the Real World.

August 14, 2010 10:43

July 08, 2010

Eelco Visser

TOOLS 2010 Portraits

I finally got back into the photo processing game by mastering selection with Adobe Bridge; first make a selection of photos and then only process the ones with highest rank; separation of duties. I shot some photos at last week's TOOLS 2010 conference, mostly during the dinners, since I was busy at other times. I uploaded the selection to a flickr set; here is a selection from the selection:

Jean Bezivin

Bertrand Meyer

Michael Ernst

Florian Heidenreich

Dennis Wagelaar

July 08, 2010 07:37

July 01, 2010

Eelco Visser

Lambdas in Spoofax

Today I've been participating in the Transformation Tool Contest 2010, which is held as part of the TOOLS 2010 conference in Malaga. The plan for my participation was the use of researchr for use in the evaluation of the results of the contest. Then it turned out that the example problem for the live part of the contest was evaluation of lambda terms. An excellent problem to tackle with Spoofax. So after doing some last improvements of researchr, I spent the afternoon creating a little lambda language environment.

The source code of the project is part of the Spoofax contributions directory.

Here is the Eclipse editor:

In the figure example.lam is the input for the transformation in concrete syntax; example.aterm is the abstract syntax for example.lam; example.nf.aterm is the result of normalizing example.aterm; finally, example.nf.lam is the result of pretty-printing the normal form. The result should be the correct addition of the lambda numeral for 2 with itself.

The syntax of the language is defined in SDF:
The Lambda.sdf module contains the definition of the concrete syntax. The Lambda.str module is automatically derived from Lambda.sdf and defines the abstract syntax (as an algebraic signature in Stratego).

The transformation is defined in Stratego. Here are the rules for pretty-printing and beta reduction:

Renaming rules are used to introduce the bindings of bound variables. The check rule verifies that no two definitions for the same name are given. The warning rule finds free variables and gives a warning in the editor.

Here's a screencast with a little demo to give an impression of the environment:

download

July 01, 2010 15:12

June 29, 2010

Eelco Visser

mobl

In my keynote talk at the Model-Driven Service Engineering workshop in Malaga (co-located with ICMT and TOOLS) I gave a presentation of mobl, the domain-specific language for mobile (web) application development that we (Zef Hemel) are developing.

June 29, 2010 14:58

June 21, 2010

Eelco Visser

Pure and Declarative Syntax Definition: Paradise Lost and Regained

"In the beginning were the words, and the words were trees, and the trees were words. All words were made through grammars, and without grammars was not any word made that was made. Those were the days of the garden of Eden. And there where language engineers strolling through the garden. They made languages which were sets of words by making grammars full of beauty. And with these grammars, they turned words into trees and trees into words. And the trees were natural, and pure, and beautiful, as were the grammars.

Among them were software engineers who made software as the language engineers made languages. And they dwelt with them and they were one people. The language engineers were software engineers and the software engineers were language engineers. And the language engineers made language software. They made recognizers to know words, and generators to make words, and parsers to turn words into trees, and formatters to turn trees into words.

But the software they made was not as natural, and pure, and beautiful as the grammars they made. So they made software to make language software and began to make language software by making syntax definitions. And the syntax definitions were grammars and grammars were syntax definitions. With their software, they turned syntax definitions into language software. And the syntax definitions were language software and language software were syntax definitions. And the syntax definitions were natural, and pure, and beautiful, as were the grammars."

Thus starts our essay on declarative syntax definition, which has been accepted for presentation at the Onward! 2010 conference:
Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth. Pure and Declarative Syntax Definition: Paradise Lost and Regained. In Proceedings of Onward! 2010. ACM, 2010. (draft of final version)

Abstract Syntax definitions are pervasive in modern software systems, and serve as the basis for language processing tools like parsers and compilers. Mainstream parser generators pose restrictions on syntax definitions that follow from their implementation algorithm. They hamper evolution, maintainability, and compositionality of syntax definitions. The pureness and declarativity of syntax definitions is lost. We analyze how these problems arise for different aspects of syntax definitions, discuss their consequences for language engineers, and show how the pure and declarative nature of syntax definitions can be regained.

June 21, 2010 18:55

June 18, 2010

Eelco Visser

Building DSLs with Spoofax Language Workbench

The slides of my keynote talk at Code Generation 2010 with an appendix about model-to-model transformation and transformation strategies.

June 18, 2010 10:38

June 13, 2010

Eelco Visser

Strategies for Design & Implementation of Domain-Specific Languages

In the fifth lecture in the Model-Driven Software Development course at TU Delft I talked about strategies for design and implementation of domain-specific languages, illustrated with a sketch of the mapping from WebDSL to Java.

June 13, 2010 09:59

June 11, 2010

Eelco Visser

Encapsulating Software Platform Logic by Aspect-Oriented Programming

The Stratego transformation language was originally developed on and for the C/Linux platform. A couple of years ago we started porting the language and the Stratego/XT infrastructure to Java, which eventually enabled us to build the Spoofax Language Workbench. The back-end of the Stratego compiler was fairly unproblematic; the back-end for C is a bit over 1000 lines of code translating the compact Stratego Core language. Porting this was a matter of replacing the translation scheme. However, the extensive library, partly consisting on primitives implemented in C, depended on the Linux platform. To adapt the library to the Java platform, we extended Stratego with a simple aspect mechanism. Using override and proceed stragey definitions from libraries can be replaced with new definitions. The aspect extension and its application for achieving portability is discussed in the following paper (which has been accepted for the SCAM 2010 conference):

Lennart C. L. Kats, Eelco Visser. Encapsulating Software Platform Logic by Aspect-Oriented Programming: A Case Study in Using Aspects for Language Portability. In Cristina Marinescu, Jurgen J. Vinju, editors, Proceedings of the Tenth IEEE International Working Conference on Source Code Analysis and Manipulation 2010. 2010.

Abstract: Software platforms such as the Java Virtual Machine or the CLR .NET virtual machine have their own ecosystem of a core programming language or instruction set, libraries, and developer community. Programming languages can target multiple software platforms to increase interoperability or to boost performance. Introducing a new compiler backend for a language is the first step towards targeting a new platform, translating the language to the platform's language or instruction set. Programs written in modern languages generally make extensive use of APIs, based on the runtime system of the software platform, introducing additional portability concerns. They may use APIs that are implemented by platform-specific libraries. Libraries may perform platform-specific operations, make direct native calls, or make assumptions about performance characteristics of operations or about the file system. This paper proposes to use aspect weaving to invasively adapt programs and libraries to address such portability concerns, and identifies four classes of aspects for this purpose. We evaluate this approach through a case study where we retarget the Stratego program transformation language towards the Java Virtual Machine.

June 11, 2010 13:18

June 09, 2010

Eelco Visser

Web Abstractions

The third and fourth lectures in the course on model-driven software development that I teach at TU Delft. I talked about domain-specific abstractions in the domain of web programming, as incarnated in the design of WebDSL.

The material is based on the following publications:

June 09, 2010 23:15

June 08, 2010

Eelco Visser

Code Generation by Model Transformation

The journal version of our ICMT 2008 paper has finally and officially been published in the Software and Systems Modeling journal.

Zef Hemel, Lennart C. L. Kats, Danny M. Groenewegen, Eelco Visser. Code generation by model transformation: a case study in transformation modularity. Software and Systems Modeling, 9(3):375-402, June 2010.

The big contribution with respect to the earlier paper is a new approach to type checking with Stratego. Instead of defining a type checker as a single traversal that takes care of name resolution, type analysis, error checking and generation of error messages, these aspects are defined separately and can also be used separately. This allows a much cleaner style for defining type checkers and it enables a smooth combination of type analysis and normalizing transformations. The style also turned out to work very well in Spoofax setting where analyses such as name resolution are used for multiple purposes in the IDE.

The paper is published under the open access regime, so you can read it even without subscription to the journal.

Abstract The realization of model-driven software development requires effective techniques for implementing code generators for domain-specific languages. This paper identifies techniques for improving separation of concerns in the implementation of generators. The core technique is code generation by model transformation, that is, the generation of a structured representation (model) of the target program instead of plain text. This approach enables the transformation of code after generation, which in turn enables the extension of the target language with features that allow better modularity in code generation rules. The technique can also be applied to ‘internal code generation’ for the translation of high-level extensions of a DSL to lower-level constructs within the same DSL using model-to-model transformations. This paper refines our earlier description of code generation by model transformation with an improved architecture for the composition of model-to-model normalization rules, solving the problem of combining type analysis and transformation. Instead of coarse-grained stages that alternate between normalization and type analysis, we have developed a new style of type analysis that can be integrated with normalizing transformations in a fine-grained manner. The normalization strategy has a simple extension interface and integrates non-local, context-sensitive transformation rules. We have applied the techniques in a realistic case study of domain-specific language engineering, i.e. the code generator for WebDSL, using Stratego, a high-level transformation language that integrates model-to-model, model-to-code, and code-to-code transformations.

June 08, 2010 11:39

May 31, 2010

Eelco Visser

Domain Analysis & Data Modeling

Second lecture in the course on model-driven software development that I teach at TU Delft. I talked about domain analysis, interaction design, and domain-driven design.

May 31, 2010 20:30

May 28, 2010

Eelco Visser

Spoofax 0.5

We're pleased to announce the 0.5 release of the Spoofax language workbench, an Eclipse plugin that seamlessly integrates Java versions of Stratego and SDF into Eclipse. Spoofax can be used to develop new languages and transformations based on SDF and Stratego in the Eclipse environment.

Stratego and SDF have traditonally been implemented using C, but to increase portability we have developed Java versions of the Stratego compiler and the JSGLR parser for SDF. These new implementations are seamlessly integrated into the Spoofax environment, but can also be used as stand-alone tools.

IDE support has become essential for developers to be productive with programming languages. Spoofax provides IDE support for Stratego and SDF for developers of languages and transformations. It also aids in the development of IDE support for new languages: from the first version of an !SDF grammar, an editor can be created for the language and used side-by-side with the definition in Eclipse. Using Stratego, the editor can be enhanced with transformations and semantic editor services such as reference resolving and content completion. The screenshot below illustrates some of the IDE features supported by editors created with Spoofax:
Spoofax editor features

Spoofax can be downloaded from spoofax.org or strategoxt.org/Spoofax. When installed in Eclipse, the plugin provides a "New project" wizard that creates a new skeleton project illustrating some of the Spoofax features. The website also includes a tour further showcasing the features of the workbench. For migrating C-based Stratego projects to Spoofax, please read our FAQ or contact us in case of other questions.

An overview of the architecture of Spoofax and how Spoofax can be used in the development of new languages and IDE services is given in the paper The Spoofax Language Workbench. Rules for Declarative Specification of Languages and IDEs by Lennart Kats and Eelco Visser, accepted for publication at SPLASH/OOPSLA 2010. Further documentation can be found on the Spoofax website.

May 28, 2010 21:41

May 03, 2010

Eelco Visser

Performing Systematic Literature Reviews with Researchr

Eelco Visser. Performing Systematic Literature Reviews with Researchr: Tool Demonstration. Technical Report TUD-SERG-2010-010, Software Engineering Research Group, Delft University of Technology, Delft, The Netherlands, May 2010. [researchr]

In this paper, I describe the integrated workflow for performing systematic reviews with researchr, a web application for management of bibliographic data. Researchr semantically links publications to authors, journals, proceedings, and conferences, supporting reliable browsing. Publications can be classified using public (shared) tags. Researchr has over a million publication records, mainly in computer science. The core of the collection is based on the DBLP database (as provided via its XML export), but is extended with contributions from users. Researchr is open for contributions; users can contribute missing publications and can make corrections to publication records in the database and add missing information such as abstracts and citations. The quality of such modifications is guarded by a reputation system. Users can use researchr to provide a profile of their research with publications. More importantly, the site supports literature reviews, by creating bibiographies, collections of publications about a topic of choice. In this paper I describe the elements of the systematic reviewing workflow in researchr. Performing a systematic review consists of creating a bibliography, defining and executing a search strategy, defining classification schemes, and reviewing and classifying papers.

May 03, 2010 13:13

April 30, 2010

Eelco Visser

Model-Driven Software Development: Introduction

I'll be posting my slides from a course on model-driven software development that I'm teaching at Delft University of Technology. Here's the first, introducing the course.

April 30, 2010 19:35

April 12, 2010

Karl Trygve Kalleberg

English as a Programming Language, part I

Together with Einar, I’ve been deeply immersed in domain analysis for our upcoming trading language for the past month — i.e., analyzing the domain of trading. One of the crucial things we’ve been focusing on, is the rich vocabulary you are likely to hear when listening to traders working together. Their vocabulary captures how trading ideas are transmitted between humans. As the philosopher Gadamer might have argued, we think in words. And traders have invested time and effort into learning “the trading words”, because with language comes the knowledge and vice versa.

We want to capitalize on their investment by exposing the raw computing power of modern computers using a familiar language. If our target group were mathematicians, the language and notation of choice would be + - / * f(x), and all the other weird symbols we learned in school (and many we didn’t learn). If our target group were programmers, formal languages like Java or C++ would be an good starting point for inventing a new variant for trading.

But our future users are traders first, then a little bit of mathematician, then programmers as a distant third. With this in mind, we have done some serious consideration into using natural language as a basis for expressing trading rules. As part of this consideration, I did a (perhaps too short, but very enlightening) survey of the state of the art of using English as a programming language. I’m sharing my findings here, since they might be of broader interest.

Background

Let me give you a very concrete idea of what we want: Our motivation for using English as a programming language is to be able to support trading rules written almost exactly like regular English sentences, e.g.

if seeing more than 3 news headlines
   from distinct sources
   containing 'Hurricane' and 'US'
   within 1 minute:
    flatten positions

Again, our users (c.f. Takuya) are domain experts, but not programmers. They will mostly likely be reading code more often than they write it. They will not be coding every day, only now and again. They’re usually not interested in spending all that much time getting into that specific frame of mind for solving complex problems that regular programmers love so much (you know what I mean only too well, admit it).

As anybody will tell you, the English language is rather unbounded and pretty rife with ambiguities. I’ve come across a wide range of efforts to manage this problem, and I’ll share some highlights from the various efforts here.

Natural Language Programming

One of the major selling points of the Natural Language Programming technique is the ability to write technical documents shared by computers and humans. The technique exposes to its user an iterative process whereby rules for turning sentences into code are formulated. This multi-step process goes as follows:

  1. Define your taxonomy of concepts — i.e. names of interest in your domain. In our case: prices, instruments, pips, pricing, trends, support, resistance, etc.
  2. Define top-level sentences using said concepts, e.g. define support if bottom is at same level within X minutes.
  3. Define top-level sentences in terms of sequences of other sentences which also include said concepts.
  4. When you reach the most primitive sentences, you define a mapping from these to code in an (any) existing programming language, somewhat like a template system.
  5. Repeat until there is a mapping for all paths from the top-level sentences to code.

The example provided at Wikipedia is more detailed:

If  U_ is 'smc01-control',  then do the following.
Define surface weights Alpha as "[0.5, 0.5]". Initialise matrix Phi 
   as a 'unit matrix'.
Define J as the 'inertia matrix' of Spc01.
Compute matrix J2 as the inverse of J.
Compute position velocity error Ve and angular velocity error Oe 
   from dynamical state X, guidance reference Xnow.
Define the joint sliding surface G2 from the position velocity error 
   Ve and angular velocity error Oe using the surface weights Alpha.
Compute the smoothed sign function SG2 from the joint sliding 
   surface G2 with sign threshold 0.01.
Compute special dynamical force F from dynamical state  X and 
   surface weights Alpha.
Compute control torque T and control force U from matrix J2, surface 
   weights Alpha, special dynamical force F , smoothed sign function SG2.
Finish conditional actions.

The technique is supported by a system called System English. There is a page listing a few more examples, all of them in the style of the example above.

I must admit that this is a rather intriguing technique. My present reservation is that all the examples are rather verbose, and a dreary read. The documents (i.e., programs) are very repetitive, with a lot of sentences starting with either define or compute. This might be perfectly fine for technical documentation. I’ll not make any hasty judgments about its applicability to trading before actually having tested it for our domain, though.

Maude

It might come as a surprise to find Maude on this list, but let me explain why I included it. Maude is a language for equational and rewriting logic specification and programming. That sounds scary, but let’s focus on the parts of Maude which allow you build sentence fragments, the operators.

op add _ to _  : Num List -> List

This defines the operator (sentence fragment) add to. Assuming list is a list, the definition allows you to write

add 1 to list

which (provided that the add to operator does its job correctly), will add 1 to a list of numbers.

Since you are free to define any operator, you can also start defining your own mathematical operators

op _ + _ : Num Num -> Num
op _ * _ : Num Num -> Num

This allows you to write expressions a * b + c, as you might expect. There is one crucial point here, however. The precedence of the operators are not fixed. You are free to define these yourself. This means that the expression just given, will be ambiguous. It’s not (yet) defined whether it should be taken to mean (a * b) + c or a * (b + c). As you might guess, these are wrinkles that are rather important to iron out.

Maude comes to the rescue with a precedence annotation:

op _ + _ : Num Num -> Num [prec 35]
op _ * _ : Num Num -> Num [prec 25]

Think of this as a little bit of grammatic chemistry: the operator with the lower precedence will bind stronger to its arguments. With the precedence annotation in place, the expression a * b + c is now always interpreted as (a * b) + c, because * binds stronger than ‘+’.

The operator + brings another ambiguity to the table: associativity. When writing a + b + c, should it be taken to mean (a + b) + c or a + (b + c). The precedence cannot help us, since it will always be the same for the two “competing” pluses (plus competing against itself). As it happens, for the + operator, it really doesn’t matter. Both alternatives are the same. In mathematics, an operator with this property is called associative. Maude provides the [assoc] annotation for marking an operator with this property. Commutativity (that x op y is the same as y op x) is marked by [comm].

However, there may be cases with an operator is not associative, such as the divide operator. The Maude 2.0 Primer suggests that vs is also such an operator:

op _ vs _ Player Player -> Player [comm]

Let’s assume we’re back in the early 90s and that the following battle is at hand: Chun-Li vs Ryu vs Ken. If we read this is as (Chun-Li vs Ryu) vs Ken the winner of Chun-Li and Ryu will face Ken. In the interpretation Chun-Li vs (Ryu vs Ken), Chun-Li will face the winner of Ken and Ryu. Clearly not the same meaning.

Again, Maude comes to the rescue with an annotation: gather. Gather takes a number of flags equal to the number of arguments for your operator (in this case two). Each gather flag tells us something about the precedence of the operator found at  argument. A flag may be either of e (strictly less), E (less or equal), & (any). Confuzzled? Let’s try it out.

op _ vs _ Player Player -> Player [comm] [prec 33] [gather(e E)]

This defines that the first argument to vs must be of precedence strictly less than that of vs. Since the precedence of vs can never be strictly less than itself; (Chun-Li vs Ryu) vs Ken is therefore not allowed — the only valid interpretation left is Chun-Li vs (Ryu vs Ken). With that ambiguity out of the way, we can proceed to the real action. Fight!

Clearly, these mechanisms make Maude very expressive. In practice, Maude allows programmers to fairly freely define their own language with a rather natural looking syntax for any library they might want to design. The downside is that mastering Maude’s disambiguation mechanics is an art unto its own.

Action Semantics

Another little language with a very English-like syntax I’ve come across in previous travels is that of Peter D. Mosses’ formalism for defining semantics for programming languages. I.e., a “programming language” for defining the meaning of programming languages. The following, which tells how the identifier I is to be evaluated, should provide some flavor:

evaluate I = give the expressible bound to I 
             or 
             enact the abstraction bound to I

While not exactly in widespread use, it has been used to define semantics for some non-trivial languages, such as the dynamic semantics for Pascal.

Mosses and colleagues have written research tools to fully generate compilers based on language definitions in the action semantics formalism. The value proposition of Action Semantics is pretty clear: it allows you define your programming language at a high level, and then generate language tools such as interpreters and compilers (almost) for free — provided your language fits into the AS formalism.

Metafor

Metafor was a research prototype of a user interface for visualizing interactively typed stories as code (not available for general consumption). The idea was that users type English sentences describing their domain. Metafor then continually analyzes the typed story and maintains a “Python-view”. The Python view contains the structure of the code.

For example, when the user types the code:

There is a bar with a bartender who makes drinks.

Metafor would generate the following code:

class bar:
    the_bartender = bartender()
class bartender:
    def make(drink): pass

The “translation” from English to code is done using a slew of natural language processing tricks, including a natural language parser and a large database of  English “common sense” sentences from which “knowledge” about the various words is extracted.

As far as I know, the system was only ever able to generate the outline of your program. The larger, and perhaps more interesting, problem of writing program logic in English is still left as an exercise to the reader.

An improved version of Metafor be useful for doing structural/architectural design of code and classes, but the project seems discontinued. Fortunately, some of the appealing properties of Metafor are available in the Inform-series of adventure game/interactive fiction-creators.

Inform 7

The Inform system might be seen as a continuation of the good old Infocom-style of games, but on steroids and done really well. Inform is a design system for interactive fiction based on natural language. It comes with an “IDE” for developing interactive fiction, or text-based adventures, if you will.

You can easily define locations and entities:

The wood-slatted crate is in the Gazebo.
The crate is a container.
Mr Jones wears a top hat.
The crate contains a croquet mallet.

In the nomenclature of Inform, these are assertions. A number of built-in verbs are used to state assertions: to beto haveto carryto wearto contain and to support. It is possible for users to extend this set of verbs.

Based on the above sentences, the system will create the objects ‘crate’, ‘Mr Jones’ and ‘croquet mallet’. It will even place the croquet mallet inside the crate, so that the player must open it to find the mallet.

All in all, the system seems very well crafted and is the most impressive example of natural language programming I have seen to date. Again, I have no practical experience using it, so I cannot offer any advice on how expressive such and approach be might be for areas other than interactive fiction.

AppleScript

Perhaps the English-like language with the most widespread use, is that of AppleScript, as found on all Apple computers today. AppleScript was designed in the early 90s and is a descendent of the HyperCard hype of the late 80s. According to its creators, AppleScript is aimed at “casual programmers”. This includes programmers of all experience levels who are primarily out to solve problems, not write programs.

A simple AppleScript program for manipulating Excel:

tell application ‘‘ Excel ’’ on machine x
    put 3.14 into cell 1 of row 2 of window 1
end

Another script for doing a bit of cut-n-pasting:

copy the name of the first window of application ‘‘ Excel ’’
to the end of the first paragraph of app ‘‘ Scriptable Text Editor ’’

I came across this excellent report: AppleScript. William R. Cook, University of Texas at Austin With contributions from Warren Harris, Kurt Piersol, Dave Curbow, Donn Denman, Edmund Lai, Ron Lichty, Larry Tesler, Mitchell Gass & Donald Olson September 29, 2006.

This report is a really interesting read to anyone who considers using English as a programming language. To set the stage, let’s start with their conclusion:

Many of the current problems in AppleScript can be traced to the use of syntax based on natural language; however, the ability to create pluggable dialects may provide a solution in the future, by creating a new syntax based on conventional programming languages.

The English-like syntax, initially touted as a great way to capture the minds and souls of non-programmer and programmer alike, turned out to be the major Achilles heel of the language. Hmmm.

In the discussion, the authors go into a bit more detail:

The experiment in designing a language that resembled natural languages (English and Japanese) was not successful. It was assume that scripts should be presented in “natural language” so that average people could read and write them. […] In the end the syntactic variations and flexibility did more to confuse programmers than to help them out. The main problem is that AppleScript only appears to be a natural language. In fact is an artificial language, like any other programming language. […] It is easy to read AppleScript, but quite hard to write it.

When writing programs or scripts, users prefer a more conventional programming language structure. Later versions of AppleScript dropped support for dialects. In hindsight, we believe that AppleScript should have adopted the Professional Dialect that was developed but never shipped.

So, the main problem seems to stem from AppleScript not providing “English as we know it”. It has a surface syntax which reads pretty much like English. However, when trying express yourself in AppleScript, most of the grammar rules, idioms, proverbs and personal style you are accustomed to, is simply not there. Instead, you are forced to mould your ideas into a sort of Pidgin English, the AppleScript English, which doesn’t really behave like you expect. And the worst thing programmers know is when things don’t behave as they expect.

In retrospect, this might not come as a surprise. Many, if not most, programming languages are designed to be somewhat syntactically and grammatically minimal. There is often only one, or at most a few, ways of stating the basic things. Based on a set of core language constructs, you can build successively larger programs. But the set of core constructs behave rather simply and predictably. (Yes, Perl is a rather famous example of the opposite.)

The Osmosian Order

Osmosian Order is not as much a project as it appears to be a movement. According to their manifesto:

The Osmosian Order of Plain English Programmers is a group of like-minded developers and educators dedicated to the rescue of computer science from the pervasive fog of confusion engulfing it today.

They even have a plan:

Our initial goal is to see Plain English (and other natural-language variants, such as Plain Spanish and Plain German) adopted as de facto standard languages.

To that end, they have (allegedly) written a compiler and development environment for programming using Plain English. An example Plan English program is given below:

To create the background:
    Draw the screen's box with the white color.
    Loop.
    Pick a spot anywhere in the screen's box.
    Pick a color between the lightest gray color and the white color.
    Dab the color on the spot.
    If a counter is past 80000, break.
    If the counter is evenly divisible by 1000, refresh the screen.
    Repeat.
    Extract the background given the screen's box. 
       \or Create the background from the screen.
    Or something.

To finish a work:
    If the work is nil, exit.
    If the work is finished, exit.
    Create a picture given the work's URL.
    If the picture is nil, exit.
    Resize the picture to 5-1/2 inches by 5-1/2 inches.
    Center the picture in the screen's box.
    Draw the background.
    Draw the picture.
    Loop.
    Pick a spot anywhere near the picture's box.
    Mix a color given the spot.
    Dab the color on the spot.
    If a counter is past 20000, break.
    Repeat.
    Extract the work's painting given the picture's box.
    Destroy the picture.

Reputedly, The Order provides a fast compiler which is able to turn the stylized English above into an executable Windows program. The compiler isn’t freely downloadable, but their web page says you can get it by e-mail.

Tentative conclusion

Using English as a programming language is definitely not a no-brainer. There are some hard trade-offs to be juggled.

My present thinking is that if the price of predictability is the cost of learning a new language, people with extensive scripting needs will pay it gladly. If this cost is kept low by providing a set of core language constructs already known from other programming languages, getting started with scripting becomes trivial. While the ultimate goal will always be do-what-I-mean (and not as I think or say) computers, an intermediate step seems to be where most computer literates have a working knowledge of scripting. As long as most scripting languages resemble each other, it’s sort of like knowing any one of the Scandinavian languages — you can easily get by in all of Scandinavia knowing just one of them.

April 12, 2010 08:45

March 15, 2010

Eelco Visser

Talk about the Researchr Digital Library

At the recent IFIP WG 2.11 meeting in St Andrews, I gave a talk about researchr, the digital library application I have been working on in the past year. Here is the recording of that talk, which gives an introduction to the features of the tool.

Download: 768x576

March 15, 2010 11:04

March 14, 2010

Eelco Visser

Spoofax: The Language Workbench

At the recent IFIP WG 2.11 meeting in St Andrews, I gave a demonstration talk about the Spoofax language workbench. I made a recording. The talk is, eh, a little, eh, rough, but should be useful to give an impression of the basic features of Spoofax.

Download: 512x384, 768x576, 1024x768

March 14, 2010 22:20

March 13, 2010

Eelco Visser

The Unbearable Fragility of Software Configuration

Issue 52 of the Spoofax project in YellowGrass is a perfect illustration of the fragility of software configuration; the presence or absence of a line break makes the difference between a working and a failing system. What is worse is that the error is very hard to detect and requires a hunch from an expert. If the eclipse.ini file would have a proper syntax and static semantic constraints, the error could be caught right on load time, or preferably when editing the file in the first place. Configuration files are models in a domain-specific language that should be treated just like the classes of a Java program.

March 13, 2010 17:44

February 27, 2010

Eelco Visser

Publication Clouds

Previously researchr supported 'tag cloud' views for bibliographies providing a quick overview of the important and lesser important topics in a set of publications collected in a bibliography. Similarly, an 'author cloud' gives an overview of the authors of a bibliograpy, emphasizing authors with many papers.

Now clouds are not just provided for tags and authors, but also for year, publication type, and publication venue. Furthermore, cloud views are provided for all 'publication lists' in researchr: author profile, alias page (publications authored by author with same name), bibliography, tag, and also for the publications of the editions of a conference. The latter provides a nice overview of the community and topics of a conference series.

February 27, 2010 21:01

February 20, 2010

Eelco Visser

Coloring Language Aspects

I was looking for a way to illustrate the integration of different language aspects in WebDSL. After tedious experimentation with graphical editors to highlight the parts of a program, it dawned on me that the syntax coloring configuration of the Spoofax editor for WebDSL is the perfect tool for the job. Spoofax provides DSL for syntactic editor services, such as syntax coloring. The Spoofax generator, generates default configurations for the various services. For instance, for syntax coloring the following settings (left) are generated. To illustrate the role of data models, user interface templates, and expressions/actions I added the following customization of the default (right):

The result can be seen in the following fragments consisting of an entity declaration (data model) and a page definition with form and action:

I find it hard to choose good colors (e.g. yellow provides bad contrast on white), but this example nicely show how Spoofax allows assigning colors to syntactic categories, not just tokens.

February 20, 2010 20:17

February 19, 2010

Eelco Visser

WebDSL Applications

February 19, 2010 09:48

February 12, 2010

Eelco Visser

Building IDEs for Domain-Specific Languages with Spoofax/IMP

At the IFIP WG 2.11 meeting in March I'll give a talk on Spoofax/IMP; I'll try to record the talk and post it here.

Domain-specific languages are a key component of program generation. While we have ample experience building code generators and compilers, modern software developers expect integrated development environments such as Eclipse and Visual Studio to boost their productivity. To achieve the productivity gain promised by domain-specific languages, it is required that they come with strong IDE support. Since the production of DSLs cannot afford the effort that is put into IDEs for general-purpose languages, better tools are needed.

In this talk I present Spoofax/IMP, an Eclipse plugin for creating Eclipse plugins for custom (domain-specific) languages. Spoofax integrates several domain-specific languages for language definition. SDF supports modular, declarative definition of syntax with arbitrary context-free grammars integrating lexical syntax. A new implementation of the SGLR parser for SDF supports sophisticated error recovery. Stratego supports model transformation, code generation, static analysis, and refactoring with rewrite rules and programmable strategies. Editor service DSLs support declaration and customization of syntactic editor services such as syntax highlighting, folding, outline views, and semantic editor services such as error checking, cross references, content completion, and refactorings.

I'll introduce Spoofax by discussing a definition for a subset of the WebDSL web programming language. The talk will be partly slides and partly demonstration.

February 12, 2010 15:00

November 13, 2009

Eelco Visser

Composing Domain-Specific Languages

Eelco Visser Last week I was in Bergen (Norway) for the PhD thesis defense of Anya Bagge and the opening of the Bergen Language Design Laboratory. For the occasion of the opening a few people where asked to talk about the history and future of programming language design. Don Sannella talked about the failure of the Extended ML formal methods project. Bjarne Stroustrup talked about social issues of language design such as the (enormous) impact of C++ and the hassles of language standardization (and took my picture during the BLDL dinner). Paul Klint gave a flavour of his new Rascal meta-programming language. Horacio Bouzas and Carl Seger talked about the role of languages in oil exploration and chip design.

As a long time collaborator of the Bergen group, I also was asked to shine my light on language design. Instead of giving a straight up talk about Stratego or WebDSL, I decided to try to give a somewhat more general slant on the currently dominant theme of our research at TU Delft: composition of domain-specific languages. Today I extended the talk for delivery at the TU Eindhoven CS colloquium. I like the style of the talk and it gave rise to a good discussion about the trade-offs between internal and external DSLs afterwards. This is also the start of the development of a lectures series for my MDSD course in the Spring semester. Since there was an expression of interest, I've uploaded my slides. They're presentation-zenish, so there's not a whole lot of explanatory text and may not be useful without me talking about them. Feedback is welcome.

Abstract The history of programming languages shows a progressive development from low-level programming languages close to the machine, to high-level languages close to the problems being solved with software. Domain-specific languages take this a step further than general purpose programming languages by making assumptions about the class of applications for which the language is intended. Complete applications typically require programs in multiple (technical) domains, which can be catered for by separate domain-specific languages. While such separation of concerns is beneficial for domain expressivity, it often leads to loose coupling and lack of static verification. Hence, the design of individual DSLs needs to be complemented with their linguistic integration.

In this talk, I illustrate these ideas with the design of WebDSL, a domain-specific language for data centric web applications. WebDSL linguistically integrates the definition of data models, user interfaces, actions, access control rules, data validation rules, styling rules, and workflow definitions. While maintaining separation between these concerns through specialized sub-languages, linguistic integration ensures static consistency checking and correct code generation. The language allows developers to concentrate on the essential design of web applications, abstracting from accidental complexity, such as the details of data persistence. The combination of high-level and low-level constructs ensures high expressivity, while supporting customization to application requirements.

November 13, 2009 00:06

October 23, 2009

Eelco Visser

An Eclipse Editor for WebDSL

This afternoon I created an Eclipse editor for WebDSL from just its syntax definition using Spoofax/IMP in a matter of minutes. The editor provides syntactic editor services such as syntax highlighting, folding, outlining, error recovery; all out of the box. Here's a screenshot with an impression.

Note how my decision to include 'sections' in WebDSL turns out to play very nice with code folding. Spoofax/IMP also supports semantic error services such as displaying type check errors and cross-references, but that requires the WebDSL compiler to be refactored from a batch compiler to an incremental compiler.

October 23, 2009 23:16

October 18, 2009

Eelco Visser

Take the red PIL (or choose any other color you like)

The Platform Independent Language PIL now has its own website with a manual for the language and downloads of the compiler. You can use PIL as target for your domain-specific language and have it work on all PIL supported platforms. Currently only Java and Python are supported, but creation of PIL back-ends is cheap, probably much cheaper than making dedicated back-ends for your own DSL. You can get involved and contribute a new back-end for a new platform. To start we would like to have back-ends for PHP, C#, C, and Objective C, but any others are welcome too.

October 18, 2009 20:35