Planet Stratego
You have a term and it don't look good, who're you gonna call?
November 13, 2009
Last week I was in Bergen (Norway) for the PhD thesis defense of Anya Bagge and the opening of the Bergen Language Design Laboratory.
For the occasion of the opening a few people where asked to talk about the history and future of programming language design.
Don Sannella talked about the failure of the Extended ML formal methods project.
Bjarne Stroustrup talked about social issues of language design such as the (enormous) impact of C++ and the hassles of language standardization (and took my picture during the BLDL dinner).
Paul Klint gave a flavour of his new Rascal meta-programming language.
Horacio Bouzas and Carl Seger talked about the role of languages in oil exploration and chip design.
As a long time collaborator of the Bergen group, I also was asked to shine my light on language design. Instead of giving a straight up talk about
Stratego or
WebDSL, I decided to try to give a somewhat more general slant on the currently dominant theme of our research at TU Delft: composition of domain-specific languages. Today I extended the talk for delivery at the TU Eindhoven CS colloquium. I like the style of the talk and it gave rise to a good discussion about the trade-offs between internal and external DSLs afterwards. This is also the start of the development of a lectures series for my MDSD course in the Spring semester. Since there was an expression of interest, I've uploaded my slides. They're presentation-zenish, so there's not a whole lot of explanatory text and may not be useful without me talking about them. Feedback is welcome.
Abstract
The history of programming languages shows a progressive development from low-level programming languages close to the machine, to high-level languages close to the problems being solved with software. Domain-specific languages take this a step further than general purpose programming languages by making assumptions about the class of applications for which the language is intended. Complete applications typically require programs in multiple (technical) domains, which can be catered for by separate domain-specific languages. While such separation of concerns is beneficial for domain expressivity, it often leads to loose coupling and lack of static verification. Hence, the design of individual DSLs needs to be complemented with their linguistic integration.
In this talk, I illustrate these ideas with the design of WebDSL, a domain-specific language for data centric web applications. WebDSL linguistically integrates the definition of data models, user interfaces, actions, access control rules, data validation rules, styling rules, and workflow definitions. While maintaining separation between these concerns through specialized sub-languages, linguistic integration ensures static consistency checking and correct code generation. The language allows developers to concentrate on the essential design of web applications, abstracting from accidental complexity, such as the details of data persistence. The combination of high-level and low-level constructs ensures high expressivity, while supporting customization to application requirements.
November 13, 2009 00:06
October 23, 2009
This afternoon I created an Eclipse editor for WebDSL from just its syntax definition
using Spoofax/IMP in a matter of minutes.
The editor provides syntactic editor services such as syntax highlighting, folding, outlining, error recovery; all out of the box.
Here's a screenshot with an impression.
Note how my decision to include 'sections' in WebDSL turns out to play very nice with code folding.
Spoofax/IMP also supports semantic error services such as displaying type check errors and cross-references, but that requires the WebDSL compiler to be refactored from a batch compiler to an incremental compiler.
October 23, 2009 23:16
October 18, 2009
The Platform Independent Language PIL now has its own website with a manual for the language and downloads of the compiler.
You can use PIL as target for your domain-specific language and have it work on all PIL supported platforms.
Currently only Java and Python are supported, but creation of PIL back-ends is cheap, probably much cheaper than making dedicated back-ends for your own DSL.
You can get involved and contribute a new back-end for a new platform.
To start we would like to have back-ends for PHP, C#, C, and Objective C, but any others are welcome too.
October 18, 2009 20:35
October 10, 2009
While attending GPCE, SLE, and MoDELS in Denver, I recorded episode 2 of The Big Scheme
of Things screencast series. Using a simple to-do application as
example application, I illustrate various aspects of the use of WebDSL, a domain-specific language for web applications.
In this episode, I show the use of templates to divide a page definition into smaller fragments, the definition of action in templates, and the use of icons as actions.
Download:
512x384,
768x576,
1024x768
[This is as much an experiment in screencasting, as it is a WebDSL tutorial. Feedback is welcome. Thanks for the feedback on the previous episode. For this recording I have used a proper microphone (Rode Podcaster) instead of the built-in microphone of my MacBook; sound quality should be better. I still have to look into other formats than Quicktime; suggestions for useful formats are welcome. An iTunes feed is in the planning as well. There was a request for a start and end tune; apart from the requirement that such a tune should be copyright free, music is not my strong point; suggestions are welcome.]
Continue reading "The Big Scheme of Things: Episode 2 -- Templates"
October 10, 2009 13:53
October 05, 2009
One of the tricky aspects of getting a web application right, is data validation. That is, checking that inputs to forms is correct in some sense that is relevant for the application. For example, the input for a year field should be an integer that represents a year that sensible in the context of the application. Data validation actually goes beyond checking validity of single input fields in a form and includes invariants over the data model that should be maintained, and assertions that should hold at some point in processing input data.
The problem of data validation is two-fold. First, we'd like a declarative way to specify validation rules. Second, the checking of these rules should be integrated in the rest of the application. In particular, injecting error messages in the UI.
In a paper that Danny Groenewegen will present this afternoon at the
Software Language Engineering (SLE 2009) we describe how we have extended
WebDSL with declarative data validation rules that are checked automatically and for which error messages are injected in the UI.
Danny M. Groenewegen, Eelco Visser. Integration of Data Validation and User Interface Concerns in a DSL for Web Applications. In Mark G. J. van den Brand, Jeff Gray, editors, Software Language Engineering, Second International Conference, SLE 2009, Denver, USA, October, 2009. Revised Selected Papers. Lecture Notes in Computer Science, Springer, 2009.
[
researchr]
Abstract: Data validation rules constitute the constraints that data input and processing must adhere to in addition to the structural constraints imposed by a data model. Web modeling tools do not address data validation concerns explicitly, hampering full code generation and model expressivity. Web application frameworks do not offer a consistent interface for data validation. In this paper, we present a solution for the integration of declarative data validation rules with user interface models in the domain of web applications, unifying syntax, mechanisms for error handling, and semantics of validation checks, and covering value well-formedness, data invariants, input assertions, and action assertions. We have implemented the approach in WebDSL, a domain-specific language for the defi- nition of web applications.
October 05, 2009 09:34
October 04, 2009
Tonight I was reminiscing with Tijs van der Storm about conferences of the past, in particular the day that I wrote a Quine in Stratego at OOPSLA 2004. I reported about that in my pre-blog and at the Stratego/XT site, but not in this blog. Since it was a fun example, I thought it deserved a place on my official blog. Here goes:
After GPCE/OOPSLA in Vancouver Tijs van der Storm challenged me to write a Stratego
program that prints its own source. So I set to work, with the following result.
I'll make it educational and reconstruct the design process.
The basic idea
The core of a self printing program is that it should contain its own source
and duplicate that. The following program implements this idea, but with
'prefix' and 'suffix' instead of the
----------------------------------
module quine
imports lib
strategies
main =
!["prefix", "suffix"]
; \ [x, y] -> [x, x, y, y] \
; concat-strings
; (stdout, [])
; 0
-----------------------------------
Compiling and running this program produces:
prefixprefixsuffixsuffix
Quoting the source
Next we replace 'prefix' and 'suffix' by the prefix and suffix of the program
with respect to the list. Note that we don't bother with the layout of the
quoted fragment. Note that the backslashes of the anonymous rewrite rule need
to be escaped in the string.
------------------------------------------------------------------------------------------
module quine
imports lib
strategies
main =
!["module quine imports lib strategies main = ![",
"]; \\ [x, y] -> [x, x, y, y] \\; concat-strings; (stdout, []); 0"]
; \ [x, y] -> [x, x, y, y] \
; concat-strings
; (stdout, [])
; 0
------------------------------------------------------------------------------------------
The output of this program is
------------------------------------------------------------------
module quine imports lib strategies main = ![module quine
imports lib strategies main = ![]; \ [x, y] -> [x, x, y, y] \;
concat-strings; (stdout, []); 0]; \ [x, y] ->
[x, x, y, y] \; concat-strings; (stdout, []); 0
------------------------------------------------------------------
which is starting to look good, but not quite there, since it doesn't compile.
Getting the quotes right
What we have to do now is introduce quotes in the printed program, such that the pieces
of code in the list are actually parsed as strings.
------------------------------------------------------------------------------------------
module quine
imports lib
strategies
main =
!["module quine imports lib strategies main = ![\"",
"\"]; \\ [x, y] -> [x, x, y, y] \\; concat-strings; (stdout, []); 0"]
; \ [x, y] -> [x, x, "\",\"", y, y] \
; concat-strings
; (stdout, [])
; 0
------------------------------------------------------------------------------------------
The output of this program is
------------------------------------------------------------------
module quine imports lib strategies main = !["module quine
imports lib strategies main = !["",""]; \ [x, y] -> [x, x, y, y] \;
concat-strings; (stdout, []); 0"]; \ [x, y] ->
[x, x, y, y] \; concat-strings; (stdout, []); 0
------------------------------------------------------------------
which still does not compile because of the doublequotes embedded in the string.
Getting the quotes more right
We need to escape the embedded strings. This can be achieved easily by using the
escape strategy from the library, which escapes doublequotes and backslashes.
------------------------------------------------------------------------------------------
module quine
imports lib
strategies
main =
!["module quine imports lib strategies main = ![\"",
"\"]; \\ [x, y] -> [x, x, \"\\\",\\\"\", y, y] \\; concat-strings; (stdout, []); 0"]
; \ [x, y] -> [x, x, "\",\"", y, y] \
; concat-strings
; (stdout, [])
; 0
------------------------------------------------------------------------------------------
This produces (without the newlines)
-----------------------------------------------------------------------------
module quine imports lib strategies main = !["module quine imports
lib strategies main = ![\"","\"]; \\ [x, y] -> [x, x,
\"\\\",\\\"\", y, y] \\; concat-strings; (stdout
, []); 0"]; \ [x, y] -> [x, x, "\",\"", y, y] \;
concat-strings; (stdout, []); 0
-----------------------------------------------------------------------------
Bootstrapping
Now the last program is not exactly the same as its predecessor, but if we
compile and run it, it produces its own source literally.
October 04, 2009 21:18
October 03, 2009
(Update: the position has been filled)
Yesterday, I had an interesting phone discussion with researchers from T-Mobile, which strengthened me in my plan to make the next
MoDSE case study about abstractions for mobile applications, and to make that the focus for the open
postdoc position in the project.
The
Software Engineering Research Group of
Delft University of Technology has a vacancy for a postdoc position in the “
Delft University of Technology has a vacancy for a postdoc position in the “
Delft University of Technology has a vacancy for a postdoc position in the “
Delft University of Technology has a vacancy for a postdoc position in the “
Model-Driven Software Evolution (MoDSE)” project funded by the Jacquard Software Engineering Research program of NWO.
Job description: The candidate will lead the next MoDSE case study in domain-specific language engineering into the domain of mobile applications. The case study will consist of finding high-level, platform independent abstractions for mobile applications and creating a collection of domain-specific languages covering the domain, building on the experience and tools from the
WebDSL case study.
Requirements: The candidate should have a PhD degree in Computer Science with a strong background in software engineering, specifically software language engineering, model-driven engineering, and/or programming languages. Ideally the candidate has experience with language design and implementation.
Contract: 2 years, 8 months
Salary: Maximum of €3755 per month gross depending on past experience.
TU Delft offers an attractive benefits package, including a flexible work week, free high-speed Internet access from home, and the option of assembling a customized compensation and benefits package (the 'IKA'). Salary and benefits are in accordance with the Collective Labour Agreement for Dutch Universities.
How to apply: Send applications consisting of an application letter and a detailed CV in PDF by
email to
Eelco Visser
October 03, 2009 01:06
September 21, 2009
WebDSL is a domain-specific language for building web applications. WebDSL
avoids the boilerplate that is common in other web technologies by
providing tuned domain-specific notations for the concerns of web
development. Furthermore, the consistency of these concerns is verified
at compile-time.
WebDSL is available from webdsl.org.
There you can also find instructions for installing the compiler and the
other components that you need to run WebDSL applications.
In this series of screencasts I will illustrate the use of WebDSL, by
building a web application from scratch. The application we are going to
build is called 'The Big Scheme of Things', a to-do list application.
In this first episode I keep it really simple and create a single page
app that allows us to add and remove tasks. First I show how to get an
even simpler "Hello web!" app running.
Download:
512x384,
768x576,
1024x768
(This is as much an experiment in screencasting, as it is a WebDSL tutorial. Feedback is welcome.)
Continue reading "The Big Scheme of Things: Episode 1 "
September 21, 2009 08:14
September 09, 2009
WebDSL is not the first domain-specific language I have designed and implemented.
With each language the question is which platform to develop the language for. That is, which language should the compiler generate or should the interpreter be written in and which libraries to use with that language as run-time system.
After the first generation of the ASF+SDF MetaEnvironment that was written in LeLisp, we decided in the 1990's that the second generation should target C as a fairly portable and ubiquitous 'assembly language'. Thus, the prototype implementation of the SGLR parser for SDF was written in C.
For my next project, the transformation language Stratego, I also chose C as platform with the ATerm library as run-time system providing term representation and garbage collection.
When I started work on WebDSL in 2007, Java and the JVM provided the best platform for web application development. Thus, Java is the target language generated from WebDSL applications. (The set frameworks used to create the run-time system is subject to change.)
SDF, Stratego, and WebDSL transcend the platforms they are implemented on. Each of the languages provides abstractions that are independent of the implementation platform and could easily be used on other platforms as well.
However, creating back-ends for all sorts of platforms is a tedious and laborious undertaking, and then maintaining the consistency of these back-ends is worse.
This problem is not unique to languages that I come up with, but is rather a universal problem for domain-specific languages.
To address this problem as part of our work on a
general approach to the development and evolution of domain-specific languages, we are developing PIL, a
platform independent language that abstracts from the differences between object-oriented languages.
Back-ends that translate PIL to languages such as Java and Python are much smaller than the back-end of the WebDSL compiler, which encapsulates a lot of domain-specific implementation knowledge about the domain of web applications.
With PIL we hope that we can not only develop domain-specific platform-independent abstractions, but also provide portable implementations for a variety of platforms.
A paper about PIL has been accepted for the second conference on Software Language Engineering:
Zef Hemel, Eelco Visser. PIL: A Platform Independent Language for Retargetable DSLs. In Mark G. J. van den Brand, Jeff Gray, editors,
Software Language Engineering, Second International Conference, SLE 2009, Denver, USA, October, 2009. Lecture Notes in Computer Science, Springer, 2009. (to appear)
[
researchr]
Abstract:
Intermediate languages are used in compiler construction to simplify
retargeting compilers to multiple machine architectures. In the
implementation of domain-specific languages (DSLs), compilers
typically generate high-level source code, rather than low-level
machine instructions. DSL compilers target a software platform, i.e.
a programming language with a set of libraries, deployable on one or
more operating systems. DSLs enable targeting multiple
software platforms if its abstractions are platform independent. While
transformations from DSL to each targeted platform are often
conceptually very similar, there is little reuse between
transformations due to syntactic and API differences of the target
platforms, making supporting multiple platforms expensive. In this
paper, we discuss the design and implementation of PIL, a Platform
Independent Language, an intermediate language providing a layer of
abstraction between DSL and target platform code, abstracting from
syntactic and API differences between platforms, thereby removing the
need for platform-specific transformations. We discuss the use of PIL
in an implemementation of WebDSL, a DSL for building web applications.
September 09, 2009 18:25
August 27, 2009
At OOPSLA this year, Danny Groenewegen and I will be giving demonstrations of WebDSL.
There will probably be two slots scheduled in the program for the demonstration, and we'll be happy to give unscheduled demonstrations throughout the conference.
We wrote a little paper for the proceedings that gives an impression of what we will be showing.
Abstract:
WebDSL is a domain-specific language for the development of web applications that integrates data-models, userinterface models, actions, validation, access control, and workflow. The compiler verifies the consistency of applications and generates complete implementations in Java or Python. We illustrate the key concepts of the language with a small web application.
August 27, 2009 07:59
July 08, 2009
Stratego/XT 0.17 is now available from
http://www.stratego-language.org/Stratego/StrategoRelease017
Stratego/XT 0.17 introduces major improvements across the board,
including language additions, a new compiler library, numerous
improvements to the compiler, significant changes to the library
handling, new libraries for parsing, pretty printing and term
validation, 64-bit support, stack traces and more.
For this release, over 200 outstanding issues have been addressed,
much thanks to the efforts of external bug reporters and contributors.
The manual has been updated to reflect the changes made to the
language and libraries, and the new libraries come with up-to-date
source code documentation.
This release of Stratego/XT has taken a while to be promoted to a
major release. After the release of 0.16 we had decided that
'major releases' were an archaic notion, since most users subscribe
to the continuous integration builds produced by the buildfarm.
However, major releases do provide stable baselines and a better
indication of progress of the project. To improve clarity and
stability we are changing the release policy. Our goal is to release
a new major version of Stratego/XT every few months.
July 08, 2009 09:35
July 07, 2009
After some recent activity on the psat-dev mailinglist I became aware of the (lack of) available builds for php-sat. Even though the development speed is not what I would want it to be (so much fun things to do, so little hours in a day!) I still believe it is important to release early and often.
Fortunately, Eelco Dolstra had some time to migrate php-front, php-sat and php-tools to Hydra, the new Nix-based continuous build system. After some tweaking we now again have access to unstable build for all PSAT-projects. Go Hydra!

July 07, 2009 22:19
June 10, 2009
Our paper on StringBorg is being published by Science of Computer programming:
M. Bravenboer, E. Dolstra, and E. Visser. Preventing Injection Attacks with Syntax Embeddings. A Host and Guest Language Independent Approach. Science of Computer Programming, 2009.
StringBorg is a technique for embedding 'string' languages in general purpose languages in a safe way, to avoid injection attacks.
The paradigmatic example is the embedding of SQL queries, which typically is done using string literals as in the following example:
String userName = getParam("userName");
String password = getParam("password");
String query = "SELECT id FROM users "
+ "WHERE name = ’" + userName + "’ "
+ "AND password = ’" + password + "’";
if (executeQuery(query).size() == 0)
throw new Exception("bad user/password");
In these approaches it is very easy to forget to escape SQL meta characters in the values obtained from the client. This opens the
door to an attack through a query that escapes from the programmed query.
StringBorg prevents such attacks by syntactically embedding the query language in the host language. For example, the query
above can then be written as follows:
SQL q = | SELECT id FROM users
WHERE name = ${userName} AND password = ${password} |>;
if (executeQuery(q.toString()).size() == 0) ...
Now, the syntax of the query is checked statically. But more importantly, at run-time the query is constructed by a query
API that ensures that the query constructed has the same syntactic structure as the one defined by the programmer.
Furthermore, it enforces escaping meta-characters in values spliced into the query, thus guaranteeing that no injection
attacks can occur.
The paper does not just provide a solution for embedding SQL in Java, but offers a generic approach for embedding
any
guest language in
any host language with little more effort than providing syntax definitions for host and guest language.
Abstract: Software written in one language often needs to construct sentences in another language, such as SQL queries, XML output, or shell command invocations. This is almost always done using unhygienic string manipulation, the concatenation of constants and client-supplied strings. A client can then supply specially crafted input that causes the constructed sentence to be interpreted in an unintended way, leading to an injection attack. We describe a more natural style of programming that yields code that is impervious to injections by construction. Our approach embeds the grammars of the guest languages (e.g. SQL) into that of the host language (e.g. Java) and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate. This approach is generic, meaning that it can be applied with relative ease to any combination of context-free host and guest languages.
June 10, 2009 08:38
May 30, 2009
For the participants of our hands-on tutorial on
Creating DSLs with Stratego/XT
at the
Code Generation 2009 conference,
we (i.e. Rob Vermaas) created a
VirtualBox
image with all the software needed during the tutorial.
In particular, it contains a full installation of
Stratego/XT
with compiler, libraries, and auxiliary packages such as java-front.
In addition, the image contains a built-from-source installation of
the WebDSL
language for building web applications, and an installation of tomcat
for deploying created web apps.
The image and instructions for its installation are available from
http://strategoxt.org/Stratego/CodeGeneration2009Tutorial
During the tutorial additional material will be handed out with concrete exercises.
The virtual machine is also useful for exploring Stratego/XT and WebDSL outside the context of this particular tutorial.
May 30, 2009 17:41
May 23, 2009
Spoofax/IMP is a toolset for the creation of interactive development environments for custom languages based on domain-specific languages for editor services. The toolset is especially aimed at the developers of domain-specific languages, allowing them to provide IDE support for their specialist language under development. An important feature of Spoofax/IMP is the support for language composition, i.e. for languages consisting of multiple, syntactically different, sub-languages. Furthermore, the toolset allows the customization of heuristically generated editor services without loosing the ability to regenerate these services when a language evolves.
At
LDTA 2009 we presented a paper about Spoofax/IMP. The final version of that paper is now finished, and a pre-print is available.
L. C. L. Kats,
K. T. Kalleberg, and
E. Visser. Domain-Specific Languages for Composable Editor Plugins.
In T. Ekman and J. Vinju, editors, Proceedings of the
Ninth Workshop on Language Descriptions, Tools, and Applications (LDTA 2009),
Electronic Notes in Theoretical Computer Science. Elsevier Science Publishers, April 2009.
[
pdf]
Abstract:
Modern IDEs increase developer productivity by incorporating many
different kinds of editor services. These can be purely syntactic,
such as syntax highlighting, code folding, and an outline for
navigation; or they can be based on the language semantics, such as
in-line type error reporting and resolving identifier declarations.
Building all these services from scratch requires both the extensive
knowledge of the sometimes complicated and highly interdependent APIs
and extension mechanisms of an IDE framework, and an in-depth
understanding of the structure and semantics of the targeted language.
This paper describes Spoofax/IMP, a meta-tooling suite that provides
high-level domain-specific languages for describing editor services,
relieving editor developers from much of the framework-specific
programming. Editor services are defined as composable modules of
rules coupled to a modular SDF grammar. The composability provided by
the SGLR parser and the declaratively defined services allows embedded
languages and language extensions to be easily formulated as
additional rules extending an existing language definition. The
service definitions are used to generate Eclipse editor plugins. We
discuss two examples: an editor plugin for WebDSL, a domain-specific
language for web applications, and the embedding of WebDSL in
Stratego, used for expressing the (static) semantic rules of WebDSL.
May 23, 2009 11:29
May 10, 2009
We just got the notification that our submission to OOPSLA 2009 has been accepted.
The paper presents a solution to error recovery for the SGLR parsing algorithm. Here's the full citation and abstract (pre-print will follow later):
Lennart C. L. Kats, Maartje de Jonge, Emma Nilsson-Nyman, and Eelco Visser.
"Providing Rapid Feedback in Generated Modular Language Environments. Adding Error Recovery to Scannerless Generalized-LR Parsing"
In Gary T. Leavens, editor, Proceedings of the 24th ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA 2009), New York, NY, USA, October 2009. ACM. (to appear).
Abstract: Integrated Development Environments (IDEs) increase programmer
productivity, providing rapid, interactive feedback based on the
syntax and semantics of a language. A heavy burden lies on developers
of new languages to provide adequate IDE support. Code generation
techniques provide a viable, efficient approach to semi-automatically
produce IDE plugins. Key components for the realization of plugins are
the language's grammar and parser. For embedded languages and
language extensions, constituent IDE plugin modules and their grammars
can be combined. Unlike conventional parsing algorithms, scannerless
generalized-LR parsing supports the full set of context-free grammars,
which is closed under composition, and hence can parse language
embeddings and extensions composed from separate grammar modules. To
apply this algorithm in an interactive environment, this paper
introduces a novel error recovery mechanism, which allows it to be
used with files with syntax errors -- common in interactive
editing. Error recovery is vital for providing rapid feedback in case
of syntax errors, as most IDE services depend on the parser -- from
syntax highlighting to semantic analysis and cross-referencing. We
base our approach on the principles of island grammars, and
automatically generate new productions for existing grammars, making
them more permissive of their inputs. To cope with the added
complexity of these grammars, we adapt the parser to support
backtracking. We evaluate the recovery quality and performance of our
approach using a set of composed languages, based on Java and
Stratego.
May 10, 2009 22:19
May 02, 2009
I have been playing around for a couple of minutes with yUML, an online service by Tobin Harris for creating UML diagrams using a textual input language.
The diagram below is generated while you load this page.
The input needed to generate the diagram is the following list of relations:
[Publication]++->*[Author],
[AbstractAuthor]^[Author],
[AbstractAuthor]*->1[Person],
[AbstractAuthor]*->1[Affiliation],
[Person]*->*[Publication],
[Publication]^[PrintPublication],
[PrintPublication]^[Article],
[PrintPublication]^[InProceedings],
[Publication]^[PublishedVolume],
[PublishedVolume]^[Proceedings],
[InProceedings]->[Proceedings],
[AbstractAuthor]^[Editor],
[PublishedVolume]++->*[Editor],
[Person]*->*[PublishedVolume],
[PublishedVolume]^[Book],
[PrintPublication]^[InCollection],
[Book]->*[InCollection] .
This diagram documents a (small) subset of the data model underlying the
researchr.org application for bibliography sharing and reviewing.
*[Author], [AbstractAuthor]^[Author], [AbstractAuthor]*->1[Person], [AbstractAuthor]*->1[Affiliation], [Person]*->*[Publication], [Publication]^[PrintPublication], [PrintPublication]^[Article], [PrintPublication]^[InProceedings], [Publication]^[PublishedVolume], [PublishedVolume]^[Proceedings], [InProceedings]->[Proceedings], [AbstractAuthor]^[Editor], [PublishedVolume]++->*[Editor], [Person]*->*[PublishedVolume], [PublishedVolume]^[Book], [PrintPublication]^[InCollection], [Book]->*[InCollection] ." />*[Author], [AbstractAuthor]^[Author], [AbstractAuthor]*->1[Person], [AbstractAuthor]*->1[Affiliation], [Person]*->*[Publication], [Publication]^[PrintPublication], [PrintPublication]^[Article], [PrintPublication]^[InProceedings], [Publication]^[PublishedVolume], [PublishedVolume]^[Proceedings], [InProceedings]->[Proceedings], [AbstractAuthor]^[Editor], [PublishedVolume]++->*[Editor], [Person]*->*[PublishedVolume], [PublishedVolume]^[Book], [PrintPublication]^[InCollection], [Book]->*[InCollection] ." />*[Author], [AbstractAuthor]^[Author], [AbstractAuthor]*->1[Person], [AbstractAuthor]*->1[Affiliation], [Person]*->*[Publication], [Publication]^[PrintPublication], [PrintPublication]^[Article], [PrintPublication]^[InProceedings], [Publication]^[PublishedVolume], [PublishedVolume]^[Proceedings], [InProceedings]->[Proceedings], [AbstractAuthor]^[Editor], [PublishedVolume]++->*[Editor], [Person]*->*[PublishedVolume], [PublishedVolume]^[Book], [PrintPublication]^[InCollection], [Book]->*[InCollection] ." />
May 02, 2009 09:28
April 30, 2009
I have been invited to give a talk at the
Sixth International Workshop on Web Information Systems Modeling (WISM 2009),
which will be held in June in Amsterdam (co-located with CAiSE 2009).
Here's the abstract I wrote for the talk.
Abstract:
In this talk I give an overview of the design and application of
WebDSL, a domain-specific language for data centric web applications.
WebDSL linguistically integrates the definition of data models, user
interfaces, actions, access control rules, data validation rules,
styling rules, and workflow definitions. While maintaining separation
between these concerns through specialized sub-languages, linguistic
integration ensures static consistency checking and correct code
generation. The language allows developers to concentrate on the
essential design of web applications, abstracting from accidental
complexity, such as the details of data persistence. The combination
of high-level and low-level constructs ensures high expressivity,
while supporting customization to application requirements. The
application of WebDSL is illustrated using the researchr.org
application for bibliography sharing and reviewing.
Links:
April 30, 2009 09:38
April 13, 2009
It was probably unavoidable. Tweetie app on iPod makes it easy. It seems I'm tweeting.
Blogging for the lazy. Let's see if I have more to say, or more frequently at least, on twitter than on this blog.
April 13, 2009 12:23
April 06, 2009
At the Code Generation 2009 conference,
Lennart Kats and I will give a hands-on tutorial
about building DSLs with Stratego/XT. The program says it thus:
Stratego/XT is a state-of-the art language and toolset for the development of domain-specific language implementations. In this session the participants learn to use Stratego/XT, by developing a generator for a small DSL for web applications generating PHP. The tutorial covers declarative syntax definition with SDF, code generation by model transformation, and model-to-model transformation by rewriting. The tutorial is based on a course in model-driven software development developed at Delft University of Technology.
NB Since this is a hands-on session places are strictly limited. Please let us know whether you plan to attend this session when you book your conference place. Places will be allocated on a first-come first-served basis.
April 06, 2009 20:35
March 05, 2009
These days I am writing a book on ‘domain-specific language engineering’ for use
in a master’s course at Delft University. The book is about the design and implementation
of domain-specific languages, i.e. the definition of their syntax, static
semantics, and code generators. But it also contains a dose of linguistic reflection,
by studying the phenomenon of defining languages, and of defining languages for
defining languages (which is called meta-modelling these days).
Writing chapters on syntax definition and modeling of languages, takes me
back to my days as a PhD student at the University of Amsterdam. Our quarters
were in the university building at theWatergraafsmeer, which was connected to the
CWI building via a bridge. Since the ASF+SDF group of Paul Klint was divided
over the two locations, meetings required a walk to the other end of the building.
So, I would regularly wander to the CWI part of the building to chat.
[While the third application we learned to use in our Unix course in 1989 was talk, with which
one could synchronously talk with someone else on the internet (the first application was probably
csh and the second email), face-to-face meetings were still the primary mode of communication;
as opposed to the use of IRC to talk to one’s officemate.]
Often I
would look into Jan Heering’s office to say hi, and more often than not would end
up spending the rest of the afternoon discussing research and meta-research.
One of the recurring topics in these conversations was the importance of examples.
Jan was fascinated by the notion of ‘programming by example’, i.e. deriving
a program from a bunch of examples of its expected behaviour, instead of a rigorous
and complete definition for all cases. But the other use of examples was for
validation, a word I didn’t learn until long after writing my thesis.
The culture of the day (and probably location?) was heavily influenced by
mathematics and theoretical computer science. The game was the definition, preferably
algebraically, of the artifacts of interest, and then, possibly proving interesting
properties. The application minded would actually implement stuff. As a language
engineer I was mostly interested in making languages with cool features. The motivation
for these features was often highly abstract. The main test example driving
much of the work on the ASF+SDF MetaEnvironment was creating an interactive
environment for the Pico language (While with variable declarations). The idea
being that once an environment for Pico was realized, creating one for a more realistic
language would be a matter scaling up the Pico definition (mere engineering).
Actually making a language (implementation) and using that to write programs
would be a real test. To be fair, there were specifications of larger languages undertaken,
such as ones of (mini-) ML [8] and Pascal [6]. As a student I had developed a
specification of the syntax and static semantics of the object-oriented programming
language Eiffel [16], but that was so big it was not usable at the Sun workstations
we had at that time.
Time and again, Jan Heering would stress the importance of real examples to
show the relevance of a technique and/or to discover the requirements for a design.
While I thought it was a cool idea, I didn’t have examples. At least not to sell the
design and implementation of SDF2, the syntax definition formalism that turned
out to be the main contribution of my PhD thesis [20].
Continue reading "Example-Driven Research"
March 05, 2009 10:20
February 25, 2009
If you're doing research into domain-specific languages, model-driven engineering, or program generation, your agenda for the coming months is set. Early October the three main conferences on these topics are co-located in Denver. The deadlines are somewhat spread, so you should be able to submit a paper to each conference:
May 10:
Model Driven Engineering Languages and Systems (MODELS'09)
May 18:
Generative Programming and Component Engineering (GPCE'09)
July 10:
Software Language Engineering (SLE 2009)
I'm looking forward to your submission, and to meeting you in Denver.
February 25, 2009 14:50
February 01, 2009
Lennart Kats, Eelco Visser and myself just got a paper accepted to LDTA'09. The paper is about declarative languages for describing programming editors. The main part of it is Lennart's work, but it's running on top of the Spoofax transformation infrastructure. The idea is simple: You don't want to fight with Java, complicated APIs and complicated XML when you implement an Eclipse-based editor for your DSL. Instead, you describe your language's grammar with SDF, provide some auxiliary information using our declarative editor languages, and Spoofax/IMP does the rest by generating the editor engine for you.
The abstract explains it in the usual academic style:
Modern IDEs increase developer productivity by incorporating many different kinds of editor services. These can be purely syntactic, such as syntax highlighting, code folding, and an outline for navigation; or they can be based on the language semantics, such as in-line type error reporting and resolving identifier declarations. Building all these services from scratch requires both the extensive knowledge of the sometimes complicated and highly interdependent APIs and extension mechanisms of an IDE framework, and an in-depth understanding of the structure and semantics of the targeted language.
This paper describes Spoofax/IMP, a meta-tooling suite that provides high-level domain-specific languages for describing editor services, relieving editor developers from much of the framework-specific programming. Editor services are defined as composable modules of rules coupled to a modular SDF grammar. The composability provided by the SGLR parser and the declaratively defined services allows embedded languages and language extensions to be easily formulated as additional rules extending an existing language definition. The service definitions are used to generate Eclipse editor plugins.
We discuss two examples: an editor plugin for WebDSL, a domain-specific language for web applications, and the embedding of WebDSL in Stratego, used for expressing the semantic rules of WebDSL.
Once I get bibtex-tools running on 64bit again, I'll link to the bib and pdf.
February 01, 2009 18:41
January 17, 2009

Like last year, I'm going to FOSDEM. Last year, I traveled from Paris with the express train (great experience). This year, it's back to planes again. Not looking forward to the security hysteria.
If you're interested in meeting me there, don't hesitate to fire off an e-mail.
January 17, 2009 21:55
December 19, 2008
As mentionted before, we've been doing some real parsing research to better support parsers for extensible languages. Parse table composition provides separate compilation for syntax components such that syntax extensions can be provided as plugins to a compiler for a base language. Due to various distractions last Summer I seem to have forgotten to blog about the paper that Martin Bravenboer and I got accepted at the first international conference on Software Language Engineering (which Martin was looking forward too).
M. Bravenboer and E. Visser. Parse Table Composition. Separate Compilation and Binary Extensibility of Grammars. In D. Gasevic and E. van Wyk, editors,
First International Conference on Software Language Engineering (SLE 2008). To appear in Lecture Notes in Computer Science, Heidelberg, 2009. Springer.
[
pdf]

Abstract:
Module systems, separate compilation, deployment of binary
components, and dynamic linking have enjoyed wide acceptance in
programming languages and systems. In contrast, the syntax of
languages is usually defined in a non-modular way, cannot be
compiled separately, cannot easily be combined with the syntax of
other languages, and cannot be deployed as a component for later
composition. Grammar formalisms that do support modules use whole
program compilation.
Current extensible compilers focus on source-level extensibility,
which requires users to compile the compiler with a specific
configuration of extensions. A compound parser needs to be
generated for every combination of extensions. The generation of
parse tables is expensive, which is a particular problem when the
composition configuration is not fixed to enable users to choose
language extensions.
In this paper we introduce an algorithm for
parse table
composition to support separate compilation of grammars to
parse table components. Parse table components can be
composed (linked) efficiently at runtime, i.e. just before
parsing. While the worst-case time complexity of parse table
composition is exponential (like the complexity of parse table
generation itself), for realistic language combination scenarios
involving grammars for real languages, our parse table composition
algorithm is an order of magnitude faster than computation of the
parse table for the combined grammars.
The experimental parser generator is available
online.
December 19, 2008 09:50
December 16, 2008
A few weeks ago an e-mail from Didier Garcin popped up on the Stratego mailing list. He explained that he had written a python script that could visualize an abstract syntax signature. The script was also send to the list and I finally had some time to check it out.
It turned out that I had almost anything installed to use the script, only the pydot dependency needed some work. This was mostly because the script only seems to work with the 0.9.10 version of this library. After getting the script started it was really simple to generate the signature from the PHP-Front grammar.
So, here is the one for PHP4:

And the one for PHP5:

The first thing I notices where the big rectangles in both versions. These rectangles are the statements (smaller one) and expressions grouped together. What I also notices was that in both versions we see that the bottom of the graph (which corresponds with the smallest units in the language) looks the most complicated. This corresponds very well with the amount of effort put into the modeling of this part of the language.
Comparing both grammars to each other we can see that the latest versions is the most complicated one. Furthermore, if we compare both graphs to the graphs shown in this post we can see that the Java-graph appears to be the most similar one.
Actually, I do not think these images show anything, but please explain it to me when you think I just don't see it. Anyway, at least we have some nice pictures now :)
P.S. for those who are interested, more detailed images (in svg-format) are available here.
December 16, 2008 21:36
December 15, 2008
Last Summer I attended the Code Generation 2008 conference in Cambridge to give a tutorial on WebDSL, as case study in domain-specific language engineering. The conference was an interesting change from the usual academic conferences I visit, in that the majority of the audience were from industry. It was good to see the interest in code generation in industry, but also disconcerting to observe the gap between academic research and industrial practice; but more about that some other time.

During the conference I was interviewed by
Laurence Tratt
for
Software Engineering Radio about
parsing.
The interview podcast recently appeared as
Episode 118.
It was a long time ago (1997) that I defended my PhD thesis, which was mostly about syntax definition and parsing.
In particular, I introduced SDF2, which radically integrates lexical and context-free syntax, and the SGLR parsing algorithm for parsing arbitrary 'character-level' context-free grammars.
Since finishing my thesis I have done quite a bit of `applied parsing research', using SDF and SGLR for applications such as
meta-programming with concrete object syntax and
DSL embedding, but I don't consider myself a hard-core parsing researcher any more.
So I had to dig deep in my memory to talk about Noam Chomsky's language hierarchy, grammars as string rewrite systems, and parsing algorithms. I find the result a bit awkward to listen to, but people assure me that is because it is my own voice I'm listening too.
In the meantime my relation to parsing is changing again.
While SDF/SGLR still provides the best approach to declarative definition of composite languages (in my opinion at least),
it has some fundamental limitations which have never been addressed.
A first step in addressing these limitations was taken in the SLE 2008 paper with Martin Bravenboer on parse table
composition (see upcoming blog) to provide separate compilation for grammars.
With a new PhD student starting in the new year, I hope to address other limitations such as the lack of error recovery.
December 15, 2008 21:11
December 12, 2008
The paper "Decorated Attribute Grammars" by Lennart Kats, Tony Sloane and Eelco Visser has been accepted for presentation at the International Conference on Compiler Construction (CC 2009) to be held in March 2009 in York (UK).
[pdf]

Abstract:
Attribute grammars are a powerful specification formalism for
tree-based computation, particularly for software language
processing. Various extensions have been proposed to abstract
over common patterns in attribute grammar
specifications. These include
various forms of copy rules to support non-local dependencies,
collection attributes, and expressing dependencies that are
evaluated to a fixed point. Rather than implementing
extensions natively in an attribute evaluator, we propose
attribute decorators that describe an abstract
evaluation mechanism for attributes, making it possible to
provide such extensions as part of a library of
decorators. Inspired by strategic programming, they are
specified using generic traversal operators. To demonstrate
their effectiveness, we describe how to employ decorators in
name, type, and flow analysis.
The ideas have been implemented in Aster, an extension of Stratego
with reference attribute grammars.
December 12, 2008 09:45
December 04, 2008
I'm working on the specification of pointer analysis for Java using Datalog. Basically, a pointer analysis computes for each variable in a program the set of objects it may point to at run-time.
For this purpose I need to express parts of the JVM Spec in Datalog as well. As a simple example, the following Datalog rules define when a class is a subclass of another class.
/**
* JVM Spec:
* - A class A is a subclass of a class C if A is a direct
* subclass of C
*/
Subclass(?c, ?a) <-
DirectSubclass[?a] = ?c.
/**
* JVM Spec:
* - A class A is a subclass of a class C if there is a direct
* subclass B of C and class A is a subclass of B
*/
Subclass(?c, ?a) <-
Subclass(?b, ?a),
DirectSubclass[?b] = ?c.
As you can see, this is remarkably close to the original specification (quoted in comments). You can clearly see the relationship between the spec and the code, even if you are not familiar with Datalog.
Recently, I was working on the specification of the checkcast instruction. This instruction performs the run-time check if an object can be cast to some type. The JVM Spec for checkcast first defines some variables:
The following rules are used to determine whether an objectref that
is not null can be cast to the resolved type: if S is the class of
the object referred to by objectref and T is the resolved class,
array, or interface type, checkcast determines whether objectref can
be cast to type T as follows:
So, this basically says that we're checking the cast (T)
S.
The first rule for this cast is straightforward:
If S is an ordinary (nonarray) class, then:
- If T is a class type, then S must be the same class as T, or a
subclass of T.
- If T is an interface type, then S must implement interface
T.
Well, if you're somewhat familiar with Java, or object-oriented
programming, then this part is obvious. Again, the specification in
Datalog is easy:
CheckCast(?s, ?s) <-
ClassType(?s).
CheckCast(?s, ?t) <-
Subclass(?t, ?s).
CheckCast(?s, ?t) -
ClassType(?s),
Superinterface(?t, ?s).
However, the next alternative in the specification is confusing:
If S is an interface type, then:
- If T is a class type, then T must be Object.
- If T is an interface type, then T must be the same interface as
S or a superinterface of S.
The specification is crystal clear, but how can S ever be an interface
type? S is the type of the object that is being cast, and how can an
object ever have a run-time type that is an interface? Of course, the
static type of an expression can be an interface, but we're talking
about the run-time here!
I searched
the web, which only resulted in a few hits. There was one question on a Sun forum years ago, where the one answer didn't make a lot of sense.
It turns out that this is indeed an `impossible' case. The reason why
this item is in the specification, is because checkcast is recursively
defined for arrays:
If S is a class representing the array type SC[], that is, an array of
components of type SC, then:
- ...
- If T is an array type TC[], that is, an array of components of
type TC, then one of the following must be true:
- ...
- TC and SC are reference types, and type SC can be cast to TC
by recursive application of these rules.
So, if you have an object of type List[] that is cast to
an Collection[], then the rules for checkcast get
recursively invoked for the types S = List and T =
Collection. Notice that List is an interface, but an object can
have type List[] at run-time. If have not verified this with the JVM
Spec maintainers, but as far as I can see, this is the only reason why
the rule for interface types is there.
Just to show a little bit more of my specifications, here is the rule
for the array case I just quoted from the JVM Spec:
CheckCast(?s, ?t) <-
ComponentType[?s] = ?sc,
ComponentType[?t] = ?tc,
ReferenceType(?sc),
ReferenceType(?tc),
CheckCast(?sc, ?tc).
Isn't it beautiful how this exactly corresponds to the formal
specification?
Unfortunately, even formal specifications can have errors, so I also
specified a large testsuite that checks the specifications with
concrete code. Here are some of the tests for CheckCast.
test Casting to self
using database tests/hello/Empty.jar
assert
CheckCast("java.lang.Integer", "java.lang.Integer")
test Casting to superclasses
using database tests/hello/Empty.jar
assert
CheckCast("java.lang.Integer", "java.lang.Number")
CheckCast("java.lang.Integer", "java.lang.Object")
test Cast ArrayList to various superinterfaces
using database tests/hello/Arrays.jar
assert
CheckCast("java.util.ArrayList", "java.util.List")
CheckCast("java.util.ArrayList", "java.util.Collection")
CheckCast("java.util.ArrayList", "java.io.Serializable")
test Cast class[] to implemented interface[]
using database tests/hello/Arrays.jar
assert
CheckCast("java.util.ArrayList[]", "java.util.List[]")
CheckCast("java.lang.Integer[]", "java.io.Serializable[]")
test Cast interface[] to superinterface[]
using database tests/hello/Arrays.jar
assert
CheckCast("java.util.List[]", "java.util.Collection[]")
The tests are specified in a little domain-specific language for
unit-testing Datalog that I implemented, initially for IRIS and later for LogicBlox. This tool is similar to
parse-unit,
a tool I wrote earlier for testing parsers in Stratego/XT. The concise syntax
of a test encourages you to write a lot of tests. Domain-specific
languages rock for this purpose!
December 04, 2008 15:03
July 11, 2008
The paper "WebWorkFlow: An Object-Oriented Workflow Modeling Language for Web Applications" by Zef Hemel, Ruben Verhaaf and Eelco Visser has been accepted for presentation at the conference on Model Driven Engineering Languages and Systems (MODELS 2008) to be held in Toulouse, France at the end of September 2008.
Abstract:
Workflow languages are designed for the high-level description
of processes and are typically not suitable for the generation
of complete applications.
In this paper, we present WebWorkFlow, an object-oriented
workflow modeling language for the high-level description of
workflows in web applications.
Workflow descriptions define procedures operating on domain
objects. Procedures are composed using sequential and
concurrent process combinators.
WebWorkFlow is an embedded language, extending WebDSL, a
domain-specific language for web application development, with
workflow abstractions.
The extension is implemented by means of model-to-model
transformations.
Rather than providing an exclusive workflow language,
WebWorkFlow supports interaction with the underlying WebDSL
language. WebWorkFlow supports most of the basic workflow
control patterns.
July 11, 2008 19:39