Planet Stratego

You have a term and it don't look good, who're you gonna call?

May 08, 2008

Eelco Visser

Stratego: 10 years

-------------------------------------------------------------------------
Dear author,

We are pleased to inform you that your paper entitled

     Building Program Optimizers with Rewriting [Strategies]

has been accepted for presentation at ICFP'98. In a next message, you
will receive reviews from the Program Committee that we hope you can
use to improve the final draft, which is due on July 14th. We will
also be sending you an ACM Copyright Release form, which must be
signed and returned by the same deadline.

Congratulations, and thank you for your submittal to ICFP'98.

  Paul Hudak
  Christian Queinnec
  co-Chairs, ICFP'98 Program Committee
-------------------------------------------------------------------------
The text of an email from June 21, 1998 that announced the acceptance at ICFP'98 of the first paper on Stratego written with Zino Benaissa and Andrew Tolmach. The implementation of the language for the paper marked the first version of the language and compiler. The idea of traversal strategies had been done before embedded in ASF+SDF. Just before the conference in September I managed to bootstrap the compiler. The language did not have a name in that paper yet. And the reviews were not very enthusiastic.

May 08, 2008 18:54

April 04, 2008

Eelco Visser

Declarative Access Control for WebDSL

The paper "Declarative Access Control for WebDSL: Combining Language Integration and Separation of Concerns" by Danny Groenewegen and Eelco Visser has been accepted for presentation at the International Conference on Web Engineering (ICWE'08), which will be held in July 2008 in Yorktown Heights, New York. [pdf]

I'm especially proud of this acceptance as it is (1) based on the Master's thesis work of Danny Groenewegen, and (2) the first paper about WebDSL to be accepted in the web engineering research community. (And also the first attempt; the other two papers featuring WebDSL appear in transformation venues.)

Abstract: In this paper, we present the extension of WebDSL, a domain-specific language for web application development, with abstractions for declarative definition of access control. The extension supports the definition of a wide range of access control policies concisely and transparently as a separate concern. In addition to regulating the access to pages and actions, access control rules are used to infer navigation options not accessible to the current user, preventing the presentation of inaccessible links. The extension is an illustration of a general approach to the design of domain-specific languages for different technical domains to support separation of concerns in application development, while preserving linguistic integration. This approach is realized by means of a transformational semantics that weaves separately defined aspects into an integrated implementation.

April 04, 2008 20:27

March 19, 2008

Eelco Visser

Code Generation by Model Transformation

The paper "Code Generation by Model Transformation" by Zef Hemel, Lennart Kats, and Eelco Visser was accepted for presentation at the International Conference on Model Transformation (ICMT'08).

elated Abstract: The realization of model-driven software development requires effective techniques for implementing code generators. In this paper, we present a case study of code generation by model transformation with Stratego, a high-level transformation language based on the paradigm of rewrite rules with programmable strategies that integrates model-to-model, model-to-code, and code-to-code transformations. The use of concrete object syntax guarantees syntactic correctness of code patterns, and supports the subsequent transformation of generated code. The composability of strategies supports two dimensions of transformation modularity. Vertical modularity is achieved by designing a generator as a pipeline of model-to-model transformations that gradually transforms a high-level input model to an implementation. Horizontal modularity is achieved by supporting the definition of plugins which implement all aspects of a language feature. We discuss the application of these techniques in the implementation of WebDSL, a domain-specific language for dynamic web applications with a rich data model.

March 19, 2008 11:14

March 18, 2008

Eelco Visser

Attribute Grammars in Stratego II

Today we had a follow up on last weeks discussion about an attribute grammar extension of Stratego. By now, Nicolas Pierron has created a proper extension of Stratego with attribute equations and made a translation to basic Stratego in combination with the Transformers run-time extension for attribute evaluation support. Next up is the port of the copy rules generator that makes writing attribute equations much less verbose. In the meantime Lennart Kats is working on a JastAdd style implementation and Tony Sloane on a Eli-style (static scheduling) implementation. With these implementations in place we will be able to do some proper exploration of the combination of attribute evaluation and rewriting (strategies). I can't wait to make an implementation of the WebDSL typechecker using the attribute extension. To be continued.

March 18, 2008 11:21

March 14, 2008

Karl Trygve Kalleberg

Proper navigation support for Spoofax

I finally figured out how to add proper navigation history support to Spoofax today. This one has been bugging me for quite some time. I remember spending far too much time diving through the documentation with the hopes of figuring out how this should be done properly. No luck.

Today I had a flash of inspiration, so I dug into the JDT code base. That code seemed to solve the same problem in a very complicated way, so I didn't want to copy their approach outright. Stymied, I started tracing exactly what happens with the navigation history when positions are placed into it. After a bit of fiddling around, I figured out that when I move the cursor, I should mark the position both before and after the cursor/focus moves to get the behaviour of JDT (which I tried to emulate). I've always only tried saving the editor location state either before I changed it, or afterwards. I also tried all kinds of alternative calls on the EditorPart hierarchy in vain. I now use ITextEditor.setHighlightRange() which appears to do the job, provided I call markInNavigatorHistory() "properly".

Anyway, the lesson is simple: if you call AbstractTextEditor.markInNavigationHistory(), remember to do it twice -- once before you change the editor/focus and once afterwards.

March 14, 2008 16:46

March 13, 2008

Karl Trygve Kalleberg

Porting Eclipse IMP from Eclipse 3.2 to 3.3

It's official: I'm the bootstrapper. My hacking life in the last few weeks have hardly been anything but bootsrapping. I've already said a few things about the Stratego compiler hacking. Since it takes ~3-4 hours for a full build of the Stratego compiler in the Delft buildfarm, I've had a couple of other projects to dive into in parallell. One of these has been the porting of Eclipse IMP from Eclipse 3.2 to 3.3.

In short, IMP is an IDE generator based on Eclipse. It provides set of plugins and wizards that makes the development of programming language environments (a lot) easier. The basic workflow when building an IDE for you favourite language with IMP is, (1) provide a grammar defined using the LPG grammar language, (2) use the IMP-provided wizards inside Eclipse to generate things like syntax highlighting support, outline support, code folding support, templates, text hovers, etc, then (3) fill in the skeletons provided by the generator. My personal view (subject to change without warning) of the generated code is that it's a guide to which parts of the Eclipse framework you need to extend in order to provide a given piece of functionality. Sort of a little helpful gnome pointing you in the right direction. In some cases, the generated code will actually do all you want, but more often than not, you will want to go beyond it.

That was the backgrounder on IMP. A major drawback of the current IMP releases is that they will only work on 3.2. Oh, and, of course, that IMP requires IMP to build IMP. Getting this beast ported to 3.3 wasn't as straightforward as I'd hoped. It took a few iterations. The first was getting it to build properly without any problems on my plain 3.2 installation. That took me several days. All kinds of subtle bugs surfaced, presumably because I have a different set of development habits than the IMPers.

Once those were patched and fixed upstream, I managed to bootstrap my first version on 3.2. An ensuing battle with race conditions in the startup code of various plug-ins followed. I hate static initializers, but apparently not everybody does. In a multi-plugin architecture where the order to plugin loading is not guaranteed, I cannot see how you can safely assume the order of static initializers across plugins, but those questions are not for me to ponder. I ripped them out, and replaced them with lazy initializers as far as possible, and that worked wonders. With that hurdle out of the way, it was all down hill: a couple of internal JFace and JDT classes had changed locations and APIs between 3.2 and 3.3, but it was quick enough to rewrite the offending code (another reason why depending on internal APIs is a bitch, though I realize that the features in question could not have been provided without doing so).

It's a huge disappointment to realize that my patches are only a couple of hundreds of lines. I felt like I had to rewrite the world, at places... Anyway, here's hoping to its inclusion in one of the pending releases. I've updated our sdf2imp tool to use the 3.3-based IMP, so we're already seeing a return on my investment:)


March 13, 2008 16:33

March 11, 2008

Eelco Visser

Attribute Grammars in Stratego

This morning I gave my very first lecture on attribute grammars, featuring Knuth's binary numbers AG (ripped from the jastadd.org site), and my own implementation of WebDSL data models in JastAdd. In the slides I explored the implementation of model of JastAdd by including the code it generates. While I had not added any JastAdd code since last week, the presentation further improved my understanding what is going in JastAdd. At the same time, I'm realizing I do not fully understand the design space we are considering, and the distinction between notational and essential differences between the various approaches.

In the afternoon, Nicolas Pierron, a student from the Transformers group at Epita who is doing an internship in Delft, showed the code of his implementation of WebDSL typechecking in the Transformers SDF attribute grammar extension, which is implemented by generating Stratego code with some native code hacks. The hacks basically implement thunks of deferred attribute evaluations for nodes in the tree with pointers to dependent attribute value thunks. Attributes are then evaluated on demand, and thus no static scheduling is done. This approach is essentially the same as that of the encoding of attribute grammars in a lazy functional language such as Haskell (a hobby of my former colleagues in Utrecht). If we would just add heap-bound closures to Stratego, implementing this pattern would become that much easier. The Transformers AG code operates on ATerms without sharing. Again, I'm not sure about the implications of this design choice. Are the Stratego libraries at all usable without maximal sharing? I guess I should ask Nicolas tomorrow. To be continued.

March 11, 2008 21:41

Karl Trygve Kalleberg

Stack tracing improvements

A limitation of my previous stack tracing patches was that io-wrap and io-stream-wrap did not properly report traces on failure. The reason for this is easy to spot if we look at how the error is handled (this is where execution flow ends up when you call io-wrap):

  option-wrap(opts, usage, about, announce, s) =
    parse-options(opts, usage, about)
    ; announce
    ; (s; report-success + report-failure)

  report-failure =
      report-run-time
    ; <fprintnl> (stderr(), [ (), ": rewriting failed"])
    ; <exit> 1

As you can imagine, even though the program now happily prints a stack trace when the main strategy exits with a failure, it will not be printed when exit is called.

I've introduced a couple of stack introspection functions for dealing with this: stacktrace-get-current-frame-name returns the name of the current frame s, stacktrace-get-all-frame-names returns a list of all frame names and, stacktrace-get-current-frame-index returns integer that holds the current depth of the stack. These are actually implemented by primitives in the Stratego Standard Library (SSL).

A caveat of these strategies is that calling them will of course alter the stack. Even in the wonderful world of computing, we're not entirely free of Heisenbergian effects, apparently. However, there's a simple workaround: call the primitives directly, since this bypasses the way the compiler registers the stack frames.

With this trick in hand, I rewrote the two above strategies to include proper stack tracing for io-wrap:

  option-wrap(opts, usage, about, announce, s) =
    parse-options(opts, usage, about)
    ; announce
    ; (s; report-success + prim("SSL_stacktrace_get_all_frame_names") ; report-failure)

  report-failure =
      ?stacktrace
    ; report-run-time
    ; <fprintnl> (stderr(), [ <whoami> (), ": rewriting failed, trace:"])
    ; <reverse ; map(<fprintnl> (stderr(), ["\t", <id>]))> stacktrace
    ; <exit> 1

Applying the modified io-wrap on the following sample program

 
  main = io-wrap(my-wrap(foo))

  my-wrap(s) = s

  foo = debug(!"foo") ; bar

  bar = debug(!"bar") ; fap ; zap

  fap = debug(!"fap") ; id

  zap = debug(!"zap") ; debug ; fail

gives

./prog: rewriting failed, trace:
        main_0_0
        io_wrap_1_0
        option_wrap_5_0
        lifted144
        input_1_0
        lifted145
        output_1_0
        lifted0
        my_wrap_1_0
        foo_0_0
        bar_0_0
        zap_0_0

Due to the compiler lifting inner strategies into freshly named, top-level strategies, the trace will contain some lifted* entries. Also, should you call strategies or rules which are compiled with older versions of the compiler, there will be "dark spots" in your trace. It won't be truncated -- only the frames due to the old library will be hidden.

March 11, 2008 13:05

March 10, 2008

Karl Trygve Kalleberg

Would you like a stack trace with your "rewriting failed"?

Prompted by my visit to EPITA, I hacked together some very basic support for stack traces in Stratego that might come in handy when a Stratego program fails.

Here's a simple Stratego program, called prog (which, if you look at it closely, will always fail):

  main = foo

  foo = bar

  bar = fap ; zap

  fap = id

  zap = fail

On the latest and greatest version of the compiler (build 17522 and later), you will get the following trace when this program is executed:

prog: rewriting failed, trace:
        main_0_0
        foo_0_0
        bar_0_0
        zap_0_0 

There are a number of caveats with the tracing that I will try to get rid of, and, when there are only very hard problems left, explain myself out of, in a couple of future posts.

March 10, 2008 16:49

February 28, 2008

Eelco Visser

WebDSL in JastAdd

Earlier this week I was visiting the Programming Tools Group of Oege de Moor in Oxford. The visit was associated with the refactoring project in which I am an official collaborator. The project aims at studying the high-level description of refactorings. The starting point is to combine the strengths of program analysis with attribute grammars and program transformation with (strategic) rewriting based approaches. They are experimenting with the JastAdd-based Java compiler. For me the project is interesting as we are considering to integrate attribute grammars in Stratego. To start exploring this topic I've started writing an implementation of WebDSL in JastAdd together with Torbjörn Ekman, the main developer op JastAdd, who is a postdoc in Oxford. We got to implement a basic version of the data model sub-language complete with modules and code generation (using the StringTemplate library of Terence Parr). I'm planning to finish the implementation of the data model and use that as the basis for my lecture on JastAdd in the program transformation course. (And then hand the implementation to the students who can then add the implementation of (a subset of) the UI language.) We had further discussions about the relation of between JastAdd and Stratego; although quite different at first sight, there are interesting similarities (that I hadn't realised before). More to follow about that in the future.

February 28, 2008 17:00

February 27, 2008

Karl Trygve Kalleberg

Visit to EPITA

I visited Akim Demaille and his posse at EPITA today, and apparently there still is such as thing as free lunch (although, in my excitement over the good food, I kinda promised to help out with fixing some Stratego issues they are experiencing, so it was not entirely without entanglements).

I got to sit in on one of the bi-weekly status updates for the LRDE. The room numbered a little under 30 people, including students and faculty. They were kind enough to hold the meeting in English so that I could follow it. I found it surprising and very encouraging to have everybody report their progress (and, in a very few instances, lack thereof) in front of the entire lab. I've been missing this in many of the institutions I've been working at. It certainly increases the level of team feeling, and also makes it easier to uncover opportunities for collaboration between the various groups. For example, they all shared a lot of common infrastructure, including setups for newsgroups, a build farm, svn repos, etc.

I met two of the guys from the "previous" Transformers generation, Florian and Maxime. Florian was putting the finishing touches on a visualization tool for ambiguities in Transformers' attributed parse trees. It looked pretty sweet. Maxime was hacking a translator from a DSL for their Olena image processing library.

I also got to meet the new generation of Transformer students. I expect that I'll interact a lot more with them in the coming months, as they come to grips with Stratego.


February 27, 2008 17:55

February 21, 2008

Eelco Visser

WebDSL at Code Generation 2008

My proposal for a tutorial on 'WebDSL: A Case Study in Domain-Specific Language Engineering' has been accepted by the organizers of Code Generation 2008. This conference emerged from the codegeneration.net site, which collects information about code generation techniques and tools. As opposed to the conferences I usually visit, this one attracts quite a crowd from industry, I understand. Last year's event was quite a success, according to attendees I talked to, so I'm looking forward to event in general, and the opportunity to present Stratego/XT, SDF, and WebDSL to industry, in particular. Now think how to squeeze that into 75 min;)

February 21, 2008 21:27

February 13, 2008

Eelco Visser

Pull Deployment of Services

The NWO/EZ Jacquard Software Engineering Program has granted the project Pull Deployment of Services for an amount of 368K Euro which should pay for a PhD student (4 years) and a postdoc (3 years). In the project for which I am principal investigator, we collaborate with Merijn de Jonge from Philips Research and the buildfarm project at TU Delft in which software deployment expert Eelco Dolstra is postdoc. Here's the text from the proposal summary:

Hospitals are complex organizations, requiring the coordination of specialists and support staff operating complex medical equipment, involving large data sets, to take care of the health of large numbers of patients. The information technology infrastructure of hospitals is heterogeneous and may consist of thousands of electronic devices, ranging from workstations to medical equipment such as MRI scanners. These devices are connected by wired and wireless networks with complex topologies with different security and privacy policies applicable to different nodes. Software deployment in such a heterogeneous environment is inherently difficult. In order to make health-care professionals more effective and deployment and maintenance more tractable, the hospital information technology infrastructure is changing from a device-oriented to a service-oriented environment, in which the access to services is decoupled from the physical access to particular devices.

In this project, we propose a pull model for service deployment in which the components comprising a service are distributed over nodes in the network, depending on the network topology, properties of the application, and quality of service requirements. The goal of this project is to expand the state-of-the-art in software deployment to support pull deployment of services. In order to realize this goal we will conduct research in (1) modeling of services and network architectures, (2) technology for distributed deployment, and (3) tools for testing implementations of distributed services. We will build on our previous research in software deployment (Nix) and model-based software development (Stratego/XT). The project will be conducted in close collaboration with Philips as industrial partner and will consist of a series of experiments building prototype systems which implement service distribution scenarios of increasing complexity.

February 13, 2008 10:37

February 10, 2008

Karl Trygve Kalleberg

FOSDEM 2008

I'm going to FOSDEM again this year. A bunch of old friends will be coming, so the opportunity is too good to pass up. Also, since I'll be in Paris at the time around FOSDEM, travel is both fast and reasonably cheap. (Three cheers for high speed trains.)

If you're interested in meeting me there, don't hesitate to fire off an e-mail. There's no Gentoo room this year, so I'll be hanging around elsewhere. I'm bound to drop by the Free Java devroom, for sure:) Another gang I'm anxious to meet again are the Nix people.


February 10, 2008 13:27

February 05, 2008

Eelco Visser

Program Transformation & Generation

Today I started teaching a new master's course on Program Transformation & Generation at Delft University. The course studies techniques principles, techniques, applications of program transformation and generation. Using WebDSL as case study, several paradigms for implementing domain-specific languages will be studied, including term rewriting (Stratego), attribute grammars (Eli, JastAdd), and graph transformation. This is a departure from earlier courses I taught at Utrecht University about same subject. There I would spend a full quarter teaching on just Stratego/XT, which I felt was necessary to prepare master's students for a master project in this area. With the current state of documentation of Stratego/XT, it appears that such a in depth course is no longer necessary. At least, that is the experience with several (PhD) students who recently started developing Stratego applications succesfully without any prior training.

February 05, 2008 20:30

February 04, 2008

Eelco Visser

Generating Editors for Embedded Languages

The paper Generating Editors for Embedded Languages. Integrating SGLR into IMP by Lennart Kats, Karl Trygve Kalleberg, and Eelco Visser has been accepted by Language Descriptions, Tools, and Applications (LDTA'08) to be held in Budapest, Hungary in April 2008 as part of ETAPS'08. The paper reports on the succesful integration of the SGLR parser in the IMP framework for building language-specific Eclipse plugins. Through this integration the capability of SDF/SGLR to support language embedings is extended to the IDE. This project is a first step towards generation of full fledged IDEs from SDF/Stratego language definitions. From the abstract:

Integrated Development Environments (IDEs) increase productivity by providing a rich user interface and rapid feedback for a specific language. Creating an editor for a specific language is not a trivial undertaking, and is a cumbersome task even when working with an extensible framework such as Eclipse. The IMP framework relieves the IDE developer from a significant portion of the required work by providing various abstractions for this. For embedded % domain-specific languages, such as embedded regular expressions, SQL queries, or code generation templates, its LALR parser generator falls short, however. Scannerless parsing with SGLR enables concise, modular definition of such languages. In this paper, we present an integration of SGLR into IMP, demonstrating that a scannerless parser can be successfully integrated into an IDE. Given an SDF syntax definition, the \textsc{sdf2imp} tool automatically generates an editor plugin based on the IMP API, complete with syntax checking, syntax highlighting, outline view, and code folding. Using declarative domain-specific languages, these services can be customized, and using the IMP metatooling framework it can be extended with other features.

February 04, 2008 14:15

January 28, 2008

Karl Trygve Kalleberg

New dissertation page and other Stratego updates

Martin recently got his PhD . It's very well deserved. I've seen first hand how serious and focused he's been for the last 4+ years.

Inspired by his didactical skills, I decided to rearrange my own dissertation page so that the individual chapters of my dissertation are easily downloadable.

Since I don't expect anybody to have neither the time nor the inclination to read the entire thesis from start to finish, Martin's idea of making it available as a split download makes a lot of sense.

Having done this, I got inspired to continue with spring (winter?) cleaning on a lot of other pieces of my PhD work.

I've set up an Ant Ivy repository for Spoofax. This means that you are now able to check out the various Spoofax subprojects from the source code repository and expect each subproject to compile separately, since all its dependencies will be fetched from my Ivy repo. Some of the subprojects require Eclipse. For those, you must run a script, fetch.sh, which will pick out the necessary jars from your Eclipse installation. It would best to have this repo hosted along with the rest of Stratego/XT, since it's definitely part of the Stratego/XT umbrella, but the new infrastructure in Delft is still being set up, I've been told.

Trying my hand as a webmonkey, I've decided to upload new Spoofax pages with a revamped design.

With those things out of the way, I'm now working on a reflection API for Stratego/J so that we may easily instantiate Java objects and call methods on them from Stratego scripts. This is needed for another project I'm cooking. However, I keep running into the lack of a fully interactive Stratego interpreter on the JVM, and that's a very itchy spot just now...;)


January 28, 2008 20:23

January 20, 2008

Martin Bravenboer

Ph.D. Thesis: Exercises in Free Syntax

It has been awfully quiet here, I'm sorry about that. There are a few reasons for that. The first one is that I assembled my PhD thesis from my publications. This took quite some time and energy, but the result is great! My dissertation Exercises Free Syntax is available online. If you are interested in having dead tree version, just let me know!

I will defend my thesis tomorrow, January 21 (see the Dutch announcement). It's weird to realize that tomorrow is the accumulation of 4 years of working intensely!

For the library I created an English abstract. To give you an idea what the thesis is about, let me quote it here:

In modern software development the use of multiple software languages to constitute a single application is ubiquitous. Despite the omnipresent use of combinations of languages, the principles and techniques for using languages together are ad-hoc, unfriendly to programmers, and result in a poor level of integration. We work towards a principled and generic solution to language extension by studying the applicability of modular syntax definition, scannerless parsing, generalized parsing algorithms, and program transformations.

We describe MetaBorg, a method for providing concrete syntax for domain abstractions to application programmers. Since object-oriented languages are designed for extensibility and reuse, the language constructs are often sufficient for expressing domain abstractions at the semantic level. However, they do not provide the right abstractions at the syntactic level. The MetaBorg method consists of embedding domain-specific languages in a general purpose host language and assimilating the embedded domain code into the surrounding host code. Instead of extending the implementation of the host language, the assimilation phase implements domain abstractions in terms of existing APIs leaving the host language undisturbed.

We present a solution to injection vulnerabilities. Software written in one language often needs to construct sentences in another language, such as SQL queries, XML output, or shell command invocations. This is almost always done using unhygienic string manipulation. A client can then supply specially crafted input that causes the constructed sentence to be interpreted in an unintended way, leading to an injection attack. We describe a more natural style of programming that yields code that is impervious to injections by construction. Our approach embeds the grammars of the guest languages into that of the host language and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate.

We study AspectJ as a typical example of a language conglomerate, i.e. a language composed of a number of separate languages with different syntactic styles. We show that the combination of the lexical syntax leads to considerable complexity in the lexical states to be processed. We show how scannerless parsing elegantly addresses this. We present the design of a modular, extensible, and formal definition of the lexical and context-free aspects of the AspectJ syntax. We introduce grammar mixins, which allows the declarative definition of keyword policies and combination of extensions.

We introduce separate compilation of grammars to enable deployment of languages as plugins to a compiler. Current extensible compilers focus on source-level extensibility, which requires users to compile the compiler with a specific configuration of extensions. A compound parser needs to be generated for every combination. We introduce an algorithm for parse table composition to support separate compilation of grammars to parse table components. Parse table components can be composed (linked) efficiently at runtime, i.e. just before parsing. For realistic language combination scenarios involving grammars for real languages, our parse table composition algorithm is an order of magnitude faster than computation of the parse table for the combined grammars, making online language composition feasible.

Also, they asked me for a Dutch, non-technical summary for news websites. For my Dutch readers:

We presenteren een verzameling van methoden en technieken om programmeertalen te combineren. Onze methoden maken het bijvoorbeeld mogelijk om in een programmeertaal die ontworpen is voor algemene doeleinden een subtaal te gebruiken die beter aansluit bij het domain van een bepaald onderdeel van een applicatie. Hierdoor kan een programmeur op een duidelijkere en compactere wijze een aspect van de software implementeren.

Op basis van dezelfde technieken presenteren we een methode die programmeurs beschermt tegen fouten die de oorzaak zijn van het meest voorkomende beveiligingsprobleem, een zogenaamde injectie aanval. Door op een iets andere wijze te programmeren, heeft de programmeur de garantie dat de software niet gevoelig is voor dergelijke aanvallen. In tegenstelling tot eerder voorgestelde oplossingen geeft onze methode absolute garanties, is eenvoudiger voor de programmeur, en kan gebruikt worden voor alle gevallen waarin injectie aanvallen kunnen voorkomen (bijvoorbeeld niet specifiek voor de taal SQL).

Tot slot maken onze technieken het mogelijk om de syntaxis van sommige programmeertalen duidelijker en formeler te definieren. Sommige moderne programmeertalen zijn eigenlijk een samensmelting van verschillende subtalen (zogenaamde taalagglomeraten). Van dergelijke talen was het tot nu toe onduidelijk hoe de syntaxis precies geformuleerd kon worden, wat voor standaardisering en compatibiliteit noodzakelijk is.

January 20, 2008 12:08

December 24, 2007

Karl Trygve Kalleberg

Summer Job Hunting

With my comp.sci PhD finished, printed and published, I'm now back to being a full-time medical student. It's really rewarding and fun -- being an introvert geek, I've learned a lot about how I relate to other people by being shoved into a room with a patient who expects me to talk to him/her about the most intimate details of his/her situation. Alas, being a student doesn't pay at all.

I've still some money left from earlier jobs, and I live quite comfortably, but prudence (and interest) requires me to look for a job this summer as well. If all pending exams go well, I'll get my temporary license at the end of the spring semester, which means I can apply for work as a hospital doctor during the summer. There's no denying that this would be quite a lot of fun, but I'm not all that hopeful -- the competition to get hospital jobs seems fierce, and I've not been a star performer when it comes to medicine, I'm sad to say (but hopefully things will pick up now that I can focus on it).

For these reasons, I'll probably also be looking around for comp.sci jobs as a backup because I'm fairly good at it, it pays well, and it's usually a lot of fun. Also, it's easier to get jobs abroad, even in countries where you don't speak the native language fluenty:)


December 24, 2007 01:42

December 23, 2007

Eelco Visser

www.webdsl.org online

demo Since my last blog post I have been pretty busy teaching a course on programming languages, and developing a web application for the webdsl.org site. It is now finally online. We presented the first release of WebDSL last Thursday using a presentation embedded in the webdsl.org site in the MoDSE Colloquium. As should be the case, the database crashed midway the presentation (probably due to a corruption of the filesystem of the virtual machine we were using), but after a short break and some frantic hacking we got the presentation back on track. By now the site is available outside the TU Delft firewall. The site includes a wiki (with page history), blogs, forums, news items, and an issue tracker. Proper user management should avoid the user registration spam disaster that we experienced with previous (T)wikis. The aim is to evolve the application into a full blown software project management and community site that should be usable by other projects as well. For starters, I am now working on migrating the Stratego/XT and Program Transformation wikis to (clones of) the webdsl.org application. While usable and useful, the application can be improved in numerous ways.

December 23, 2007 11:38

December 08, 2007

Karl Trygve Kalleberg

Stratego Java backend in progress

It's been rather quiet on the northern front for quite some time. I've been mostly busy with diagnosing old ladies with chest pain of late, and trying to make heads and tails of the horrible electronic health record system at the hospital. Sheesh.

Anyway, today I found time to do some compiler hacking. It feels great, as always! I resurrected the strc-java project -- a Java backend for the Stratego compiler. After a couple of hours of fiddling around, I now have an extremely rudimentary runtime up and running, and the compiler can compile simple build expressions properly.

Given the simple strategy

main = !Foo(1,2)

the following Java code is produced:

public static class main_0_0 extends Strategy
{
    public final static main_0_0 instance = new main_0_0();
   public ATerm apply(ATerm term)
    {
      try
      {
        {
          ATerm[] b_0 = new ATerm[2];
          {
            ATerm c_0 = atermFactory.makeInt(1);
            b_0[0] = c_0;
          }
          {
            ATerm d_0 = atermFactory.makeInt(2);
            b_0[1] = d_0;
          }
          ATerm a_0 = atermFactory.makeAppl(atermFactory.makeAFun("Foo", 2, false), b_0);
          term = a_0;
        }
      }
      catch(Failure f)
      {
        return null;
      }
      return term;
    }
  }

There are a number of unnecessary blocks in the above code fragment, but that's an artifact of the way I wrote the Java code templates. I'll see if I can't get rid of them eventually.

I've spent some time hacking about in order to get closures working without too much overhead. I think the current scheme will work, but will require a bit of sophistication and context-awareness in the code generator.

You can see the scheme in the example above. Every strategy is compiled to its own class, with an apply method. The signature for this method is not fixed. Rather, the number of strategy and term arguments may vary. The last argument is always the current term. Every class has a singleton instance, called instance. This is how we get the pointer. All context information that's required will have to be passed in, through the argument list.

There are in principle two possible schemes for passing in arguments. The first is to do as Stratego/J (the Stratego interpreter for Java): use two arrays, e.g. ATerm apply(Strategy[] svars, ATerm[] tvars, ATerm currentTerm). This costs two calls to new (in the general case) for every strategy invocation. Not very appealing.

The other possibility is to sequence the strategy and term arguments in the argument list, e.g.:

ATerm apply(Strategy s0, Strategy s1, ATerm t0, ATerm t0, ATerm currentTerm) 

The problem here is that the arity of s.apply() is not fixed. We really have:

ATerm apply(Strategy<x0,y0> s0, Strategy<x1,y1> s1, ATerm t0, ATerm t0, ATerm currentTerm)

where x and y are the strategy and term arities, respectively. If we were generating C++ code, we could just use integers here. In Java, we'll have to insert real types. I'm tempted to use enums, and manually define the types N0 through N31. Nobody will ever invent a strategy with more than 32 strategy or term arguments, right?

I'll keep mulling this one over a bit. Feel free to drop me a line if you see better solutions.

December 08, 2007 22:38

October 23, 2007

Eric Bouwers

So much interesting to read ...

... and less time to do it!

Since I started my full-time job I noticed that it is a lot harder to stay up to date with all the latest (online) developments. During the writing of my thesis I had a considerable amount of time which I could spend on reading blogs and news-items, and following mailing-lists. When you spend most of your day actually working you learn to prioritize which things you want to read :)

One of the things I did managed to read, albeit a bit late, is
">
">
">this
thread on the PHP.internals list written by Wietse Venema. It is a follow-up on
">
">
">this
thread in which a first proposal was made to integrate a perl-style taint-mode into the core of PHP. The results posted in the follow-up thread look promising. It is also interesting to see that he has gone from a black-and-white taint-mode, to a more leveled approach. Currently the proposal only contains a subset of the levels available in PHP-Sat, but I think that the most fundamental ones are definitely there.

Even though Wietse is developing a prototype I think he will have a hard time getting this taint-mode into the actual core of PHP. Within both threads the general opinion seems to be that the idea is nice, but the developers of PHP seem to think of many situations in which it could fail. I hope to see more results of this idea soon!

Going over some other interesting threads in the internals-list I found a reference to a tool called PHPLint. (Isn't is funny to see that there are all sorts of initiatives popping up that that try to make PHP more secure/stricter). I haven't have time to take a better look at this tool, but a first glance definitely showed potential. I'll try to examine this tool more thoroughly at the end of this week.

While there is less time to read on-line material, there is more time to read off-line stuff. Since I am using public transportation to get to work I have an extra hour a day to read actual books and publications. One of the books I have read in the past few weeks is a printed version of "Producing Open Source Software" (ProducingOss) written by Karl Fogel. My conclusion: absolutely worth reading!

ProducingOss contains all sorts of tips, hints and best practices. Even if you are not involved in an open source project it is still useful to read. Almost everything in the book can also be applied to closed-source projects. Furthermore, it contains many pointers to other interesting literature. One of these pointers lead me to The Cathedral and the Bazaar, my current read-while-traveling-to-and-from-work book.

I intend to use several things from ProducingOSS within PHP-Sat. I will just have to think about how I can fit the project in my current schedule, but it will definitely be fitted in.

October 23, 2007 20:47

September 01, 2007

Eric Bouwers

A little migration helper

You probably already heard about it, but support for PHP version 4 is partly dropped as of 31-12-2007. And after 08-08-2008 the support ends completely.

Luckily, the PHP documentation team provides you with a set of migration guides. Going through these guides can take some time, but it enables you to upgrade your code easily to be PHP5 compatible.

To make your life even easier, the PHP-tools project is extended with the tool test-migration. This tool performs some checks that are described by the migration guide to detect whether the code can be run under version 5 of PHP. These checks include:

The first two checks are rather easy to understand, PHP5 will simply halt execution with an error when these issues are detected at runtime. Therefore, the warnings that are generated for these kind of patterns are shown with a 'serious'-level.
On the other hand, the last two checks do not find constructions that can halt execution. They detect places in which certain constructions are used. These constructions where already available within PHP version 4, but their behavior changed in version 5. To easily find these constructions they are flagged by a 'minor'-warning. More details about the changed behavior can be found here.

The first version of this tool is quit basic and performs only a few checks. If you would like to see more check included, don't hesitate to drop me an email or put them in the comments.

September 01, 2007 17:33

July 13, 2007

Karl Trygve Kalleberg

Post-Defense Redux

Finally! It's over! Never again! The defense mostly followed the specified procedure. I first had about 45 minutes to give a presentation of the results of the dissertation, then the first opponent, Neil Jones, gave a 15 minute summary putting my work into a larger context.

After his summary, he proceeded to ask several high-level questions about various parts of the dissertation. One question I liked a lot was (paraphrased): "are the axiom-based Java testing techniques you propose in your case study applicable to Stratego and would you actually use them?". All the tools and prototypes discussed in the thesis are written in Stratego, and are applied to Java, C and a toy language called TIL. However, few of the tools are actually available for Stratego itself. This is the classical story of the cobbler's children's shoes... I certainly think it would be worthwhile to do the work necessary to make some of the the tools available to Stratego as well.

Peter Mosses followed with a series of detailed questions. Clearly, Peter had read the text and figures very carefully, because some of his questions were about rather subtle issues and ambiguities in my work. There were also a few (fortunately minor) mistakes that made some of the figures more difficult to comprehend than necessary. He also nailed me on a very embarrassing definite-instead-of-indefinite article mistake. Normally, these things do not matter very much, but in this particular sentence it sort of reversed one of my main arguments in the dissertation. Whoops;)

After I'd answered their questions as best as I could, they retired to discuss whether my performance was good enough. This is mostly a formality in the current tradition, so I can't say I was very worried at that point. Once they came back, the dean proclaimed my successful completion of the degree, and we all rushed off for some (sadly delayed) champagne and cake.

I even wore a suit, and here's a picture to prove it:




Much thanks to Uwe Wolter, who was the local member of the committe and therefore the grand orchestrator of all the formalities, the formal parts went smoothly. After the defense, the stressful part of the day started: I had to collect all people's menu choices for the evening, send my family and friends shopping for the evening's party, clean the apartment and of course smear huge swaths of marzipan cake all over my suit. Thanks to very good help from my brother, his girlfriend, Håvard (my roommate) and my mother, and Tilde, we managed to get ready just in time to arrive ten minutes late for the scheduled dinner.

Magne and Peter:


Neil and Uwe:


Tilde:


Since Eelco took the pictures, he's not in any of them. Fortunately, Tilde has a few pictures of Eelco, and of the other people present. I'm still waiting for those and will upload a few once I get them (and get some green lights from the people depicted).

After dinner, we all (committee, advisors, friends and family) drove back to my apartment and Tilde whipped up drinks to all. She even tricked one into me;) I was very pleased to see that my office neighbour and student advisor, Ida Holen, found the time to show up. Also, I had friends flying in from Oslo (okay, Holmestrand) and Trondheim for the event, namely Karl Thomas, Karina and Leif Olav. Thanks guys! Hope the drinks and food was worth it;) The usual Bergen posse showed up as well, including Stig, Fay, Knute, Glenn, Espen, Tommy x 2 and Paul Simon (if I forgot somebody, ping me).


July 13, 2007 03:23

July 01, 2007

Eric Bouwers

It is getting noticed

Within the last two weeks I noticed that PHP-Front and PHP-Sat are being discovered by people that are looking for PHP-specific solutions. It is not that we are flooded with request, but I am still happy with every question :)

The first question was send to the psat-dev-mailinglist and was about the Cyclomatic complexity of PHP code. I replied that it would not get into PHP-Sat because it is not a bug-pattern. However, it would be a nice tool for the PHP-Tools project. I made a similar tool for Java because of an assignment in the past, so it is probably just a matter of renaming the Strategies to use the PHP-Front api. Unfortunately, I didn't get an answer about how the report should look like. If you have any ideas please let me know in the comments, or in the issue.

A second question was about the grammar of PHP-Front, or actually the license of this grammar. The people behind TXL have derived a PHP-grammar for TXL from the SDF-grammar in PHP-Front. Since our license does not state anything about derived work without common source, we were asked for our permission to distribute this new grammar. Naturally, this permission was given very quickly and we were also allowed to take a peak at the source. I must say that I find it interesting, but I currently do not have time to look into it all. I imagine that the definitions of the grammars and TXL itself is similar to how things work in Stratego, but I have to look into that.

The last question in this series is about defined functions. Finding out which functions are defined in a project is easy when classes are ignored, a simple grep on the project will do. When a project also includes classes it becomes trickier to get all functions defined outside of a class. The question was whether PHP-Front could help with this issue, and the answer is of course yes!
Within the reflection part of the library the list of defined functions and classes is already available. This makes it possible to write a tool to show all defined functions in just a few lines of code. Since it was also a nice tool for the PHP-Tools project I added a tool for this last Friday.

Another issue that was brought up by the last e-mail is the issue of our implementation language. Since Stratego is relatively unknown the project has a steep learning curve. On the other hand, if I had chosen a different implementation language it would have taken me way longer to implement the current features. And besides, this piece of code is not that hard to understand right?

  defined-functions-main =
include-files-complex
; get-php-environment
; get-functions
; if ?[]
then !"No functions defined."
else map(transform-to-message)
; lines
end

July 01, 2007 16:51

June 28, 2007

Eelco Visser

Abstractions for Language Independent Transformations

abstractions for language independent transformations Yesterday, Karl Trygve Kalleberg defended his PhD thesis at Bergen University. The thesis treats the subject of abstractions for program transformation, with the aim of making available transformation abstractions independent of the source language being transformed. The thesis introduces a number of extensions to the Stratego transformation language and reports on several case studies. I understand the thesis will be available soon. (More photos)

June 28, 2007 20:01

June 24, 2007

Eric Bouwers

Where you at?

Now that the propagation of safety-types seems to go smoothly it was time to dive into another subject: accessing the location of terms. In this case, the location of a term is defined as the location of the text in the original file that was parsed to that term. Since the strategy to annotate the AST with position-info is available in the standard libraries nowadays, it should be easy to access these locations and finally solve PSAT-91 right? Lets find out!

The first thing I did was to add a separate module to handle the low-level stuff of getting the location annotations. This module contains several getter-strategies that can retrieve, for example, the number of the start-line. The location info is captured in six different numbers: start-line, end-line, start-column, end-column, offset and length. A getter-strategy is available for all of them. Furthermore, the name of the file in which the term is defined can be retrieved.

Although these getter-strategies are useful, they are not meant to be called directly. I figured that the most common use of these functions would be reporting the values in some kind of (formatted) message. In order to capture this kind of behavior the strategy format-location-string(|message) is defined. This strategy takes a message with holes in the form of [STARTLINE] as parameter and fills these holes with values from the current term. A rather useful strategy if I say so myself.

To practice with this new piece of functionality I have added an extra option to the tool input-vector of the php-tools-project. This option allows the user to choose between the normal list, or the same list with line-numbers printed for each access. More information about this option and how to add an option yourself can be found here.

After this was done I moved to php-sat to make the output more concise. It was actually pretty easy to implement. The algorithm is nothing more then get-terms-with-annotations, make-nice-output. I actually spend more time on creating a test-setup for calling php-sat through a shell-script then on generating the more concise format. The only problem was that the adding of position-info everywhere interfered with the dynamic-rules. A few well-places rm-annotations where needed to fix this. Please let me know if you like the new output, or whether something should be added.

The next applications of the location info is the tracking of where untainted data enters an application. When a function is called with a parameter $foo which is tainted, it would be nice to show when it was tainted. I think this is not too difficult to add, but bugs always seem to lurk in 'I-think-it-is-easy-to-add'-features.

A last remark about locations is a small problem without an actual solution. Eventually php-sat must support function-calls. The algorithm to analyze function-calls is not complicated, but how can bug-patterns within a function be reported? A message before each call to this function? Within the file in which the function is defined? And what about cases in which one call is flagged and the other one isn't? And can we also handle object-creation in the same way? I haven't figured out how to handle this, so if you have any ideas please let me know.

June 24, 2007 19:37

Eelco Visser

Domain-Specific Language Engineering (Mark I)

There it is. The first version of the 'Domain-Specific Language Engineering' paper. It is still somewhat rough around the edges and can use more meta-level reflection. Therefore, this is 'Mark I'. I expect at least another version, and maybe two before the final one. Comments on any aspect of the work would be greatly appreciated. Here is a quote from the introduction:

In recent years there has been an increasing momentum (some call it hype) for approaches with names as domain-specific languages, model-driven architecture, software factories, language workbenches, and intentional programming. While there are differences between these approaches (mostly of a technological nature?), the common goal is to achieve a higher-level of abstraction in software development by abstracting from low-level boilerplate code. (Making domain-specific languages the approach of my choice, I'll use its terminology from now on.) The idea of domain-specific languages has been around for a long time, but what seems to be new in the current wave, is the requirement to use DSL design and implementation as a standard tool in the software development process. The challenge then is to develop a systematic method for designing new domain-specific languages.

This tutorial describes an experiment in DSL design and implementation. The experiment is simply to take a new domain (web applications), to develop a DSL (set of DSLs) for this domain, and observe the process to extract ingredients for a standard process. The target of the experiment are web applications with a rich domain model that can serve as content management system editable via the browser, but also allow querying and aggretation based on the structure of the data. The tutorial takes one particular combination of technologies. The DSL will be a textual language. The generator targets Java with a certain collection of frameworks for implementation of web applications. The DSL is implemented using Stratego/XT, SDF, and Nix.

Eelco Visser. Domain-Specific Language Engineering. A Case Study in Agile DSL Development (Mark I). Technical Report TUD-SERG-2007-017, Software Engineering Research Group, Delft University of Technology, June 2007. To appear in the proceedings of the Summer School on Generative and Transformational Techniques in Software Engineering (GTTSE'07). (pdf)

June 24, 2007 09:38

June 21, 2007

Eelco Visser

Scrap your Boilertemplate

In February I had promissed 'to report about my journey through webland and towards the GTTSE tutorial on this blog', but have failed miserably. I find I'm not a good blogger. Not that I have nothing to say. During the process of designing and implementing WebDSL I have thought about dozens of things I could blog about, but either didn't have the time, or didn't think it would be interesting for anyone but myself. I regret that; now that I have something that works it would have been interesting to see a log of the process. Not all is lost though. I am finalizing a paper for the GTTSE'07 summerschool in which I explain the design process. Here's a quote from the paper:

The boilertemplate smell is characterized by similar target coding patterns used in different templates, only large chunks of target code (a complete page type) considered as a reusable programming pattern, and limited expressivity, since adding a slightly different pattern (type of page) already requires extending the generator.

High time for some generator refactoring. The refactoring we're going to use here is called 'find an intermediate language' also known as 'scrap your boilertemplate'. In order to gain expressivity we need to better cover the variability in the application domain. While implementing the domain model DSL, we've explored the capabilities of the target platform, so by now we have a better idea how to implement variations on the CRUD theme by combining the basics of JSF and Seam in different ways. What we now need is a language that sits in between the high-level domain modeling language and the low-level details of JSF/Seam and allows us to provide more variability to application developers while still maintaining an advantage over direct programming.
In preparation of my GTTSE presenation I have given a two part lecture at the MoDSE Colloquium entitled Domain-Specific Language Engineering (part 1, part 2). At the first MoDSE workshop yesterday I gave a talk with the title A Sweet WebDSL focusing on the role of desugarings in a DSL for web applications.

June 21, 2007 09:36

June 08, 2007

Eelco Visser

Preventing Injection Attacks with Syntax Embeddings

"Software written in one language often needs to construct sentences in another language, such as SQL queries, XML output, or shell command invocations. This is almost always done using unhygienic string manipulation, the concatenation of constants and client-supplied strings. A client can then supply specially crafted input that causes the constructed sentence to be interpreted in an unintended way, leading to an injection attack. We describe a more natural style of programming that yields code that is impervious to injections by construction. Our approach embeds the grammars of the guest languages (e.g., SQL) into that of the host language (e.g., Java) and automatically generates code that maps the embedded language to constructs in the host language that reconstruct the embedded sentences, adding escaping functions where appropriate. This approach is generic, meaning that it can be applied with relative ease to any combination of host and guest languages."

The paper Preventing Injection Attacks with Syntax Embeddings, A Host and Guest Language Independent Approach by Martin Bravenboer, Eelco Dolstra, and Eelco Visser has been accepted at GPCE'07. (pdf)

June 08, 2007 09:28