On Github melix / s2gx-deepdive-groovy-compiler
by Cédric Champeau (@CedricChampeau)
speaker { name 'Cédric Champeau' company 'Gradle Inc' oss 'Apache Groovy committer', successes (['Static type checker', 'Static compilation', 'Traits', 'Markup template engine', 'DSLs']) failures Stream.of(bugs), twitter '@CedricChampeau', github 'melix', extraDescription '''Groovy in Action 2 co-author Misc OSS contribs (Gradle plugins, deck2pdf, jlangdetect, ...)''' }
Special coupon code: ctwspringo2gx
Interpreted vs compiled
Scripts vs classes
Parsing
Abstract Syntax Trees
Resolving
Run-time vs compile-time
Static type checking
Bytecode generation
Class loading
Groovy is a dynamic language
Dynamic != interpreted
Interpreted == a runtime interprets an AST
JVM is an interpreter + a JIT
Groovy compiles down to JVM bytecode
public class Greeter { public static void main(String... args) { System.out.println("Hello, "+args[0]); } }
println "Hello, $args[0]"
Classes are compiled to bytecode
Scripts are also compiled to bytecode
So it’s more a run-time vs compile-time discussion!
Given a set of source files
Compile them
Output is bytecode
cacheable (library, jar file, …)
loadable by the runtime (classloader)
Same as compile-time but…
done during execution of the program!
Groovy does both
consequences on packaging
consequences on the size of the runtime
A runtime provides support libraries to execute a compiled program
Run-time is what happens at run time
Groovy has a runtime
Java also (the JRE, providing core classes)
Groovy has 9 compilation phases (see org.codehaus.groovy.control.CompilePhase)
initialization
parsing
conversion
semantic analysis
canonicalization
instruction selection
class generation
output
finalization
Converts source code (text) into a concrete syntax tree (CST)
Where we send syntax errors
Groovy tries to minimize the errors at that phase
We make use of Antlr 2
Migration to Antlr 4 in progress
See org.codehaus.groovy.antlr.AntlrParserPlugin
Limited transformations available (and not recommended)
Converts a CST into an Abstract Syntax Tree
AST nodes are what the other compilation phases rely on
There’s already semantic information in an AST
Earliest phase an AST transformation can hook into
2 categories
statements (IfStatement, BlockStatement, …)
expressions (ConstantExpression, MethodCallExpression, …)
Know your AST!
particularily useful if you plan on writing AST transformations
println "Hello, $args[0]"
typically where an interpreter would step in
at the core of the Groovy compiler
AST classes live in org.codehaus.groovy.ast
Still somehow runtime agnostic
In practice, ClassNode already bridges to java.lang.Class
Start of visitor pattern
computation intensive phase
resolves class literals (symbols in AST, imports, …)
resolves static imports (constants, methods)
computes the scope of parameters and local variables
checks static scope vs instance scope
updates the AST of inner classes
collects AST transformations information
High price in compilation time
When we see Foo, need to:
check if Foo is something on classpath
check if Foo is another class being compiled (or script)
Must avoid class initialization
Finalizes the AST with information deduced from the semantic analysis
Completes generation of AST of inner classes
Completes enumerations with calls to super
Weaves trait aspects into classes implementing traits
Usually last chance to hook an AST transformation
Formely used to select the instruction set (java version, …)
(Optional) Type checking
Post-type checking trait corrections
(optional) static compiler specific AST transformations
in short: all AST operations that need to be done just before generating bytecode
Converts an AST into bytecode
Makes use of the ASM library
we’ll get back to it…
(optional) write the generated bytecode into a file
supposed to perform cleanup tasks
Unused today!
CompilationUnit is responsible for the compile phases lifecycle
processes a set of SourceUnit
a SourceUnit represents a single source file (or script)
a CompileUnit gathers all ASTs of a compilation unit in a single place
typically used for resolution
all source units are processed phase by phase
User code that hooks into the compiler
Allows transforming the AST during compilation
A transform runs at a specific phases
a best, conversion
usually, semantic analysis
no later than canonicalization
If you do it later… all bets are off!
Groovy comes with several AST xforms
some features of the compiler are implemented as AST xforms
traits
static type checking
Implemented (mostly) as an AST transformation
Annotates AST nodes with metadata
Flow typing
Must be done very last in compiler phases
INSTRUCTION_SELECTION
Traits are a superior replacement to mixins
Built-in since Groovy 2.3
How are they compiled?
trait HasName { String name } class NamedObject implements HasName {}
Converts a trait into "JVM compatible" objets
interface HasName { void setName(String name) String getName(String name) static class HasName$Trait$Helper { public static void $init$(HasName $self) { } public static void $static$init$(java.lang.Class<HasName> $static$self) { } public static String getName(HasName $self) { ((HasName$Trait$FieldHelper) ($self)).HasName__name$get() } public static void setName(HasName $self, String value) { ((HasName$Trait$FieldHelper) ($self)).HasName__name$set(value) } } static interface HasName$Trait$FieldHelper { final public static String $ins$1HasName__name String HasName__name$set(String val) String HasName__name$get() } }
At canonicalization:
class NamedObject implements HasName, HasName$Trait$FieldHelper { static { HasName$Trait$Helper.$static$init$(NamedObject) } private String HasName__name @groovy.transform.CompileStatic public String HasName__name$get() { return HasName__name } @groovy.transform.CompileStatic public String HasName__name$set(String val) { HasName__name = val } @Traits$TraitBridge(traitClass = HasName, desc = '(Ljava/lang/String;)V') public void setName(String arg1) { HasName$Trait$Helper.setName(this, arg1) } @Traits$TraitBridge(traitClass = HasName, desc = '()Ljava/lang/String;') public String getName() { HasName$Trait$Helper.getName(this) } public String HasNametrait$super$getName() { if ( this instanceof GeneratedGroovyProxy) { (String) (InvokerHelper.invokeMethod((((GeneratedProxy) this)).getProxyTarget(), 'getName', new Object[])) } else { super.getName() } } public void HasNametrait$super$setName(String value) { if ( this instanceof GeneratedGroovyProxy) { InvokerHelper.invokeMethod((((GeneratedProxy) this)).getProxyTarget(), 'setName', new Object[]) } else { super.setName(value) } } }
Groovy targets the JVM
Android is supported by post-processing bytecode (dex)
Bytecode generation library: ASM
3 different backends
legacy
invokedynamic
static compilation
ASM is a low level API
Groovy uses a higher level API
AsmCodeGenerator : entry point, visitor pattern for the Groovy AST
writers: WriterController, BinaryExpressionWriter, InvocationWriter, … map ASTs to ASM patterns
helpers: BytecodeHelper, CompileStack, OperandStack simplify the generation of bytecode
Dedicated writer versions
CallSiteWriter → StaticTypesCallSiteWriter
Optimized paths
Primitive optimizations
Static compilation
Static compiler can delegate to a dynamic writer
int sum(int... values) { values.sum() }
groovyc example.groovy
javap -v example.class
0: invokestatic #17 // Method $getCallSiteArray:()[Lorg/codehaus/groovy/runtime/callsite/CallSite; 3: astore_2 4: aload_2 5: ldc #42 // int 1 7: aaload 8: aload_1 9: invokeinterface #45, 2 // InterfaceMethod org/codehaus/groovy/runtime/callsite/CallSite.call:(Ljava/lang/Object;)Ljava/lang/Object; 14: invokestatic #51 // Method org/codehaus/groovy/runtime/typehandling/DefaultTypeTransformation.intUnbox:(Ljava/lang/Object;)I 17: ireturn
groovyc --indy example.groovy
0: aload_1 1: invokedynamic #50, 0 // InvokeDynamic #1:invoke:([I)Ljava/lang/Object; 6: invokestatic #56 // Method org/codehaus/groovy/runtime/typehandling/DefaultTypeTransformation.intUnbox:(Ljava/lang/Object;)I 9: ireturn
groovyc --configscript config.groovy example.groovy
0: aload_1 1: invokestatic #38 // Method org/codehaus/groovy/runtime/DefaultGroovyMethods.sum:([I)I 4: ireturn
int run(int i) { _new 'java/lang/Integer' dup iload 1 invokespecial 'java/lang/Integer.<init>','(I)V' invokevirtual 'java/lang/Integer.intValue','()I' ireturn }
An AST transformation is applied (@Bytecode)
Transforms "bytecode-like" method calls into actual ASM method calls
So allows writing "bytecode" directly as method body
Very useful for learning purposes
Limited to method bodies
Bytecode → byte[]
Still have to load that code
For precompiled classes, can be done by any classloader
GroovyClassLoader
supports generation of classes at runtime
will cache the generated classes
Special classloader that reverses the logic of parent vs child
Used to implement different classpath
Mutable
Used only on the legacy dynamic runtime
Loads call site classes
Call site class: dynamically generated classes which avoid use of reflection
Slides and code : https://github.com/melix/s2gx-deepdive-groovy-compiler
Groovy documentation : http://groovy-lang.org/documentation.html
Follow me: @CedricChampeau