Understanding the GWT Compiler
[img_assist|nid=3421|title=|desc=|link=url|url=http://www.manning.com/affiliate/idevaffiliate.php?id|align=left|width=208|height=388]The GWT compiler is the fulcrum of GWT. The entire approach GWT takes, encapsulating browser differences and compiling JavaScript from Java, is made possible by the design and architecture of the compiler. The GWT compiler compiles Java into JavaScript, but it’s important to understand that the compiler doesn’t compile Java the same way javac does. The GWT compiler is really a Java source to JavaScript source translator. The GWT compiler needs hints about the work that it must perform partly because it operates from source. These hints come in the form of the module descriptor, the marker interfaces that denote serializable types, the JavaDoc style annotations used in serializable types for collections, and more. Although these hints may sometimes seem like overkill, they’re needed because the GWT compiler will optimize your application at compile time. This doesn’t just mean compressing the JavaScript naming to the shortest possible form; it also includes pruning unused classes, and even methods and attributes, from your code. The core engineering goal of the GWT compiler is summarized succinctly: you pay for what you use. This optimization offers big advantages over other Ajax/JavaScript libraries, where a large initial download of a library may be needed even if just a few elements are used. In Java, serialization marked by the java.io.Serializable interface is handled at the bytecode level. GWT examines your code and only provides serialization for the classes where you explicitly need it. Like GWTShell, GWTCompiler supports a set of useful command-line options. They’re described in table 1: GWTCompiler [-logLevel level] [-gen dir] [-out dir] [-treeLogger] [-style style] module Table 1 GWTCompiler parameters Option Description -logLevel The logging level: ERROR, WARN, INFO, TRACE, DEBUG, SPAM, or ALL -gen The directory into which generated files will be written for review -out The directory to which output files will be written (defaults to the current directory) -treeLogger Logs output in a graphical tree view -style The script output style: OBF[uscated], PRETTY, or DETAILED(defaults to OBF) module The name of the module to compile The -gen and -out command-line options specify where generated files and the final output directory are to be, respectively. And -logLevel, as in the case of GWTShell, is used to indicate the level of logging performed during the compilation. You can even use the -treeLogger option to bring up a window to view the hierarchical logging information you would see in the shell’s console display. The GWT compiler supports several styles of output, each of use in looking at how your code is executing in the browser. JavaScript output style When working with the GWT compiler, you can use several values with the -style command-line option to control what the generated JavaScript looks like. These options are as follows: OBF - Obfuscated mode. This is a non-human-readable, compressed version suitable for production use. PRETTY - Pretty-printed JavaScript with meaningful names. DETAILED - Pretty-printed JavaScript with fully qualified names. To give you an idea of what these options mean, let’s look at examples of java.lang.StringBuffer compiled in the three different modes. First, in listing 1, is the obfuscated mode. Listing 1 StringBuffer in obfuscated compilation function A0(){this.B0();return this.js[0];}function C0(){if(this.js.length > 1){this.js = [this.js.join('')];this.length = this.js[0].length;}function D0(E0){this.js = [E0];this.length = E0.length;}function Ez(F0,a1){return F0.yx(yZ(a1));}function yB(b1){c1(b1);return b1;}function c1(d1){d1.e1('');}function zB(){}_ = zB.prototype = new f();_.yx = w0;_.vB = A0;_.B0 = C0;_.e1 = D0;_.i = 'java.lang.StringBuffer';_.j = 75;function f1(){f1 = a;g1 = new iX();h1 = new iX();return window;} Obfuscated mode is just that. This is intended to be the final compiled version of your application, which has names compressed and whitespace cleaned. Next is the pretty mode, shown in listing 2. Listing 2 StringBuffer in pretty compilation function _append2(_toAppend){ var _last = this.js.length - 1; var _lastLength = this.js[_last].length; if (this.length > _lastLength * _lastLength) { this.js[_last] = this.js[_last] + _toAppend; } else { this.js.push(_toAppend); } this.length += _toAppend.length; return this;}function _toString0(){ this._normalize(); return this.js[0];}// Some stuff omitted.function _$StringBuffer(_this$static){ _$assign(_this$static); return _this$static;}function _$assign(_this$static){ _this$static._assign0('');}function _StringBuffer(){}_ = _StringBuffer.prototype = new _Object();_._append = _append2;_._toString = _toString0;_._normalize = _normalize0;_._assign0 = _assign;_._typeName = 'java.lang.StringBuffer'; _._typeId = 75; append() becomes _append2() to avoid collision> toString() becomes _toString0()> _typeName holds name of original Java class> Pretty mode is useful for debugging as method names are somewhat preserved. However, collisions are resolved with suffixes, as the _toString0() method name shows . Last, we have the detailed mode, as displayed in listing 3. Listing 3 StringBuffer in detailed compilation function java_lang_StringBuffer_append__Ljava_lang _String_2(toAppend){ var last = this.js.length - 1; var lastLength = this.js[last].length; if (this.length > lastLength * lastLength) { this.js[last] = this.js[last] + toAppend; } else { this.js.push(toAppend); } this.length += toAppend.length; return this;}function java_lang_StringBuffer_toString__(){ this.normalize__(); return this.js[0];}function java_lang_StringBuffer_normalize__(){ if (this.js.length > 1) { this.js = [this.js.join('')]; this.length = this.js[0].length; }// . . . some stuff omittedfunction java_lang_StringBuffer(){}_ = java_lang_StringBuffer.prototype = new java_lang_Object();_.append__Ljava_lang_String_2 = java_lang_StringBuffer_append__Ljava_lang_String_2;_.toString__ = java_lang_StringBuffer_toString__;_.normalize__ = java_lang_StringBuffer_normalize__;_.assign__Ljava_lang_String_2 = java_lang_StringBuffer_assign__Ljava_lang_String_2;_.java_lang_Object_typeName = 'java.lang.StringBuffer';_.java_lang_Object_typeId = 75; Detailed mode preserves the full class name, as well as the method name #2. For overloaded methods, the signature of the method is encoded into the name, as in the case of the append() method #1. There are some important concepts to grasp about this compilation structure, especially given the way GWT interacts with native JavaScript, through the JavaScript Native Interface (JSNI), which will be discussed in the section on the compiler lifecycle. The names of your classes and methods in their JavaScript form aren’t guaranteed, even for different compilations of the same application. Use of the special syntax provided with JSNI will let you invoke known JavaScript objects from your Java code and invoke your compiled Java classes from within JavaScript; but you can’t freely invoke your JavaScript when using obfuscated style, predictably. This imposes certain limitations on your development: If you intend to expose your JavaScript API for external use, you need to create the references for calls into GWT code using JSNI registrations. You can’t rely on JavaScript naming in an object hash to give you java.lang.reflect.* type functionality, since the naming of methods isn’t reliable. Although they’re rare, you should consider potential conflicts with other JavaScript libraries you’re including in your page, especially if you’re publishing using the PRETTY setting. In addition to being aware of the available compiler output options and how they affect your application, you should also be familiar with a few other compiler nuances. Additional compiler nuances Currently, the compiler is limited to J2SE 1.4 syntactical structures. This means that exposing generics or annotations in your GWT projects can cause problems. Other options are available for many of the purposes for which you might wish to use annotations. For example, you can often use JavaDoc-style annotations, to which GWT provides its own extensions. Along with the J2SE 1.4 limitations, you also need to keep in mind the limited subset of Java classes that are supported in the GWT Java Runtime Environment (JRE) emulation library. This library is growing, and there are third-party extensions, but you need to be aware of the constructs you can use in client-side GWT code—the complete JRE you’re accustomed to isn’t available. Of course, one of the great advantages of GWT’s approach to compiling JavaScript from plain Java is that you get to leverage your existing toolbox while building your Ajax application. When you execute the hosted mode browser, you’re running regularly compiled Java classes. This, again, means you can use all the standard Java tooling—static analysis tools, debuggers, IDEs, and the like. These tools are useful for writing any code, but they become even more important in the GWT world because cleaning up your code means less transfer time to high-latency clients. Also, because JavaScript is a fairly slow execution environment, such cleanup can have a large impact on ultimate performance. The GWT compiler helps by optimizing the JavaScript it emits to include only classes and methods that are on the execution stack of your module and by using native browser functions where possible, but you should always keep the nature of JavaScript in mind. To that end, we’ll now take a closer look at the lifecycle of a GWT compilation and at how this JavaScript is generated. The compiler lifecycle When the GWT compiler runs, it goes through several stages for building the final compiled project. In these stages, the need for the GWT module definition file becomes clear. First, the compiler identifies which combinations of files need to be built. Then, it generates any client-side code using the generator metaprogramming model. Last, it produces the final output. We’ll look at each of these steps in the compiler lifecycle in more detail. dentifying build combinations One of the great strengths of GWT is that it builds specific versions of the application, each exactly targeted to what the client needs (user agent, locale, so on). This keeps the final download, and the operational overhead at the client level, very lean. A particular build combination is defined in the GWT module definition using a tag. This establishes a base set of values that are used for the build. The first and very obvious property is the user agent that the JavaScript will be built for. Listing 4 shows the core GWT UserAgent module definition. Listing 4 The GWT UserAgent definition = 6000) { return "ie6"; } } } else if (ua.indexOf("gecko") != -1) { var result = /rv:([0-9]+)\.([0-9]+)/.exec(ua); if (result && result.length == 3) { if (makeVersion(result) >= 1008) return "gecko1_8"; } return "gecko"; } return "unknown";]]> Here the tag establishes a number of different builds that the final compiler will output #1: in this case, ie6, gecko, gecko1_8, safari, and opera. This means each of these will be processed as a build of the final JavaScript that GWT emits. GWT can then switch implementations of classes based on properties using the tag in the module definition. As the GWT application starts up, the JavaScript snippet contained within the tag determines which of these implementations is used #2. This snippet is built into the startup script that determines which compiled artifact is loaded by the client. At the core of the GWT UI classes is the DOM class. This gets replaced based on the user.agent property. Listing 5 shows this definition. Listing 5 Changing the DOM implementation by UserAgent Now you can see the usefulness of this system. The basic DOM class is implemented with the same interface for each of the browsers, providing a core set of operations on which cross-platform code can easily be written. Classes replaced in this method can’t be instantiated with simple constructors but must be created using the GWT.create() method. In practice, the DOM object is a singleton exposing static methods that are called by applications, so this GWT.create() invocation is still invisible. This is an important point to remember if you want to provide alternative implementations based on compile-time settings in your application. You can also define your own properties and property providers for switching implementations. We have found that doing this for different runtime settings can be useful. For example, we have defined debug, test, and production settings, and replacing some functionality in the application based on this property can help smooth development in certain cases. This technique of identifying build combinations and then spinning off into specific implementations during the compile process is known in GWT terms as deferred binding. The GWT documentation sums this approach up as “the Google Web Toolkit answer to Java reflection.” Dynamic loading of classes (dynamic binding) isn’t truly available in a JavaScript environment, so GWT provides another way. For example, obj.getClass().getName() isn’t available, but GWT.getTypeName(obj) is. The same is true for Class.forName("MyClass"), which has GWT.create(MyClass) as a counterpart. By using deferred binding, the GWT compiler can figure out every possible variation, or axis, for every type and feature needed at compile time. Then, at runtime, the correct permutation for the context in use can be downloaded and run. Remember, though, that each axis you add becomes a combinatory compile. If you use 4 languages and 4 browser versions, you must compile 16 final versions of the application; and if you use several runtime settings, you end up with many more combinations in the mix. This concept is depicted in figure 1. Figure 1 Multiple versions of a GWT application are created by the compiler for each axis or variant application property, such as user agent and locale. Compiling a new monolithic version of your application for each axis doesn’t affect your end users negatively. Rather, this technique allows each user to download only the exact application version they need, without taking any unused portion along for the ride. This is beneficial for users, but it slows compile time considerably, and it can be a painful point for developers. Reducing the compile variants to speed up compile time Even though GWT compile time can be long, keep in mind that the end result for users is well optimized. Also, the GWT module system allows you to tweak the compile time variants for the situation. During day-to-day development, you may want to use the tag in your module definition to confine the compile to a single language, or single browser version, to speed up the compile step. Another important use for module properties is in code generation, which is the next step of the compilation process. Generating code GWT’s compiler includes a code generation or metaprogramming facility that allows you to generate code based on module properties at compile time. Perhaps the best example is the internationalization support. The i18n module defines several no-method interfaces that you extend to define Constants, Messages (which include in-text replacement), or Dictionary classes (which extract values from the HTML host page). The implementations of each of these classes are built at compile time using the code generation facility, producing a lean, custom version of your application in each language. The i18n module does this through the tag, which lets you add additional iterative values to a property in a module. Listing 6 demonstrates the use of this concept to add French and Italian support to a GWT application. Listing 6 Defining French and Italian using extend-property When an application inherits the i18n module, the GWT compiler searches for interfaces that extend one of the i18n classes and generates an implementation for the class based on a resource bundle matching the language code. This is accomplished via the tag in the i18n module definition. Listing 7 shows this along with the tag, which is used for establishing which language will be needed at runtime. Listing 7 The i18n module’s locale property declarations = 0) { var language = args.substring(startLang); var begin = language.indexOf("=") + 1; var end = language.indexOf("&"); if (end == -1) { end = language.length; } locale = language.substring(begin, end); } } if (locale == null) { // Look for the locale on the web page locale = __gwt_getMetaProperty("locale") } if (locale == null) { return "default"; } while (!__gwt_isKnownPropertyValue("locale", locale)) { var lastIndex = locale.lastIndexOf("_"); if (lastIndex == -1) { locale = "default"; break; } else { locale = locale.substring(0,lastIndex); } } return locale; } catch(e) { alert("Unexpected exception in locale "+ "detection, using default: " + e); return "default"; } ]]> The module first establishes the locale property. This is the property we extended in listing 6 to include it and fr. Next, the property provider is defined . The value is checked first as a parameter on the request URL in the format locale=xx and then as a tag on the host page in the format ; finally, it defaults to default. The last step is to define a generator class. Here it tells the compiler to generate implementations of all classes that extend or implement Localizable with the LocalizableGenerator . This class writes out Java files that implement the appropriate interfaces for each of the user’s defined Constants, Messages, or Dictionary classes. Notice that, to this point, nowhere have we dealt specifically with JavaScript outside of small snippets in the modules. GWT will produce the JavaScript in the final step. Producing output You can do a great deal from Java with the GWT module system, but you may need to get down to JavaScript-level implementations at some point if you wish to integrate existing JavaScript or extend or add lower-level components. For this, GWT includes JSNI. This is a special syntax that allows you to define method implementations on a Java class using JavaScript. Listing 8 shows a simple JSNI method. Listing 8 A simple JSNI method public class Alert { public Alert() { super(); } public native void alert(String message) /*-{ alert(message); }-*/; } In this listing, we first define the alert() method using the native keyword. Much like the Java Native Interface (JNI) in regular Java, this indicates that we are going to use a native implementation of this method. Unlike JNI, the implementation lives right in the Java file, surrounded by a special /*- -*/ comment syntax. To reference Java code from JavaScript implementations, syntax similar to JNI is provided. Figure 2 shows the structure of calls back into Java from JavaScript. Figure 2 The structure of JSNI call syntax GWT reuses the JNI typing system to reference Java types. The final compiled output will have synthetic names for methods and classes, so the use of this syntax is important to ensure that GWT knows how to direct a call from JavaScript. In the final step of the compilation process, GWT takes all the Java files, whether provided or generated, and the JSNI method implementations, and examines the call tree, pruning unused methods and attributes. Then it transforms all of this into a number of unique JavaScript files, targeted very specifically at each needed axis. This minimizes both code download time and execution time in the client. Although this complex compilation process can be time consuming for the developer, it ensures that the end user’s experience is the best it can possibly be for the application that is being built. This article is excerpted from Chapter 1 of GWT in Practice, by Robert Cooper and Charlie Collins, and published in May 2008 by Manning Publications.
June 17, 2008
by Schalk Neethling
·
125,327 Views