1*0c56280aSSorin Basca<?xml version="1.0"?> 2*0c56280aSSorin Basca<!-- 3*0c56280aSSorin Basca * Licensed to the Apache Software Foundation (ASF) under one 4*0c56280aSSorin Basca * or more contributor license agreements. See the NOTICE file 5*0c56280aSSorin Basca * distributed with this work for additional information 6*0c56280aSSorin Basca * regarding copyright ownership. The ASF licenses this file 7*0c56280aSSorin Basca * to you under the Apache License, Version 2.0 (the 8*0c56280aSSorin Basca * "License"); you may not use this file except in compliance 9*0c56280aSSorin Basca * with the License. You may obtain a copy of the License at 10*0c56280aSSorin Basca * 11*0c56280aSSorin Basca * http://www.apache.org/licenses/LICENSE-2.0 12*0c56280aSSorin Basca * 13*0c56280aSSorin Basca * Unless required by applicable law or agreed to in writing, 14*0c56280aSSorin Basca * software distributed under the License is distributed on an 15*0c56280aSSorin Basca * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY 16*0c56280aSSorin Basca * KIND, either express or implied. See the License for the 17*0c56280aSSorin Basca * specific language governing permissions and limitations 18*0c56280aSSorin Basca * under the License. 19*0c56280aSSorin Basca--> 20*0c56280aSSorin Basca<document> 21*0c56280aSSorin Basca <properties> 22*0c56280aSSorin Basca <title>The BCEL API</title> 23*0c56280aSSorin Basca </properties> 24*0c56280aSSorin Basca 25*0c56280aSSorin Basca <body> 26*0c56280aSSorin Basca <section name="The BCEL API"> 27*0c56280aSSorin Basca <p> 28*0c56280aSSorin Basca The <font face="helvetica,arial">BCEL</font> API abstracts from 29*0c56280aSSorin Basca the concrete circumstances of the Java Virtual Machine and how to 30*0c56280aSSorin Basca read and write binary Java class files. The API mainly consists 31*0c56280aSSorin Basca of three parts: 32*0c56280aSSorin Basca </p> 33*0c56280aSSorin Basca 34*0c56280aSSorin Basca <p> 35*0c56280aSSorin Basca 36*0c56280aSSorin Basca <ol type="1"> 37*0c56280aSSorin Basca <li> A package that contains classes that describe "static" 38*0c56280aSSorin Basca constraints of class files, i.e., reflects the class file format and 39*0c56280aSSorin Basca is not intended for byte code modifications. The classes may be 40*0c56280aSSorin Basca used to read and write class files from or to a file. This is 41*0c56280aSSorin Basca useful especially for analyzing Java classes without having the 42*0c56280aSSorin Basca source files at hand. The main data structure is called 43*0c56280aSSorin Basca <tt>JavaClass</tt> which contains methods, fields, etc..</li> 44*0c56280aSSorin Basca 45*0c56280aSSorin Basca <li> A package to dynamically generate or modify 46*0c56280aSSorin Basca <tt>JavaClass</tt> or <tt>Method</tt> objects. It may be used to 47*0c56280aSSorin Basca insert analysis code, to strip unnecessary information from class 48*0c56280aSSorin Basca files, or to implement the code generator back-end of a Java 49*0c56280aSSorin Basca compiler.</li> 50*0c56280aSSorin Basca 51*0c56280aSSorin Basca <li> Various code examples and utilities like a class file viewer, 52*0c56280aSSorin Basca a tool to convert class files into HTML, and a converter from 53*0c56280aSSorin Basca class files to the <a 54*0c56280aSSorin Basca href="http://jasmin.sourceforge.net">Jasmin</a> assembly 55*0c56280aSSorin Basca language.</li> 56*0c56280aSSorin Basca </ol> 57*0c56280aSSorin Basca </p> 58*0c56280aSSorin Basca 59*0c56280aSSorin Basca <subsection name="JavaClass"> 60*0c56280aSSorin Basca <p> 61*0c56280aSSorin Basca The "static" component of the <font 62*0c56280aSSorin Basca face="helvetica,arial">BCEL</font> API resides in the package 63*0c56280aSSorin Basca <tt>org.apache.bcel.classfile</tt> and closely represents class 64*0c56280aSSorin Basca files. All of the binary components and data structures declared 65*0c56280aSSorin Basca in the <a 66*0c56280aSSorin Basca href="http://docs.oracle.com/javase/specs/">JVM 67*0c56280aSSorin Basca specification</a> and described in section <a 68*0c56280aSSorin Basca href="#2 The Java Virtual Machine">2</a> are mapped to classes. 69*0c56280aSSorin Basca 70*0c56280aSSorin Basca <a href="#Figure 3">Figure 3</a> shows an UML diagram of the 71*0c56280aSSorin Basca hierarchy of classes of the <font face="helvetica,arial">BCEL 72*0c56280aSSorin Basca </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also 73*0c56280aSSorin Basca shows a detailed diagram of the <tt>ConstantPool</tt> components. 74*0c56280aSSorin Basca </p> 75*0c56280aSSorin Basca 76*0c56280aSSorin Basca <p align="center"> 77*0c56280aSSorin Basca <a name="Figure 3"> 78*0c56280aSSorin Basca <img src="../images/javaclass.gif"/> <br/> 79*0c56280aSSorin Basca Figure 3: UML diagram for the JavaClass API</a> 80*0c56280aSSorin Basca </p> 81*0c56280aSSorin Basca 82*0c56280aSSorin Basca <p> 83*0c56280aSSorin Basca The top-level data structure is <tt>JavaClass</tt>, which in most 84*0c56280aSSorin Basca cases is created by a <tt>ClassParser</tt> object that is capable 85*0c56280aSSorin Basca of parsing binary class files. A <tt>JavaClass</tt> object 86*0c56280aSSorin Basca basically consists of fields, methods, symbolic references to the 87*0c56280aSSorin Basca super class and to the implemented interfaces. 88*0c56280aSSorin Basca </p> 89*0c56280aSSorin Basca 90*0c56280aSSorin Basca <p> 91*0c56280aSSorin Basca The constant pool serves as some kind of central repository and is 92*0c56280aSSorin Basca thus of outstanding importance for all components. 93*0c56280aSSorin Basca <tt>ConstantPool</tt> objects contain an array of fixed size of 94*0c56280aSSorin Basca <tt>Constant</tt> entries, which may be retrieved via the 95*0c56280aSSorin Basca <tt>getConstant()</tt> method taking an integer index as argument. 96*0c56280aSSorin Basca Indexes to the constant pool may be contained in instructions as 97*0c56280aSSorin Basca well as in other components of a class file and in constant pool 98*0c56280aSSorin Basca entries themselves. 99*0c56280aSSorin Basca </p> 100*0c56280aSSorin Basca 101*0c56280aSSorin Basca <p> 102*0c56280aSSorin Basca Methods and fields contain a signature, symbolically defining 103*0c56280aSSorin Basca their types. Access flags like <tt>public static final</tt> occur 104*0c56280aSSorin Basca in several places and are encoded by an integer bit mask, e.g., 105*0c56280aSSorin Basca <tt>public static final</tt> matches to the Java expression 106*0c56280aSSorin Basca </p> 107*0c56280aSSorin Basca 108*0c56280aSSorin Basca 109*0c56280aSSorin Basca <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source> 110*0c56280aSSorin Basca 111*0c56280aSSorin Basca <p> 112*0c56280aSSorin Basca As mentioned in <a href="jvm.html#Java_class_file_format">section 113*0c56280aSSorin Basca 2.1</a> already, several components may contain <em>attribute</em> 114*0c56280aSSorin Basca objects: classes, fields, methods, and <tt>Code</tt> objects 115*0c56280aSSorin Basca (introduced in <a href="jvm.html#Method_code">section 2.3</a>). The 116*0c56280aSSorin Basca latter is an attribute itself that contains the actual byte code 117*0c56280aSSorin Basca array, the maximum stack size, the number of local variables, a 118*0c56280aSSorin Basca table of handled exceptions, and some optional debugging 119*0c56280aSSorin Basca information coded as <tt>LineNumberTable</tt> and 120*0c56280aSSorin Basca <tt>LocalVariableTable</tt> attributes. Attributes are in general 121*0c56280aSSorin Basca specific to some data structure, i.e., no two components share the 122*0c56280aSSorin Basca same kind of attribute, though this is not explicitly 123*0c56280aSSorin Basca forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped 124*0c56280aSSorin Basca with the component they belong to. 125*0c56280aSSorin Basca </p> 126*0c56280aSSorin Basca 127*0c56280aSSorin Basca </subsection> 128*0c56280aSSorin Basca 129*0c56280aSSorin Basca <subsection name="Class repository"> 130*0c56280aSSorin Basca <p> 131*0c56280aSSorin Basca Using the provided <tt>Repository</tt> class, reading class files into 132*0c56280aSSorin Basca a <tt>JavaClass</tt> object is quite simple: 133*0c56280aSSorin Basca </p> 134*0c56280aSSorin Basca 135*0c56280aSSorin Basca <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source> 136*0c56280aSSorin Basca 137*0c56280aSSorin Basca <p> 138*0c56280aSSorin Basca The repository also contains methods providing the dynamic equivalent 139*0c56280aSSorin Basca of the <tt>instanceof</tt> operator, and other useful routines: 140*0c56280aSSorin Basca </p> 141*0c56280aSSorin Basca 142*0c56280aSSorin Basca <source> 143*0c56280aSSorin Bascaif (Repository.instanceOf(clazz, super_class)) { 144*0c56280aSSorin Basca ... 145*0c56280aSSorin Basca} 146*0c56280aSSorin Basca </source> 147*0c56280aSSorin Basca 148*0c56280aSSorin Basca </subsection> 149*0c56280aSSorin Basca 150*0c56280aSSorin Basca <h4>Accessing class file data</h4> 151*0c56280aSSorin Basca 152*0c56280aSSorin Basca <p> 153*0c56280aSSorin Basca Information within the class file components may be accessed like 154*0c56280aSSorin Basca Java Beans via intuitive set/get methods. All of them also define 155*0c56280aSSorin Basca a <tt>toString()</tt> method so that implementing a simple class 156*0c56280aSSorin Basca viewer is very easy. In fact all of the examples used here have 157*0c56280aSSorin Basca been produced this way: 158*0c56280aSSorin Basca </p> 159*0c56280aSSorin Basca 160*0c56280aSSorin Basca <source> 161*0c56280aSSorin BascaSystem.out.println(clazz); 162*0c56280aSSorin BascaprintCode(clazz.getMethods()); 163*0c56280aSSorin Basca... 164*0c56280aSSorin Bascapublic static void printCode(Method[] methods) { 165*0c56280aSSorin Basca for (int i = 0; i < methods.length; i++) { 166*0c56280aSSorin Basca System.out.println(methods[i]); 167*0c56280aSSorin Basca 168*0c56280aSSorin Basca Code code = methods[i].getCode(); 169*0c56280aSSorin Basca if (code != null) // Non-abstract method 170*0c56280aSSorin Basca System.out.println(code); 171*0c56280aSSorin Basca } 172*0c56280aSSorin Basca} 173*0c56280aSSorin Basca </source> 174*0c56280aSSorin Basca 175*0c56280aSSorin Basca <h4>Analyzing class data</h4> 176*0c56280aSSorin Basca <p> 177*0c56280aSSorin Basca Last but not least, <font face="helvetica,arial">BCEL</font> 178*0c56280aSSorin Basca supports the <em>Visitor</em> design pattern, so one can write 179*0c56280aSSorin Basca visitor objects to traverse and analyze the contents of a class 180*0c56280aSSorin Basca file. Included in the distribution is a class 181*0c56280aSSorin Basca <tt>JasminVisitor</tt> that converts class files into the <a 182*0c56280aSSorin Basca href="http://jasmin.sourceforge.net">Jasmin</a> 183*0c56280aSSorin Basca assembler language. 184*0c56280aSSorin Basca </p> 185*0c56280aSSorin Basca 186*0c56280aSSorin Basca <subsection name="ClassGen"> 187*0c56280aSSorin Basca <p> 188*0c56280aSSorin Basca This part of the API (package <tt>org.apache.bcel.generic</tt>) 189*0c56280aSSorin Basca supplies an abstraction level for creating or transforming class 190*0c56280aSSorin Basca files dynamically. It makes the static constraints of Java class 191*0c56280aSSorin Basca files like the hard-coded byte code addresses "generic". The 192*0c56280aSSorin Basca generic constant pool, for example, is implemented by the class 193*0c56280aSSorin Basca <tt>ConstantPoolGen</tt> which offers methods for adding different 194*0c56280aSSorin Basca types of constants. Accordingly, <tt>ClassGen</tt> offers an 195*0c56280aSSorin Basca interface to add methods, fields, and attributes. 196*0c56280aSSorin Basca <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API. 197*0c56280aSSorin Basca </p> 198*0c56280aSSorin Basca 199*0c56280aSSorin Basca <p align="center"> 200*0c56280aSSorin Basca <a name="Figure 4"> 201*0c56280aSSorin Basca <img src="../images/classgen.gif"/> 202*0c56280aSSorin Basca <br/> 203*0c56280aSSorin Basca Figure 4: UML diagram of the ClassGen API</a> 204*0c56280aSSorin Basca </p> 205*0c56280aSSorin Basca 206*0c56280aSSorin Basca <h4>Types</h4> 207*0c56280aSSorin Basca <p> 208*0c56280aSSorin Basca We abstract from the concrete details of the type signature syntax 209*0c56280aSSorin Basca (see <a href="jvm.html#Type_information">2.5</a>) by introducing the 210*0c56280aSSorin Basca <tt>Type</tt> class, which is used, for example, by methods to 211*0c56280aSSorin Basca define their return and argument types. Concrete sub-classes are 212*0c56280aSSorin Basca <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt> 213*0c56280aSSorin Basca which consists of the element type and the number of 214*0c56280aSSorin Basca dimensions. For commonly used types the class offers some 215*0c56280aSSorin Basca predefined constants. For example, the method signature of the 216*0c56280aSSorin Basca <tt>main</tt> method as shown in 217*0c56280aSSorin Basca <a href="jvm.html#Type_information">section 2.5</a> is represented by: 218*0c56280aSSorin Basca </p> 219*0c56280aSSorin Basca 220*0c56280aSSorin Basca <source> 221*0c56280aSSorin BascaType return_type = Type.VOID; 222*0c56280aSSorin BascaType[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) }; 223*0c56280aSSorin Basca </source> 224*0c56280aSSorin Basca 225*0c56280aSSorin Basca <p> 226*0c56280aSSorin Basca <tt>Type</tt> also contains methods to convert types into textual 227*0c56280aSSorin Basca signatures and vice versa. The sub-classes contain implementations 228*0c56280aSSorin Basca of the routines and constraints specified by the Java Language 229*0c56280aSSorin Basca Specification. 230*0c56280aSSorin Basca </p> 231*0c56280aSSorin Basca 232*0c56280aSSorin Basca <h4>Generic fields and methods</h4> 233*0c56280aSSorin Basca <p> 234*0c56280aSSorin Basca Fields are represented by <tt>FieldGen</tt> objects, which may be 235*0c56280aSSorin Basca freely modified by the user. If they have the access rights 236*0c56280aSSorin Basca <tt>static final</tt>, i.e., are constants and of basic type, they 237*0c56280aSSorin Basca may optionally have an initializing value. 238*0c56280aSSorin Basca </p> 239*0c56280aSSorin Basca 240*0c56280aSSorin Basca <p> 241*0c56280aSSorin Basca Generic methods contain methods to add exceptions the method may 242*0c56280aSSorin Basca throw, local variables, and exception handlers. The latter two are 243*0c56280aSSorin Basca represented by user-configurable objects as well. Because 244*0c56280aSSorin Basca exception handlers and local variables contain references to byte 245*0c56280aSSorin Basca code addresses, they also take the role of an <em>instruction 246*0c56280aSSorin Basca targeter</em> in our terminology. Instruction targeters contain a 247*0c56280aSSorin Basca method <tt>updateTarget()</tt> to redirect a reference. This is 248*0c56280aSSorin Basca somewhat related to the Observer design pattern. Generic 249*0c56280aSSorin Basca (non-abstract) methods refer to <em>instruction lists</em> that 250*0c56280aSSorin Basca consist of instruction objects. References to byte code addresses 251*0c56280aSSorin Basca are implemented by handles to instruction objects. If the list is 252*0c56280aSSorin Basca updated the instruction targeters will be informed about it. This 253*0c56280aSSorin Basca is explained in more detail in the following sections. 254*0c56280aSSorin Basca </p> 255*0c56280aSSorin Basca 256*0c56280aSSorin Basca <p> 257*0c56280aSSorin Basca The maximum stack size needed by the method and the maximum number 258*0c56280aSSorin Basca of local variables used may be set manually or computed via the 259*0c56280aSSorin Basca <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods 260*0c56280aSSorin Basca automatically. 261*0c56280aSSorin Basca </p> 262*0c56280aSSorin Basca 263*0c56280aSSorin Basca <h4>Instructions</h4> 264*0c56280aSSorin Basca <p> 265*0c56280aSSorin Basca Modeling instructions as objects may look somewhat odd at first 266*0c56280aSSorin Basca sight, but in fact enables programmers to obtain a high-level view 267*0c56280aSSorin Basca upon control flow without handling details like concrete byte code 268*0c56280aSSorin Basca offsets. Instructions consist of an opcode (sometimes called 269*0c56280aSSorin Basca tag), their length in bytes and an offset (or index) within the 270*0c56280aSSorin Basca byte code. Since many instructions are immutable (stack operators, 271*0c56280aSSorin Basca e.g.), the <tt>InstructionConstants</tt> interface offers 272*0c56280aSSorin Basca shareable predefined "fly-weight" constants to use. 273*0c56280aSSorin Basca </p> 274*0c56280aSSorin Basca 275*0c56280aSSorin Basca <p> 276*0c56280aSSorin Basca Instructions are grouped via sub-classing, the type hierarchy of 277*0c56280aSSorin Basca instruction classes is illustrated by (incomplete) figure in the 278*0c56280aSSorin Basca appendix. The most important family of instructions are the 279*0c56280aSSorin Basca <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to 280*0c56280aSSorin Basca targets somewhere within the byte code. Obviously, this makes them 281*0c56280aSSorin Basca candidates for playing an <tt>InstructionTargeter</tt> role, 282*0c56280aSSorin Basca too. Instructions are further grouped by the interfaces they 283*0c56280aSSorin Basca implement, there are, e.g., <tt>TypedInstruction</tt>s that are 284*0c56280aSSorin Basca associated with a specific type like <tt>ldc</tt>, or 285*0c56280aSSorin Basca <tt>ExceptionThrower</tt> instructions that may raise exceptions 286*0c56280aSSorin Basca when executed. 287*0c56280aSSorin Basca </p> 288*0c56280aSSorin Basca 289*0c56280aSSorin Basca <p> 290*0c56280aSSorin Basca All instructions can be traversed via <tt>accept(Visitor v)</tt> 291*0c56280aSSorin Basca methods, i.e., the Visitor design pattern. There is however some 292*0c56280aSSorin Basca special trick in these methods that allows to merge the handling 293*0c56280aSSorin Basca of certain instruction groups. The <tt>accept()</tt> do not only 294*0c56280aSSorin Basca call the corresponding <tt>visit()</tt> method, but call 295*0c56280aSSorin Basca <tt>visit()</tt> methods of their respective super classes and 296*0c56280aSSorin Basca implemented interfaces first, i.e., the most specific 297*0c56280aSSorin Basca <tt>visit()</tt> call is last. Thus one can group the handling of, 298*0c56280aSSorin Basca say, all <tt>BranchInstruction</tt>s into one single method. 299*0c56280aSSorin Basca </p> 300*0c56280aSSorin Basca 301*0c56280aSSorin Basca <p> 302*0c56280aSSorin Basca For debugging purposes it may even make sense to "invent" your own 303*0c56280aSSorin Basca instructions. In a sophisticated code generator like the one used 304*0c56280aSSorin Basca as a backend of the <a href="http://barat.sourceforge.net">Barat 305*0c56280aSSorin Basca framework</a> for static analysis one often has to insert 306*0c56280aSSorin Basca temporary <tt>nop</tt> (No operation) instructions. When examining 307*0c56280aSSorin Basca the produced code it may be very difficult to track back where the 308*0c56280aSSorin Basca <tt>nop</tt> was actually inserted. One could think of a derived 309*0c56280aSSorin Basca <tt>nop2</tt> instruction that contains additional debugging 310*0c56280aSSorin Basca information. When the instruction list is dumped to byte code, the 311*0c56280aSSorin Basca extra data is simply dropped. 312*0c56280aSSorin Basca </p> 313*0c56280aSSorin Basca 314*0c56280aSSorin Basca <p> 315*0c56280aSSorin Basca One could also think of new byte code instructions operating on 316*0c56280aSSorin Basca complex numbers that are replaced by normal byte code upon 317*0c56280aSSorin Basca load-time or are recognized by a new JVM. 318*0c56280aSSorin Basca </p> 319*0c56280aSSorin Basca 320*0c56280aSSorin Basca <h4>Instruction lists</h4> 321*0c56280aSSorin Basca <p> 322*0c56280aSSorin Basca An <em>instruction list</em> is implemented by a list of 323*0c56280aSSorin Basca <em>instruction handles</em> encapsulating instruction objects. 324*0c56280aSSorin Basca References to instructions in the list are thus not implemented by 325*0c56280aSSorin Basca direct pointers to instructions but by pointers to instruction 326*0c56280aSSorin Basca <em>handles</em>. This makes appending, inserting and deleting 327*0c56280aSSorin Basca areas of code very simple and also allows us to reuse immutable 328*0c56280aSSorin Basca instruction objects (fly-weight objects). Since we use symbolic 329*0c56280aSSorin Basca references, computation of concrete byte code offsets does not 330*0c56280aSSorin Basca need to occur until finalization, i.e., until the user has 331*0c56280aSSorin Basca finished the process of generating or transforming code. We will 332*0c56280aSSorin Basca use the term instruction handle and instruction synonymously 333*0c56280aSSorin Basca throughout the rest of the paper. Instruction handles may contain 334*0c56280aSSorin Basca additional user-defined data using the <tt>addAttribute()</tt> 335*0c56280aSSorin Basca method. 336*0c56280aSSorin Basca </p> 337*0c56280aSSorin Basca 338*0c56280aSSorin Basca <p> 339*0c56280aSSorin Basca <b>Appending:</b> One can append instructions or other instruction 340*0c56280aSSorin Basca lists anywhere to an existing list. The instructions are appended 341*0c56280aSSorin Basca after the given instruction handle. All append methods return a 342*0c56280aSSorin Basca new instruction handle which may then be used as the target of a 343*0c56280aSSorin Basca branch instruction, e.g.: 344*0c56280aSSorin Basca </p> 345*0c56280aSSorin Basca 346*0c56280aSSorin Basca <source> 347*0c56280aSSorin BascaInstructionList il = new InstructionList(); 348*0c56280aSSorin Basca... 349*0c56280aSSorin BascaGOTO g = new GOTO(null); 350*0c56280aSSorin Bascail.append(g); 351*0c56280aSSorin Basca... 352*0c56280aSSorin Basca// Use immutable fly-weight object 353*0c56280aSSorin BascaInstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); 354*0c56280aSSorin Bascag.setTarget(ih); 355*0c56280aSSorin Basca </source> 356*0c56280aSSorin Basca 357*0c56280aSSorin Basca <p> 358*0c56280aSSorin Basca <b>Inserting:</b> Instructions may be inserted anywhere into an 359*0c56280aSSorin Basca existing list. They are inserted before the given instruction 360*0c56280aSSorin Basca handle. All insert methods return a new instruction handle which 361*0c56280aSSorin Basca may then be used as the start address of an exception handler, for 362*0c56280aSSorin Basca example. 363*0c56280aSSorin Basca </p> 364*0c56280aSSorin Basca 365*0c56280aSSorin Basca <source> 366*0c56280aSSorin BascaInstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP); 367*0c56280aSSorin Basca... 368*0c56280aSSorin Bascamg.addExceptionHandler(start, end, handler, "java.io.IOException"); 369*0c56280aSSorin Basca </source> 370*0c56280aSSorin Basca 371*0c56280aSSorin Basca <p> 372*0c56280aSSorin Basca <b>Deleting:</b> Deletion of instructions is also very 373*0c56280aSSorin Basca straightforward; all instruction handles and the contained 374*0c56280aSSorin Basca instructions within a given range are removed from the instruction 375*0c56280aSSorin Basca list and disposed. The <tt>delete()</tt> method may however throw 376*0c56280aSSorin Basca a <tt>TargetLostException</tt> when there are instruction 377*0c56280aSSorin Basca targeters still referencing one of the deleted instructions. The 378*0c56280aSSorin Basca user is forced to handle such exceptions in a <tt>try-catch</tt> 379*0c56280aSSorin Basca clause and redirect these references elsewhere. The <em>peep 380*0c56280aSSorin Basca hole</em> optimizer described in the appendix gives a detailed 381*0c56280aSSorin Basca example for this. 382*0c56280aSSorin Basca </p> 383*0c56280aSSorin Basca 384*0c56280aSSorin Basca <source> 385*0c56280aSSorin Bascatry { 386*0c56280aSSorin Basca il.delete(first, last); 387*0c56280aSSorin Basca} catch (TargetLostException e) { 388*0c56280aSSorin Basca for (InstructionHandle target : e.getTargets()) { 389*0c56280aSSorin Basca for (InstructionTargeter targeter : target.getTargeters()) { 390*0c56280aSSorin Basca targeter.updateTarget(target, new_target); 391*0c56280aSSorin Basca } 392*0c56280aSSorin Basca } 393*0c56280aSSorin Basca} 394*0c56280aSSorin Basca </source> 395*0c56280aSSorin Basca 396*0c56280aSSorin Basca <p> 397*0c56280aSSorin Basca <b>Finalizing:</b> When the instruction list is ready to be dumped 398*0c56280aSSorin Basca to pure byte code, all symbolic references must be mapped to real 399*0c56280aSSorin Basca byte code offsets. This is done by the <tt>getByteCode()</tt> 400*0c56280aSSorin Basca method which is called by default by 401*0c56280aSSorin Basca <tt>MethodGen.getMethod()</tt>. Afterwards you should call 402*0c56280aSSorin Basca <tt>dispose()</tt> so that the instruction handles can be reused 403*0c56280aSSorin Basca internally. This helps to improve memory usage. 404*0c56280aSSorin Basca </p> 405*0c56280aSSorin Basca 406*0c56280aSSorin Basca <source> 407*0c56280aSSorin BascaInstructionList il = new InstructionList(); 408*0c56280aSSorin Basca 409*0c56280aSSorin BascaClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", 410*0c56280aSSorin Basca "<generated>", ACC_PUBLIC | ACC_SUPER, null); 411*0c56280aSSorin BascaMethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC, 412*0c56280aSSorin Basca Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) }, 413*0c56280aSSorin Basca new String[] { "argv" }, "main", "HelloWorld", il, cp); 414*0c56280aSSorin Basca... 415*0c56280aSSorin Bascacg.addMethod(mg.getMethod()); 416*0c56280aSSorin Bascail.dispose(); // Reuse instruction handles of list 417*0c56280aSSorin Basca </source> 418*0c56280aSSorin Basca 419*0c56280aSSorin Basca <h4>Code example revisited</h4> 420*0c56280aSSorin Basca <p> 421*0c56280aSSorin Basca Using instruction lists gives us a generic view upon the code: In 422*0c56280aSSorin Basca <a href="#Figure 5">Figure 5</a> we again present the code chunk 423*0c56280aSSorin Basca of the <tt>readInt()</tt> method of the factorial example in section 424*0c56280aSSorin Basca <a href="jvm.html#Code_example">2.6</a>: The local variables 425*0c56280aSSorin Basca <tt>n</tt> and <tt>e1</tt> both hold two references to 426*0c56280aSSorin Basca instructions, defining their scope. There are two <tt>goto</tt>s 427*0c56280aSSorin Basca branching to the <tt>iload</tt> at the end of the method. One of 428*0c56280aSSorin Basca the exception handlers is displayed, too: it references the start 429*0c56280aSSorin Basca and the end of the <tt>try</tt> block and also the exception 430*0c56280aSSorin Basca handler code. 431*0c56280aSSorin Basca </p> 432*0c56280aSSorin Basca 433*0c56280aSSorin Basca <p align="center"> 434*0c56280aSSorin Basca <a name="Figure 5"> 435*0c56280aSSorin Basca <img src="../images/il.gif"/> 436*0c56280aSSorin Basca <br/> 437*0c56280aSSorin Basca Figure 5: Instruction list for <tt>readInt()</tt> method</a> 438*0c56280aSSorin Basca </p> 439*0c56280aSSorin Basca 440*0c56280aSSorin Basca <h4>Instruction factories</h4> 441*0c56280aSSorin Basca <p> 442*0c56280aSSorin Basca To simplify the creation of certain instructions the user can use 443*0c56280aSSorin Basca the supplied <tt>InstructionFactory</tt> class which offers a lot 444*0c56280aSSorin Basca of useful methods to create instructions from 445*0c56280aSSorin Basca scratch. Alternatively, he can also use <em>compound 446*0c56280aSSorin Basca instructions</em>: When producing byte code, some patterns 447*0c56280aSSorin Basca typically occur very frequently, for instance the compilation of 448*0c56280aSSorin Basca arithmetic or comparison expressions. You certainly do not want 449*0c56280aSSorin Basca to rewrite the code that translates such expressions into byte 450*0c56280aSSorin Basca code in every place they may appear. In order to support this, the 451*0c56280aSSorin Basca <font face="helvetica,arial">BCEL</font> API includes a <em>compound 452*0c56280aSSorin Basca instruction</em> (an interface with a single 453*0c56280aSSorin Basca <tt>getInstructionList()</tt> method). Instances of this class 454*0c56280aSSorin Basca may be used in any place where normal instructions would occur, 455*0c56280aSSorin Basca particularly in append operations. 456*0c56280aSSorin Basca </p> 457*0c56280aSSorin Basca 458*0c56280aSSorin Basca <p> 459*0c56280aSSorin Basca <b>Example: Pushing constants</b> Pushing constants onto the 460*0c56280aSSorin Basca operand stack may be coded in different ways. As explained in <a 461*0c56280aSSorin Basca href="jvm.html#Byte_code_instruction_set">section 2.2</a> there are 462*0c56280aSSorin Basca some "short-cut" instructions that can be used to make the 463*0c56280aSSorin Basca produced byte code more compact. The smallest instruction to push 464*0c56280aSSorin Basca a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other 465*0c56280aSSorin Basca possibilities are <tt>bipush</tt> (can be used to push values 466*0c56280aSSorin Basca between -128 and 127), <tt>sipush</tt> (between -32768 and 32767), 467*0c56280aSSorin Basca or <tt>ldc</tt> (load constant from constant pool). 468*0c56280aSSorin Basca </p> 469*0c56280aSSorin Basca 470*0c56280aSSorin Basca <p> 471*0c56280aSSorin Basca Instead of repeatedly selecting the most compact instruction in, 472*0c56280aSSorin Basca say, a switch, one can use the compound <tt>PUSH</tt> instruction 473*0c56280aSSorin Basca whenever pushing a constant number or string. It will produce the 474*0c56280aSSorin Basca appropriate byte code instruction and insert entries into to 475*0c56280aSSorin Basca constant pool if necessary. 476*0c56280aSSorin Basca </p> 477*0c56280aSSorin Basca 478*0c56280aSSorin Basca <source> 479*0c56280aSSorin BascaInstructionFactory f = new InstructionFactory(class_gen); 480*0c56280aSSorin BascaInstructionList il = new InstructionList(); 481*0c56280aSSorin Basca... 482*0c56280aSSorin Bascail.append(new PUSH(cp, "Hello, world")); 483*0c56280aSSorin Bascail.append(new PUSH(cp, 4711)); 484*0c56280aSSorin Basca... 485*0c56280aSSorin Bascail.append(f.createPrintln("Hello World")); 486*0c56280aSSorin Basca... 487*0c56280aSSorin Bascail.append(f.createReturn(type)); 488*0c56280aSSorin Basca </source> 489*0c56280aSSorin Basca 490*0c56280aSSorin Basca <h4>Code patterns using regular expressions</h4> 491*0c56280aSSorin Basca <p> 492*0c56280aSSorin Basca When transforming code, for instance during optimization or when 493*0c56280aSSorin Basca inserting analysis method calls, one typically searches for 494*0c56280aSSorin Basca certain patterns of code to perform the transformation at. To 495*0c56280aSSorin Basca simplify handling such situations <font 496*0c56280aSSorin Basca face="helvetica,arial">BCEL </font>introduces a special feature: 497*0c56280aSSorin Basca One can search for given code patterns within an instruction list 498*0c56280aSSorin Basca using <em>regular expressions</em>. In such expressions, 499*0c56280aSSorin Basca instructions are represented by their opcode names, e.g., 500*0c56280aSSorin Basca <tt>LDC</tt>, one may also use their respective super classes, e.g., 501*0c56280aSSorin Basca "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>, 502*0c56280aSSorin Basca <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus, 503*0c56280aSSorin Basca the expression 504*0c56280aSSorin Basca </p> 505*0c56280aSSorin Basca 506*0c56280aSSorin Basca <source>"NOP+(ILOAD|ALOAD)*"</source> 507*0c56280aSSorin Basca 508*0c56280aSSorin Basca <p> 509*0c56280aSSorin Basca represents a piece of code consisting of at least one <tt>NOP</tt> 510*0c56280aSSorin Basca followed by a possibly empty sequence of <tt>ILOAD</tt> and 511*0c56280aSSorin Basca <tt>ALOAD</tt> instructions. 512*0c56280aSSorin Basca </p> 513*0c56280aSSorin Basca 514*0c56280aSSorin Basca <p> 515*0c56280aSSorin Basca The <tt>search()</tt> method of class 516*0c56280aSSorin Basca <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular 517*0c56280aSSorin Basca expression and a starting point as arguments and returns an 518*0c56280aSSorin Basca iterator describing the area of matched instructions. Additional 519*0c56280aSSorin Basca constraints to the matching area of instructions, which can not be 520*0c56280aSSorin Basca implemented via regular expressions, may be expressed via <em>code 521*0c56280aSSorin Basca constraint</em> objects. 522*0c56280aSSorin Basca </p> 523*0c56280aSSorin Basca 524*0c56280aSSorin Basca <h4>Example: Optimizing boolean expressions</h4> 525*0c56280aSSorin Basca <p> 526*0c56280aSSorin Basca In Java, boolean values are mapped to 1 and to 0, 527*0c56280aSSorin Basca respectively. Thus, the simplest way to evaluate boolean 528*0c56280aSSorin Basca expressions is to push a 1 or a 0 onto the operand stack depending 529*0c56280aSSorin Basca on the truth value of the expression. But this way, the 530*0c56280aSSorin Basca subsequent combination of boolean expressions (with 531*0c56280aSSorin Basca <tt>&&</tt>, e.g) yields long chunks of code that push 532*0c56280aSSorin Basca lots of 1s and 0s onto the stack. 533*0c56280aSSorin Basca </p> 534*0c56280aSSorin Basca 535*0c56280aSSorin Basca <p> 536*0c56280aSSorin Basca When the code has been finalized these chunks can be optimized 537*0c56280aSSorin Basca with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt> 538*0c56280aSSorin Basca (e.g. the comparison of two integers: <tt>if_icmpeq</tt>) that 539*0c56280aSSorin Basca either produces a 1 or a 0 on the stack and is followed by an 540*0c56280aSSorin Basca <tt>ifne</tt> instruction (branch if stack value 0) may be 541*0c56280aSSorin Basca replaced by the <tt>IfInstruction</tt> with its branch target 542*0c56280aSSorin Basca replaced by the target of the <tt>ifne</tt> instruction: 543*0c56280aSSorin Basca </p> 544*0c56280aSSorin Basca 545*0c56280aSSorin Basca <source> 546*0c56280aSSorin BascaCodeConstraint constraint = new CodeConstraint() { 547*0c56280aSSorin Basca public boolean checkCode(InstructionHandle[] match) { 548*0c56280aSSorin Basca IfInstruction if1 = (IfInstruction) match[0].getInstruction(); 549*0c56280aSSorin Basca GOTO g = (GOTO) match[2].getInstruction(); 550*0c56280aSSorin Basca return (if1.getTarget() == match[3]) && 551*0c56280aSSorin Basca (g.getTarget() == match[4]); 552*0c56280aSSorin Basca } 553*0c56280aSSorin Basca}; 554*0c56280aSSorin Basca 555*0c56280aSSorin BascaInstructionFinder f = new InstructionFinder(il); 556*0c56280aSSorin BascaString pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)"; 557*0c56280aSSorin Basca 558*0c56280aSSorin Bascafor (Iterator e = f.search(pat, constraint); e.hasNext(); ) { 559*0c56280aSSorin Basca InstructionHandle[] match = (InstructionHandle[]) e.next();; 560*0c56280aSSorin Basca ... 561*0c56280aSSorin Basca match[0].setTarget(match[5].getTarget()); // Update target 562*0c56280aSSorin Basca ... 563*0c56280aSSorin Basca try { 564*0c56280aSSorin Basca il.delete(match[1], match[5]); 565*0c56280aSSorin Basca } catch (TargetLostException ex) { 566*0c56280aSSorin Basca ... 567*0c56280aSSorin Basca } 568*0c56280aSSorin Basca} 569*0c56280aSSorin Basca </source> 570*0c56280aSSorin Basca 571*0c56280aSSorin Basca <p> 572*0c56280aSSorin Basca The applied code constraint object ensures that the matched code 573*0c56280aSSorin Basca really corresponds to the targeted expression pattern. Subsequent 574*0c56280aSSorin Basca application of this algorithm removes all unnecessary stack 575*0c56280aSSorin Basca operations and branch instructions from the byte code. If any of 576*0c56280aSSorin Basca the deleted instructions is still referenced by an 577*0c56280aSSorin Basca <tt>InstructionTargeter</tt> object, the reference has to be 578*0c56280aSSorin Basca updated in the <tt>catch</tt>-clause. 579*0c56280aSSorin Basca </p> 580*0c56280aSSorin Basca 581*0c56280aSSorin Basca <p> 582*0c56280aSSorin Basca <b>Example application:</b> 583*0c56280aSSorin Basca The expression: 584*0c56280aSSorin Basca </p> 585*0c56280aSSorin Basca 586*0c56280aSSorin Basca <source> 587*0c56280aSSorin Basca if ((a == null) || (i < 2)) 588*0c56280aSSorin Basca System.out.println("Ooops"); 589*0c56280aSSorin Basca </source> 590*0c56280aSSorin Basca 591*0c56280aSSorin Basca <p> 592*0c56280aSSorin Basca can be mapped to both of the chunks of byte code shown in <a 593*0c56280aSSorin Basca href="#Figure 6">figure 6</a>. The left column represents the 594*0c56280aSSorin Basca unoptimized code while the right column displays the same code 595*0c56280aSSorin Basca after the peep hole algorithm has been applied: 596*0c56280aSSorin Basca </p> 597*0c56280aSSorin Basca 598*0c56280aSSorin Basca <p align="center"><a name="Figure 6"> 599*0c56280aSSorin Basca <table> 600*0c56280aSSorin Basca <tr> 601*0c56280aSSorin Basca <td valign="top"><pre> 602*0c56280aSSorin Basca 5: aload_0 603*0c56280aSSorin Basca 6: ifnull #13 604*0c56280aSSorin Basca 9: iconst_0 605*0c56280aSSorin Basca 10: goto #14 606*0c56280aSSorin Basca 13: iconst_1 607*0c56280aSSorin Basca 14: nop 608*0c56280aSSorin Basca 15: ifne #36 609*0c56280aSSorin Basca 18: iload_1 610*0c56280aSSorin Basca 19: iconst_2 611*0c56280aSSorin Basca 20: if_icmplt #27 612*0c56280aSSorin Basca 23: iconst_0 613*0c56280aSSorin Basca 24: goto #28 614*0c56280aSSorin Basca 27: iconst_1 615*0c56280aSSorin Basca 28: nop 616*0c56280aSSorin Basca 29: ifne #36 617*0c56280aSSorin Basca 32: iconst_0 618*0c56280aSSorin Basca 33: goto #37 619*0c56280aSSorin Basca 36: iconst_1 620*0c56280aSSorin Basca 37: nop 621*0c56280aSSorin Basca 38: ifeq #52 622*0c56280aSSorin Basca 41: getstatic System.out 623*0c56280aSSorin Basca 44: ldc "Ooops" 624*0c56280aSSorin Basca 46: invokevirtual println 625*0c56280aSSorin Basca 52: return 626*0c56280aSSorin Basca </pre></td> 627*0c56280aSSorin Basca <td valign="top"><pre> 628*0c56280aSSorin Basca 10: aload_0 629*0c56280aSSorin Basca 11: ifnull #19 630*0c56280aSSorin Basca 14: iload_1 631*0c56280aSSorin Basca 15: iconst_2 632*0c56280aSSorin Basca 16: if_icmpge #27 633*0c56280aSSorin Basca 19: getstatic System.out 634*0c56280aSSorin Basca 22: ldc "Ooops" 635*0c56280aSSorin Basca 24: invokevirtual println 636*0c56280aSSorin Basca 27: return 637*0c56280aSSorin Basca </pre></td> 638*0c56280aSSorin Basca </tr> 639*0c56280aSSorin Basca </table> 640*0c56280aSSorin Basca </a> 641*0c56280aSSorin Basca </p> 642*0c56280aSSorin Basca </subsection> 643*0c56280aSSorin Basca </section> 644*0c56280aSSorin Basca </body> 645*0c56280aSSorin Basca</document>