xref: /aosp_15_r20/external/apache-commons-bcel/src/site/xdoc/manual/bcel-api.xml (revision 0c56280ab0842982c46a149f7b9eaa497e31e292)
1*0c56280aSSorin Basca<?xml version="1.0"?>
2*0c56280aSSorin Basca<!--
3*0c56280aSSorin Basca    * Licensed to the Apache Software Foundation (ASF) under one
4*0c56280aSSorin Basca    * or more contributor license agreements.  See the NOTICE file
5*0c56280aSSorin Basca    * distributed with this work for additional information
6*0c56280aSSorin Basca    * regarding copyright ownership.  The ASF licenses this file
7*0c56280aSSorin Basca    * to you under the Apache License, Version 2.0 (the
8*0c56280aSSorin Basca    * "License"); you may not use this file except in compliance
9*0c56280aSSorin Basca    * with the License.  You may obtain a copy of the License at
10*0c56280aSSorin Basca    *
11*0c56280aSSorin Basca    *   http://www.apache.org/licenses/LICENSE-2.0
12*0c56280aSSorin Basca    *
13*0c56280aSSorin Basca    * Unless required by applicable law or agreed to in writing,
14*0c56280aSSorin Basca    * software distributed under the License is distributed on an
15*0c56280aSSorin Basca    * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16*0c56280aSSorin Basca    * KIND, either express or implied.  See the License for the
17*0c56280aSSorin Basca    * specific language governing permissions and limitations
18*0c56280aSSorin Basca    * under the License.
19*0c56280aSSorin Basca-->
20*0c56280aSSorin Basca<document>
21*0c56280aSSorin Basca  <properties>
22*0c56280aSSorin Basca    <title>The BCEL API</title>
23*0c56280aSSorin Basca  </properties>
24*0c56280aSSorin Basca
25*0c56280aSSorin Basca  <body>
26*0c56280aSSorin Basca    <section name="The BCEL API">
27*0c56280aSSorin Basca      <p>
28*0c56280aSSorin Basca        The <font face="helvetica,arial">BCEL</font> API abstracts from
29*0c56280aSSorin Basca        the concrete circumstances of the Java Virtual Machine and how to
30*0c56280aSSorin Basca        read and write binary Java class files. The API mainly consists
31*0c56280aSSorin Basca        of three parts:
32*0c56280aSSorin Basca      </p>
33*0c56280aSSorin Basca
34*0c56280aSSorin Basca      <p>
35*0c56280aSSorin Basca
36*0c56280aSSorin Basca        <ol type="1">
37*0c56280aSSorin Basca          <li> A package that contains classes that describe "static"
38*0c56280aSSorin Basca            constraints of class files, i.e., reflects the class file format and
39*0c56280aSSorin Basca            is not intended for byte code modifications. The classes may be
40*0c56280aSSorin Basca            used to read and write class files from or to a file.  This is
41*0c56280aSSorin Basca            useful especially for analyzing Java classes without having the
42*0c56280aSSorin Basca            source files at hand.  The main data structure is called
43*0c56280aSSorin Basca            <tt>JavaClass</tt> which contains methods, fields, etc..</li>
44*0c56280aSSorin Basca
45*0c56280aSSorin Basca          <li> A package to dynamically generate or modify
46*0c56280aSSorin Basca            <tt>JavaClass</tt> or <tt>Method</tt> objects.  It may be used to
47*0c56280aSSorin Basca            insert analysis code, to strip unnecessary information from class
48*0c56280aSSorin Basca            files, or to implement the code generator back-end of a Java
49*0c56280aSSorin Basca            compiler.</li>
50*0c56280aSSorin Basca
51*0c56280aSSorin Basca          <li> Various code examples and utilities like a class file viewer,
52*0c56280aSSorin Basca            a tool to convert class files into HTML, and a converter from
53*0c56280aSSorin Basca            class files to the <a
54*0c56280aSSorin Basca                    href="http://jasmin.sourceforge.net">Jasmin</a> assembly
55*0c56280aSSorin Basca            language.</li>
56*0c56280aSSorin Basca        </ol>
57*0c56280aSSorin Basca      </p>
58*0c56280aSSorin Basca
59*0c56280aSSorin Basca    <subsection name="JavaClass">
60*0c56280aSSorin Basca      <p>
61*0c56280aSSorin Basca        The "static" component of the <font
62*0c56280aSSorin Basca              face="helvetica,arial">BCEL</font> API resides in the package
63*0c56280aSSorin Basca        <tt>org.apache.bcel.classfile</tt> and closely represents class
64*0c56280aSSorin Basca        files. All of the binary components and data structures declared
65*0c56280aSSorin Basca        in the <a
66*0c56280aSSorin Basca              href="http://docs.oracle.com/javase/specs/">JVM
67*0c56280aSSorin Basca        specification</a> and described in section <a
68*0c56280aSSorin Basca              href="#2 The Java Virtual Machine">2</a> are mapped to classes.
69*0c56280aSSorin Basca
70*0c56280aSSorin Basca        <a href="#Figure 3">Figure 3</a> shows an UML diagram of the
71*0c56280aSSorin Basca        hierarchy of classes of the <font face="helvetica,arial">BCEL
72*0c56280aSSorin Basca      </font>API. <a href="#Figure 8">Figure 8</a> in the appendix also
73*0c56280aSSorin Basca        shows a detailed diagram of the <tt>ConstantPool</tt> components.
74*0c56280aSSorin Basca      </p>
75*0c56280aSSorin Basca
76*0c56280aSSorin Basca      <p align="center">
77*0c56280aSSorin Basca        <a name="Figure 3">
78*0c56280aSSorin Basca          <img src="../images/javaclass.gif"/> <br/>
79*0c56280aSSorin Basca          Figure 3: UML diagram for the JavaClass API</a>
80*0c56280aSSorin Basca      </p>
81*0c56280aSSorin Basca
82*0c56280aSSorin Basca      <p>
83*0c56280aSSorin Basca        The top-level data structure is <tt>JavaClass</tt>, which in most
84*0c56280aSSorin Basca        cases is created by a <tt>ClassParser</tt> object that is capable
85*0c56280aSSorin Basca        of parsing binary class files. A <tt>JavaClass</tt> object
86*0c56280aSSorin Basca        basically consists of fields, methods, symbolic references to the
87*0c56280aSSorin Basca        super class and to the implemented interfaces.
88*0c56280aSSorin Basca      </p>
89*0c56280aSSorin Basca
90*0c56280aSSorin Basca      <p>
91*0c56280aSSorin Basca        The constant pool serves as some kind of central repository and is
92*0c56280aSSorin Basca        thus of outstanding importance for all components.
93*0c56280aSSorin Basca        <tt>ConstantPool</tt> objects contain an array of fixed size of
94*0c56280aSSorin Basca        <tt>Constant</tt> entries, which may be retrieved via the
95*0c56280aSSorin Basca        <tt>getConstant()</tt> method taking an integer index as argument.
96*0c56280aSSorin Basca        Indexes to the constant pool may be contained in instructions as
97*0c56280aSSorin Basca        well as in other components of a class file and in constant pool
98*0c56280aSSorin Basca        entries themselves.
99*0c56280aSSorin Basca      </p>
100*0c56280aSSorin Basca
101*0c56280aSSorin Basca      <p>
102*0c56280aSSorin Basca        Methods and fields contain a signature, symbolically defining
103*0c56280aSSorin Basca        their types.  Access flags like <tt>public static final</tt> occur
104*0c56280aSSorin Basca        in several places and are encoded by an integer bit mask, e.g.,
105*0c56280aSSorin Basca        <tt>public static final</tt> matches to the Java expression
106*0c56280aSSorin Basca      </p>
107*0c56280aSSorin Basca
108*0c56280aSSorin Basca
109*0c56280aSSorin Basca      <source>int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;</source>
110*0c56280aSSorin Basca
111*0c56280aSSorin Basca      <p>
112*0c56280aSSorin Basca        As mentioned in <a href="jvm.html#Java_class_file_format">section
113*0c56280aSSorin Basca        2.1</a> already, several components may contain <em>attribute</em>
114*0c56280aSSorin Basca        objects: classes, fields, methods, and <tt>Code</tt> objects
115*0c56280aSSorin Basca        (introduced in <a href="jvm.html#Method_code">section 2.3</a>).  The
116*0c56280aSSorin Basca        latter is an attribute itself that contains the actual byte code
117*0c56280aSSorin Basca        array, the maximum stack size, the number of local variables, a
118*0c56280aSSorin Basca        table of handled exceptions, and some optional debugging
119*0c56280aSSorin Basca        information coded as <tt>LineNumberTable</tt> and
120*0c56280aSSorin Basca        <tt>LocalVariableTable</tt> attributes. Attributes are in general
121*0c56280aSSorin Basca        specific to some data structure, i.e., no two components share the
122*0c56280aSSorin Basca        same kind of attribute, though this is not explicitly
123*0c56280aSSorin Basca        forbidden. In the figure the <tt>Attribute</tt> classes are stereotyped
124*0c56280aSSorin Basca        with the component they belong to.
125*0c56280aSSorin Basca      </p>
126*0c56280aSSorin Basca
127*0c56280aSSorin Basca    </subsection>
128*0c56280aSSorin Basca
129*0c56280aSSorin Basca    <subsection name="Class repository">
130*0c56280aSSorin Basca      <p>
131*0c56280aSSorin Basca        Using the provided <tt>Repository</tt> class, reading class files into
132*0c56280aSSorin Basca        a <tt>JavaClass</tt> object is quite simple:
133*0c56280aSSorin Basca      </p>
134*0c56280aSSorin Basca
135*0c56280aSSorin Basca      <source>JavaClass clazz = Repository.lookupClass("java.lang.String");</source>
136*0c56280aSSorin Basca
137*0c56280aSSorin Basca      <p>
138*0c56280aSSorin Basca        The repository also contains methods providing the dynamic equivalent
139*0c56280aSSorin Basca        of the <tt>instanceof</tt> operator, and other useful routines:
140*0c56280aSSorin Basca      </p>
141*0c56280aSSorin Basca
142*0c56280aSSorin Basca      <source>
143*0c56280aSSorin Bascaif (Repository.instanceOf(clazz, super_class)) {
144*0c56280aSSorin Basca    ...
145*0c56280aSSorin Basca}
146*0c56280aSSorin Basca      </source>
147*0c56280aSSorin Basca
148*0c56280aSSorin Basca    </subsection>
149*0c56280aSSorin Basca
150*0c56280aSSorin Basca    <h4>Accessing class file data</h4>
151*0c56280aSSorin Basca
152*0c56280aSSorin Basca      <p>
153*0c56280aSSorin Basca        Information within the class file components may be accessed like
154*0c56280aSSorin Basca        Java Beans via intuitive set/get methods. All of them also define
155*0c56280aSSorin Basca        a <tt>toString()</tt> method so that implementing a simple class
156*0c56280aSSorin Basca        viewer is very easy. In fact all of the examples used here have
157*0c56280aSSorin Basca        been produced this way:
158*0c56280aSSorin Basca      </p>
159*0c56280aSSorin Basca
160*0c56280aSSorin Basca      <source>
161*0c56280aSSorin BascaSystem.out.println(clazz);
162*0c56280aSSorin BascaprintCode(clazz.getMethods());
163*0c56280aSSorin Basca...
164*0c56280aSSorin Bascapublic static void printCode(Method[] methods) {
165*0c56280aSSorin Basca    for (int i = 0; i &lt; methods.length; i++) {
166*0c56280aSSorin Basca        System.out.println(methods[i]);
167*0c56280aSSorin Basca
168*0c56280aSSorin Basca        Code code = methods[i].getCode();
169*0c56280aSSorin Basca        if (code != null) // Non-abstract method
170*0c56280aSSorin Basca        System.out.println(code);
171*0c56280aSSorin Basca    }
172*0c56280aSSorin Basca}
173*0c56280aSSorin Basca      </source>
174*0c56280aSSorin Basca
175*0c56280aSSorin Basca    <h4>Analyzing class data</h4>
176*0c56280aSSorin Basca      <p>
177*0c56280aSSorin Basca        Last but not least, <font face="helvetica,arial">BCEL</font>
178*0c56280aSSorin Basca        supports the <em>Visitor</em> design pattern, so one can write
179*0c56280aSSorin Basca        visitor objects to traverse and analyze the contents of a class
180*0c56280aSSorin Basca        file. Included in the distribution is a class
181*0c56280aSSorin Basca        <tt>JasminVisitor</tt> that converts class files into the <a
182*0c56280aSSorin Basca              href="http://jasmin.sourceforge.net">Jasmin</a>
183*0c56280aSSorin Basca        assembler language.
184*0c56280aSSorin Basca      </p>
185*0c56280aSSorin Basca
186*0c56280aSSorin Basca    <subsection name="ClassGen">
187*0c56280aSSorin Basca      <p>
188*0c56280aSSorin Basca        This part of the API (package <tt>org.apache.bcel.generic</tt>)
189*0c56280aSSorin Basca        supplies an abstraction level for creating or transforming class
190*0c56280aSSorin Basca        files dynamically. It makes the static constraints of Java class
191*0c56280aSSorin Basca        files like the hard-coded byte code addresses "generic". The
192*0c56280aSSorin Basca        generic constant pool, for example, is implemented by the class
193*0c56280aSSorin Basca        <tt>ConstantPoolGen</tt> which offers methods for adding different
194*0c56280aSSorin Basca        types of constants. Accordingly, <tt>ClassGen</tt> offers an
195*0c56280aSSorin Basca        interface to add methods, fields, and attributes.
196*0c56280aSSorin Basca        <a href="#Figure 4">Figure 4</a> gives an overview of this part of the API.
197*0c56280aSSorin Basca      </p>
198*0c56280aSSorin Basca
199*0c56280aSSorin Basca      <p align="center">
200*0c56280aSSorin Basca        <a name="Figure 4">
201*0c56280aSSorin Basca          <img src="../images/classgen.gif"/>
202*0c56280aSSorin Basca          <br/>
203*0c56280aSSorin Basca          Figure 4: UML diagram of the ClassGen API</a>
204*0c56280aSSorin Basca      </p>
205*0c56280aSSorin Basca
206*0c56280aSSorin Basca    <h4>Types</h4>
207*0c56280aSSorin Basca      <p>
208*0c56280aSSorin Basca        We abstract from the concrete details of the type signature syntax
209*0c56280aSSorin Basca        (see <a href="jvm.html#Type_information">2.5</a>) by introducing the
210*0c56280aSSorin Basca        <tt>Type</tt> class, which is used, for example, by methods to
211*0c56280aSSorin Basca        define their return and argument types. Concrete sub-classes are
212*0c56280aSSorin Basca        <tt>BasicType</tt>, <tt>ObjectType</tt>, and <tt>ArrayType</tt>
213*0c56280aSSorin Basca        which consists of the element type and the number of
214*0c56280aSSorin Basca        dimensions. For commonly used types the class offers some
215*0c56280aSSorin Basca        predefined constants. For example, the method signature of the
216*0c56280aSSorin Basca        <tt>main</tt> method as shown in
217*0c56280aSSorin Basca        <a href="jvm.html#Type_information">section 2.5</a> is represented by:
218*0c56280aSSorin Basca      </p>
219*0c56280aSSorin Basca
220*0c56280aSSorin Basca      <source>
221*0c56280aSSorin BascaType return_type = Type.VOID;
222*0c56280aSSorin BascaType[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) };
223*0c56280aSSorin Basca      </source>
224*0c56280aSSorin Basca
225*0c56280aSSorin Basca      <p>
226*0c56280aSSorin Basca        <tt>Type</tt> also contains methods to convert types into textual
227*0c56280aSSorin Basca        signatures and vice versa. The sub-classes contain implementations
228*0c56280aSSorin Basca        of the routines and constraints specified by the Java Language
229*0c56280aSSorin Basca        Specification.
230*0c56280aSSorin Basca      </p>
231*0c56280aSSorin Basca
232*0c56280aSSorin Basca    <h4>Generic fields and methods</h4>
233*0c56280aSSorin Basca      <p>
234*0c56280aSSorin Basca        Fields are represented by <tt>FieldGen</tt> objects, which may be
235*0c56280aSSorin Basca        freely modified by the user. If they have the access rights
236*0c56280aSSorin Basca        <tt>static final</tt>, i.e., are constants and of basic type, they
237*0c56280aSSorin Basca        may optionally have an initializing value.
238*0c56280aSSorin Basca      </p>
239*0c56280aSSorin Basca
240*0c56280aSSorin Basca      <p>
241*0c56280aSSorin Basca        Generic methods contain methods to add exceptions the method may
242*0c56280aSSorin Basca        throw, local variables, and exception handlers. The latter two are
243*0c56280aSSorin Basca        represented by user-configurable objects as well. Because
244*0c56280aSSorin Basca        exception handlers and local variables contain references to byte
245*0c56280aSSorin Basca        code addresses, they also take the role of an <em>instruction
246*0c56280aSSorin Basca        targeter</em> in our terminology. Instruction targeters contain a
247*0c56280aSSorin Basca        method <tt>updateTarget()</tt> to redirect a reference. This is
248*0c56280aSSorin Basca        somewhat related to the Observer design pattern. Generic
249*0c56280aSSorin Basca        (non-abstract) methods refer to <em>instruction lists</em> that
250*0c56280aSSorin Basca        consist of instruction objects. References to byte code addresses
251*0c56280aSSorin Basca        are implemented by handles to instruction objects. If the list is
252*0c56280aSSorin Basca        updated the instruction targeters will be informed about it. This
253*0c56280aSSorin Basca        is explained in more detail in the following sections.
254*0c56280aSSorin Basca      </p>
255*0c56280aSSorin Basca
256*0c56280aSSorin Basca      <p>
257*0c56280aSSorin Basca        The maximum stack size needed by the method and the maximum number
258*0c56280aSSorin Basca        of local variables used may be set manually or computed via the
259*0c56280aSSorin Basca        <tt>setMaxStack()</tt> and <tt>setMaxLocals()</tt> methods
260*0c56280aSSorin Basca        automatically.
261*0c56280aSSorin Basca      </p>
262*0c56280aSSorin Basca
263*0c56280aSSorin Basca    <h4>Instructions</h4>
264*0c56280aSSorin Basca      <p>
265*0c56280aSSorin Basca        Modeling instructions as objects may look somewhat odd at first
266*0c56280aSSorin Basca        sight, but in fact enables programmers to obtain a high-level view
267*0c56280aSSorin Basca        upon control flow without handling details like concrete byte code
268*0c56280aSSorin Basca        offsets.  Instructions consist of an opcode (sometimes called
269*0c56280aSSorin Basca        tag), their length in bytes and an offset (or index) within the
270*0c56280aSSorin Basca        byte code. Since many instructions are immutable (stack operators,
271*0c56280aSSorin Basca        e.g.), the <tt>InstructionConstants</tt> interface offers
272*0c56280aSSorin Basca        shareable predefined "fly-weight" constants to use.
273*0c56280aSSorin Basca      </p>
274*0c56280aSSorin Basca
275*0c56280aSSorin Basca      <p>
276*0c56280aSSorin Basca        Instructions are grouped via sub-classing, the type hierarchy of
277*0c56280aSSorin Basca        instruction classes is illustrated by (incomplete) figure in the
278*0c56280aSSorin Basca        appendix. The most important family of instructions are the
279*0c56280aSSorin Basca        <em>branch instructions</em>, e.g., <tt>goto</tt>, that branch to
280*0c56280aSSorin Basca        targets somewhere within the byte code. Obviously, this makes them
281*0c56280aSSorin Basca        candidates for playing an <tt>InstructionTargeter</tt> role,
282*0c56280aSSorin Basca        too. Instructions are further grouped by the interfaces they
283*0c56280aSSorin Basca        implement, there are, e.g., <tt>TypedInstruction</tt>s that are
284*0c56280aSSorin Basca        associated with a specific type like <tt>ldc</tt>, or
285*0c56280aSSorin Basca        <tt>ExceptionThrower</tt> instructions that may raise exceptions
286*0c56280aSSorin Basca        when executed.
287*0c56280aSSorin Basca      </p>
288*0c56280aSSorin Basca
289*0c56280aSSorin Basca      <p>
290*0c56280aSSorin Basca        All instructions can be traversed via <tt>accept(Visitor v)</tt>
291*0c56280aSSorin Basca        methods, i.e., the Visitor design pattern. There is however some
292*0c56280aSSorin Basca        special trick in these methods that allows to merge the handling
293*0c56280aSSorin Basca        of certain instruction groups. The <tt>accept()</tt> do not only
294*0c56280aSSorin Basca        call the corresponding <tt>visit()</tt> method, but call
295*0c56280aSSorin Basca        <tt>visit()</tt> methods of their respective super classes and
296*0c56280aSSorin Basca        implemented interfaces first, i.e., the most specific
297*0c56280aSSorin Basca        <tt>visit()</tt> call is last. Thus one can group the handling of,
298*0c56280aSSorin Basca        say, all <tt>BranchInstruction</tt>s into one single method.
299*0c56280aSSorin Basca      </p>
300*0c56280aSSorin Basca
301*0c56280aSSorin Basca      <p>
302*0c56280aSSorin Basca        For debugging purposes it may even make sense to "invent" your own
303*0c56280aSSorin Basca        instructions. In a sophisticated code generator like the one used
304*0c56280aSSorin Basca        as a backend of the <a href="http://barat.sourceforge.net">Barat
305*0c56280aSSorin Basca        framework</a> for static analysis one often has to insert
306*0c56280aSSorin Basca        temporary <tt>nop</tt> (No operation) instructions. When examining
307*0c56280aSSorin Basca        the produced code it may be very difficult to track back where the
308*0c56280aSSorin Basca        <tt>nop</tt> was actually inserted. One could think of a derived
309*0c56280aSSorin Basca        <tt>nop2</tt> instruction that contains additional debugging
310*0c56280aSSorin Basca        information. When the instruction list is dumped to byte code, the
311*0c56280aSSorin Basca        extra data is simply dropped.
312*0c56280aSSorin Basca      </p>
313*0c56280aSSorin Basca
314*0c56280aSSorin Basca      <p>
315*0c56280aSSorin Basca        One could also think of new byte code instructions operating on
316*0c56280aSSorin Basca        complex numbers that are replaced by normal byte code upon
317*0c56280aSSorin Basca        load-time or are recognized by a new JVM.
318*0c56280aSSorin Basca      </p>
319*0c56280aSSorin Basca
320*0c56280aSSorin Basca    <h4>Instruction lists</h4>
321*0c56280aSSorin Basca      <p>
322*0c56280aSSorin Basca        An <em>instruction list</em> is implemented by a list of
323*0c56280aSSorin Basca        <em>instruction handles</em> encapsulating instruction objects.
324*0c56280aSSorin Basca        References to instructions in the list are thus not implemented by
325*0c56280aSSorin Basca        direct pointers to instructions but by pointers to instruction
326*0c56280aSSorin Basca        <em>handles</em>. This makes appending, inserting and deleting
327*0c56280aSSorin Basca        areas of code very simple and also allows us to reuse immutable
328*0c56280aSSorin Basca        instruction objects (fly-weight objects). Since we use symbolic
329*0c56280aSSorin Basca        references, computation of concrete byte code offsets does not
330*0c56280aSSorin Basca        need to occur until finalization, i.e., until the user has
331*0c56280aSSorin Basca        finished the process of generating or transforming code. We will
332*0c56280aSSorin Basca        use the term instruction handle and instruction synonymously
333*0c56280aSSorin Basca        throughout the rest of the paper. Instruction handles may contain
334*0c56280aSSorin Basca        additional user-defined data using the <tt>addAttribute()</tt>
335*0c56280aSSorin Basca        method.
336*0c56280aSSorin Basca      </p>
337*0c56280aSSorin Basca
338*0c56280aSSorin Basca      <p>
339*0c56280aSSorin Basca        <b>Appending:</b> One can append instructions or other instruction
340*0c56280aSSorin Basca        lists anywhere to an existing list. The instructions are appended
341*0c56280aSSorin Basca        after the given instruction handle. All append methods return a
342*0c56280aSSorin Basca        new instruction handle which may then be used as the target of a
343*0c56280aSSorin Basca        branch instruction, e.g.:
344*0c56280aSSorin Basca      </p>
345*0c56280aSSorin Basca
346*0c56280aSSorin Basca      <source>
347*0c56280aSSorin BascaInstructionList il = new InstructionList();
348*0c56280aSSorin Basca...
349*0c56280aSSorin BascaGOTO g = new GOTO(null);
350*0c56280aSSorin Bascail.append(g);
351*0c56280aSSorin Basca...
352*0c56280aSSorin Basca// Use immutable fly-weight object
353*0c56280aSSorin BascaInstructionHandle ih = il.append(InstructionConstants.ACONST_NULL);
354*0c56280aSSorin Bascag.setTarget(ih);
355*0c56280aSSorin Basca      </source>
356*0c56280aSSorin Basca
357*0c56280aSSorin Basca      <p>
358*0c56280aSSorin Basca        <b>Inserting:</b> Instructions may be inserted anywhere into an
359*0c56280aSSorin Basca        existing list. They are inserted before the given instruction
360*0c56280aSSorin Basca        handle. All insert methods return a new instruction handle which
361*0c56280aSSorin Basca        may then be used as the start address of an exception handler, for
362*0c56280aSSorin Basca        example.
363*0c56280aSSorin Basca      </p>
364*0c56280aSSorin Basca
365*0c56280aSSorin Basca      <source>
366*0c56280aSSorin BascaInstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP);
367*0c56280aSSorin Basca...
368*0c56280aSSorin Bascamg.addExceptionHandler(start, end, handler, "java.io.IOException");
369*0c56280aSSorin Basca      </source>
370*0c56280aSSorin Basca
371*0c56280aSSorin Basca      <p>
372*0c56280aSSorin Basca        <b>Deleting:</b> Deletion of instructions is also very
373*0c56280aSSorin Basca        straightforward; all instruction handles and the contained
374*0c56280aSSorin Basca        instructions within a given range are removed from the instruction
375*0c56280aSSorin Basca        list and disposed. The <tt>delete()</tt> method may however throw
376*0c56280aSSorin Basca        a <tt>TargetLostException</tt> when there are instruction
377*0c56280aSSorin Basca        targeters still referencing one of the deleted instructions. The
378*0c56280aSSorin Basca        user is forced to handle such exceptions in a <tt>try-catch</tt>
379*0c56280aSSorin Basca        clause and redirect these references elsewhere. The <em>peep
380*0c56280aSSorin Basca        hole</em> optimizer described in the appendix gives a detailed
381*0c56280aSSorin Basca        example for this.
382*0c56280aSSorin Basca      </p>
383*0c56280aSSorin Basca
384*0c56280aSSorin Basca      <source>
385*0c56280aSSorin Bascatry {
386*0c56280aSSorin Basca    il.delete(first, last);
387*0c56280aSSorin Basca} catch (TargetLostException e) {
388*0c56280aSSorin Basca    for (InstructionHandle target : e.getTargets()) {
389*0c56280aSSorin Basca        for (InstructionTargeter targeter : target.getTargeters()) {
390*0c56280aSSorin Basca            targeter.updateTarget(target, new_target);
391*0c56280aSSorin Basca        }
392*0c56280aSSorin Basca    }
393*0c56280aSSorin Basca}
394*0c56280aSSorin Basca      </source>
395*0c56280aSSorin Basca
396*0c56280aSSorin Basca      <p>
397*0c56280aSSorin Basca        <b>Finalizing:</b> When the instruction list is ready to be dumped
398*0c56280aSSorin Basca        to pure byte code, all symbolic references must be mapped to real
399*0c56280aSSorin Basca        byte code offsets. This is done by the <tt>getByteCode()</tt>
400*0c56280aSSorin Basca        method which is called by default by
401*0c56280aSSorin Basca        <tt>MethodGen.getMethod()</tt>. Afterwards you should call
402*0c56280aSSorin Basca        <tt>dispose()</tt> so that the instruction handles can be reused
403*0c56280aSSorin Basca        internally. This helps to improve memory usage.
404*0c56280aSSorin Basca      </p>
405*0c56280aSSorin Basca
406*0c56280aSSorin Basca      <source>
407*0c56280aSSorin BascaInstructionList il = new InstructionList();
408*0c56280aSSorin Basca
409*0c56280aSSorin BascaClassGen  cg = new ClassGen("HelloWorld", "java.lang.Object",
410*0c56280aSSorin Basca        "&lt;generated&#62;", ACC_PUBLIC | ACC_SUPER, null);
411*0c56280aSSorin BascaMethodGen mg = new MethodGen(ACC_STATIC | ACC_PUBLIC,
412*0c56280aSSorin Basca        Type.VOID, new Type[] { new ArrayType(Type.STRING, 1) },
413*0c56280aSSorin Basca        new String[] { "argv" }, "main", "HelloWorld", il, cp);
414*0c56280aSSorin Basca...
415*0c56280aSSorin Bascacg.addMethod(mg.getMethod());
416*0c56280aSSorin Bascail.dispose(); // Reuse instruction handles of list
417*0c56280aSSorin Basca      </source>
418*0c56280aSSorin Basca
419*0c56280aSSorin Basca    <h4>Code example revisited</h4>
420*0c56280aSSorin Basca      <p>
421*0c56280aSSorin Basca        Using instruction lists gives us a generic view upon the code: In
422*0c56280aSSorin Basca        <a href="#Figure 5">Figure 5</a> we again present the code chunk
423*0c56280aSSorin Basca        of the <tt>readInt()</tt> method of the factorial example in section
424*0c56280aSSorin Basca        <a href="jvm.html#Code_example">2.6</a>: The local variables
425*0c56280aSSorin Basca        <tt>n</tt> and <tt>e1</tt> both hold two references to
426*0c56280aSSorin Basca        instructions, defining their scope.  There are two <tt>goto</tt>s
427*0c56280aSSorin Basca        branching to the <tt>iload</tt> at the end of the method. One of
428*0c56280aSSorin Basca        the exception handlers is displayed, too: it references the start
429*0c56280aSSorin Basca        and the end of the <tt>try</tt> block and also the exception
430*0c56280aSSorin Basca        handler code.
431*0c56280aSSorin Basca      </p>
432*0c56280aSSorin Basca
433*0c56280aSSorin Basca      <p align="center">
434*0c56280aSSorin Basca        <a name="Figure 5">
435*0c56280aSSorin Basca          <img src="../images/il.gif"/>
436*0c56280aSSorin Basca          <br/>
437*0c56280aSSorin Basca          Figure 5: Instruction list for <tt>readInt()</tt> method</a>
438*0c56280aSSorin Basca      </p>
439*0c56280aSSorin Basca
440*0c56280aSSorin Basca    <h4>Instruction factories</h4>
441*0c56280aSSorin Basca      <p>
442*0c56280aSSorin Basca        To simplify the creation of certain instructions the user can use
443*0c56280aSSorin Basca        the supplied <tt>InstructionFactory</tt> class which offers a lot
444*0c56280aSSorin Basca        of useful methods to create instructions from
445*0c56280aSSorin Basca        scratch. Alternatively, he can also use <em>compound
446*0c56280aSSorin Basca        instructions</em>: When producing byte code, some patterns
447*0c56280aSSorin Basca        typically occur very frequently, for instance the compilation of
448*0c56280aSSorin Basca        arithmetic or comparison expressions. You certainly do not want
449*0c56280aSSorin Basca        to rewrite the code that translates such expressions into byte
450*0c56280aSSorin Basca        code in every place they may appear. In order to support this, the
451*0c56280aSSorin Basca        <font face="helvetica,arial">BCEL</font> API includes a <em>compound
452*0c56280aSSorin Basca        instruction</em> (an interface with a single
453*0c56280aSSorin Basca        <tt>getInstructionList()</tt> method). Instances of this class
454*0c56280aSSorin Basca        may be used in any place where normal instructions would occur,
455*0c56280aSSorin Basca        particularly in append operations.
456*0c56280aSSorin Basca      </p>
457*0c56280aSSorin Basca
458*0c56280aSSorin Basca      <p>
459*0c56280aSSorin Basca        <b>Example: Pushing constants</b> Pushing constants onto the
460*0c56280aSSorin Basca        operand stack may be coded in different ways. As explained in <a
461*0c56280aSSorin Basca              href="jvm.html#Byte_code_instruction_set">section 2.2</a> there are
462*0c56280aSSorin Basca        some "short-cut" instructions that can be used to make the
463*0c56280aSSorin Basca        produced byte code more compact. The smallest instruction to push
464*0c56280aSSorin Basca        a single <tt>1</tt> onto the stack is <tt>iconst_1</tt>, other
465*0c56280aSSorin Basca        possibilities are <tt>bipush</tt> (can be used to push values
466*0c56280aSSorin Basca        between -128 and 127), <tt>sipush</tt> (between -32768 and 32767),
467*0c56280aSSorin Basca        or <tt>ldc</tt> (load constant from constant pool).
468*0c56280aSSorin Basca      </p>
469*0c56280aSSorin Basca
470*0c56280aSSorin Basca      <p>
471*0c56280aSSorin Basca        Instead of repeatedly selecting the most compact instruction in,
472*0c56280aSSorin Basca        say, a switch, one can use the compound <tt>PUSH</tt> instruction
473*0c56280aSSorin Basca        whenever pushing a constant number or string. It will produce the
474*0c56280aSSorin Basca        appropriate byte code instruction and insert entries into to
475*0c56280aSSorin Basca        constant pool if necessary.
476*0c56280aSSorin Basca      </p>
477*0c56280aSSorin Basca
478*0c56280aSSorin Basca      <source>
479*0c56280aSSorin BascaInstructionFactory f  = new InstructionFactory(class_gen);
480*0c56280aSSorin BascaInstructionList    il = new InstructionList();
481*0c56280aSSorin Basca...
482*0c56280aSSorin Bascail.append(new PUSH(cp, "Hello, world"));
483*0c56280aSSorin Bascail.append(new PUSH(cp, 4711));
484*0c56280aSSorin Basca...
485*0c56280aSSorin Bascail.append(f.createPrintln("Hello World"));
486*0c56280aSSorin Basca...
487*0c56280aSSorin Bascail.append(f.createReturn(type));
488*0c56280aSSorin Basca      </source>
489*0c56280aSSorin Basca
490*0c56280aSSorin Basca    <h4>Code patterns using regular expressions</h4>
491*0c56280aSSorin Basca      <p>
492*0c56280aSSorin Basca        When transforming code, for instance during optimization or when
493*0c56280aSSorin Basca        inserting analysis method calls, one typically searches for
494*0c56280aSSorin Basca        certain patterns of code to perform the transformation at. To
495*0c56280aSSorin Basca        simplify handling such situations <font
496*0c56280aSSorin Basca              face="helvetica,arial">BCEL </font>introduces a special feature:
497*0c56280aSSorin Basca        One can search for given code patterns within an instruction list
498*0c56280aSSorin Basca        using <em>regular expressions</em>. In such expressions,
499*0c56280aSSorin Basca        instructions are represented by their opcode names, e.g.,
500*0c56280aSSorin Basca        <tt>LDC</tt>, one may also use their respective super classes, e.g.,
501*0c56280aSSorin Basca        "<tt>IfInstruction</tt>". Meta characters like <tt>+</tt>,
502*0c56280aSSorin Basca        <tt>*</tt>, and <tt>(..|..)</tt> have their usual meanings. Thus,
503*0c56280aSSorin Basca        the expression
504*0c56280aSSorin Basca      </p>
505*0c56280aSSorin Basca
506*0c56280aSSorin Basca      <source>"NOP+(ILOAD|ALOAD)*"</source>
507*0c56280aSSorin Basca
508*0c56280aSSorin Basca      <p>
509*0c56280aSSorin Basca        represents a piece of code consisting of at least one <tt>NOP</tt>
510*0c56280aSSorin Basca        followed by a possibly empty sequence of <tt>ILOAD</tt> and
511*0c56280aSSorin Basca        <tt>ALOAD</tt> instructions.
512*0c56280aSSorin Basca      </p>
513*0c56280aSSorin Basca
514*0c56280aSSorin Basca      <p>
515*0c56280aSSorin Basca        The <tt>search()</tt> method of class
516*0c56280aSSorin Basca        <tt>org.apache.bcel.util.InstructionFinder</tt> gets a regular
517*0c56280aSSorin Basca        expression and a starting point as arguments and returns an
518*0c56280aSSorin Basca        iterator describing the area of matched instructions. Additional
519*0c56280aSSorin Basca        constraints to the matching area of instructions, which can not be
520*0c56280aSSorin Basca        implemented via regular expressions, may be expressed via <em>code
521*0c56280aSSorin Basca        constraint</em> objects.
522*0c56280aSSorin Basca      </p>
523*0c56280aSSorin Basca
524*0c56280aSSorin Basca    <h4>Example: Optimizing boolean expressions</h4>
525*0c56280aSSorin Basca      <p>
526*0c56280aSSorin Basca        In Java, boolean values are mapped to 1 and to 0,
527*0c56280aSSorin Basca        respectively. Thus, the simplest way to evaluate boolean
528*0c56280aSSorin Basca        expressions is to push a 1 or a 0 onto the operand stack depending
529*0c56280aSSorin Basca        on the truth value of the expression. But this way, the
530*0c56280aSSorin Basca        subsequent combination of boolean expressions (with
531*0c56280aSSorin Basca        <tt>&amp;&amp;</tt>, e.g) yields long chunks of code that push
532*0c56280aSSorin Basca        lots of 1s and 0s onto the stack.
533*0c56280aSSorin Basca      </p>
534*0c56280aSSorin Basca
535*0c56280aSSorin Basca      <p>
536*0c56280aSSorin Basca        When the code has been finalized these chunks can be optimized
537*0c56280aSSorin Basca        with a <em>peep hole</em> algorithm: An <tt>IfInstruction</tt>
538*0c56280aSSorin Basca        (e.g.  the comparison of two integers: <tt>if_icmpeq</tt>) that
539*0c56280aSSorin Basca        either produces a 1 or a 0 on the stack and is followed by an
540*0c56280aSSorin Basca        <tt>ifne</tt> instruction (branch if stack value 0) may be
541*0c56280aSSorin Basca        replaced by the <tt>IfInstruction</tt> with its branch target
542*0c56280aSSorin Basca        replaced by the target of the <tt>ifne</tt> instruction:
543*0c56280aSSorin Basca      </p>
544*0c56280aSSorin Basca
545*0c56280aSSorin Basca      <source>
546*0c56280aSSorin BascaCodeConstraint constraint = new CodeConstraint() {
547*0c56280aSSorin Basca    public boolean checkCode(InstructionHandle[] match) {
548*0c56280aSSorin Basca        IfInstruction if1 = (IfInstruction) match[0].getInstruction();
549*0c56280aSSorin Basca        GOTO g = (GOTO) match[2].getInstruction();
550*0c56280aSSorin Basca        return (if1.getTarget() == match[3]) &amp;&amp;
551*0c56280aSSorin Basca            (g.getTarget() == match[4]);
552*0c56280aSSorin Basca    }
553*0c56280aSSorin Basca};
554*0c56280aSSorin Basca
555*0c56280aSSorin BascaInstructionFinder f = new InstructionFinder(il);
556*0c56280aSSorin BascaString pat = "IfInstruction ICONST_0 GOTO ICONST_1 NOP(IFEQ|IFNE)";
557*0c56280aSSorin Basca
558*0c56280aSSorin Bascafor (Iterator e = f.search(pat, constraint); e.hasNext(); ) {
559*0c56280aSSorin Basca    InstructionHandle[] match = (InstructionHandle[]) e.next();;
560*0c56280aSSorin Basca    ...
561*0c56280aSSorin Basca    match[0].setTarget(match[5].getTarget()); // Update target
562*0c56280aSSorin Basca    ...
563*0c56280aSSorin Basca    try {
564*0c56280aSSorin Basca        il.delete(match[1], match[5]);
565*0c56280aSSorin Basca    } catch (TargetLostException ex) {
566*0c56280aSSorin Basca        ...
567*0c56280aSSorin Basca    }
568*0c56280aSSorin Basca}
569*0c56280aSSorin Basca      </source>
570*0c56280aSSorin Basca
571*0c56280aSSorin Basca      <p>
572*0c56280aSSorin Basca        The applied code constraint object ensures that the matched code
573*0c56280aSSorin Basca        really corresponds to the targeted expression pattern. Subsequent
574*0c56280aSSorin Basca        application of this algorithm removes all unnecessary stack
575*0c56280aSSorin Basca        operations and branch instructions from the byte code. If any of
576*0c56280aSSorin Basca        the deleted instructions is still referenced by an
577*0c56280aSSorin Basca        <tt>InstructionTargeter</tt> object, the reference has to be
578*0c56280aSSorin Basca        updated in the <tt>catch</tt>-clause.
579*0c56280aSSorin Basca      </p>
580*0c56280aSSorin Basca
581*0c56280aSSorin Basca      <p>
582*0c56280aSSorin Basca        <b>Example application:</b>
583*0c56280aSSorin Basca        The expression:
584*0c56280aSSorin Basca      </p>
585*0c56280aSSorin Basca
586*0c56280aSSorin Basca      <source>
587*0c56280aSSorin Basca        if ((a == null) || (i &lt; 2))
588*0c56280aSSorin Basca        System.out.println("Ooops");
589*0c56280aSSorin Basca      </source>
590*0c56280aSSorin Basca
591*0c56280aSSorin Basca      <p>
592*0c56280aSSorin Basca        can be mapped to both of the chunks of byte code shown in <a
593*0c56280aSSorin Basca              href="#Figure 6">figure 6</a>. The left column represents the
594*0c56280aSSorin Basca        unoptimized code while the right column displays the same code
595*0c56280aSSorin Basca        after the peep hole algorithm has been applied:
596*0c56280aSSorin Basca      </p>
597*0c56280aSSorin Basca
598*0c56280aSSorin Basca      <p align="center"><a name="Figure 6">
599*0c56280aSSorin Basca        <table>
600*0c56280aSSorin Basca          <tr>
601*0c56280aSSorin Basca            <td valign="top"><pre>
602*0c56280aSSorin Basca              5:  aload_0
603*0c56280aSSorin Basca              6:  ifnull        #13
604*0c56280aSSorin Basca              9:  iconst_0
605*0c56280aSSorin Basca              10: goto          #14
606*0c56280aSSorin Basca              13: iconst_1
607*0c56280aSSorin Basca              14: nop
608*0c56280aSSorin Basca              15: ifne          #36
609*0c56280aSSorin Basca              18: iload_1
610*0c56280aSSorin Basca              19: iconst_2
611*0c56280aSSorin Basca              20: if_icmplt     #27
612*0c56280aSSorin Basca              23: iconst_0
613*0c56280aSSorin Basca              24: goto          #28
614*0c56280aSSorin Basca              27: iconst_1
615*0c56280aSSorin Basca              28: nop
616*0c56280aSSorin Basca              29: ifne          #36
617*0c56280aSSorin Basca              32: iconst_0
618*0c56280aSSorin Basca              33: goto          #37
619*0c56280aSSorin Basca              36: iconst_1
620*0c56280aSSorin Basca              37: nop
621*0c56280aSSorin Basca              38: ifeq          #52
622*0c56280aSSorin Basca              41: getstatic     System.out
623*0c56280aSSorin Basca              44: ldc           "Ooops"
624*0c56280aSSorin Basca              46: invokevirtual println
625*0c56280aSSorin Basca              52: return
626*0c56280aSSorin Basca            </pre></td>
627*0c56280aSSorin Basca            <td valign="top"><pre>
628*0c56280aSSorin Basca              10: aload_0
629*0c56280aSSorin Basca              11: ifnull        #19
630*0c56280aSSorin Basca              14: iload_1
631*0c56280aSSorin Basca              15: iconst_2
632*0c56280aSSorin Basca              16: if_icmpge     #27
633*0c56280aSSorin Basca              19: getstatic     System.out
634*0c56280aSSorin Basca              22: ldc           "Ooops"
635*0c56280aSSorin Basca              24: invokevirtual println
636*0c56280aSSorin Basca              27: return
637*0c56280aSSorin Basca            </pre></td>
638*0c56280aSSorin Basca          </tr>
639*0c56280aSSorin Basca        </table>
640*0c56280aSSorin Basca      </a>
641*0c56280aSSorin Basca      </p>
642*0c56280aSSorin Basca    </subsection>
643*0c56280aSSorin Basca    </section>
644*0c56280aSSorin Basca  </body>
645*0c56280aSSorin Basca</document>