1*055d4590SKeyi Gui<html> 2*055d4590SKeyi Gui<head> 3*055d4590SKeyi Gui <title>Dalvik Optimization and Verification</title> 4*055d4590SKeyi Gui</head> 5*055d4590SKeyi Gui 6*055d4590SKeyi Gui<body> 7*055d4590SKeyi Gui<h1>Dalvik Optimization and Verification With <i>dexopt</i></h1> 8*055d4590SKeyi Gui 9*055d4590SKeyi Gui<p> 10*055d4590SKeyi GuiThe Dalvik virtual machine was designed specifically for the Android 11*055d4590SKeyi Guimobile platform. The target systems have little RAM, store data on slow 12*055d4590SKeyi Guiinternal flash memory, and generally have the performance characteristics 13*055d4590SKeyi Guiof decade-old desktop systems. They also run Linux, which provides 14*055d4590SKeyi Guivirtual memory, processes and threads, and UID-based security mechanisms. 15*055d4590SKeyi Gui<p> 16*055d4590SKeyi GuiThe features and limitations caused us to focus on certain goals: 17*055d4590SKeyi Gui 18*055d4590SKeyi Gui<ul> 19*055d4590SKeyi Gui <li>Class data, notably bytecode, must be shared between multiple 20*055d4590SKeyi Gui processes to minimize total system memory usage. 21*055d4590SKeyi Gui <li>The overhead in launching a new app must be minimized to keep 22*055d4590SKeyi Gui the device responsive. 23*055d4590SKeyi Gui <li>Storing class data in individual files results in a lot of 24*055d4590SKeyi Gui redundancy, especially with respect to strings. To conserve disk 25*055d4590SKeyi Gui space we need to factor this out. 26*055d4590SKeyi Gui <li>Parsing class data fields adds unnecessary overhead during 27*055d4590SKeyi Gui class loading. Accessing data values (e.g. integers and strings) 28*055d4590SKeyi Gui directly as C types is better. 29*055d4590SKeyi Gui <li>Bytecode verification is necessary, but slow, so we want to verify 30*055d4590SKeyi Gui as much as possible outside app execution. 31*055d4590SKeyi Gui <li>Bytecode optimization (quickened instructions, method pruning) is 32*055d4590SKeyi Gui important for speed and battery life. 33*055d4590SKeyi Gui <li>For security reasons, processes may not edit shared code. 34*055d4590SKeyi Gui</ul> 35*055d4590SKeyi Gui 36*055d4590SKeyi Gui<p> 37*055d4590SKeyi GuiThe typical VM implementation uncompresses individual classes from a 38*055d4590SKeyi Guicompressed archive and stores them on the heap. This implies a separate 39*055d4590SKeyi Guicopy of each class in every process, and slows application startup because 40*055d4590SKeyi Guithe code must be uncompressed (or at least read off disk in many small 41*055d4590SKeyi Guipieces). On the other hand, having the bytecode on the local heap makes 42*055d4590SKeyi Guiit easy to rewrite instructions on first use, facilitating a number of 43*055d4590SKeyi Guidifferent optimizations. 44*055d4590SKeyi Gui<p> 45*055d4590SKeyi GuiThe goals led us to make some fundamental decisions: 46*055d4590SKeyi Gui 47*055d4590SKeyi Gui<ul> 48*055d4590SKeyi Gui <li>Multiple classes are aggregated into a single "DEX" file. 49*055d4590SKeyi Gui <li>DEX files are mapped read-only and shared between processes. 50*055d4590SKeyi Gui <li>Byte ordering and word alignment are adjusted to suit the local 51*055d4590SKeyi Gui system. 52*055d4590SKeyi Gui <li>Bytecode verification is mandatory for all classes, but we want 53*055d4590SKeyi Gui to "pre-verify" whatever we can. 54*055d4590SKeyi Gui <li>Optimizations that require rewriting bytecode must be done ahead 55*055d4590SKeyi Gui of time. 56*055d4590SKeyi Gui</ul> 57*055d4590SKeyi Gui 58*055d4590SKeyi Gui<p> 59*055d4590SKeyi GuiThe consequences of these decisions are explained in the following sections. 60*055d4590SKeyi Gui 61*055d4590SKeyi Gui 62*055d4590SKeyi Gui<h2>VM Operation</h2> 63*055d4590SKeyi Gui 64*055d4590SKeyi Gui<p> 65*055d4590SKeyi GuiApplication code is delivered to the system in a <code>.jar</code> 66*055d4590SKeyi Guior <code>.apk</code> file. These are really just <code>.zip</code> 67*055d4590SKeyi Guiarchives with some meta-data files added. The Dalvik DEX data file 68*055d4590SKeyi Guiis always called <code>classes.dex</code>. 69*055d4590SKeyi Gui<p> 70*055d4590SKeyi GuiThe bytecode cannot be memory-mapped and executed directly from the zip 71*055d4590SKeyi Guifile, because the data is compressed and the start of the file is not 72*055d4590SKeyi Guiguaranteed to be word-aligned. These problems could be addressed by 73*055d4590SKeyi Guistoring <code>classes.dex</code> without compression and padding out the zip 74*055d4590SKeyi Guifile, but that would increase the size of the package sent across the 75*055d4590SKeyi Guidata network. 76*055d4590SKeyi Gui<p> 77*055d4590SKeyi GuiWe need to extract <code>classes.dex</code> from the zip archive before 78*055d4590SKeyi Guiwe can use it. While we have the file available, we might as well perform 79*055d4590SKeyi Guisome of the other actions (realignment, optimization, verification) described 80*055d4590SKeyi Guiearlier. This raises a new question however: who is responsible for doing 81*055d4590SKeyi Guithis, and where do we keep the output? 82*055d4590SKeyi Gui 83*055d4590SKeyi Gui<h3>Preparation</h3> 84*055d4590SKeyi Gui 85*055d4590SKeyi Gui<p> 86*055d4590SKeyi GuiThere are at least three different ways to create a "prepared" DEX file, 87*055d4590SKeyi Guisometimes known as "ODEX" (for Optimized DEX): 88*055d4590SKeyi Gui<ol> 89*055d4590SKeyi Gui <li>The VM does it "just in time". The output goes into a special 90*055d4590SKeyi Gui <code>dalvik-cache</code> directory. This works on the desktop and 91*055d4590SKeyi Gui engineering-only device builds where the permissions on the 92*055d4590SKeyi Gui <code>dalvik-cache</code> directory are not restricted. On production 93*055d4590SKeyi Gui devices, this is not allowed. 94*055d4590SKeyi Gui <li>The system installer does it when an application is first added. 95*055d4590SKeyi Gui It has the privileges required to write to <code>dalvik-cache</code>. 96*055d4590SKeyi Gui <li>The build system does it ahead of time. The relevant <code>jar</code> 97*055d4590SKeyi Gui / <code>apk</code> files are present, but the <code>classes.dex</code> 98*055d4590SKeyi Gui is stripped out. The optimized DEX is stored next to the original 99*055d4590SKeyi Gui zip archive, not in <code>dalvik-cache</code>, and is part of the 100*055d4590SKeyi Gui system image. 101*055d4590SKeyi Gui</ol> 102*055d4590SKeyi Gui<p> 103*055d4590SKeyi GuiThe <code>dalvik-cache</code> directory is more accurately 104*055d4590SKeyi Gui<code>$ANDROID_DATA/data/dalvik-cache</code>. The files inside it have 105*055d4590SKeyi Guinames derived from the full path of the source DEX. On the device the 106*055d4590SKeyi Guidirectory is owned by <code>system</code> / <code>system</code> 107*055d4590SKeyi Guiand has 0771 permissions, and the optimized DEX files stored there are 108*055d4590SKeyi Guiowned by <code>system</code> and the 109*055d4590SKeyi Guiapplication's group, with 0644 permissions. DRM-locked applications will 110*055d4590SKeyi Guiuse 640 permissions to prevent other user applications from examining them. 111*055d4590SKeyi GuiThe bottom line is that you can read your own DEX file and those of most 112*055d4590SKeyi Guiother applications, but you cannot create, modify, or remove them. 113*055d4590SKeyi Gui<p> 114*055d4590SKeyi GuiPreparation of the DEX file for the "just in time" and "system installer" 115*055d4590SKeyi Guiapproaches proceeds in three steps: 116*055d4590SKeyi Gui<p> 117*055d4590SKeyi GuiFirst, the dalvik-cache file is created. This must be done in a process 118*055d4590SKeyi Guiwith appropriate privileges, so for the "system installer" case this is 119*055d4590SKeyi Guidone within <code>installd</code>, which runs as root. 120*055d4590SKeyi Gui<p> 121*055d4590SKeyi GuiSecond, the <code>classes.dex</code> entry is extracted from the the zip 122*055d4590SKeyi Guiarchive. A small amount of space is left at the start of the file for 123*055d4590SKeyi Guithe ODEX header. 124*055d4590SKeyi Gui<p> 125*055d4590SKeyi GuiThird, the file is memory-mapped for easy access and tweaked for use on 126*055d4590SKeyi Guithe current system. This includes byte-swapping and structure realigning, 127*055d4590SKeyi Guibut no meaningful changes to the DEX file. We also do some basic 128*055d4590SKeyi Guistructure checks, such as ensuring that file offsets and data indices 129*055d4590SKeyi Guifall within valid ranges. 130*055d4590SKeyi Gui<p> 131*055d4590SKeyi GuiThe build system uses a hairy process that involves starting the 132*055d4590SKeyi Guiemulator, forcing just-in-time optimization of all relevant DEX files, 133*055d4590SKeyi Guiand then extracting the results from <code>dalvik-cache</code>. The 134*055d4590SKeyi Guireasons for doing this, rather than using a tool that runs on the desktop, 135*055d4590SKeyi Guiwill become more apparent when the optimizations are explained. 136*055d4590SKeyi Gui<p> 137*055d4590SKeyi GuiOnce the code is byte-swapped and aligned, we're ready to go. We append 138*055d4590SKeyi Guisome pre-computed data, fill in the ODEX header at the start of the file, 139*055d4590SKeyi Guiand start executing. (The header is filled in last, so that we don't 140*055d4590SKeyi Guitry to use a partial file.) If we're interested in verification and 141*055d4590SKeyi Guioptimization, however, we need to insert a step after the initial prep. 142*055d4590SKeyi Gui 143*055d4590SKeyi Gui<h3>dexopt</h3> 144*055d4590SKeyi Gui 145*055d4590SKeyi Gui<p> 146*055d4590SKeyi GuiWe want to verify and optimize all of the classes in the DEX file. The 147*055d4590SKeyi Guieasiest and safest way to do this is to load all of the classes into 148*055d4590SKeyi Guithe VM and run through them. Anything that fails to load is simply not 149*055d4590SKeyi Guiverified or optimized. Unfortunately, this can cause allocation of some 150*055d4590SKeyi Guiresources that are difficult to release (e.g. loading of native shared 151*055d4590SKeyi Guilibraries), so we don't want to do it in the same virtual machine that 152*055d4590SKeyi Guiwe're running applications in. 153*055d4590SKeyi Gui<p> 154*055d4590SKeyi GuiThe solution is to invoke a program called <code>dexopt</code>, which 155*055d4590SKeyi Guiis really just a back door into the VM. It performs an abbreviated VM 156*055d4590SKeyi Guiinitialization, loads zero or more DEX files from the bootstrap class 157*055d4590SKeyi Guipath, and then sets about verifying and optimizing whatever it can from 158*055d4590SKeyi Guithe target DEX. On completion, the process exits, freeing all resources. 159*055d4590SKeyi Gui<p> 160*055d4590SKeyi GuiIt is possible for multiple VMs to want the same DEX file at the same 161*055d4590SKeyi Guitime. File locking is used to ensure that dexopt is only run once. 162*055d4590SKeyi Gui 163*055d4590SKeyi Gui 164*055d4590SKeyi Gui<h2>Verification</h2> 165*055d4590SKeyi Gui 166*055d4590SKeyi Gui<p> 167*055d4590SKeyi GuiThe bytecode verification process involves scanning through the instructions 168*055d4590SKeyi Guiin every method in every class in a DEX file. The goal is to identify 169*055d4590SKeyi Guiillegal instruction sequences so that we don't have to check for them at 170*055d4590SKeyi Guirun time. Many of the computations involved are also necessary for "exact" 171*055d4590SKeyi Guigarbage collection. See 172*055d4590SKeyi Gui<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> for more 173*055d4590SKeyi Guiinformation. 174*055d4590SKeyi Gui<p> 175*055d4590SKeyi GuiFor performance reasons, the optimizer (described in the next section) 176*055d4590SKeyi Guiassumes that the verifier has run successfully, and makes some potentially 177*055d4590SKeyi Guiunsafe assumptions. By default, Dalvik insists upon verifying all classes, 178*055d4590SKeyi Guiand only optimizes classes that have been verified. If you want to 179*055d4590SKeyi Guidisable the verifier, you can use command-line flags to do so. See also 180*055d4590SKeyi Gui<a href="embedded-vm-control.html"> Controlling the Embedded VM</a> 181*055d4590SKeyi Guifor instructions on controlling these 182*055d4590SKeyi Guifeatures within the Android application framework. 183*055d4590SKeyi Gui<p> 184*055d4590SKeyi GuiReporting of verification failures is a tricky issue. For example, 185*055d4590SKeyi Guicalling a package-scope method on a class in a different package is 186*055d4590SKeyi Guiillegal and will be caught by the verifier. We don't necessarily want 187*055d4590SKeyi Guito report it during verification though -- we actually want to throw 188*055d4590SKeyi Guian exception when the method call is attempted. Checking the access 189*055d4590SKeyi Guiflags on every method call is expensive though. The 190*055d4590SKeyi Gui<a href="verifier.html">Dalvik Bytecode Verifier Notes</a> document 191*055d4590SKeyi Guiaddresses this issue. 192*055d4590SKeyi Gui<p> 193*055d4590SKeyi GuiClasses that have been verified successfully have a flag set in the ODEX. 194*055d4590SKeyi GuiThey will not be re-verified when loaded. The Linux access permissions 195*055d4590SKeyi Guiare expected to prevent tampering; if you can get around those, installing 196*055d4590SKeyi Guifaulty bytecode is far from the easiest line of attack. The ODEX file has 197*055d4590SKeyi Guia 32-bit checksum, but that's chiefly present as a quick check for 198*055d4590SKeyi Guicorrupted data. 199*055d4590SKeyi Gui 200*055d4590SKeyi Gui 201*055d4590SKeyi Gui<h2>Optimization</h2> 202*055d4590SKeyi Gui 203*055d4590SKeyi Gui<p> 204*055d4590SKeyi GuiVirtual machine interpreters typically perform certain optimizations the 205*055d4590SKeyi Guifirst time a piece of code is used. Constant pool references are replaced 206*055d4590SKeyi Guiwith pointers to internal data structures, operations that always succeed 207*055d4590SKeyi Guior always work a certain way are replaced with simpler forms. Some of 208*055d4590SKeyi Guithese require information only available at runtime, others can be inferred 209*055d4590SKeyi Guistatically when certain assumptions are made. 210*055d4590SKeyi Gui<p> 211*055d4590SKeyi GuiThe Dalvik optimizer does the following: 212*055d4590SKeyi Gui<ul> 213*055d4590SKeyi Gui <li>For virtual method calls, replace the method index with a 214*055d4590SKeyi Gui vtable index. 215*055d4590SKeyi Gui <li>For instance field get/put, replace the field index with 216*055d4590SKeyi Gui a byte offset. Also, merge the boolean / byte / char / short 217*055d4590SKeyi Gui variants into a single 32-bit form (less code in the interpreter 218*055d4590SKeyi Gui means more room in the CPU I-cache). 219*055d4590SKeyi Gui <li>Replace a handful of high-volume calls, like String.length(), 220*055d4590SKeyi Gui with "inline" replacements. This skips the usual method call 221*055d4590SKeyi Gui overhead, directly switching from the interpreter to a native 222*055d4590SKeyi Gui implementation. 223*055d4590SKeyi Gui <li>Prune empty methods. The simplest example is 224*055d4590SKeyi Gui <code>Object.<init></code>, which does nothing, but must be 225*055d4590SKeyi Gui called whenever any object is allocated. The instruction is 226*055d4590SKeyi Gui replaced with a new version that acts as a no-op unless a debugger 227*055d4590SKeyi Gui is attached. 228*055d4590SKeyi Gui <li>Append pre-computed data. For example, the VM wants to have a 229*055d4590SKeyi Gui hash table for lookups on class name. Instead of computing this 230*055d4590SKeyi Gui when the DEX file is loaded, we can compute it now, saving heap 231*055d4590SKeyi Gui space and computation time in every VM where the DEX is loaded. 232*055d4590SKeyi Gui</ul> 233*055d4590SKeyi Gui 234*055d4590SKeyi Gui<p> 235*055d4590SKeyi GuiAll of the instruction modifications involve replacing the opcode with 236*055d4590SKeyi Guione not defined by the Dalvik specification. This allows us to freely 237*055d4590SKeyi Guimix optimized and unoptimized instructions. The set of optimized 238*055d4590SKeyi Guiinstructions, and their exact representation, is tied closely to the VM 239*055d4590SKeyi Guiversion. 240*055d4590SKeyi Gui<p> 241*055d4590SKeyi GuiMost of the optimizations are obvious "wins". The use of raw indices 242*055d4590SKeyi Guiand offsets not only allows us to execute more quickly, we can also 243*055d4590SKeyi Guiskip the initial symbolic resolution. Pre-computation eats up 244*055d4590SKeyi Guidisk space, and so must be done in moderation. 245*055d4590SKeyi Gui<p> 246*055d4590SKeyi GuiThere are a couple of potential sources of trouble with these 247*055d4590SKeyi Guioptimizations. First, vtable indices and byte offsets are subject to 248*055d4590SKeyi Guichange if the VM is updated. Second, if a superclass is in a different 249*055d4590SKeyi GuiDEX, and that other DEX is updated, we need to ensure that our optimized 250*055d4590SKeyi Guiindices and offsets are updated as well. A similar but more subtle 251*055d4590SKeyi Guiproblem emerges when user-defined class loaders are employed: the class 252*055d4590SKeyi Guiwe actually call may not be the one we expected to call. 253*055d4590SKeyi Gui<p>These problems are addressed with dependency lists and some limitations 254*055d4590SKeyi Guion what can be optimized. 255*055d4590SKeyi Gui 256*055d4590SKeyi Gui 257*055d4590SKeyi Gui<h2>Dependencies and Limitations</h2> 258*055d4590SKeyi Gui 259*055d4590SKeyi Gui<p> 260*055d4590SKeyi GuiThe optimized DEX file includes a list of dependencies on other DEX files, 261*055d4590SKeyi Guiplus the CRC-32 and modification date from the originating 262*055d4590SKeyi Gui<code>classes.dex</code> zip file entry. The dependency list includes the 263*055d4590SKeyi Guifull path to the <code>dalvik-cache</code> file, and the file's SHA-1 264*055d4590SKeyi Guisignature. The timestamps of files on the device are unreliable and 265*055d4590SKeyi Guinot used. The dependency area also includes the VM version number. 266*055d4590SKeyi Gui<p> 267*055d4590SKeyi GuiAn optimized DEX is dependent upon all of the DEX files in the bootstrap 268*055d4590SKeyi Guiclass path. DEX files that are part of the bootstrap class path depend 269*055d4590SKeyi Guiupon the DEX files that appeared earlier. To ensure that nothing outside 270*055d4590SKeyi Guithe dependent DEX files is available, <code>dexopt</code> only loads the 271*055d4590SKeyi Guibootstrap classes. References to classes in other DEX files fail, which 272*055d4590SKeyi Guicauses class loading and/or verification to fail, and classes with 273*055d4590SKeyi Guiexternal dependencies are simply not optimized. 274*055d4590SKeyi Gui<p> 275*055d4590SKeyi GuiThis means that splitting code out into many separate DEX files has a 276*055d4590SKeyi Guidisadvantage: virtual method calls and instance field lookups between 277*055d4590SKeyi Guinon-boot DEX files can't be optimized. Because verification is pass/fail 278*055d4590SKeyi Guiwith class granularity, no method in a class that has any reliance on 279*055d4590SKeyi Guiclasses in external DEX files can be optimized. This may be a bit 280*055d4590SKeyi Guiheavy-handed, but it's the only way to guarantee that nothing breaks 281*055d4590SKeyi Guiwhen individual pieces are updated. 282*055d4590SKeyi Gui<p> 283*055d4590SKeyi GuiAnother negative consequence: any change to a bootstrap DEX will result 284*055d4590SKeyi Guiin rejection of all optimized DEX files. This makes it hard to keep 285*055d4590SKeyi Guisystem updates small. 286*055d4590SKeyi Gui<p> 287*055d4590SKeyi GuiDespite our caution, there is still a possibility that a class in a DEX 288*055d4590SKeyi Guifile loaded by a user-defined class loader could ask for a bootstrap class 289*055d4590SKeyi Gui(say, String) and be given a different class with the same name. If a 290*055d4590SKeyi Guiclass in the DEX file being processed has the same name as a class in the 291*055d4590SKeyi Guibootstrap DEX files, the class will be flagged as ambiguous and references 292*055d4590SKeyi Guito it will not be resolved during verification / optimization. The class 293*055d4590SKeyi Guilinking code in the VM does additional checks to plug another hole; 294*055d4590SKeyi Guisee the verbose description in the VM sources for details (vm/oo/Class.c). 295*055d4590SKeyi Gui<p> 296*055d4590SKeyi GuiIf one of the dependencies is updated, we need to re-verify and 297*055d4590SKeyi Guire-optimize the DEX file. If we can do a just-in-time <code>dexopt</code> 298*055d4590SKeyi Guiinvocation, this is easy. If we have to rely on the installer daemon, or 299*055d4590SKeyi Guithe DEX was shipped only in ODEX, then the VM has to reject the DEX. 300*055d4590SKeyi Gui<p> 301*055d4590SKeyi GuiThe output of <code>dexopt</code> is byte-swapped and struct-aligned 302*055d4590SKeyi Guifor the host, and contains indices and offsets that are highly VM-specific 303*055d4590SKeyi Gui(both version-wise and platform-wise). For this reason it's tricky to 304*055d4590SKeyi Guiwrite a version of <code>dexopt</code> that runs on the desktop but 305*055d4590SKeyi Guigenerates output suitable for a particular device. The safest way to 306*055d4590SKeyi Guiinvoke it is on the target device, or on an emulator for that device. 307*055d4590SKeyi Gui 308*055d4590SKeyi Gui 309*055d4590SKeyi Gui<h2>Generated DEX</h2> 310*055d4590SKeyi Gui 311*055d4590SKeyi Gui<p> 312*055d4590SKeyi GuiSome languages and frameworks rely on the ability to generate bytecode 313*055d4590SKeyi Guiand execute it. The rather heavy <code>dexopt</code> verification and 314*055d4590SKeyi Guioptimization model doesn't work well with that. 315*055d4590SKeyi Gui<p> 316*055d4590SKeyi GuiWe intend to support this in a future release, but the exact method is 317*055d4590SKeyi Guito be determined. We may allow individual classes to be added or whole 318*055d4590SKeyi GuiDEX files; may allow Java bytecode or Dalvik bytecode in instructions; 319*055d4590SKeyi Guimay perform the usual set of optimizations, or use a separate interpreter 320*055d4590SKeyi Guithat performs on-first-use optimizations directly on the bytecode (which 321*055d4590SKeyi Guiwon't be mapped read-only, since it's locally defined). 322*055d4590SKeyi Gui 323*055d4590SKeyi Gui<address>Copyright © 2008 The Android Open Source Project</address> 324*055d4590SKeyi Gui 325*055d4590SKeyi Gui</body> 326*055d4590SKeyi Gui</html> 327