1*1789df15SXin LiNotes on WST StructuredDocument 2*1789df15SXin Li------------------------------- 3*1789df15SXin Li 4*1789df15SXin LiCreated: 2010/11/26 5*1789df15SXin LiReferences: WST 3.1.x, Eclipse 3.5 Galileo 6*1789df15SXin Li 7*1789df15SXin LiTo manipulate XML documents in refactorings, we sometimes use the WST/SEE 8*1789df15SXin Li"StructuredDocument" API. There isn't exactly a lot of documentation on 9*1789df15SXin Lithis out there, so this is a short explanation of how it works, totally 10*1789df15SXin Libased on _empirical_ evidence. As such, it must be taken with a grain of salt. 11*1789df15SXin Li 12*1789df15SXin LiExamples of usage can be found in 13*1789df15SXin Li sdk/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/refactorings/ 14*1789df15SXin Li 15*1789df15SXin Li 16*1789df15SXin Li1- Get a document instance 17*1789df15SXin Li-------------------------- 18*1789df15SXin Li 19*1789df15SXin LiTo get a document from an existing IFile resource: 20*1789df15SXin Li 21*1789df15SXin Li IModelManager modelMan = StructuredModelManager.getModelManager(); 22*1789df15SXin Li IStructuredDocument sdoc = modelMan.createStructuredDocumentFor(file); 23*1789df15SXin Li 24*1789df15SXin LiNote that the IStructuredDocument and all the associated interfaces we'll use 25*1789df15SXin Libelow are all located in org.eclipse.wst.sse.core.internal.provisional, 26*1789df15SXin Limeaning they _might_ change later. 27*1789df15SXin Li 28*1789df15SXin LiAlso note that this parses the content of the file on disk, not of a buffer 29*1789df15SXin Liwith pending unsaved modifications opened in an editor. 30*1789df15SXin Li 31*1789df15SXin LiThere is a counterpart for non-existent resources: 32*1789df15SXin Li 33*1789df15SXin Li IModelManager.createNewStructuredDocumentFor(IFile) 34*1789df15SXin Li 35*1789df15SXin LiHowever our goal so far has been to _parse_ existing documents, find 36*1789df15SXin Lithe place that we wanted to modify and then generate a TextFileChange 37*1789df15SXin Lifor a refactoring operation. Consequently this document doesn't say 38*1789df15SXin Lianything about using this model to modify content directly. 39*1789df15SXin Li 40*1789df15SXin Li 41*1789df15SXin Li2- Structured Document overview 42*1789df15SXin Li------------------------------- 43*1789df15SXin Li 44*1789df15SXin LiThe IStructuredDocument is organized in "regions", which are little pieces 45*1789df15SXin Liof text. 46*1789df15SXin Li 47*1789df15SXin LiThe document contains a list of region collections, each one being 48*1789df15SXin Lia list of regions. Each region has a type, as well as text. 49*1789df15SXin Li 50*1789df15SXin LiSince we use this to parse XML, let's look at this XML example: 51*1789df15SXin Li 52*1789df15SXin Li<?xml version="1.0" encoding="utf-8"?> \n 53*1789df15SXin Li<resource> \n 54*1789df15SXin Li <color/> 55*1789df15SXin Li <string name="my_string">Some Value</string> <!-- comment -->\n 56*1789df15SXin Li</resource> 57*1789df15SXin Li 58*1789df15SXin Li 59*1789df15SXin LiThis will result in the following regions and sub-regions: 60*1789df15SXin Li(all the constants below are located in DOMRegionContext) 61*1789df15SXin Li 62*1789df15SXin LiXML_PI_OPEN 63*1789df15SXin Li XML_PI_OPEN:<? 64*1789df15SXin Li XML_TAG_NAME:xml 65*1789df15SXin Li XML_TAG_ATTRIBUTE_NAME:version 66*1789df15SXin Li XML_TAG_ATTRIBUTE_EQUALS:= 67*1789df15SXin Li XML_TAG_ATTRIBUTE_VALUE:"1.0" 68*1789df15SXin Li XML_TAG_ATTRIBUTE_NAME:encoding 69*1789df15SXin Li XML_TAG_ATTRIBUTE_EQUALS:= 70*1789df15SXin Li XML_TAG_ATTRIBUTE_VALUE:"utf-8" 71*1789df15SXin Li XML_PI_CLOSE:?> 72*1789df15SXin Li 73*1789df15SXin LiXML_CONTENT 74*1789df15SXin Li XML_CONTENT:\n 75*1789df15SXin Li 76*1789df15SXin LiXML_TAG_NAME 77*1789df15SXin Li XML_TAG_OPEN:< 78*1789df15SXin Li XML_TAG_NAME:resources 79*1789df15SXin Li XML_TAG_CLOSE:> 80*1789df15SXin Li 81*1789df15SXin LiXML_CONTENT 82*1789df15SXin Li XML_CONTENT:\n + whitespace before color 83*1789df15SXin Li 84*1789df15SXin LiXML_TAG_NAME 85*1789df15SXin Li XML_TAG_OPEN:< 86*1789df15SXin Li XML_TAG_NAME:color 87*1789df15SXin Li XML_EMPTY_TAG_CLOSE:/> 88*1789df15SXin Li 89*1789df15SXin LiXML_CONTENT 90*1789df15SXin Li XML_CONTENT:\n + whitespace before string 91*1789df15SXin Li 92*1789df15SXin LiXML_TAG_NAME 93*1789df15SXin Li XML_TAG_OPEN:< 94*1789df15SXin Li XML_TAG_NAME:string 95*1789df15SXin Li XML_TAG_ATTRIBUTE_NAME:name 96*1789df15SXin Li XML_TAG_ATTRIBUTE_EQUALS:= 97*1789df15SXin Li XML_TAG_ATTRIBUTE_VALUE:"my_string" 98*1789df15SXin Li XML_TAG_CLOSE:> 99*1789df15SXin Li 100*1789df15SXin LiXML_CONTENT 101*1789df15SXin Li XML_CONTENT:Some Value 102*1789df15SXin Li 103*1789df15SXin LiXML_TAG_NAME 104*1789df15SXin Li XML_END_TAG_OPEN:</ 105*1789df15SXin Li XML_TAG_NAME:string 106*1789df15SXin Li XML_TAG_CLOSE:> 107*1789df15SXin Li 108*1789df15SXin LiXML_CONTENT 109*1789df15SXin Li XML_CONTENT: (2 spaces before the comment) 110*1789df15SXin Li 111*1789df15SXin LiXML_COMMENT_TEXT 112*1789df15SXin Li XML_COMMENT_OPEN:<!-- 113*1789df15SXin Li XML_COMMENT_TEXT: comment 114*1789df15SXin Li XML_COMMENT_CLOSE:-- 115*1789df15SXin Li 116*1789df15SXin LiXML_CONTENT 117*1789df15SXin Li XML_CONTENT: \n after comment 118*1789df15SXin Li 119*1789df15SXin LiXML_TAG_NAME 120*1789df15SXin Li XML_END_TAG_OPEN:</ 121*1789df15SXin Li XML_TAG_NAME:resources 122*1789df15SXin Li XML_TAG_CLOSE:> 123*1789df15SXin Li 124*1789df15SXin LiXML_CONTENT 125*1789df15SXin Li XML_CONTENT: 126*1789df15SXin Li 127*1789df15SXin Li 128*1789df15SXin Li3- Iterating through regions 129*1789df15SXin Li---------------------------- 130*1789df15SXin Li 131*1789df15SXin LiTo iterate through all regions, we need to process the list of top-level regions and then 132*1789df15SXin Liiterate over inner regions: 133*1789df15SXin Li 134*1789df15SXin Li for (IStructuredDocumentRegion regions : sdoc.getStructuredDocumentRegions()) { 135*1789df15SXin Li // process inner regions 136*1789df15SXin Li for (int i = 0; i < regions.getNumberOfRegions(); i++) { 137*1789df15SXin Li ITextRegion region = regions.getRegions().get(i); 138*1789df15SXin Li String type = region.getType(); 139*1789df15SXin Li String text = regions.getText(region); 140*1789df15SXin Li } 141*1789df15SXin Li } 142*1789df15SXin Li 143*1789df15SXin LiEach "region collection" basically matches one XML tag, with sub-regions for all the tokens 144*1789df15SXin Liinside a tag. 145*1789df15SXin Li 146*1789df15SXin LiNote that an XML_CONTENT region is actually the whitespace, was is known as a TEXT in the w3c DOM. 147*1789df15SXin Li 148*1789df15SXin LiAlso note that each outer region has a type, but the inner regions also reuse a similar type. 149*1789df15SXin LiSo for example an outer XML_TAG_NAME region collection is a proper XML tag, and it will contain 150*1789df15SXin Lian opening tag, a closing tag but also an XML_TAG_NAME that is the tag name itself. 151*1789df15SXin Li 152*1789df15SXin LiSurprisingly, the inner regions do not have many access methods we can use on them, except their 153*1789df15SXin Litype and start/length/end. There are two length and end methods: 154*1789df15SXin Li- getLength() and getEnd() take any whitespace into account. 155*1789df15SXin Li- getTextLength() and getTextEnd() exclude some typical trailing whitespace. 156*1789df15SXin Li 157*1789df15SXin LiNote that regarding the trailing whitespace, empirical evidence shows that in the XML case 158*1789df15SXin Lihere, the only case where it matters is in a tag such as <string name="my_string">: for the 159*1789df15SXin LiXML_TAG_NAME region, getLength is 7 (string + space) and getTextLength is 6 (string, no space). 160*1789df15SXin LiSpacing between XML element is its own collapsed region. 161*1789df15SXin Li 162*1789df15SXin LiIf you want the text of the inner region, you actually need to query it from the outer region. 163*1789df15SXin LiThe outer IStructuredDocumentRegion (the region collection) contains lots more useful access 164*1789df15SXin Limethods, some of which return details on the inner regions: 165*1789df15SXin Li- getText : without the whitespace. 166*1789df15SXin Li- getFullText : with the whitespace. 167*1789df15SXin Li- getStart / getLength / getEnd : type-dependent offset, including whitespace. 168*1789df15SXin Li- getStart / getTextLength / getTextEnd : type-dependent offset, excluding "irrelevant" whitespace. 169*1789df15SXin Li- getStartOffset / getEndOffset / getTextEndOffset : relative to document. 170*1789df15SXin Li 171*1789df15SXin LiEmpirical evidence shows that there is no discernible difference between the getStart/getEnd 172*1789df15SXin Livalues and those returned by getStartOffset/getEndOffset. Please abide by the javadoc. 173*1789df15SXin Li 174*1789df15SXin LiAll offsets start at zero. 175*1789df15SXin Li 176*1789df15SXin LiGiven a region collection, you can also browse regions either using a getRegions() list, or 177*1789df15SXin Liusing getFirst/getLastRegion, or using getRegionAtCharacterOffset(). Iterating the region 178*1789df15SXin Lilist seems the most useful scenario. There's no actual iterator provided for inner regions. 179*1789df15SXin Li 180*1789df15SXin LiThere are a few other methods available in the regions classes. This was not an exhaustive list. 181*1789df15SXin Li 182*1789df15SXin Li 183*1789df15SXin Li---- 184