xref: /aosp_15_r20/sdk/docs/Notes_on_WST_StructuredDocument.txt (revision 1789df15502f1991eff51ff970dce5df8404dd56)
1*1789df15SXin LiNotes on WST StructuredDocument
2*1789df15SXin Li-------------------------------
3*1789df15SXin Li
4*1789df15SXin LiCreated:    2010/11/26
5*1789df15SXin LiReferences: WST 3.1.x, Eclipse 3.5 Galileo
6*1789df15SXin Li
7*1789df15SXin LiTo manipulate XML documents in refactorings, we sometimes use the WST/SEE
8*1789df15SXin Li"StructuredDocument" API. There isn't exactly a lot of documentation on
9*1789df15SXin Lithis out there, so this is a short explanation of how it works, totally
10*1789df15SXin Libased on _empirical_ evidence. As such, it must be taken with a grain of salt.
11*1789df15SXin Li
12*1789df15SXin LiExamples of usage can be found in
13*1789df15SXin Li  sdk/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/refactorings/
14*1789df15SXin Li
15*1789df15SXin Li
16*1789df15SXin Li1- Get a document instance
17*1789df15SXin Li--------------------------
18*1789df15SXin Li
19*1789df15SXin LiTo get a document from an existing IFile resource:
20*1789df15SXin Li
21*1789df15SXin Li    IModelManager modelMan = StructuredModelManager.getModelManager();
22*1789df15SXin Li    IStructuredDocument sdoc = modelMan.createStructuredDocumentFor(file);
23*1789df15SXin Li
24*1789df15SXin LiNote that the IStructuredDocument and all the associated interfaces we'll use
25*1789df15SXin Libelow are all located in org.eclipse.wst.sse.core.internal.provisional,
26*1789df15SXin Limeaning they _might_ change later.
27*1789df15SXin Li
28*1789df15SXin LiAlso note that this parses the content of the file on disk, not of a buffer
29*1789df15SXin Liwith pending unsaved modifications opened in an editor.
30*1789df15SXin Li
31*1789df15SXin LiThere is a counterpart for non-existent resources:
32*1789df15SXin Li
33*1789df15SXin Li    IModelManager.createNewStructuredDocumentFor(IFile)
34*1789df15SXin Li
35*1789df15SXin LiHowever our goal so far has been to _parse_ existing documents, find
36*1789df15SXin Lithe place that we wanted to modify and then generate a TextFileChange
37*1789df15SXin Lifor a refactoring operation. Consequently this document doesn't say
38*1789df15SXin Lianything about using this model to modify content directly.
39*1789df15SXin Li
40*1789df15SXin Li
41*1789df15SXin Li2- Structured Document overview
42*1789df15SXin Li-------------------------------
43*1789df15SXin Li
44*1789df15SXin LiThe IStructuredDocument is organized in "regions", which are little pieces
45*1789df15SXin Liof text.
46*1789df15SXin Li
47*1789df15SXin LiThe document contains a list of region collections, each one being
48*1789df15SXin Lia list of regions. Each region has a type, as well as text.
49*1789df15SXin Li
50*1789df15SXin LiSince we use this to parse XML, let's look at this XML example:
51*1789df15SXin Li
52*1789df15SXin Li<?xml version="1.0" encoding="utf-8"?> \n
53*1789df15SXin Li<resource> \n
54*1789df15SXin Li    <color/>
55*1789df15SXin Li    <string name="my_string">Some Value</string>  <!-- comment -->\n
56*1789df15SXin Li</resource>
57*1789df15SXin Li
58*1789df15SXin Li
59*1789df15SXin LiThis will result in the following regions and sub-regions:
60*1789df15SXin Li(all the constants below are located in DOMRegionContext)
61*1789df15SXin Li
62*1789df15SXin LiXML_PI_OPEN
63*1789df15SXin Li    XML_PI_OPEN:<?
64*1789df15SXin Li    XML_TAG_NAME:xml
65*1789df15SXin Li    XML_TAG_ATTRIBUTE_NAME:version
66*1789df15SXin Li    XML_TAG_ATTRIBUTE_EQUALS:=
67*1789df15SXin Li    XML_TAG_ATTRIBUTE_VALUE:"1.0"
68*1789df15SXin Li    XML_TAG_ATTRIBUTE_NAME:encoding
69*1789df15SXin Li    XML_TAG_ATTRIBUTE_EQUALS:=
70*1789df15SXin Li    XML_TAG_ATTRIBUTE_VALUE:"utf-8"
71*1789df15SXin Li    XML_PI_CLOSE:?>
72*1789df15SXin Li
73*1789df15SXin LiXML_CONTENT
74*1789df15SXin Li    XML_CONTENT:\n
75*1789df15SXin Li
76*1789df15SXin LiXML_TAG_NAME
77*1789df15SXin Li    XML_TAG_OPEN:<
78*1789df15SXin Li    XML_TAG_NAME:resources
79*1789df15SXin Li    XML_TAG_CLOSE:>
80*1789df15SXin Li
81*1789df15SXin LiXML_CONTENT
82*1789df15SXin Li    XML_CONTENT:\n + whitespace before color
83*1789df15SXin Li
84*1789df15SXin LiXML_TAG_NAME
85*1789df15SXin Li    XML_TAG_OPEN:<
86*1789df15SXin Li    XML_TAG_NAME:color
87*1789df15SXin Li    XML_EMPTY_TAG_CLOSE:/>
88*1789df15SXin Li
89*1789df15SXin LiXML_CONTENT
90*1789df15SXin Li    XML_CONTENT:\n + whitespace before string
91*1789df15SXin Li
92*1789df15SXin LiXML_TAG_NAME
93*1789df15SXin Li    XML_TAG_OPEN:<
94*1789df15SXin Li    XML_TAG_NAME:string
95*1789df15SXin Li    XML_TAG_ATTRIBUTE_NAME:name
96*1789df15SXin Li    XML_TAG_ATTRIBUTE_EQUALS:=
97*1789df15SXin Li    XML_TAG_ATTRIBUTE_VALUE:"my_string"
98*1789df15SXin Li    XML_TAG_CLOSE:>
99*1789df15SXin Li
100*1789df15SXin LiXML_CONTENT
101*1789df15SXin Li    XML_CONTENT:Some Value
102*1789df15SXin Li
103*1789df15SXin LiXML_TAG_NAME
104*1789df15SXin Li    XML_END_TAG_OPEN:</
105*1789df15SXin Li    XML_TAG_NAME:string
106*1789df15SXin Li    XML_TAG_CLOSE:>
107*1789df15SXin Li
108*1789df15SXin LiXML_CONTENT
109*1789df15SXin Li    XML_CONTENT: (2 spaces before the comment)
110*1789df15SXin Li
111*1789df15SXin LiXML_COMMENT_TEXT
112*1789df15SXin Li    XML_COMMENT_OPEN:<!--
113*1789df15SXin Li    XML_COMMENT_TEXT: comment
114*1789df15SXin Li    XML_COMMENT_CLOSE:--
115*1789df15SXin Li
116*1789df15SXin LiXML_CONTENT
117*1789df15SXin Li    XML_CONTENT: \n after comment
118*1789df15SXin Li
119*1789df15SXin LiXML_TAG_NAME
120*1789df15SXin Li    XML_END_TAG_OPEN:</
121*1789df15SXin Li    XML_TAG_NAME:resources
122*1789df15SXin Li    XML_TAG_CLOSE:>
123*1789df15SXin Li
124*1789df15SXin LiXML_CONTENT
125*1789df15SXin Li    XML_CONTENT:
126*1789df15SXin Li
127*1789df15SXin Li
128*1789df15SXin Li3- Iterating through regions
129*1789df15SXin Li----------------------------
130*1789df15SXin Li
131*1789df15SXin LiTo iterate through all regions, we need to process the list of top-level regions and then
132*1789df15SXin Liiterate over inner regions:
133*1789df15SXin Li
134*1789df15SXin Li    for (IStructuredDocumentRegion regions : sdoc.getStructuredDocumentRegions()) {
135*1789df15SXin Li        // process inner regions
136*1789df15SXin Li        for (int i = 0; i < regions.getNumberOfRegions(); i++) {
137*1789df15SXin Li            ITextRegion region = regions.getRegions().get(i);
138*1789df15SXin Li            String type = region.getType();
139*1789df15SXin Li            String text = regions.getText(region);
140*1789df15SXin Li        }
141*1789df15SXin Li    }
142*1789df15SXin Li
143*1789df15SXin LiEach "region collection" basically matches one XML tag, with sub-regions for all the tokens
144*1789df15SXin Liinside a tag.
145*1789df15SXin Li
146*1789df15SXin LiNote that an XML_CONTENT region is actually the whitespace, was is known as a TEXT in the w3c DOM.
147*1789df15SXin Li
148*1789df15SXin LiAlso note that each outer region has a type, but the inner regions also reuse a similar type.
149*1789df15SXin LiSo for example an outer XML_TAG_NAME region collection is a proper XML tag, and it will contain
150*1789df15SXin Lian opening tag, a closing tag but also an XML_TAG_NAME that is the tag name itself.
151*1789df15SXin Li
152*1789df15SXin LiSurprisingly, the inner regions do not have many access methods we can use on them, except their
153*1789df15SXin Litype and start/length/end. There are two length and end methods:
154*1789df15SXin Li- getLength() and getEnd() take any whitespace into account.
155*1789df15SXin Li- getTextLength() and getTextEnd() exclude some typical trailing whitespace.
156*1789df15SXin Li
157*1789df15SXin LiNote that regarding the trailing whitespace, empirical evidence shows that in the XML case
158*1789df15SXin Lihere, the only case where it matters is in a tag such as <string name="my_string">: for the
159*1789df15SXin LiXML_TAG_NAME region, getLength is 7 (string + space) and getTextLength is 6 (string, no space).
160*1789df15SXin LiSpacing between XML element is its own collapsed region.
161*1789df15SXin Li
162*1789df15SXin LiIf you want the text of the inner region, you actually need to query it from the outer region.
163*1789df15SXin LiThe outer IStructuredDocumentRegion (the region collection) contains lots more useful access
164*1789df15SXin Limethods, some of which return details on the inner regions:
165*1789df15SXin Li- getText     : without the whitespace.
166*1789df15SXin Li- getFullText : with the whitespace.
167*1789df15SXin Li- getStart / getLength / getEnd : type-dependent offset, including whitespace.
168*1789df15SXin Li- getStart / getTextLength / getTextEnd : type-dependent offset, excluding "irrelevant" whitespace.
169*1789df15SXin Li- getStartOffset / getEndOffset / getTextEndOffset : relative to document.
170*1789df15SXin Li
171*1789df15SXin LiEmpirical evidence shows that there is no discernible difference between the getStart/getEnd
172*1789df15SXin Livalues and those returned by getStartOffset/getEndOffset. Please abide by the javadoc.
173*1789df15SXin Li
174*1789df15SXin LiAll offsets start at zero.
175*1789df15SXin Li
176*1789df15SXin LiGiven a region collection, you can also browse regions either using a getRegions() list, or
177*1789df15SXin Liusing getFirst/getLastRegion, or using getRegionAtCharacterOffset(). Iterating the region
178*1789df15SXin Lilist seems the most useful scenario. There's no actual iterator provided for inner regions.
179*1789df15SXin Li
180*1789df15SXin LiThere are a few other methods available in the regions classes. This was not an exhaustive list.
181*1789df15SXin Li
182*1789df15SXin Li
183*1789df15SXin Li----
184