1*16467b97STreehugger Robot#!/usr/bin/ruby 2*16467b97STreehugger Robot# encoding: utf-8 3*16467b97STreehugger Robot 4*16467b97STreehugger Robot=begin LICENSE 5*16467b97STreehugger Robot 6*16467b97STreehugger Robot[The "BSD licence"] 7*16467b97STreehugger RobotCopyright (c) 2009-2010 Kyle Yetter 8*16467b97STreehugger RobotAll rights reserved. 9*16467b97STreehugger Robot 10*16467b97STreehugger RobotRedistribution and use in source and binary forms, with or without 11*16467b97STreehugger Robotmodification, are permitted provided that the following conditions 12*16467b97STreehugger Robotare met: 13*16467b97STreehugger Robot 14*16467b97STreehugger Robot 1. Redistributions of source code must retain the above copyright 15*16467b97STreehugger Robot notice, this list of conditions and the following disclaimer. 16*16467b97STreehugger Robot 2. Redistributions in binary form must reproduce the above copyright 17*16467b97STreehugger Robot notice, this list of conditions and the following disclaimer in the 18*16467b97STreehugger Robot documentation and/or other materials provided with the distribution. 19*16467b97STreehugger Robot 3. The name of the author may not be used to endorse or promote products 20*16467b97STreehugger Robot derived from this software without specific prior written permission. 21*16467b97STreehugger Robot 22*16467b97STreehugger RobotTHIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 23*16467b97STreehugger RobotIMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 24*16467b97STreehugger RobotOF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 25*16467b97STreehugger RobotIN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 26*16467b97STreehugger RobotINCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 27*16467b97STreehugger RobotNOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 28*16467b97STreehugger RobotDATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 29*16467b97STreehugger RobotTHEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 30*16467b97STreehugger Robot(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 31*16467b97STreehugger RobotTHIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32*16467b97STreehugger Robot 33*16467b97STreehugger Robot=end 34*16467b97STreehugger Robot 35*16467b97STreehugger Robotmodule ANTLR3 36*16467b97STreehugger Robot 37*16467b97STreehugger Robot 38*16467b97STreehugger Robot=begin rdoc ANTLR3::Stream 39*16467b97STreehugger Robot 40*16467b97STreehugger Robot= ANTLR3 Streams 41*16467b97STreehugger Robot 42*16467b97STreehugger RobotThis documentation first covers the general concept of streams as used by ANTLR 43*16467b97STreehugger Robotrecognizers, and then discusses the specific <tt>ANTLR3::Stream</tt> module. 44*16467b97STreehugger Robot 45*16467b97STreehugger Robot== ANTLR Stream Classes 46*16467b97STreehugger Robot 47*16467b97STreehugger RobotANTLR recognizers need a way to walk through input data in a serialized IO-style 48*16467b97STreehugger Robotfashion. They also need some book-keeping about the input to provide useful 49*16467b97STreehugger Robotinformation to developers, such as current line number and column. Furthermore, 50*16467b97STreehugger Robotto implement backtracking and various error recovery techniques, recognizers 51*16467b97STreehugger Robotneed a way to record various locations in the input at a number of points in the 52*16467b97STreehugger Robotrecognition process so the input state may be restored back to a prior state. 53*16467b97STreehugger Robot 54*16467b97STreehugger RobotANTLR bundles all of this functionality into a number of Stream classes, each 55*16467b97STreehugger Robotdesigned to be used by recognizers for a specific recognition task. Most of the 56*16467b97STreehugger RobotStream hierarchy is implemented in antlr3/stream.rb, which is loaded by default 57*16467b97STreehugger Robotwhen 'antlr3' is required. 58*16467b97STreehugger Robot 59*16467b97STreehugger Robot--- 60*16467b97STreehugger Robot 61*16467b97STreehugger RobotHere's a brief overview of the various stream classes and their respective 62*16467b97STreehugger Robotpurpose: 63*16467b97STreehugger Robot 64*16467b97STreehugger RobotStringStream:: 65*16467b97STreehugger Robot Similar to StringIO from the standard Ruby library, StringStream wraps raw 66*16467b97STreehugger Robot String data in a Stream interface for use by ANTLR lexers. 67*16467b97STreehugger RobotFileStream:: 68*16467b97STreehugger Robot A subclass of StringStream, FileStream simply wraps data read from an IO or 69*16467b97STreehugger Robot File object for use by lexers. 70*16467b97STreehugger RobotCommonTokenStream:: 71*16467b97STreehugger Robot The job of a TokenStream is to read lexer output and then provide ANTLR 72*16467b97STreehugger Robot parsers with the means to sequential walk through series of tokens. 73*16467b97STreehugger Robot CommonTokenStream is the default TokenStream implementation. 74*16467b97STreehugger RobotTokenRewriteStream:: 75*16467b97STreehugger Robot A subclass of CommonTokenStream, TokenRewriteStreams provide rewriting-parsers 76*16467b97STreehugger Robot the ability to produce new output text from an input token-sequence by 77*16467b97STreehugger Robot managing rewrite "programs" on top of the stream. 78*16467b97STreehugger RobotCommonTreeNodeStream:: 79*16467b97STreehugger Robot In a similar fashion to CommonTokenStream, CommonTreeNodeStream feeds tokens 80*16467b97STreehugger Robot to recognizers in a sequential fashion. However, the stream object serializes 81*16467b97STreehugger Robot an Abstract Syntax Tree into a flat, one-dimensional sequence, but preserves 82*16467b97STreehugger Robot the two-dimensional shape of the tree using special UP and DOWN tokens. The 83*16467b97STreehugger Robot sequence is primarily used by ANTLR Tree Parsers. *note* -- this is not 84*16467b97STreehugger Robot defined in antlr3/stream.rb, but antlr3/tree.rb 85*16467b97STreehugger Robot 86*16467b97STreehugger Robot--- 87*16467b97STreehugger Robot 88*16467b97STreehugger RobotThe next few sections cover the most significant methods of all stream classes. 89*16467b97STreehugger Robot 90*16467b97STreehugger Robot=== consume / look / peek 91*16467b97STreehugger Robot 92*16467b97STreehugger Robot<tt>stream.consume</tt> is used to advance a stream one unit. StringStreams are 93*16467b97STreehugger Robotadvanced by one character and TokenStreams are advanced by one token. 94*16467b97STreehugger Robot 95*16467b97STreehugger Robot<tt>stream.peek(k = 1)</tt> is used to quickly retrieve the object of interest 96*16467b97STreehugger Robotto a recognizer at look-ahead position specified by <tt>k</tt>. For 97*16467b97STreehugger Robot<b>StringStreams</b>, this is the <i>integer value of the character</i> 98*16467b97STreehugger Robot<tt>k</tt> characters ahead of the stream cursor. For <b>TokenStreams</b>, this 99*16467b97STreehugger Robotis the <i>integer token type of the token</i> <tt>k</tt> tokens ahead of the 100*16467b97STreehugger Robotstream cursor. 101*16467b97STreehugger Robot 102*16467b97STreehugger Robot<tt>stream.look(k = 1)</tt> is used to retrieve the full object of interest at 103*16467b97STreehugger Robotlook-ahead position specified by <tt>k</tt>. While <tt>peek</tt> provides the 104*16467b97STreehugger Robot<i>bare-minimum lightweight information</i> that the recognizer needs, 105*16467b97STreehugger Robot<tt>look</tt> provides the <i>full object of concern</i> in the stream. For 106*16467b97STreehugger Robot<b>StringStreams</b>, this is a <i>string object containing the single 107*16467b97STreehugger Robotcharacter</i> <tt>k</tt> characters ahead of the stream cursor. For 108*16467b97STreehugger Robot<b>TokenStreams</b>, this is the <i>full token structure</i> <tt>k</tt> tokens 109*16467b97STreehugger Robotahead of the stream cursor. 110*16467b97STreehugger Robot 111*16467b97STreehugger Robot<b>Note:</b> in most ANTLR runtime APIs for other languages, <tt>peek</tt> is 112*16467b97STreehugger Robotimplemented by some method with a name like <tt>LA(k)</tt> and <tt>look</tt> is 113*16467b97STreehugger Robotimplemented by some method with a name like <tt>LT(k)</tt>. When writing this 114*16467b97STreehugger RobotRuby runtime API, I found this naming practice both confusing, ambiguous, and 115*16467b97STreehugger Robotun-Ruby-like. Thus, I chose <tt>peek</tt> and <tt>look</tt> to represent a 116*16467b97STreehugger Robotquick-look (peek) and a full-fledged look-ahead operation (look). If this causes 117*16467b97STreehugger Robotconfusion or any sort of compatibility strife for developers using this 118*16467b97STreehugger Robotimplementation, all apologies. 119*16467b97STreehugger Robot 120*16467b97STreehugger Robot=== mark / rewind / release 121*16467b97STreehugger Robot 122*16467b97STreehugger Robot<tt>marker = stream.mark</tt> causes the stream to record important information 123*16467b97STreehugger Robotabout the current stream state, place the data in an internal memory table, and 124*16467b97STreehugger Robotreturn a memento, <tt>marker</tt>. The marker object is typically an integer key 125*16467b97STreehugger Robotto the stream's internal memory table. 126*16467b97STreehugger Robot 127*16467b97STreehugger RobotUsed in tandem with, <tt>stream.rewind(mark = last_marker)</tt>, the marker can 128*16467b97STreehugger Robotbe used to restore the stream to an earlier state. This is used by recognizers 129*16467b97STreehugger Robotto perform tasks such as backtracking and error recovery. 130*16467b97STreehugger Robot 131*16467b97STreehugger Robot<tt>stream.release(marker = last_marker)</tt> can be used to release an existing 132*16467b97STreehugger Robotstate marker from the memory table. 133*16467b97STreehugger Robot 134*16467b97STreehugger Robot=== seek 135*16467b97STreehugger Robot 136*16467b97STreehugger Robot<tt>stream.seek(position)</tt> moves the stream cursor to an absolute position 137*16467b97STreehugger Robotwithin the stream, basically like typical ruby <tt>IO#seek</tt> style methods. 138*16467b97STreehugger RobotHowever, unlike <tt>IO#seek</tt>, ANTLR streams currently always use absolute 139*16467b97STreehugger Robotposition seeking. 140*16467b97STreehugger Robot 141*16467b97STreehugger Robot== The Stream Module 142*16467b97STreehugger Robot 143*16467b97STreehugger Robot<tt>ANTLR3::Stream</tt> is an abstract-ish base mixin for all IO-like stream 144*16467b97STreehugger Robotclasses used by ANTLR recognizers. 145*16467b97STreehugger Robot 146*16467b97STreehugger RobotThe module doesn't do much on its own besides define arguably annoying 147*16467b97STreehugger Robot``abstract'' pseudo-methods that demand implementation when it is mixed in to a 148*16467b97STreehugger Robotclass that wants to be a Stream. Right now this exists as an artifact of porting 149*16467b97STreehugger Robotthe ANTLR Java/Python runtime library to Ruby. In Java, of course, this is 150*16467b97STreehugger Robotrepresented as an interface. In Ruby, however, objects are duck-typed and 151*16467b97STreehugger Robotinterfaces aren't that useful as programmatic entities -- in fact, it's mildly 152*16467b97STreehugger Robotwasteful to have a module like this hanging out. Thus, I may axe it. 153*16467b97STreehugger Robot 154*16467b97STreehugger RobotWhen mixed in, it does give the class a #size and #source_name attribute 155*16467b97STreehugger Robotmethods. 156*16467b97STreehugger Robot 157*16467b97STreehugger RobotExcept in a small handful of places, most of the ANTLR runtime library uses 158*16467b97STreehugger Robotduck-typing and not type checking on objects. This means that the methods which 159*16467b97STreehugger Robotmanipulate stream objects don't usually bother checking that the object is a 160*16467b97STreehugger RobotStream and assume that the object implements the proper stream interface. Thus, 161*16467b97STreehugger Robotit is not strictly necessary that custom stream objects include ANTLR3::Stream, 162*16467b97STreehugger Robotthough it isn't a bad idea. 163*16467b97STreehugger Robot 164*16467b97STreehugger Robot=end 165*16467b97STreehugger Robot 166*16467b97STreehugger Robotmodule Stream 167*16467b97STreehugger Robot include ANTLR3::Constants 168*16467b97STreehugger Robot extend ClassMacros 169*16467b97STreehugger Robot 170*16467b97STreehugger Robot ## 171*16467b97STreehugger Robot # :method: consume 172*16467b97STreehugger Robot # used to advance a stream one unit (such as character or token) 173*16467b97STreehugger Robot abstract :consume 174*16467b97STreehugger Robot 175*16467b97STreehugger Robot ## 176*16467b97STreehugger Robot # :method: peek( k = 1 ) 177*16467b97STreehugger Robot # used to quickly retreive the object of interest to a recognizer at lookahead 178*16467b97STreehugger Robot # position specified by <tt>k</tt> (such as integer value of a character or an 179*16467b97STreehugger Robot # integer token type) 180*16467b97STreehugger Robot abstract :peek 181*16467b97STreehugger Robot 182*16467b97STreehugger Robot ## 183*16467b97STreehugger Robot # :method: look( k = 1 ) 184*16467b97STreehugger Robot # used to retreive the full object of interest at lookahead position specified 185*16467b97STreehugger Robot # by <tt>k</tt> (such as a character string or a token structure) 186*16467b97STreehugger Robot abstract :look 187*16467b97STreehugger Robot 188*16467b97STreehugger Robot ## 189*16467b97STreehugger Robot # :method: mark 190*16467b97STreehugger Robot # saves the current position for the purposes of backtracking and 191*16467b97STreehugger Robot # returns a value to pass to #rewind at a later time 192*16467b97STreehugger Robot abstract :mark 193*16467b97STreehugger Robot 194*16467b97STreehugger Robot ## 195*16467b97STreehugger Robot # :method: index 196*16467b97STreehugger Robot # returns the current position of the stream 197*16467b97STreehugger Robot abstract :index 198*16467b97STreehugger Robot 199*16467b97STreehugger Robot ## 200*16467b97STreehugger Robot # :method: rewind( marker = last_marker ) 201*16467b97STreehugger Robot # restores the stream position using the state information previously saved 202*16467b97STreehugger Robot # by the given marker 203*16467b97STreehugger Robot abstract :rewind 204*16467b97STreehugger Robot 205*16467b97STreehugger Robot ## 206*16467b97STreehugger Robot # :method: release( marker = last_marker ) 207*16467b97STreehugger Robot # clears the saved state information associated with the given marker value 208*16467b97STreehugger Robot abstract :release 209*16467b97STreehugger Robot 210*16467b97STreehugger Robot ## 211*16467b97STreehugger Robot # :method: seek( position ) 212*16467b97STreehugger Robot # move the stream to the given absolute index given by +position+ 213*16467b97STreehugger Robot abstract :seek 214*16467b97STreehugger Robot 215*16467b97STreehugger Robot ## 216*16467b97STreehugger Robot # the total number of symbols in the stream 217*16467b97STreehugger Robot attr_reader :size 218*16467b97STreehugger Robot 219*16467b97STreehugger Robot ## 220*16467b97STreehugger Robot # indicates an identifying name for the stream -- usually the file path of the input 221*16467b97STreehugger Robot attr_accessor :source_name 222*16467b97STreehugger Robotend 223*16467b97STreehugger Robot 224*16467b97STreehugger Robot=begin rdoc ANTLR3::CharacterStream 225*16467b97STreehugger Robot 226*16467b97STreehugger RobotCharacterStream further extends the abstract-ish base mixin Stream to add 227*16467b97STreehugger Robotmethods specific to navigating character-based input data. Thus, it serves as an 228*16467b97STreehugger Robotimmitation of the Java interface for text-based streams, which are primarily 229*16467b97STreehugger Robotused by lexers. 230*16467b97STreehugger Robot 231*16467b97STreehugger RobotIt adds the ``abstract'' method, <tt>substring(start, stop)</tt>, which must be 232*16467b97STreehugger Robotimplemented to return a slice of the input string from position <tt>start</tt> 233*16467b97STreehugger Robotto position <tt>stop</tt>. It also adds attribute accessor methods <tt>line</tt> 234*16467b97STreehugger Robotand <tt>column</tt>, which are expected to indicate the current line number and 235*16467b97STreehugger Robotposition within the current line, respectively. 236*16467b97STreehugger Robot 237*16467b97STreehugger Robot== A Word About <tt>line</tt> and <tt>column</tt> attributes 238*16467b97STreehugger Robot 239*16467b97STreehugger RobotPresumably, the concept of <tt>line</tt> and <tt>column</tt> attirbutes of text 240*16467b97STreehugger Robotare familliar to most developers. Line numbers of text are indexed from number 1 241*16467b97STreehugger Robotup (not 0). Column numbers are indexed from 0 up. Thus, examining sample text: 242*16467b97STreehugger Robot 243*16467b97STreehugger Robot Hey this is the first line. 244*16467b97STreehugger Robot Oh, and this is the second line. 245*16467b97STreehugger Robot 246*16467b97STreehugger RobotLine 1 is the string "Hey this is the first line\\n". If a character stream is at 247*16467b97STreehugger Robotline 2, character 0, the stream cursor is sitting between the characters "\\n" 248*16467b97STreehugger Robotand "O". 249*16467b97STreehugger Robot 250*16467b97STreehugger Robot*Note:* most ANTLR runtime APIs for other languages refer to <tt>column</tt> 251*16467b97STreehugger Robotwith the more-precise, but lengthy name <tt>charPositionInLine</tt>. I prefered 252*16467b97STreehugger Robotto keep it simple and familliar in this Ruby runtime API. 253*16467b97STreehugger Robot 254*16467b97STreehugger Robot=end 255*16467b97STreehugger Robot 256*16467b97STreehugger Robotmodule CharacterStream 257*16467b97STreehugger Robot include Stream 258*16467b97STreehugger Robot extend ClassMacros 259*16467b97STreehugger Robot include Constants 260*16467b97STreehugger Robot 261*16467b97STreehugger Robot ## 262*16467b97STreehugger Robot # :method: substring(start,stop) 263*16467b97STreehugger Robot abstract :substring 264*16467b97STreehugger Robot 265*16467b97STreehugger Robot attr_accessor :line 266*16467b97STreehugger Robot attr_accessor :column 267*16467b97STreehugger Robotend 268*16467b97STreehugger Robot 269*16467b97STreehugger Robot 270*16467b97STreehugger Robot=begin rdoc ANTLR3::TokenStream 271*16467b97STreehugger Robot 272*16467b97STreehugger RobotTokenStream further extends the abstract-ish base mixin Stream to add methods 273*16467b97STreehugger Robotspecific to navigating token sequences. Thus, it serves as an imitation of the 274*16467b97STreehugger RobotJava interface for token-based streams, which are used by many different 275*16467b97STreehugger Robotcomponents in ANTLR, including parsers and tree parsers. 276*16467b97STreehugger Robot 277*16467b97STreehugger Robot== Token Streams 278*16467b97STreehugger Robot 279*16467b97STreehugger RobotToken streams wrap a sequence of token objects produced by some token source, 280*16467b97STreehugger Robotusually a lexer. They provide the operations required by higher-level 281*16467b97STreehugger Robotrecognizers, such as parsers and tree parsers for navigating through the 282*16467b97STreehugger Robotsequence of tokens. Unlike simple character-based streams, such as StringStream, 283*16467b97STreehugger Robottoken-based streams have an additional level of complexity because they must 284*16467b97STreehugger Robotmanage the task of "tuning" to a specific token channel. 285*16467b97STreehugger Robot 286*16467b97STreehugger RobotOne of the main advantages of ANTLR-based recognition is the token 287*16467b97STreehugger Robot<i>channel</i> feature, which allows you to hold on to all tokens of interest 288*16467b97STreehugger Robotwhile only presenting a specific set of interesting tokens to a parser. For 289*16467b97STreehugger Robotexample, if you need to hide whitespace and comments from a parser, but hang on 290*16467b97STreehugger Robotto them for some other purpose, you have the lexer assign the comments and 291*16467b97STreehugger Robotwhitespace to channel value HIDDEN as it creates the tokens. 292*16467b97STreehugger Robot 293*16467b97STreehugger RobotWhen you create a token stream, you can tune it to some specific channel value. 294*16467b97STreehugger RobotThen, all <tt>peek</tt>, <tt>look</tt>, and <tt>consume</tt> operations only 295*16467b97STreehugger Robotyield tokens that have the same value for <tt>channel</tt>. The stream skips 296*16467b97STreehugger Robotover any non-matching tokens in between. 297*16467b97STreehugger Robot 298*16467b97STreehugger Robot== The TokenStream Interface 299*16467b97STreehugger Robot 300*16467b97STreehugger RobotIn addition to the abstract methods and attribute methods provided by the base 301*16467b97STreehugger RobotStream module, TokenStream adds a number of additional method implementation 302*16467b97STreehugger Robotrequirements and attributes. 303*16467b97STreehugger Robot 304*16467b97STreehugger Robot=end 305*16467b97STreehugger Robot 306*16467b97STreehugger Robotmodule TokenStream 307*16467b97STreehugger Robot include Stream 308*16467b97STreehugger Robot extend ClassMacros 309*16467b97STreehugger Robot 310*16467b97STreehugger Robot ## 311*16467b97STreehugger Robot # expected to return the token source object (such as a lexer) from which 312*16467b97STreehugger Robot # all tokens in the stream were retreived 313*16467b97STreehugger Robot attr_reader :token_source 314*16467b97STreehugger Robot 315*16467b97STreehugger Robot ## 316*16467b97STreehugger Robot # expected to return the value of the last marker produced by a call to 317*16467b97STreehugger Robot # <tt>stream.mark</tt> 318*16467b97STreehugger Robot attr_reader :last_marker 319*16467b97STreehugger Robot 320*16467b97STreehugger Robot ## 321*16467b97STreehugger Robot # expected to return the integer index of the stream cursor 322*16467b97STreehugger Robot attr_reader :position 323*16467b97STreehugger Robot 324*16467b97STreehugger Robot ## 325*16467b97STreehugger Robot # the integer channel value to which the stream is ``tuned'' 326*16467b97STreehugger Robot attr_accessor :channel 327*16467b97STreehugger Robot 328*16467b97STreehugger Robot ## 329*16467b97STreehugger Robot # :method: to_s(start=0,stop=tokens.length-1) 330*16467b97STreehugger Robot # should take the tokens between start and stop in the sequence, extract their text 331*16467b97STreehugger Robot # and return the concatenation of all the text chunks 332*16467b97STreehugger Robot abstract :to_s 333*16467b97STreehugger Robot 334*16467b97STreehugger Robot ## 335*16467b97STreehugger Robot # :method: at( i ) 336*16467b97STreehugger Robot # return the stream symbol at index +i+ 337*16467b97STreehugger Robot abstract :at 338*16467b97STreehugger Robotend 339*16467b97STreehugger Robot 340*16467b97STreehugger Robot=begin rdoc ANTLR3::StringStream 341*16467b97STreehugger Robot 342*16467b97STreehugger RobotA StringStream's purpose is to wrap the basic, naked text input of a recognition 343*16467b97STreehugger Robotsystem. Like all other stream types, it provides serial navigation of the input; 344*16467b97STreehugger Robota recognizer can arbitrarily step forward and backward through the stream's 345*16467b97STreehugger Robotsymbols as it requires. StringStream and its subclasses are they main way to 346*16467b97STreehugger Robotfeed text input into an ANTLR Lexer for token processing. 347*16467b97STreehugger Robot 348*16467b97STreehugger RobotThe stream's symbols of interest, of course, are character values. Thus, the 349*16467b97STreehugger Robot#peek method returns the integer character value at look-ahead position 350*16467b97STreehugger Robot<tt>k</tt> and the #look method returns the character value as a +String+. They 351*16467b97STreehugger Robotalso track various pieces of information such as the line and column numbers at 352*16467b97STreehugger Robotthe current position. 353*16467b97STreehugger Robot 354*16467b97STreehugger Robot=== Note About Text Encoding 355*16467b97STreehugger Robot 356*16467b97STreehugger RobotThis version of the runtime library primarily targets ruby version 1.8, which 357*16467b97STreehugger Robotdoes not have strong built-in support for multi-byte character encodings. Thus, 358*16467b97STreehugger Robotcharacters are assumed to be represented by a single byte -- an integer between 359*16467b97STreehugger Robot0 and 255. Ruby 1.9 does provide built-in encoding support for multi-byte 360*16467b97STreehugger Robotcharacters, but currently this library does not provide any streams to handle 361*16467b97STreehugger Robotnon-ASCII encoding. However, encoding-savvy recognition code is a future 362*16467b97STreehugger Robotdevelopment goal for this project. 363*16467b97STreehugger Robot 364*16467b97STreehugger Robot=end 365*16467b97STreehugger Robot 366*16467b97STreehugger Robotclass StringStream 367*16467b97STreehugger Robot NEWLINE = ?\n.ord 368*16467b97STreehugger Robot 369*16467b97STreehugger Robot include CharacterStream 370*16467b97STreehugger Robot 371*16467b97STreehugger Robot # current integer character index of the stream 372*16467b97STreehugger Robot attr_reader :position 373*16467b97STreehugger Robot 374*16467b97STreehugger Robot # the current line number of the input, indexed upward from 1 375*16467b97STreehugger Robot attr_reader :line 376*16467b97STreehugger Robot 377*16467b97STreehugger Robot # the current character position within the current line, indexed upward from 0 378*16467b97STreehugger Robot attr_reader :column 379*16467b97STreehugger Robot 380*16467b97STreehugger Robot # the name associated with the stream -- usually a file name 381*16467b97STreehugger Robot # defaults to <tt>"(string)"</tt> 382*16467b97STreehugger Robot attr_accessor :name 383*16467b97STreehugger Robot 384*16467b97STreehugger Robot # the entire string that is wrapped by the stream 385*16467b97STreehugger Robot attr_reader :data 386*16467b97STreehugger Robot attr_reader :string 387*16467b97STreehugger Robot 388*16467b97STreehugger Robot if RUBY_VERSION =~ /^1\.9/ 389*16467b97STreehugger Robot 390*16467b97STreehugger Robot # creates a new StringStream object where +data+ is the string data to stream. 391*16467b97STreehugger Robot # accepts the following options in a symbol-to-value hash: 392*16467b97STreehugger Robot # 393*16467b97STreehugger Robot # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt> 394*16467b97STreehugger Robot # [:line] the initial line number; default: +1+ 395*16467b97STreehugger Robot # [:column] the initial column number; default: +0+ 396*16467b97STreehugger Robot # 397*16467b97STreehugger Robot def initialize( data, options = {} ) # for 1.9 398*16467b97STreehugger Robot @string = data.to_s.encode( Encoding::UTF_8 ).freeze 399*16467b97STreehugger Robot @data = @string.codepoints.to_a.freeze 400*16467b97STreehugger Robot @position = options.fetch :position, 0 401*16467b97STreehugger Robot @line = options.fetch :line, 1 402*16467b97STreehugger Robot @column = options.fetch :column, 0 403*16467b97STreehugger Robot @markers = [] 404*16467b97STreehugger Robot @name ||= options[ :file ] || options[ :name ] # || '(string)' 405*16467b97STreehugger Robot mark 406*16467b97STreehugger Robot end 407*16467b97STreehugger Robot 408*16467b97STreehugger Robot # 409*16467b97STreehugger Robot # identical to #peek, except it returns the character value as a String 410*16467b97STreehugger Robot # 411*16467b97STreehugger Robot def look( k = 1 ) # for 1.9 412*16467b97STreehugger Robot k == 0 and return nil 413*16467b97STreehugger Robot k += 1 if k < 0 414*16467b97STreehugger Robot 415*16467b97STreehugger Robot index = @position + k - 1 416*16467b97STreehugger Robot index < 0 and return nil 417*16467b97STreehugger Robot 418*16467b97STreehugger Robot @string[ index ] 419*16467b97STreehugger Robot end 420*16467b97STreehugger Robot 421*16467b97STreehugger Robot else 422*16467b97STreehugger Robot 423*16467b97STreehugger Robot # creates a new StringStream object where +data+ is the string data to stream. 424*16467b97STreehugger Robot # accepts the following options in a symbol-to-value hash: 425*16467b97STreehugger Robot # 426*16467b97STreehugger Robot # [:file or :name] the (file) name to associate with the stream; default: <tt>'(string)'</tt> 427*16467b97STreehugger Robot # [:line] the initial line number; default: +1+ 428*16467b97STreehugger Robot # [:column] the initial column number; default: +0+ 429*16467b97STreehugger Robot # 430*16467b97STreehugger Robot def initialize( data, options = {} ) # for 1.8 431*16467b97STreehugger Robot @data = data.to_s 432*16467b97STreehugger Robot @data.equal?( data ) and @data = @data.clone 433*16467b97STreehugger Robot @data.freeze 434*16467b97STreehugger Robot @string = @data 435*16467b97STreehugger Robot @position = options.fetch :position, 0 436*16467b97STreehugger Robot @line = options.fetch :line, 1 437*16467b97STreehugger Robot @column = options.fetch :column, 0 438*16467b97STreehugger Robot @markers = [] 439*16467b97STreehugger Robot @name ||= options[ :file ] || options[ :name ] # || '(string)' 440*16467b97STreehugger Robot mark 441*16467b97STreehugger Robot end 442*16467b97STreehugger Robot 443*16467b97STreehugger Robot # 444*16467b97STreehugger Robot # identical to #peek, except it returns the character value as a String 445*16467b97STreehugger Robot # 446*16467b97STreehugger Robot def look( k = 1 ) # for 1.8 447*16467b97STreehugger Robot k == 0 and return nil 448*16467b97STreehugger Robot k += 1 if k < 0 449*16467b97STreehugger Robot 450*16467b97STreehugger Robot index = @position + k - 1 451*16467b97STreehugger Robot index < 0 and return nil 452*16467b97STreehugger Robot 453*16467b97STreehugger Robot c = @data[ index ] and c.chr 454*16467b97STreehugger Robot end 455*16467b97STreehugger Robot 456*16467b97STreehugger Robot end 457*16467b97STreehugger Robot 458*16467b97STreehugger Robot def size 459*16467b97STreehugger Robot @data.length 460*16467b97STreehugger Robot end 461*16467b97STreehugger Robot 462*16467b97STreehugger Robot alias length size 463*16467b97STreehugger Robot 464*16467b97STreehugger Robot # 465*16467b97STreehugger Robot # rewinds the stream back to the start and clears out any existing marker entries 466*16467b97STreehugger Robot # 467*16467b97STreehugger Robot def reset 468*16467b97STreehugger Robot initial_location = @markers.first 469*16467b97STreehugger Robot @position, @line, @column = initial_location 470*16467b97STreehugger Robot @markers.clear 471*16467b97STreehugger Robot @markers << initial_location 472*16467b97STreehugger Robot return self 473*16467b97STreehugger Robot end 474*16467b97STreehugger Robot 475*16467b97STreehugger Robot # 476*16467b97STreehugger Robot # advance the stream by one character; returns the character consumed 477*16467b97STreehugger Robot # 478*16467b97STreehugger Robot def consume 479*16467b97STreehugger Robot c = @data[ @position ] || EOF 480*16467b97STreehugger Robot if @position < @data.length 481*16467b97STreehugger Robot @column += 1 482*16467b97STreehugger Robot if c == NEWLINE 483*16467b97STreehugger Robot @line += 1 484*16467b97STreehugger Robot @column = 0 485*16467b97STreehugger Robot end 486*16467b97STreehugger Robot @position += 1 487*16467b97STreehugger Robot end 488*16467b97STreehugger Robot return( c ) 489*16467b97STreehugger Robot end 490*16467b97STreehugger Robot 491*16467b97STreehugger Robot # 492*16467b97STreehugger Robot # return the character at look-ahead distance +k+ as an integer. <tt>k = 1</tt> represents 493*16467b97STreehugger Robot # the current character. +k+ greater than 1 represents upcoming characters. A negative 494*16467b97STreehugger Robot # value of +k+ returns previous characters consumed, where <tt>k = -1</tt> is the last 495*16467b97STreehugger Robot # character consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+ 496*16467b97STreehugger Robot # 497*16467b97STreehugger Robot def peek( k = 1 ) 498*16467b97STreehugger Robot k == 0 and return nil 499*16467b97STreehugger Robot k += 1 if k < 0 500*16467b97STreehugger Robot index = @position + k - 1 501*16467b97STreehugger Robot index < 0 and return nil 502*16467b97STreehugger Robot @data[ index ] or EOF 503*16467b97STreehugger Robot end 504*16467b97STreehugger Robot 505*16467b97STreehugger Robot # 506*16467b97STreehugger Robot # return a substring around the stream cursor at a distance +k+ 507*16467b97STreehugger Robot # if <tt>k >= 0</tt>, return the next k characters 508*16467b97STreehugger Robot # if <tt>k < 0</tt>, return the previous <tt>|k|</tt> characters 509*16467b97STreehugger Robot # 510*16467b97STreehugger Robot def through( k ) 511*16467b97STreehugger Robot if k >= 0 then @string[ @position, k ] else 512*16467b97STreehugger Robot start = ( @position + k ).at_least( 0 ) # start cannot be negative or index will wrap around 513*16467b97STreehugger Robot @string[ start ... @position ] 514*16467b97STreehugger Robot end 515*16467b97STreehugger Robot end 516*16467b97STreehugger Robot 517*16467b97STreehugger Robot # operator style look-ahead 518*16467b97STreehugger Robot alias >> look 519*16467b97STreehugger Robot 520*16467b97STreehugger Robot # operator style look-behind 521*16467b97STreehugger Robot def <<( k ) 522*16467b97STreehugger Robot self << -k 523*16467b97STreehugger Robot end 524*16467b97STreehugger Robot 525*16467b97STreehugger Robot alias index position 526*16467b97STreehugger Robot alias character_index position 527*16467b97STreehugger Robot 528*16467b97STreehugger Robot alias source_name name 529*16467b97STreehugger Robot 530*16467b97STreehugger Robot # 531*16467b97STreehugger Robot # Returns true if the stream appears to be at the beginning of a new line. 532*16467b97STreehugger Robot # This is an extra utility method for use inside lexer actions if needed. 533*16467b97STreehugger Robot # 534*16467b97STreehugger Robot def beginning_of_line? 535*16467b97STreehugger Robot @position.zero? or @data[ @position - 1 ] == NEWLINE 536*16467b97STreehugger Robot end 537*16467b97STreehugger Robot 538*16467b97STreehugger Robot # 539*16467b97STreehugger Robot # Returns true if the stream appears to be at the end of a new line. 540*16467b97STreehugger Robot # This is an extra utility method for use inside lexer actions if needed. 541*16467b97STreehugger Robot # 542*16467b97STreehugger Robot def end_of_line? 543*16467b97STreehugger Robot @data[ @position ] == NEWLINE #if @position < @data.length 544*16467b97STreehugger Robot end 545*16467b97STreehugger Robot 546*16467b97STreehugger Robot # 547*16467b97STreehugger Robot # Returns true if the stream has been exhausted. 548*16467b97STreehugger Robot # This is an extra utility method for use inside lexer actions if needed. 549*16467b97STreehugger Robot # 550*16467b97STreehugger Robot def end_of_string? 551*16467b97STreehugger Robot @position >= @data.length 552*16467b97STreehugger Robot end 553*16467b97STreehugger Robot 554*16467b97STreehugger Robot # 555*16467b97STreehugger Robot # Returns true if the stream appears to be at the beginning of a stream (position = 0). 556*16467b97STreehugger Robot # This is an extra utility method for use inside lexer actions if needed. 557*16467b97STreehugger Robot # 558*16467b97STreehugger Robot def beginning_of_string? 559*16467b97STreehugger Robot @position == 0 560*16467b97STreehugger Robot end 561*16467b97STreehugger Robot 562*16467b97STreehugger Robot alias eof? end_of_string? 563*16467b97STreehugger Robot alias bof? beginning_of_string? 564*16467b97STreehugger Robot 565*16467b97STreehugger Robot # 566*16467b97STreehugger Robot # record the current stream location parameters in the stream's marker table and 567*16467b97STreehugger Robot # return an integer-valued bookmark that may be used to restore the stream's 568*16467b97STreehugger Robot # position with the #rewind method. This method is used to implement backtracking. 569*16467b97STreehugger Robot # 570*16467b97STreehugger Robot def mark 571*16467b97STreehugger Robot state = [ @position, @line, @column ].freeze 572*16467b97STreehugger Robot @markers << state 573*16467b97STreehugger Robot return @markers.length - 1 574*16467b97STreehugger Robot end 575*16467b97STreehugger Robot 576*16467b97STreehugger Robot # 577*16467b97STreehugger Robot # restore the stream to an earlier location recorded by #mark. If no marker value is 578*16467b97STreehugger Robot # provided, the last marker generated by #mark will be used. 579*16467b97STreehugger Robot # 580*16467b97STreehugger Robot def rewind( marker = @markers.length - 1, release = true ) 581*16467b97STreehugger Robot ( marker >= 0 and location = @markers[ marker ] ) or return( self ) 582*16467b97STreehugger Robot @position, @line, @column = location 583*16467b97STreehugger Robot release( marker ) if release 584*16467b97STreehugger Robot return self 585*16467b97STreehugger Robot end 586*16467b97STreehugger Robot 587*16467b97STreehugger Robot # 588*16467b97STreehugger Robot # the total number of markers currently in existence 589*16467b97STreehugger Robot # 590*16467b97STreehugger Robot def mark_depth 591*16467b97STreehugger Robot @markers.length 592*16467b97STreehugger Robot end 593*16467b97STreehugger Robot 594*16467b97STreehugger Robot # 595*16467b97STreehugger Robot # the last marker value created by a call to #mark 596*16467b97STreehugger Robot # 597*16467b97STreehugger Robot def last_marker 598*16467b97STreehugger Robot @markers.length - 1 599*16467b97STreehugger Robot end 600*16467b97STreehugger Robot 601*16467b97STreehugger Robot # 602*16467b97STreehugger Robot # let go of the bookmark data for the marker and all marker 603*16467b97STreehugger Robot # values created after the marker. 604*16467b97STreehugger Robot # 605*16467b97STreehugger Robot def release( marker = @markers.length - 1 ) 606*16467b97STreehugger Robot marker.between?( 1, @markers.length - 1 ) or return 607*16467b97STreehugger Robot @markers.pop( @markers.length - marker ) 608*16467b97STreehugger Robot return self 609*16467b97STreehugger Robot end 610*16467b97STreehugger Robot 611*16467b97STreehugger Robot # 612*16467b97STreehugger Robot # jump to the absolute position value given by +index+. 613*16467b97STreehugger Robot # note: if +index+ is before the current position, the +line+ and +column+ 614*16467b97STreehugger Robot # attributes of the stream will probably be incorrect 615*16467b97STreehugger Robot # 616*16467b97STreehugger Robot def seek( index ) 617*16467b97STreehugger Robot index = index.bound( 0, @data.length ) # ensures index is within the stream's range 618*16467b97STreehugger Robot if index > @position 619*16467b97STreehugger Robot skipped = through( index - @position ) 620*16467b97STreehugger Robot if lc = skipped.count( "\n" ) and lc.zero? 621*16467b97STreehugger Robot @column += skipped.length 622*16467b97STreehugger Robot else 623*16467b97STreehugger Robot @line += lc 624*16467b97STreehugger Robot @column = skipped.length - skipped.rindex( "\n" ) - 1 625*16467b97STreehugger Robot end 626*16467b97STreehugger Robot end 627*16467b97STreehugger Robot @position = index 628*16467b97STreehugger Robot return nil 629*16467b97STreehugger Robot end 630*16467b97STreehugger Robot 631*16467b97STreehugger Robot # 632*16467b97STreehugger Robot # customized object inspection that shows: 633*16467b97STreehugger Robot # * the stream class 634*16467b97STreehugger Robot # * the stream's location in <tt>index / line:column</tt> format 635*16467b97STreehugger Robot # * +before_chars+ characters before the cursor (6 characters by default) 636*16467b97STreehugger Robot # * +after_chars+ characters after the cursor (10 characters by default) 637*16467b97STreehugger Robot # 638*16467b97STreehugger Robot def inspect( before_chars = 6, after_chars = 10 ) 639*16467b97STreehugger Robot before = through( -before_chars ).inspect 640*16467b97STreehugger Robot @position - before_chars > 0 and before.insert( 0, '... ' ) 641*16467b97STreehugger Robot 642*16467b97STreehugger Robot after = through( after_chars ).inspect 643*16467b97STreehugger Robot @position + after_chars + 1 < @data.length and after << ' ...' 644*16467b97STreehugger Robot 645*16467b97STreehugger Robot location = "#@position / line #@line:#@column" 646*16467b97STreehugger Robot "#<#{ self.class }: #{ before } | #{ after } @ #{ location }>" 647*16467b97STreehugger Robot end 648*16467b97STreehugger Robot 649*16467b97STreehugger Robot # 650*16467b97STreehugger Robot # return the string slice between position +start+ and +stop+ 651*16467b97STreehugger Robot # 652*16467b97STreehugger Robot def substring( start, stop ) 653*16467b97STreehugger Robot @string[ start, stop - start + 1 ] 654*16467b97STreehugger Robot end 655*16467b97STreehugger Robot 656*16467b97STreehugger Robot # 657*16467b97STreehugger Robot # identical to String#[] 658*16467b97STreehugger Robot # 659*16467b97STreehugger Robot def []( start, *args ) 660*16467b97STreehugger Robot @string[ start, *args ] 661*16467b97STreehugger Robot end 662*16467b97STreehugger Robotend 663*16467b97STreehugger Robot 664*16467b97STreehugger Robot 665*16467b97STreehugger Robot=begin rdoc ANTLR3::FileStream 666*16467b97STreehugger Robot 667*16467b97STreehugger RobotFileStream is a character stream that uses data stored in some external file. It 668*16467b97STreehugger Robotis nearly identical to StringStream and functions as use data located in a file 669*16467b97STreehugger Robotwhile automatically setting up the +source_name+ and +line+ parameters. It does 670*16467b97STreehugger Robotnot actually use any buffered IO operations throughout the stream navigation 671*16467b97STreehugger Robotprocess. Instead, it reads the file data once when the stream is initialized. 672*16467b97STreehugger Robot 673*16467b97STreehugger Robot=end 674*16467b97STreehugger Robot 675*16467b97STreehugger Robotclass FileStream < StringStream 676*16467b97STreehugger Robot 677*16467b97STreehugger Robot # 678*16467b97STreehugger Robot # creates a new FileStream object using the given +file+ object. 679*16467b97STreehugger Robot # If +file+ is a path string, the file will be read and the contents 680*16467b97STreehugger Robot # will be used and the +name+ attribute will be set to the path. 681*16467b97STreehugger Robot # If +file+ is an IO-like object (that responds to :read), 682*16467b97STreehugger Robot # the content of the object will be used and the stream will 683*16467b97STreehugger Robot # attempt to set its +name+ object first trying the method #name 684*16467b97STreehugger Robot # on the object, then trying the method #path on the object. 685*16467b97STreehugger Robot # 686*16467b97STreehugger Robot # see StringStream.new for a list of additional options 687*16467b97STreehugger Robot # the constructer accepts 688*16467b97STreehugger Robot # 689*16467b97STreehugger Robot def initialize( file, options = {} ) 690*16467b97STreehugger Robot case file 691*16467b97STreehugger Robot when $stdin then 692*16467b97STreehugger Robot data = $stdin.read 693*16467b97STreehugger Robot @name = '(stdin)' 694*16467b97STreehugger Robot when ARGF 695*16467b97STreehugger Robot data = file.read 696*16467b97STreehugger Robot @name = file.path 697*16467b97STreehugger Robot when ::File then 698*16467b97STreehugger Robot file = file.clone 699*16467b97STreehugger Robot file.reopen( file.path, 'r' ) 700*16467b97STreehugger Robot @name = file.path 701*16467b97STreehugger Robot data = file.read 702*16467b97STreehugger Robot file.close 703*16467b97STreehugger Robot else 704*16467b97STreehugger Robot if file.respond_to?( :read ) 705*16467b97STreehugger Robot data = file.read 706*16467b97STreehugger Robot if file.respond_to?( :name ) then @name = file.name 707*16467b97STreehugger Robot elsif file.respond_to?( :path ) then @name = file.path 708*16467b97STreehugger Robot end 709*16467b97STreehugger Robot else 710*16467b97STreehugger Robot @name = file.to_s 711*16467b97STreehugger Robot if test( ?f, @name ) then data = File.read( @name ) 712*16467b97STreehugger Robot else raise ArgumentError, "could not find an existing file at %p" % @name 713*16467b97STreehugger Robot end 714*16467b97STreehugger Robot end 715*16467b97STreehugger Robot end 716*16467b97STreehugger Robot super( data, options ) 717*16467b97STreehugger Robot end 718*16467b97STreehugger Robot 719*16467b97STreehugger Robotend 720*16467b97STreehugger Robot 721*16467b97STreehugger Robot=begin rdoc ANTLR3::CommonTokenStream 722*16467b97STreehugger Robot 723*16467b97STreehugger RobotCommonTokenStream serves as the primary token stream implementation for feeding 724*16467b97STreehugger Robotsequential token input into parsers. 725*16467b97STreehugger Robot 726*16467b97STreehugger RobotUsing some TokenSource (such as a lexer), the stream collects a token sequence, 727*16467b97STreehugger Robotsetting the token's <tt>index</tt> attribute to indicate the token's position 728*16467b97STreehugger Robotwithin the stream. The streams may be tuned to some channel value; off-channel 729*16467b97STreehugger Robottokens will be filtered out by the #peek, #look, and #consume methods. 730*16467b97STreehugger Robot 731*16467b97STreehugger Robot=== Sample Usage 732*16467b97STreehugger Robot 733*16467b97STreehugger Robot 734*16467b97STreehugger Robot source_input = ANTLR3::StringStream.new("35 * 4 - 1") 735*16467b97STreehugger Robot lexer = Calculator::Lexer.new(source_input) 736*16467b97STreehugger Robot tokens = ANTLR3::CommonTokenStream.new(lexer) 737*16467b97STreehugger Robot 738*16467b97STreehugger Robot # assume this grammar defines whitespace as tokens on channel HIDDEN 739*16467b97STreehugger Robot # and numbers and operations as tokens on channel DEFAULT 740*16467b97STreehugger Robot tokens.look # => 0 INT['35'] @ line 1 col 0 (0..1) 741*16467b97STreehugger Robot tokens.look(2) # => 2 MULT["*"] @ line 1 col 2 (3..3) 742*16467b97STreehugger Robot tokens.tokens(0, 2) 743*16467b97STreehugger Robot # => [0 INT["35"] @line 1 col 0 (0..1), 744*16467b97STreehugger Robot # 1 WS[" "] @line 1 col 2 (1..1), 745*16467b97STreehugger Robot # 2 MULT["*"] @ line 1 col 3 (3..3)] 746*16467b97STreehugger Robot # notice the #tokens method does not filter off-channel tokens 747*16467b97STreehugger Robot 748*16467b97STreehugger Robot lexer.reset 749*16467b97STreehugger Robot hidden_tokens = 750*16467b97STreehugger Robot ANTLR3::CommonTokenStream.new(lexer, :channel => ANTLR3::HIDDEN) 751*16467b97STreehugger Robot hidden_tokens.look # => 1 WS[' '] @ line 1 col 2 (1..1) 752*16467b97STreehugger Robot 753*16467b97STreehugger Robot=end 754*16467b97STreehugger Robot 755*16467b97STreehugger Robotclass CommonTokenStream 756*16467b97STreehugger Robot include TokenStream 757*16467b97STreehugger Robot include Enumerable 758*16467b97STreehugger Robot 759*16467b97STreehugger Robot # 760*16467b97STreehugger Robot # constructs a new token stream using the +token_source+ provided. +token_source+ is 761*16467b97STreehugger Robot # usually a lexer, but can be any object that implements +next_token+ and includes 762*16467b97STreehugger Robot # ANTLR3::TokenSource. 763*16467b97STreehugger Robot # 764*16467b97STreehugger Robot # If a block is provided, each token harvested will be yielded and if the block 765*16467b97STreehugger Robot # returns a +nil+ or +false+ value, the token will not be added to the stream -- 766*16467b97STreehugger Robot # it will be discarded. 767*16467b97STreehugger Robot # 768*16467b97STreehugger Robot # === Options 769*16467b97STreehugger Robot # [:channel] The channel value the stream should be tuned to initially 770*16467b97STreehugger Robot # [:source_name] The source name (file name) attribute of the stream 771*16467b97STreehugger Robot # 772*16467b97STreehugger Robot # === Example 773*16467b97STreehugger Robot # 774*16467b97STreehugger Robot # # create a new token stream that is tuned to channel :comment, and 775*16467b97STreehugger Robot # # discard all WHITE_SPACE tokens 776*16467b97STreehugger Robot # ANTLR3::CommonTokenStream.new(lexer, :channel => :comment) do |token| 777*16467b97STreehugger Robot # token.name != 'WHITE_SPACE' 778*16467b97STreehugger Robot # end 779*16467b97STreehugger Robot # 780*16467b97STreehugger Robot def initialize( token_source, options = {} ) 781*16467b97STreehugger Robot case token_source 782*16467b97STreehugger Robot when CommonTokenStream 783*16467b97STreehugger Robot # this is useful in cases where you want to convert a CommonTokenStream 784*16467b97STreehugger Robot # to a RewriteTokenStream or other variation of the standard token stream 785*16467b97STreehugger Robot stream = token_source 786*16467b97STreehugger Robot @token_source = stream.token_source 787*16467b97STreehugger Robot @channel = options.fetch( :channel ) { stream.channel or DEFAULT_CHANNEL } 788*16467b97STreehugger Robot @source_name = options.fetch( :source_name ) { stream.source_name } 789*16467b97STreehugger Robot tokens = stream.tokens.map { | t | t.dup } 790*16467b97STreehugger Robot else 791*16467b97STreehugger Robot @token_source = token_source 792*16467b97STreehugger Robot @channel = options.fetch( :channel, DEFAULT_CHANNEL ) 793*16467b97STreehugger Robot @source_name = options.fetch( :source_name ) { @token_source.source_name rescue nil } 794*16467b97STreehugger Robot tokens = @token_source.to_a 795*16467b97STreehugger Robot end 796*16467b97STreehugger Robot @last_marker = nil 797*16467b97STreehugger Robot @tokens = block_given? ? tokens.select { | t | yield( t, self ) } : tokens 798*16467b97STreehugger Robot @tokens.each_with_index { |t, i| t.index = i } 799*16467b97STreehugger Robot @position = 800*16467b97STreehugger Robot if first_token = @tokens.find { |t| t.channel == @channel } 801*16467b97STreehugger Robot @tokens.index( first_token ) 802*16467b97STreehugger Robot else @tokens.length 803*16467b97STreehugger Robot end 804*16467b97STreehugger Robot end 805*16467b97STreehugger Robot 806*16467b97STreehugger Robot # 807*16467b97STreehugger Robot # resets the token stream and rebuilds it with a potentially new token source. 808*16467b97STreehugger Robot # If no +token_source+ value is provided, the stream will attempt to reset the 809*16467b97STreehugger Robot # current +token_source+ by calling +reset+ on the object. The stream will 810*16467b97STreehugger Robot # then clear the token buffer and attempt to harvest new tokens. Identical in 811*16467b97STreehugger Robot # behavior to CommonTokenStream.new, if a block is provided, tokens will be 812*16467b97STreehugger Robot # yielded and discarded if the block returns a +false+ or +nil+ value. 813*16467b97STreehugger Robot # 814*16467b97STreehugger Robot def rebuild( token_source = nil ) 815*16467b97STreehugger Robot if token_source.nil? 816*16467b97STreehugger Robot @token_source.reset rescue nil 817*16467b97STreehugger Robot else @token_source = token_source 818*16467b97STreehugger Robot end 819*16467b97STreehugger Robot @tokens = block_given? ? @token_source.select { |token| yield( token ) } : 820*16467b97STreehugger Robot @token_source.to_a 821*16467b97STreehugger Robot @tokens.each_with_index { |t, i| t.index = i } 822*16467b97STreehugger Robot @last_marker = nil 823*16467b97STreehugger Robot @position = 824*16467b97STreehugger Robot if first_token = @tokens.find { |t| t.channel == @channel } 825*16467b97STreehugger Robot @tokens.index( first_token ) 826*16467b97STreehugger Robot else @tokens.length 827*16467b97STreehugger Robot end 828*16467b97STreehugger Robot return self 829*16467b97STreehugger Robot end 830*16467b97STreehugger Robot 831*16467b97STreehugger Robot # 832*16467b97STreehugger Robot # tune the stream to a new channel value 833*16467b97STreehugger Robot # 834*16467b97STreehugger Robot def tune_to( channel ) 835*16467b97STreehugger Robot @channel = channel 836*16467b97STreehugger Robot end 837*16467b97STreehugger Robot 838*16467b97STreehugger Robot def token_class 839*16467b97STreehugger Robot @token_source.token_class 840*16467b97STreehugger Robot rescue NoMethodError 841*16467b97STreehugger Robot @position == -1 and fill_buffer 842*16467b97STreehugger Robot @tokens.empty? ? CommonToken : @tokens.first.class 843*16467b97STreehugger Robot end 844*16467b97STreehugger Robot 845*16467b97STreehugger Robot alias index position 846*16467b97STreehugger Robot 847*16467b97STreehugger Robot def size 848*16467b97STreehugger Robot @tokens.length 849*16467b97STreehugger Robot end 850*16467b97STreehugger Robot 851*16467b97STreehugger Robot alias length size 852*16467b97STreehugger Robot 853*16467b97STreehugger Robot ###### State-Control ################################################ 854*16467b97STreehugger Robot 855*16467b97STreehugger Robot # 856*16467b97STreehugger Robot # rewind the stream to its initial state 857*16467b97STreehugger Robot # 858*16467b97STreehugger Robot def reset 859*16467b97STreehugger Robot @position = 0 860*16467b97STreehugger Robot @position += 1 while token = @tokens[ @position ] and 861*16467b97STreehugger Robot token.channel != @channel 862*16467b97STreehugger Robot @last_marker = nil 863*16467b97STreehugger Robot return self 864*16467b97STreehugger Robot end 865*16467b97STreehugger Robot 866*16467b97STreehugger Robot # 867*16467b97STreehugger Robot # bookmark the current position of the input stream 868*16467b97STreehugger Robot # 869*16467b97STreehugger Robot def mark 870*16467b97STreehugger Robot @last_marker = @position 871*16467b97STreehugger Robot end 872*16467b97STreehugger Robot 873*16467b97STreehugger Robot def release( marker = nil ) 874*16467b97STreehugger Robot # do nothing 875*16467b97STreehugger Robot end 876*16467b97STreehugger Robot 877*16467b97STreehugger Robot 878*16467b97STreehugger Robot def rewind( marker = @last_marker, release = true ) 879*16467b97STreehugger Robot seek( marker ) 880*16467b97STreehugger Robot end 881*16467b97STreehugger Robot 882*16467b97STreehugger Robot # 883*16467b97STreehugger Robot # saves the current stream position, yields to the block, 884*16467b97STreehugger Robot # and then ensures the stream's position is restored before 885*16467b97STreehugger Robot # returning the value of the block 886*16467b97STreehugger Robot # 887*16467b97STreehugger Robot def hold( pos = @position ) 888*16467b97STreehugger Robot block_given? or return enum_for( :hold, pos ) 889*16467b97STreehugger Robot begin 890*16467b97STreehugger Robot yield 891*16467b97STreehugger Robot ensure 892*16467b97STreehugger Robot seek( pos ) 893*16467b97STreehugger Robot end 894*16467b97STreehugger Robot end 895*16467b97STreehugger Robot 896*16467b97STreehugger Robot ###### Stream Navigation ########################################### 897*16467b97STreehugger Robot 898*16467b97STreehugger Robot # 899*16467b97STreehugger Robot # advance the stream one step to the next on-channel token 900*16467b97STreehugger Robot # 901*16467b97STreehugger Robot def consume 902*16467b97STreehugger Robot token = @tokens[ @position ] || EOF_TOKEN 903*16467b97STreehugger Robot if @position < @tokens.length 904*16467b97STreehugger Robot @position = future?( 2 ) || @tokens.length 905*16467b97STreehugger Robot end 906*16467b97STreehugger Robot return( token ) 907*16467b97STreehugger Robot end 908*16467b97STreehugger Robot 909*16467b97STreehugger Robot # 910*16467b97STreehugger Robot # jump to the stream position specified by +index+ 911*16467b97STreehugger Robot # note: seek does not check whether or not the 912*16467b97STreehugger Robot # token at the specified position is on-channel, 913*16467b97STreehugger Robot # 914*16467b97STreehugger Robot def seek( index ) 915*16467b97STreehugger Robot @position = index.to_i.bound( 0, @tokens.length ) 916*16467b97STreehugger Robot return self 917*16467b97STreehugger Robot end 918*16467b97STreehugger Robot 919*16467b97STreehugger Robot # 920*16467b97STreehugger Robot # return the type of the on-channel token at look-ahead distance +k+. <tt>k = 1</tt> represents 921*16467b97STreehugger Robot # the current token. +k+ greater than 1 represents upcoming on-channel tokens. A negative 922*16467b97STreehugger Robot # value of +k+ returns previous on-channel tokens consumed, where <tt>k = -1</tt> is the last 923*16467b97STreehugger Robot # on-channel token consumed. <tt>k = 0</tt> has undefined behavior and returns +nil+ 924*16467b97STreehugger Robot # 925*16467b97STreehugger Robot def peek( k = 1 ) 926*16467b97STreehugger Robot tk = look( k ) and return( tk.type ) 927*16467b97STreehugger Robot end 928*16467b97STreehugger Robot 929*16467b97STreehugger Robot # 930*16467b97STreehugger Robot # operates simillarly to #peek, but returns the full token object at look-ahead position +k+ 931*16467b97STreehugger Robot # 932*16467b97STreehugger Robot def look( k = 1 ) 933*16467b97STreehugger Robot index = future?( k ) or return nil 934*16467b97STreehugger Robot @tokens.fetch( index, EOF_TOKEN ) 935*16467b97STreehugger Robot end 936*16467b97STreehugger Robot 937*16467b97STreehugger Robot alias >> look 938*16467b97STreehugger Robot def << k 939*16467b97STreehugger Robot self >> -k 940*16467b97STreehugger Robot end 941*16467b97STreehugger Robot 942*16467b97STreehugger Robot # 943*16467b97STreehugger Robot # returns the index of the on-channel token at look-ahead position +k+ or nil if no other 944*16467b97STreehugger Robot # on-channel tokens exist 945*16467b97STreehugger Robot # 946*16467b97STreehugger Robot def future?( k = 1 ) 947*16467b97STreehugger Robot @position == -1 and fill_buffer 948*16467b97STreehugger Robot 949*16467b97STreehugger Robot case 950*16467b97STreehugger Robot when k == 0 then nil 951*16467b97STreehugger Robot when k < 0 then past?( -k ) 952*16467b97STreehugger Robot when k == 1 then @position 953*16467b97STreehugger Robot else 954*16467b97STreehugger Robot # since the stream only yields on-channel 955*16467b97STreehugger Robot # tokens, the stream can't just go to the 956*16467b97STreehugger Robot # next position, but rather must skip 957*16467b97STreehugger Robot # over off-channel tokens 958*16467b97STreehugger Robot ( k - 1 ).times.inject( @position ) do |cursor, | 959*16467b97STreehugger Robot begin 960*16467b97STreehugger Robot tk = @tokens.at( cursor += 1 ) or return( cursor ) 961*16467b97STreehugger Robot # ^- if tk is nil (i.e. i is outside array limits) 962*16467b97STreehugger Robot end until tk.channel == @channel 963*16467b97STreehugger Robot cursor 964*16467b97STreehugger Robot end 965*16467b97STreehugger Robot end 966*16467b97STreehugger Robot end 967*16467b97STreehugger Robot 968*16467b97STreehugger Robot # 969*16467b97STreehugger Robot # returns the index of the on-channel token at look-behind position +k+ or nil if no other 970*16467b97STreehugger Robot # on-channel tokens exist before the current token 971*16467b97STreehugger Robot # 972*16467b97STreehugger Robot def past?( k = 1 ) 973*16467b97STreehugger Robot @position == -1 and fill_buffer 974*16467b97STreehugger Robot 975*16467b97STreehugger Robot case 976*16467b97STreehugger Robot when k == 0 then nil 977*16467b97STreehugger Robot when @position - k < 0 then nil 978*16467b97STreehugger Robot else 979*16467b97STreehugger Robot 980*16467b97STreehugger Robot k.times.inject( @position ) do |cursor, | 981*16467b97STreehugger Robot begin 982*16467b97STreehugger Robot cursor <= 0 and return( nil ) 983*16467b97STreehugger Robot tk = @tokens.at( cursor -= 1 ) or return( nil ) 984*16467b97STreehugger Robot end until tk.channel == @channel 985*16467b97STreehugger Robot cursor 986*16467b97STreehugger Robot end 987*16467b97STreehugger Robot 988*16467b97STreehugger Robot end 989*16467b97STreehugger Robot end 990*16467b97STreehugger Robot 991*16467b97STreehugger Robot # 992*16467b97STreehugger Robot # yields each token in the stream (including off-channel tokens) 993*16467b97STreehugger Robot # If no block is provided, the method returns an Enumerator object. 994*16467b97STreehugger Robot # #each accepts the same arguments as #tokens 995*16467b97STreehugger Robot # 996*16467b97STreehugger Robot def each( *args ) 997*16467b97STreehugger Robot block_given? or return enum_for( :each, *args ) 998*16467b97STreehugger Robot tokens( *args ).each { |token| yield( token ) } 999*16467b97STreehugger Robot end 1000*16467b97STreehugger Robot 1001*16467b97STreehugger Robot 1002*16467b97STreehugger Robot # 1003*16467b97STreehugger Robot # yields each token in the stream with the given channel value 1004*16467b97STreehugger Robot # If no channel value is given, the stream's tuned channel value will be used. 1005*16467b97STreehugger Robot # If no block is given, an enumerator will be returned. 1006*16467b97STreehugger Robot # 1007*16467b97STreehugger Robot def each_on_channel( channel = @channel ) 1008*16467b97STreehugger Robot block_given? or return enum_for( :each_on_channel, channel ) 1009*16467b97STreehugger Robot for token in @tokens 1010*16467b97STreehugger Robot token.channel == channel and yield( token ) 1011*16467b97STreehugger Robot end 1012*16467b97STreehugger Robot end 1013*16467b97STreehugger Robot 1014*16467b97STreehugger Robot # 1015*16467b97STreehugger Robot # iterates through the token stream, yielding each on channel token along the way. 1016*16467b97STreehugger Robot # After iteration has completed, the stream's position will be restored to where 1017*16467b97STreehugger Robot # it was before #walk was called. While #each or #each_on_channel does not change 1018*16467b97STreehugger Robot # the positions stream during iteration, #walk advances through the stream. This 1019*16467b97STreehugger Robot # makes it possible to look ahead and behind the current token during iteration. 1020*16467b97STreehugger Robot # If no block is given, an enumerator will be returned. 1021*16467b97STreehugger Robot # 1022*16467b97STreehugger Robot def walk 1023*16467b97STreehugger Robot block_given? or return enum_for( :walk ) 1024*16467b97STreehugger Robot initial_position = @position 1025*16467b97STreehugger Robot begin 1026*16467b97STreehugger Robot while token = look and token.type != EOF 1027*16467b97STreehugger Robot consume 1028*16467b97STreehugger Robot yield( token ) 1029*16467b97STreehugger Robot end 1030*16467b97STreehugger Robot return self 1031*16467b97STreehugger Robot ensure 1032*16467b97STreehugger Robot @position = initial_position 1033*16467b97STreehugger Robot end 1034*16467b97STreehugger Robot end 1035*16467b97STreehugger Robot 1036*16467b97STreehugger Robot # 1037*16467b97STreehugger Robot # returns a copy of the token buffer. If +start+ and +stop+ are provided, tokens 1038*16467b97STreehugger Robot # returns a slice of the token buffer from <tt>start..stop</tt>. The parameters 1039*16467b97STreehugger Robot # are converted to integers with their <tt>to_i</tt> methods, and thus tokens 1040*16467b97STreehugger Robot # can be provided to specify start and stop. If a block is provided, tokens are 1041*16467b97STreehugger Robot # yielded and filtered out of the return array if the block returns a +false+ 1042*16467b97STreehugger Robot # or +nil+ value. 1043*16467b97STreehugger Robot # 1044*16467b97STreehugger Robot def tokens( start = nil, stop = nil ) 1045*16467b97STreehugger Robot stop.nil? || stop >= @tokens.length and stop = @tokens.length - 1 1046*16467b97STreehugger Robot start.nil? || stop < 0 and start = 0 1047*16467b97STreehugger Robot tokens = @tokens[ start..stop ] 1048*16467b97STreehugger Robot 1049*16467b97STreehugger Robot if block_given? 1050*16467b97STreehugger Robot tokens.delete_if { |t| not yield( t ) } 1051*16467b97STreehugger Robot end 1052*16467b97STreehugger Robot 1053*16467b97STreehugger Robot return( tokens ) 1054*16467b97STreehugger Robot end 1055*16467b97STreehugger Robot 1056*16467b97STreehugger Robot 1057*16467b97STreehugger Robot def at( i ) 1058*16467b97STreehugger Robot @tokens.at i 1059*16467b97STreehugger Robot end 1060*16467b97STreehugger Robot 1061*16467b97STreehugger Robot # 1062*16467b97STreehugger Robot # identical to Array#[], as applied to the stream's token buffer 1063*16467b97STreehugger Robot # 1064*16467b97STreehugger Robot def []( i, *args ) 1065*16467b97STreehugger Robot @tokens[ i, *args ] 1066*16467b97STreehugger Robot end 1067*16467b97STreehugger Robot 1068*16467b97STreehugger Robot ###### Standard Conversion Methods ############################### 1069*16467b97STreehugger Robot def inspect 1070*16467b97STreehugger Robot string = "#<%p: @token_source=%p @ %p/%p" % 1071*16467b97STreehugger Robot [ self.class, @token_source.class, @position, @tokens.length ] 1072*16467b97STreehugger Robot tk = look( -1 ) and string << " #{ tk.inspect } <--" 1073*16467b97STreehugger Robot tk = look( 1 ) and string << " --> #{ tk.inspect }" 1074*16467b97STreehugger Robot string << '>' 1075*16467b97STreehugger Robot end 1076*16467b97STreehugger Robot 1077*16467b97STreehugger Robot # 1078*16467b97STreehugger Robot # fetches the text content of all tokens between +start+ and +stop+ and 1079*16467b97STreehugger Robot # joins the chunks into a single string 1080*16467b97STreehugger Robot # 1081*16467b97STreehugger Robot def extract_text( start = 0, stop = @tokens.length - 1 ) 1082*16467b97STreehugger Robot start = start.to_i.at_least( 0 ) 1083*16467b97STreehugger Robot stop = stop.to_i.at_most( @tokens.length ) 1084*16467b97STreehugger Robot @tokens[ start..stop ].map! { |t| t.text }.join( '' ) 1085*16467b97STreehugger Robot end 1086*16467b97STreehugger Robot 1087*16467b97STreehugger Robot alias to_s extract_text 1088*16467b97STreehugger Robot 1089*16467b97STreehugger Robotend 1090*16467b97STreehugger Robot 1091*16467b97STreehugger Robotend 1092