1*16467b97STreehugger Robot#!/usr/bin/ruby 2*16467b97STreehugger Robot# encoding: utf-8 3*16467b97STreehugger Robot 4*16467b97STreehugger Robot=begin LICENSE 5*16467b97STreehugger Robot 6*16467b97STreehugger Robot[The "BSD licence"] 7*16467b97STreehugger RobotCopyright (c) 2009-2010 Kyle Yetter 8*16467b97STreehugger RobotAll rights reserved. 9*16467b97STreehugger Robot 10*16467b97STreehugger RobotRedistribution and use in source and binary forms, with or without 11*16467b97STreehugger Robotmodification, are permitted provided that the following conditions 12*16467b97STreehugger Robotare met: 13*16467b97STreehugger Robot 14*16467b97STreehugger Robot 1. Redistributions of source code must retain the above copyright 15*16467b97STreehugger Robot notice, this list of conditions and the following disclaimer. 16*16467b97STreehugger Robot 2. Redistributions in binary form must reproduce the above copyright 17*16467b97STreehugger Robot notice, this list of conditions and the following disclaimer in the 18*16467b97STreehugger Robot documentation and/or other materials provided with the distribution. 19*16467b97STreehugger Robot 3. The name of the author may not be used to endorse or promote products 20*16467b97STreehugger Robot derived from this software without specific prior written permission. 21*16467b97STreehugger Robot 22*16467b97STreehugger RobotTHIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR 23*16467b97STreehugger RobotIMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES 24*16467b97STreehugger RobotOF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. 25*16467b97STreehugger RobotIN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, 26*16467b97STreehugger RobotINCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 27*16467b97STreehugger RobotNOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 28*16467b97STreehugger RobotDATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 29*16467b97STreehugger RobotTHEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 30*16467b97STreehugger Robot(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 31*16467b97STreehugger RobotTHIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 32*16467b97STreehugger Robot 33*16467b97STreehugger Robot=end 34*16467b97STreehugger Robot 35*16467b97STreehugger Robotmodule ANTLR3 36*16467b97STreehugger Robot 37*16467b97STreehugger Robot=begin rdoc ANTLR3::DFA 38*16467b97STreehugger Robot 39*16467b97STreehugger RobotDFA is a class that implements a finite state machine that chooses between 40*16467b97STreehugger Robotalternatives in a rule based upon lookahead symbols from an input stream. 41*16467b97STreehugger Robot 42*16467b97STreehugger RobotDeterministic Finite Automata (DFA) are finite state machines that are capable 43*16467b97STreehugger Robotof recognizing <i>regular languages</i>. For more background on the subject, 44*16467b97STreehugger Robotcheck out http://en.wikipedia.org/wiki/Deterministic_finite-state_machine or 45*16467b97STreehugger Robotcheck out general ANTLR documentation at http://www.antlr.org 46*16467b97STreehugger Robot 47*16467b97STreehugger RobotANTLR implements most of its decision logic directly using code branching 48*16467b97STreehugger Robotstructures in methods. However, for certain types of decisions, ANTLR will 49*16467b97STreehugger Robotgenerate a special DFA class definition to implement a decision. 50*16467b97STreehugger Robot 51*16467b97STreehugger RobotConceptually, these state machines are defined by a number of states, each state 52*16467b97STreehugger Robotrepresented by an integer indexed upward from zero. State number +0+ is the 53*16467b97STreehugger Robot<i>start state</i> of the machine; every prediction will begin in state +0+. At 54*16467b97STreehugger Roboteach step, the machine examines the next symbol on the input stream, checks the 55*16467b97STreehugger Robotvalue against the transition parameters associated with the current state, and 56*16467b97STreehugger Roboteither moves to a new state number to repeat the process or decides that the 57*16467b97STreehugger Robotmachine cannot transition any further. If the machine cannot transition any 58*16467b97STreehugger Robotfurther and the current state is defined as an <i>accept state</i>, an 59*16467b97STreehugger Robotalternative has been chosen successfully and the prediction procedure ends. If 60*16467b97STreehugger Robotthe current state is not an <i>accept state</i>, the prediction has failed and 61*16467b97STreehugger Robotthere is <i>no viable alternative</i>. 62*16467b97STreehugger Robot 63*16467b97STreehugger RobotIn generated code, ANTLR defines DFA states using seven parameters, each defined 64*16467b97STreehugger Robotas a member of seven seperate array constants -- +MIN+, +MAX+, +EOT+, +EOF+, 65*16467b97STreehugger Robot+SPECIAL+, +ACCEPT+, and +TRANSITION+. The parameters that characterize state 66*16467b97STreehugger Robot+s+ are defined by the value of these lists at index +s+. 67*16467b97STreehugger Robot 68*16467b97STreehugger RobotMIN[s]:: 69*16467b97STreehugger Robot The smallest value of the next input symbol that has 70*16467b97STreehugger Robot a transition for state +s+ 71*16467b97STreehugger RobotMAX[s]:: 72*16467b97STreehugger Robot The largest value of the next input symbol that has 73*16467b97STreehugger Robot a transition for state +s+ 74*16467b97STreehugger RobotTRANSITION[s]:: 75*16467b97STreehugger Robot A list that defines the next state number based upon 76*16467b97STreehugger Robot the current input symbol. 77*16467b97STreehugger RobotEOT[s]:: 78*16467b97STreehugger Robot If positive, it specifies a state transition in 79*16467b97STreehugger Robot situations where a non-matching input symbol does 80*16467b97STreehugger Robot not indicate failure. 81*16467b97STreehugger RobotSPECIAL[s]:: 82*16467b97STreehugger Robot If positive, it indicates that the prediction 83*16467b97STreehugger Robot algorithm must defer to a special code block 84*16467b97STreehugger Robot to determine the next state. The value is used 85*16467b97STreehugger Robot by the special state code to determine the next 86*16467b97STreehugger Robot state. 87*16467b97STreehugger RobotACCEPT[s]:: 88*16467b97STreehugger Robot If positive and there are no possible state 89*16467b97STreehugger Robot transitions, this is the alternative number 90*16467b97STreehugger Robot that has been predicted 91*16467b97STreehugger RobotEOF[s]:: 92*16467b97STreehugger Robot If positive and the input stream has been exhausted, 93*16467b97STreehugger Robot this is the alternative number that has been predicted. 94*16467b97STreehugger Robot 95*16467b97STreehugger RobotFor more information on the prediction algorithm, check out the #predict method 96*16467b97STreehugger Robotbelow. 97*16467b97STreehugger Robot 98*16467b97STreehugger Robot=end 99*16467b97STreehugger Robot 100*16467b97STreehugger Robotclass DFA 101*16467b97STreehugger Robot include Constants 102*16467b97STreehugger Robot include Error 103*16467b97STreehugger Robot 104*16467b97STreehugger Robot attr_reader :recognizer, :decision_number, :eot, :eof, :min, :max, 105*16467b97STreehugger Robot :accept, :special, :transition, :special_block 106*16467b97STreehugger Robot 107*16467b97STreehugger Robot class << self 108*16467b97STreehugger Robot attr_reader :decision, :eot, :eof, :min, :max, 109*16467b97STreehugger Robot :accept, :special, :transition 110*16467b97STreehugger Robot 111*16467b97STreehugger Robot def unpack( *data ) 112*16467b97STreehugger Robot data.empty? and return [].freeze 113*16467b97STreehugger Robot 114*16467b97STreehugger Robot n = data.length / 2 115*16467b97STreehugger Robot size = 0 116*16467b97STreehugger Robot n.times { |i| size += data[ 2*i ] } 117*16467b97STreehugger Robot if size > 1024 118*16467b97STreehugger Robot values = Hash.new( 0 ) 119*16467b97STreehugger Robot data.each_slice( 2 ) do |count, value| 120*16467b97STreehugger Robot values[ value ] += count 121*16467b97STreehugger Robot end 122*16467b97STreehugger Robot default = values.keys.max_by { |v| values[ v ] } 123*16467b97STreehugger Robot 124*16467b97STreehugger Robot unpacked = Hash.new( default ) 125*16467b97STreehugger Robot position = 0 126*16467b97STreehugger Robot data.each_slice( 2 ) do |count, value| 127*16467b97STreehugger Robot unless value == default 128*16467b97STreehugger Robot count.times { |i| unpacked[ position + i ] = value } 129*16467b97STreehugger Robot end 130*16467b97STreehugger Robot position += count 131*16467b97STreehugger Robot end 132*16467b97STreehugger Robot else 133*16467b97STreehugger Robot unpacked = [] 134*16467b97STreehugger Robot data.each_slice( 2 ) do |count, value| 135*16467b97STreehugger Robot unpacked.fill( value, unpacked.length, count ) 136*16467b97STreehugger Robot end 137*16467b97STreehugger Robot end 138*16467b97STreehugger Robot 139*16467b97STreehugger Robot return unpacked 140*16467b97STreehugger Robot end 141*16467b97STreehugger Robot 142*16467b97STreehugger Robot end 143*16467b97STreehugger Robot 144*16467b97STreehugger Robot def initialize( recognizer, decision_number = nil, 145*16467b97STreehugger Robot eot = nil, eof = nil, min = nil, max = nil, 146*16467b97STreehugger Robot accept = nil, special = nil, 147*16467b97STreehugger Robot transition = nil, &special_block ) 148*16467b97STreehugger Robot @recognizer = recognizer 149*16467b97STreehugger Robot @decision_number = decision_number || self.class.decision 150*16467b97STreehugger Robot @eot = eot || self.class::EOT #.eot 151*16467b97STreehugger Robot @eof = eof || self.class::EOF #.eof 152*16467b97STreehugger Robot @min = min || self.class::MIN #.min 153*16467b97STreehugger Robot @max = max || self.class::MAX #.max 154*16467b97STreehugger Robot @accept = accept || self.class::ACCEPT #.accept 155*16467b97STreehugger Robot @special = special || self.class::SPECIAL #.special 156*16467b97STreehugger Robot @transition = transition || self.class::TRANSITION #.transition 157*16467b97STreehugger Robot @special_block = special_block 158*16467b97STreehugger Robot rescue NameError => e 159*16467b97STreehugger Robot raise unless e.message =~ /uninitialized constant/ 160*16467b97STreehugger Robot constant = e.name 161*16467b97STreehugger Robot message = Util.tidy( <<-END ) 162*16467b97STreehugger Robot | No #{ constant } information provided. 163*16467b97STreehugger Robot | DFA cannot be instantiated without providing state array information. 164*16467b97STreehugger Robot | When DFAs are generated by ANTLR, this information should already be 165*16467b97STreehugger Robot | provided in the DFA subclass constants. 166*16467b97STreehugger Robot END 167*16467b97STreehugger Robot end 168*16467b97STreehugger Robot 169*16467b97STreehugger Robot if RUBY_VERSION =~ /^1\.9/ 170*16467b97STreehugger Robot 171*16467b97STreehugger Robot def predict( input ) 172*16467b97STreehugger Robot mark = input.mark 173*16467b97STreehugger Robot state = 0 174*16467b97STreehugger Robot 175*16467b97STreehugger Robot 50000.times do 176*16467b97STreehugger Robot special_state = @special[ state ] 177*16467b97STreehugger Robot if special_state >= 0 178*16467b97STreehugger Robot state = @special_block.call( special_state ) 179*16467b97STreehugger Robot if state == -1 180*16467b97STreehugger Robot no_viable_alternative( state, input ) 181*16467b97STreehugger Robot return 0 182*16467b97STreehugger Robot end 183*16467b97STreehugger Robot input.consume 184*16467b97STreehugger Robot next 185*16467b97STreehugger Robot end 186*16467b97STreehugger Robot @accept[ state ] >= 1 and return @accept[ state ] 187*16467b97STreehugger Robot 188*16467b97STreehugger Robot # look for a normal char transition 189*16467b97STreehugger Robot 190*16467b97STreehugger Robot c = input.peek.ord 191*16467b97STreehugger Robot # the @min and @max arrays contain the bounds of the character (or token type) 192*16467b97STreehugger Robot # ranges for the transition decisions 193*16467b97STreehugger Robot if c.between?( @min[ state ], @max[ state ] ) 194*16467b97STreehugger Robot # c - @min[state] is the position of the character within the range 195*16467b97STreehugger Robot # so for a range like ?a..?z, a match of ?a would be 0, 196*16467b97STreehugger Robot # ?c would be 2, and ?z would be 25 197*16467b97STreehugger Robot next_state = @transition[ state ][ c - @min[ state ] ] 198*16467b97STreehugger Robot if next_state < 0 199*16467b97STreehugger Robot if @eot[ state ] >= 0 200*16467b97STreehugger Robot state = @eot[ state ] 201*16467b97STreehugger Robot input.consume 202*16467b97STreehugger Robot next 203*16467b97STreehugger Robot end 204*16467b97STreehugger Robot no_viable_alternative( state, input ) 205*16467b97STreehugger Robot return 0 206*16467b97STreehugger Robot end 207*16467b97STreehugger Robot 208*16467b97STreehugger Robot state = next_state 209*16467b97STreehugger Robot input.consume 210*16467b97STreehugger Robot next 211*16467b97STreehugger Robot end 212*16467b97STreehugger Robot 213*16467b97STreehugger Robot if @eot[ state ] >= 0 214*16467b97STreehugger Robot state = @eot[ state ] 215*16467b97STreehugger Robot input.consume() 216*16467b97STreehugger Robot next 217*16467b97STreehugger Robot end 218*16467b97STreehugger Robot 219*16467b97STreehugger Robot ( c == EOF && @eof[ state ] >= 0 ) and return @accept[ @eof[ state ] ] 220*16467b97STreehugger Robot no_viable_alternative( state, input ) 221*16467b97STreehugger Robot return 0 222*16467b97STreehugger Robot end 223*16467b97STreehugger Robot 224*16467b97STreehugger Robot ANTLR3.bug!( Util.tidy( <<-END ) ) 225*16467b97STreehugger Robot | DFA BANG! 226*16467b97STreehugger Robot | The prediction loop has exceeded a maximum limit of 50000 iterations 227*16467b97STreehugger Robot | ---- 228*16467b97STreehugger Robot | decision: #@decision_number 229*16467b97STreehugger Robot | description: #{ description } 230*16467b97STreehugger Robot END 231*16467b97STreehugger Robot ensure 232*16467b97STreehugger Robot input.rewind( mark ) 233*16467b97STreehugger Robot end 234*16467b97STreehugger Robot 235*16467b97STreehugger Robot else 236*16467b97STreehugger Robot 237*16467b97STreehugger Robot def predict( input ) 238*16467b97STreehugger Robot mark = input.mark 239*16467b97STreehugger Robot state = 0 240*16467b97STreehugger Robot 241*16467b97STreehugger Robot 50000.times do 242*16467b97STreehugger Robot special_state = @special[ state ] 243*16467b97STreehugger Robot if special_state >= 0 244*16467b97STreehugger Robot state = @special_block.call( special_state ) 245*16467b97STreehugger Robot if state == -1 246*16467b97STreehugger Robot no_viable_alternative( state, input ) 247*16467b97STreehugger Robot return 0 248*16467b97STreehugger Robot end 249*16467b97STreehugger Robot input.consume 250*16467b97STreehugger Robot next 251*16467b97STreehugger Robot end 252*16467b97STreehugger Robot @accept[ state ] >= 1 and return @accept[ state ] 253*16467b97STreehugger Robot 254*16467b97STreehugger Robot # look for a normal char transition 255*16467b97STreehugger Robot 256*16467b97STreehugger Robot c = input.peek 257*16467b97STreehugger Robot # the @min and @max arrays contain the bounds of the character (or token type) 258*16467b97STreehugger Robot # ranges for the transition decisions 259*16467b97STreehugger Robot if c.between?( @min[ state ], @max[ state ] ) 260*16467b97STreehugger Robot # c - @min[state] is the position of the character within the range 261*16467b97STreehugger Robot # so for a range like ?a..?z, a match of ?a would be 0, 262*16467b97STreehugger Robot # ?c would be 2, and ?z would be 25 263*16467b97STreehugger Robot next_state = @transition[ state ][ c - @min[ state ] ] 264*16467b97STreehugger Robot if next_state < 0 265*16467b97STreehugger Robot if @eot[ state ] >= 0 266*16467b97STreehugger Robot state = @eot[ state ] 267*16467b97STreehugger Robot input.consume 268*16467b97STreehugger Robot next 269*16467b97STreehugger Robot end 270*16467b97STreehugger Robot no_viable_alternative( state, input ) 271*16467b97STreehugger Robot return 0 272*16467b97STreehugger Robot end 273*16467b97STreehugger Robot 274*16467b97STreehugger Robot state = next_state 275*16467b97STreehugger Robot input.consume() 276*16467b97STreehugger Robot next 277*16467b97STreehugger Robot end 278*16467b97STreehugger Robot if @eot[ state ] >= 0 279*16467b97STreehugger Robot state = @eot[ state ] 280*16467b97STreehugger Robot input.consume() 281*16467b97STreehugger Robot next 282*16467b97STreehugger Robot end 283*16467b97STreehugger Robot ( c == EOF && @eof[ state ] >= 0 ) and return @accept[ @eof[ state ] ] 284*16467b97STreehugger Robot no_viable_alternative( state, input ) 285*16467b97STreehugger Robot return 0 286*16467b97STreehugger Robot end 287*16467b97STreehugger Robot 288*16467b97STreehugger Robot ANTLR3.bug!( Util.tidy( <<-END ) ) 289*16467b97STreehugger Robot | DFA BANG! 290*16467b97STreehugger Robot | The prediction loop has exceeded a maximum limit of 50000 iterations 291*16467b97STreehugger Robot | ---- 292*16467b97STreehugger Robot | decision: #@decision_number 293*16467b97STreehugger Robot | description: #{ description } 294*16467b97STreehugger Robot END 295*16467b97STreehugger Robot ensure 296*16467b97STreehugger Robot input.rewind( mark ) 297*16467b97STreehugger Robot end 298*16467b97STreehugger Robot 299*16467b97STreehugger Robot end 300*16467b97STreehugger Robot 301*16467b97STreehugger Robot def no_viable_alternative( state, input ) 302*16467b97STreehugger Robot raise( BacktrackingFailed ) if @recognizer.state.backtracking > 0 303*16467b97STreehugger Robot except = NoViableAlternative.new( description, @decision_number, state, input ) 304*16467b97STreehugger Robot error( except ) 305*16467b97STreehugger Robot raise( except ) 306*16467b97STreehugger Robot end 307*16467b97STreehugger Robot 308*16467b97STreehugger Robot def error( except ) 309*16467b97STreehugger Robot # overridable debugging hook 310*16467b97STreehugger Robot end 311*16467b97STreehugger Robot 312*16467b97STreehugger Robot def special_state_transition( state, input ) 313*16467b97STreehugger Robot return -1 314*16467b97STreehugger Robot end 315*16467b97STreehugger Robot 316*16467b97STreehugger Robot def description 317*16467b97STreehugger Robot return "n/a" 318*16467b97STreehugger Robot end 319*16467b97STreehugger Robotend 320*16467b97STreehugger Robotend 321