xref: /aosp_15_r20/external/antlr/runtime/Ruby/lib/antlr3/dfa.rb (revision 16467b971bd3e2009fad32dd79016f2c7e421deb)
1*16467b97STreehugger Robot#!/usr/bin/ruby
2*16467b97STreehugger Robot# encoding: utf-8
3*16467b97STreehugger Robot
4*16467b97STreehugger Robot=begin LICENSE
5*16467b97STreehugger Robot
6*16467b97STreehugger Robot[The "BSD licence"]
7*16467b97STreehugger RobotCopyright (c) 2009-2010 Kyle Yetter
8*16467b97STreehugger RobotAll rights reserved.
9*16467b97STreehugger Robot
10*16467b97STreehugger RobotRedistribution and use in source and binary forms, with or without
11*16467b97STreehugger Robotmodification, are permitted provided that the following conditions
12*16467b97STreehugger Robotare met:
13*16467b97STreehugger Robot
14*16467b97STreehugger Robot 1. Redistributions of source code must retain the above copyright
15*16467b97STreehugger Robot    notice, this list of conditions and the following disclaimer.
16*16467b97STreehugger Robot 2. Redistributions in binary form must reproduce the above copyright
17*16467b97STreehugger Robot    notice, this list of conditions and the following disclaimer in the
18*16467b97STreehugger Robot    documentation and/or other materials provided with the distribution.
19*16467b97STreehugger Robot 3. The name of the author may not be used to endorse or promote products
20*16467b97STreehugger Robot    derived from this software without specific prior written permission.
21*16467b97STreehugger Robot
22*16467b97STreehugger RobotTHIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
23*16467b97STreehugger RobotIMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
24*16467b97STreehugger RobotOF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
25*16467b97STreehugger RobotIN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
26*16467b97STreehugger RobotINCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
27*16467b97STreehugger RobotNOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
28*16467b97STreehugger RobotDATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
29*16467b97STreehugger RobotTHEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
30*16467b97STreehugger Robot(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
31*16467b97STreehugger RobotTHIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
32*16467b97STreehugger Robot
33*16467b97STreehugger Robot=end
34*16467b97STreehugger Robot
35*16467b97STreehugger Robotmodule ANTLR3
36*16467b97STreehugger Robot
37*16467b97STreehugger Robot=begin rdoc ANTLR3::DFA
38*16467b97STreehugger Robot
39*16467b97STreehugger RobotDFA is a class that implements a finite state machine that chooses between
40*16467b97STreehugger Robotalternatives in a rule based upon lookahead symbols from an input stream.
41*16467b97STreehugger Robot
42*16467b97STreehugger RobotDeterministic Finite Automata (DFA) are finite state machines that are capable
43*16467b97STreehugger Robotof recognizing <i>regular languages</i>. For more background on the subject,
44*16467b97STreehugger Robotcheck out http://en.wikipedia.org/wiki/Deterministic_finite-state_machine or
45*16467b97STreehugger Robotcheck out general ANTLR documentation at http://www.antlr.org
46*16467b97STreehugger Robot
47*16467b97STreehugger RobotANTLR implements most of its decision logic directly using code branching
48*16467b97STreehugger Robotstructures in methods. However, for certain types of decisions, ANTLR will
49*16467b97STreehugger Robotgenerate a special DFA class definition to implement a decision.
50*16467b97STreehugger Robot
51*16467b97STreehugger RobotConceptually, these state machines are defined by a number of states, each state
52*16467b97STreehugger Robotrepresented by an integer indexed upward from zero. State number +0+ is the
53*16467b97STreehugger Robot<i>start state</i> of the machine; every prediction will begin in state +0+. At
54*16467b97STreehugger Roboteach step, the machine examines the next symbol on the input stream, checks the
55*16467b97STreehugger Robotvalue against the transition parameters associated with the current state, and
56*16467b97STreehugger Roboteither moves to a new state number to repeat the process or decides that the
57*16467b97STreehugger Robotmachine cannot transition any further. If the machine cannot transition any
58*16467b97STreehugger Robotfurther and the current state is defined as an <i>accept state</i>, an
59*16467b97STreehugger Robotalternative has been chosen successfully and the prediction procedure ends. If
60*16467b97STreehugger Robotthe current state is not an <i>accept state</i>, the prediction has failed and
61*16467b97STreehugger Robotthere is <i>no viable alternative</i>.
62*16467b97STreehugger Robot
63*16467b97STreehugger RobotIn generated code, ANTLR defines DFA states using seven parameters, each defined
64*16467b97STreehugger Robotas a member of seven seperate array constants -- +MIN+, +MAX+, +EOT+, +EOF+,
65*16467b97STreehugger Robot+SPECIAL+, +ACCEPT+, and +TRANSITION+. The parameters that characterize state
66*16467b97STreehugger Robot+s+ are defined by the value of these lists at index +s+.
67*16467b97STreehugger Robot
68*16467b97STreehugger RobotMIN[s]::
69*16467b97STreehugger Robot  The smallest value of the next input symbol that has
70*16467b97STreehugger Robot  a transition for state +s+
71*16467b97STreehugger RobotMAX[s]::
72*16467b97STreehugger Robot  The largest value of the next input symbol that has
73*16467b97STreehugger Robot  a transition for state +s+
74*16467b97STreehugger RobotTRANSITION[s]::
75*16467b97STreehugger Robot  A list that defines the next state number based upon
76*16467b97STreehugger Robot  the current input symbol.
77*16467b97STreehugger RobotEOT[s]::
78*16467b97STreehugger Robot  If positive, it specifies a state transition in
79*16467b97STreehugger Robot  situations where a non-matching input symbol does
80*16467b97STreehugger Robot  not indicate failure.
81*16467b97STreehugger RobotSPECIAL[s]::
82*16467b97STreehugger Robot  If positive, it indicates that the prediction
83*16467b97STreehugger Robot  algorithm must defer to a special code block
84*16467b97STreehugger Robot  to determine the next state. The value is used
85*16467b97STreehugger Robot  by the special state code to determine the next
86*16467b97STreehugger Robot  state.
87*16467b97STreehugger RobotACCEPT[s]::
88*16467b97STreehugger Robot  If positive and there are no possible state
89*16467b97STreehugger Robot  transitions, this is the alternative number
90*16467b97STreehugger Robot  that has been predicted
91*16467b97STreehugger RobotEOF[s]::
92*16467b97STreehugger Robot  If positive and the input stream has been exhausted,
93*16467b97STreehugger Robot  this is the alternative number that has been predicted.
94*16467b97STreehugger Robot
95*16467b97STreehugger RobotFor more information on the prediction algorithm, check out the #predict method
96*16467b97STreehugger Robotbelow.
97*16467b97STreehugger Robot
98*16467b97STreehugger Robot=end
99*16467b97STreehugger Robot
100*16467b97STreehugger Robotclass DFA
101*16467b97STreehugger Robot  include Constants
102*16467b97STreehugger Robot  include Error
103*16467b97STreehugger Robot
104*16467b97STreehugger Robot  attr_reader :recognizer, :decision_number, :eot, :eof, :min, :max,
105*16467b97STreehugger Robot              :accept, :special, :transition, :special_block
106*16467b97STreehugger Robot
107*16467b97STreehugger Robot  class << self
108*16467b97STreehugger Robot    attr_reader :decision, :eot, :eof, :min, :max,
109*16467b97STreehugger Robot                :accept, :special, :transition
110*16467b97STreehugger Robot
111*16467b97STreehugger Robot    def unpack( *data )
112*16467b97STreehugger Robot      data.empty? and return [].freeze
113*16467b97STreehugger Robot
114*16467b97STreehugger Robot      n = data.length / 2
115*16467b97STreehugger Robot      size = 0
116*16467b97STreehugger Robot      n.times { |i| size += data[ 2*i ] }
117*16467b97STreehugger Robot      if size > 1024
118*16467b97STreehugger Robot        values = Hash.new( 0 )
119*16467b97STreehugger Robot        data.each_slice( 2 ) do |count, value|
120*16467b97STreehugger Robot          values[ value ] += count
121*16467b97STreehugger Robot        end
122*16467b97STreehugger Robot        default = values.keys.max_by { |v| values[ v ] }
123*16467b97STreehugger Robot
124*16467b97STreehugger Robot        unpacked = Hash.new( default )
125*16467b97STreehugger Robot        position = 0
126*16467b97STreehugger Robot        data.each_slice( 2 ) do |count, value|
127*16467b97STreehugger Robot          unless value == default
128*16467b97STreehugger Robot            count.times { |i| unpacked[ position + i ] = value }
129*16467b97STreehugger Robot          end
130*16467b97STreehugger Robot          position += count
131*16467b97STreehugger Robot        end
132*16467b97STreehugger Robot      else
133*16467b97STreehugger Robot        unpacked = []
134*16467b97STreehugger Robot        data.each_slice( 2 ) do |count, value|
135*16467b97STreehugger Robot          unpacked.fill( value, unpacked.length, count )
136*16467b97STreehugger Robot        end
137*16467b97STreehugger Robot      end
138*16467b97STreehugger Robot
139*16467b97STreehugger Robot      return unpacked
140*16467b97STreehugger Robot    end
141*16467b97STreehugger Robot
142*16467b97STreehugger Robot  end
143*16467b97STreehugger Robot
144*16467b97STreehugger Robot  def initialize( recognizer, decision_number = nil,
145*16467b97STreehugger Robot                 eot = nil, eof = nil, min = nil, max = nil,
146*16467b97STreehugger Robot                 accept = nil, special = nil,
147*16467b97STreehugger Robot                 transition = nil, &special_block )
148*16467b97STreehugger Robot    @recognizer = recognizer
149*16467b97STreehugger Robot    @decision_number = decision_number || self.class.decision
150*16467b97STreehugger Robot    @eot = eot || self.class::EOT #.eot
151*16467b97STreehugger Robot    @eof = eof || self.class::EOF #.eof
152*16467b97STreehugger Robot    @min = min || self.class::MIN #.min
153*16467b97STreehugger Robot    @max = max || self.class::MAX #.max
154*16467b97STreehugger Robot    @accept = accept || self.class::ACCEPT #.accept
155*16467b97STreehugger Robot    @special = special || self.class::SPECIAL #.special
156*16467b97STreehugger Robot    @transition = transition || self.class::TRANSITION #.transition
157*16467b97STreehugger Robot    @special_block = special_block
158*16467b97STreehugger Robot  rescue NameError => e
159*16467b97STreehugger Robot    raise unless e.message =~ /uninitialized constant/
160*16467b97STreehugger Robot    constant = e.name
161*16467b97STreehugger Robot    message = Util.tidy( <<-END )
162*16467b97STreehugger Robot    | No #{ constant } information provided.
163*16467b97STreehugger Robot    | DFA cannot be instantiated without providing state array information.
164*16467b97STreehugger Robot    | When DFAs are generated by ANTLR, this information should already be
165*16467b97STreehugger Robot    | provided in the DFA subclass constants.
166*16467b97STreehugger Robot    END
167*16467b97STreehugger Robot  end
168*16467b97STreehugger Robot
169*16467b97STreehugger Robot  if RUBY_VERSION =~ /^1\.9/
170*16467b97STreehugger Robot
171*16467b97STreehugger Robot    def predict( input )
172*16467b97STreehugger Robot      mark = input.mark
173*16467b97STreehugger Robot      state = 0
174*16467b97STreehugger Robot
175*16467b97STreehugger Robot      50000.times do
176*16467b97STreehugger Robot        special_state = @special[ state ]
177*16467b97STreehugger Robot        if special_state >= 0
178*16467b97STreehugger Robot          state = @special_block.call( special_state )
179*16467b97STreehugger Robot          if state == -1
180*16467b97STreehugger Robot            no_viable_alternative( state, input )
181*16467b97STreehugger Robot            return 0
182*16467b97STreehugger Robot          end
183*16467b97STreehugger Robot          input.consume
184*16467b97STreehugger Robot          next
185*16467b97STreehugger Robot        end
186*16467b97STreehugger Robot        @accept[ state ] >= 1 and return @accept[ state ]
187*16467b97STreehugger Robot
188*16467b97STreehugger Robot        # look for a normal char transition
189*16467b97STreehugger Robot
190*16467b97STreehugger Robot        c = input.peek.ord
191*16467b97STreehugger Robot        # the @min and @max arrays contain the bounds of the character (or token type)
192*16467b97STreehugger Robot        # ranges for the transition decisions
193*16467b97STreehugger Robot        if c.between?( @min[ state ], @max[ state ] )
194*16467b97STreehugger Robot          # c - @min[state] is the position of the character within the range
195*16467b97STreehugger Robot          # so for a range like ?a..?z, a match of ?a would be 0,
196*16467b97STreehugger Robot          # ?c would be 2, and ?z would be 25
197*16467b97STreehugger Robot          next_state = @transition[ state ][ c - @min[ state ] ]
198*16467b97STreehugger Robot          if next_state < 0
199*16467b97STreehugger Robot            if @eot[ state ] >= 0
200*16467b97STreehugger Robot              state = @eot[ state ]
201*16467b97STreehugger Robot              input.consume
202*16467b97STreehugger Robot              next
203*16467b97STreehugger Robot            end
204*16467b97STreehugger Robot            no_viable_alternative( state, input )
205*16467b97STreehugger Robot            return 0
206*16467b97STreehugger Robot          end
207*16467b97STreehugger Robot
208*16467b97STreehugger Robot          state = next_state
209*16467b97STreehugger Robot          input.consume
210*16467b97STreehugger Robot          next
211*16467b97STreehugger Robot        end
212*16467b97STreehugger Robot
213*16467b97STreehugger Robot        if @eot[ state ] >= 0
214*16467b97STreehugger Robot          state = @eot[ state ]
215*16467b97STreehugger Robot          input.consume()
216*16467b97STreehugger Robot          next
217*16467b97STreehugger Robot        end
218*16467b97STreehugger Robot
219*16467b97STreehugger Robot        ( c == EOF && @eof[ state ] >= 0 ) and return @accept[ @eof[ state ] ]
220*16467b97STreehugger Robot        no_viable_alternative( state, input )
221*16467b97STreehugger Robot        return 0
222*16467b97STreehugger Robot      end
223*16467b97STreehugger Robot
224*16467b97STreehugger Robot      ANTLR3.bug!( Util.tidy( <<-END ) )
225*16467b97STreehugger Robot      | DFA BANG!
226*16467b97STreehugger Robot      |   The prediction loop has exceeded a maximum limit of 50000 iterations
227*16467b97STreehugger Robot      | ----
228*16467b97STreehugger Robot      | decision: #@decision_number
229*16467b97STreehugger Robot      | description: #{ description }
230*16467b97STreehugger Robot      END
231*16467b97STreehugger Robot    ensure
232*16467b97STreehugger Robot      input.rewind( mark )
233*16467b97STreehugger Robot    end
234*16467b97STreehugger Robot
235*16467b97STreehugger Robot  else
236*16467b97STreehugger Robot
237*16467b97STreehugger Robot    def predict( input )
238*16467b97STreehugger Robot      mark = input.mark
239*16467b97STreehugger Robot      state = 0
240*16467b97STreehugger Robot
241*16467b97STreehugger Robot      50000.times do
242*16467b97STreehugger Robot        special_state = @special[ state ]
243*16467b97STreehugger Robot        if special_state >= 0
244*16467b97STreehugger Robot          state = @special_block.call( special_state )
245*16467b97STreehugger Robot          if state == -1
246*16467b97STreehugger Robot            no_viable_alternative( state, input )
247*16467b97STreehugger Robot            return 0
248*16467b97STreehugger Robot          end
249*16467b97STreehugger Robot          input.consume
250*16467b97STreehugger Robot          next
251*16467b97STreehugger Robot        end
252*16467b97STreehugger Robot        @accept[ state ] >= 1 and return @accept[ state ]
253*16467b97STreehugger Robot
254*16467b97STreehugger Robot        # look for a normal char transition
255*16467b97STreehugger Robot
256*16467b97STreehugger Robot        c = input.peek
257*16467b97STreehugger Robot        # the @min and @max arrays contain the bounds of the character (or token type)
258*16467b97STreehugger Robot        # ranges for the transition decisions
259*16467b97STreehugger Robot        if c.between?( @min[ state ], @max[ state ] )
260*16467b97STreehugger Robot          # c - @min[state] is the position of the character within the range
261*16467b97STreehugger Robot          # so for a range like ?a..?z, a match of ?a would be 0,
262*16467b97STreehugger Robot          # ?c would be 2, and ?z would be 25
263*16467b97STreehugger Robot          next_state = @transition[ state ][ c - @min[ state ] ]
264*16467b97STreehugger Robot          if next_state < 0
265*16467b97STreehugger Robot            if @eot[ state ] >= 0
266*16467b97STreehugger Robot              state = @eot[ state ]
267*16467b97STreehugger Robot              input.consume
268*16467b97STreehugger Robot              next
269*16467b97STreehugger Robot            end
270*16467b97STreehugger Robot            no_viable_alternative( state, input )
271*16467b97STreehugger Robot            return 0
272*16467b97STreehugger Robot          end
273*16467b97STreehugger Robot
274*16467b97STreehugger Robot          state = next_state
275*16467b97STreehugger Robot          input.consume()
276*16467b97STreehugger Robot          next
277*16467b97STreehugger Robot        end
278*16467b97STreehugger Robot        if @eot[ state ] >= 0
279*16467b97STreehugger Robot          state = @eot[ state ]
280*16467b97STreehugger Robot          input.consume()
281*16467b97STreehugger Robot          next
282*16467b97STreehugger Robot        end
283*16467b97STreehugger Robot        ( c == EOF && @eof[ state ] >= 0 ) and return @accept[ @eof[ state ] ]
284*16467b97STreehugger Robot        no_viable_alternative( state, input )
285*16467b97STreehugger Robot        return 0
286*16467b97STreehugger Robot      end
287*16467b97STreehugger Robot
288*16467b97STreehugger Robot      ANTLR3.bug!( Util.tidy( <<-END ) )
289*16467b97STreehugger Robot      | DFA BANG!
290*16467b97STreehugger Robot      |   The prediction loop has exceeded a maximum limit of 50000 iterations
291*16467b97STreehugger Robot      | ----
292*16467b97STreehugger Robot      | decision: #@decision_number
293*16467b97STreehugger Robot      | description: #{ description }
294*16467b97STreehugger Robot      END
295*16467b97STreehugger Robot    ensure
296*16467b97STreehugger Robot      input.rewind( mark )
297*16467b97STreehugger Robot    end
298*16467b97STreehugger Robot
299*16467b97STreehugger Robot  end
300*16467b97STreehugger Robot
301*16467b97STreehugger Robot  def no_viable_alternative( state, input )
302*16467b97STreehugger Robot    raise( BacktrackingFailed ) if @recognizer.state.backtracking > 0
303*16467b97STreehugger Robot    except = NoViableAlternative.new( description, @decision_number, state, input )
304*16467b97STreehugger Robot    error( except )
305*16467b97STreehugger Robot    raise( except )
306*16467b97STreehugger Robot  end
307*16467b97STreehugger Robot
308*16467b97STreehugger Robot  def error( except )
309*16467b97STreehugger Robot    # overridable debugging hook
310*16467b97STreehugger Robot  end
311*16467b97STreehugger Robot
312*16467b97STreehugger Robot  def special_state_transition( state, input )
313*16467b97STreehugger Robot    return -1
314*16467b97STreehugger Robot  end
315*16467b97STreehugger Robot
316*16467b97STreehugger Robot  def description
317*16467b97STreehugger Robot    return "n/a"
318*16467b97STreehugger Robot  end
319*16467b97STreehugger Robotend
320*16467b97STreehugger Robotend
321