xref: /aosp_15_r20/external/clang/www/analyzer/checker_dev_manual.html (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin Li<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2*67e74705SXin Li          "http://www.w3.org/TR/html4/strict.dtd">
3*67e74705SXin Li<html>
4*67e74705SXin Li<head>
5*67e74705SXin Li  <title>Checker Developer Manual</title>
6*67e74705SXin Li  <link type="text/css" rel="stylesheet" href="menu.css">
7*67e74705SXin Li  <link type="text/css" rel="stylesheet" href="content.css">
8*67e74705SXin Li  <script type="text/javascript" src="scripts/menu.js"></script>
9*67e74705SXin Li</head>
10*67e74705SXin Li<body>
11*67e74705SXin Li
12*67e74705SXin Li<div id="page">
13*67e74705SXin Li<!--#include virtual="menu.html.incl"-->
14*67e74705SXin Li
15*67e74705SXin Li<div id="content">
16*67e74705SXin Li
17*67e74705SXin Li<h3 style="color:red">This Page Is Under Construction</h3>
18*67e74705SXin Li
19*67e74705SXin Li<h1>Checker Developer Manual</h1>
20*67e74705SXin Li
21*67e74705SXin Li<p>The static analyzer engine performs path-sensitive exploration of the program and
22*67e74705SXin Lirelies on a set of checkers to implement the logic for detecting and
23*67e74705SXin Liconstructing specific bug reports. Anyone who is interested in implementing their own
24*67e74705SXin Lichecker, should check out the Building a Checker in 24 Hours talk
25*67e74705SXin Li(<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
26*67e74705SXin Li <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>)
27*67e74705SXin Liand refer to this page for additional information on writing a checker. The static analyzer is a
28*67e74705SXin Lipart of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
29*67e74705SXin Liand <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
30*67e74705SXin Lifor developer guidelines and send your questions and proposals to
31*67e74705SXin Li<a href=http://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>.
32*67e74705SXin Li</p>
33*67e74705SXin Li
34*67e74705SXin Li    <ul>
35*67e74705SXin Li      <li><a href="#start">Getting Started</a></li>
36*67e74705SXin Li      <li><a href="#analyzer">Static Analyzer Overview</a>
37*67e74705SXin Li      <ul>
38*67e74705SXin Li        <li><a href="#interaction">Interaction with Checkers</a></li>
39*67e74705SXin Li        <li><a href="#values">Representing Values</a></li>
40*67e74705SXin Li      </ul></li>
41*67e74705SXin Li      <li><a href="#idea">Idea for a Checker</a></li>
42*67e74705SXin Li      <li><a href="#registration">Checker Registration</a></li>
43*67e74705SXin Li      <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li>
44*67e74705SXin Li      <li><a href="#extendingstates">Custom Program States</a></li>
45*67e74705SXin Li      <li><a href="#bugs">Bug Reports</a></li>
46*67e74705SXin Li      <li><a href="#ast">AST Visitors</a></li>
47*67e74705SXin Li      <li><a href="#testing">Testing</a></li>
48*67e74705SXin Li      <li><a href="#commands">Useful Commands/Debugging Hints</a></li>
49*67e74705SXin Li      <li><a href="#additioninformation">Additional Sources of Information</a></li>
50*67e74705SXin Li      <li><a href="#links">Useful Links</a></li>
51*67e74705SXin Li    </ul>
52*67e74705SXin Li
53*67e74705SXin Li<h2 id=start>Getting Started</h2>
54*67e74705SXin Li  <ul>
55*67e74705SXin Li    <li>To check out the source code and build the project, follow steps 1-4 of
56*67e74705SXin Li    the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
57*67e74705SXin Li  page.</li>
58*67e74705SXin Li
59*67e74705SXin Li    <li>The analyzer source code is located under the Clang source tree:
60*67e74705SXin Li    <br><tt>
61*67e74705SXin Li    $ <b>cd llvm/tools/clang</b>
62*67e74705SXin Li    </tt>
63*67e74705SXin Li    <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
64*67e74705SXin Li     <tt>test/Analysis</tt>.</li>
65*67e74705SXin Li
66*67e74705SXin Li    <li>The analyzer regression tests can be executed from the Clang's build
67*67e74705SXin Li    directory:
68*67e74705SXin Li    <br><tt>
69*67e74705SXin Li    $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
70*67e74705SXin Li    </tt></li>
71*67e74705SXin Li
72*67e74705SXin Li    <li>Analyze a file with the specified checker:
73*67e74705SXin Li    <br><tt>
74*67e74705SXin Li    $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
75*67e74705SXin Li    </tt></li>
76*67e74705SXin Li
77*67e74705SXin Li    <li>List the available checkers:
78*67e74705SXin Li    <br><tt>
79*67e74705SXin Li    $ <b>clang -cc1 -analyzer-checker-help</b>
80*67e74705SXin Li    </tt></li>
81*67e74705SXin Li
82*67e74705SXin Li    <li>See the analyzer help for different output formats, fine tuning, and
83*67e74705SXin Li    debug options:
84*67e74705SXin Li    <br><tt>
85*67e74705SXin Li    $ <b>clang -cc1 -help | grep "analyzer"</b>
86*67e74705SXin Li    </tt></li>
87*67e74705SXin Li
88*67e74705SXin Li  </ul>
89*67e74705SXin Li
90*67e74705SXin Li<h2 id=analyzer>Static Analyzer Overview</h2>
91*67e74705SXin Li  The analyzer core performs symbolic execution of the given program. All the
92*67e74705SXin Li  input values are represented with symbolic values; further, the engine deduces
93*67e74705SXin Li  the values of all the expressions in the program based on the input symbols
94*67e74705SXin Li  and the path. The execution is path sensitive and every possible path through
95*67e74705SXin Li  the program is explored. The explored execution traces are represented with
96*67e74705SXin Li  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object.
97*67e74705SXin Li  Each node of the graph is
98*67e74705SXin Li  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
99*67e74705SXin Li  which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
100*67e74705SXin Li  <p>
101*67e74705SXin Li  <a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
102*67e74705SXin Li  represents the corresponding location in the program (or the CFG).
103*67e74705SXin Li  <tt>ProgramPoint</tt> is also used to record additional information on
104*67e74705SXin Li  when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
105*67e74705SXin Li  kind means that the state is the result of purging dead symbols - the
106*67e74705SXin Li  analyzer's equivalent of garbage collection.
107*67e74705SXin Li  <p>
108*67e74705SXin Li  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
109*67e74705SXin Li  represents abstract state of the program. It consists of:
110*67e74705SXin Li  <ul>
111*67e74705SXin Li    <li><tt>Environment</tt> - a mapping from source code expressions to symbolic
112*67e74705SXin Li    values
113*67e74705SXin Li    <li><tt>Store</tt> - a mapping from memory locations to symbolic values
114*67e74705SXin Li    <li><tt>GenericDataMap</tt> - constraints on symbolic values
115*67e74705SXin Li  </ul>
116*67e74705SXin Li
117*67e74705SXin Li  <h3 id=interaction>Interaction with Checkers</h3>
118*67e74705SXin Li  Checkers are not merely passive receivers of the analyzer core changes - they
119*67e74705SXin Li  actively participate in the <tt>ProgramState</tt> construction through the
120*67e74705SXin Li  <tt>GenericDataMap</tt> which can be used to store the checker-defined part
121*67e74705SXin Li  of the state. Each time the analyzer engine explores a new statement, it
122*67e74705SXin Li  notifies each checker registered to listen for that statement, giving it an
123*67e74705SXin Li  opportunity to either report a bug or modify the state. (As a rule of thumb,
124*67e74705SXin Li  the checker itself should be stateless.) The checkers are called one after another
125*67e74705SXin Li  in the predefined order; thus, calling all the checkers adds a chain to the
126*67e74705SXin Li  <tt>ExplodedGraph</tt>.
127*67e74705SXin Li
128*67e74705SXin Li  <h3 id=values>Representing Values</h3>
129*67e74705SXin Li  During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
130*67e74705SXin Li  objects are used to represent the semantic evaluation of expressions.
131*67e74705SXin Li  They can represent things like concrete
132*67e74705SXin Li  integers, symbolic values, or memory locations (which are memory regions).
133*67e74705SXin Li  They are a discriminated union of "values", symbolic and otherwise.
134*67e74705SXin Li  If a value isn't symbolic, usually that means there is no symbolic
135*67e74705SXin Li  information to track. For example, if the value was an integer, such as
136*67e74705SXin Li  <tt>42</tt>, it would be a <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>,
137*67e74705SXin Li  and the checker doesn't usually need to track any state with the concrete
138*67e74705SXin Li  number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be
139*67e74705SXin Li  a symbolic value. This happens when the analyzer cannot reason about something
140*67e74705SXin Li  (yet). An example is floating point numbers. In such cases, the
141*67e74705SXin Li  <tt>SVal</tt> will evaluate to <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>.
142*67e74705SXin Li  This represents a case that is outside the realm of the analyzer's reasoning
143*67e74705SXin Li  capabilities. <tt>SVals</tt> are value objects and their values can be viewed
144*67e74705SXin Li  using the <tt>.dump()</tt> method. Often they wrap persistent objects such as
145*67e74705SXin Li  symbols or regions.
146*67e74705SXin Li  <p>
147*67e74705SXin Li  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
148*67e74705SXin Li  is meant to represent abstract, but named, symbolic value. Symbols represent
149*67e74705SXin Li  an actual (immutable) value. We might not know what its specific value is, but
150*67e74705SXin Li  we can associate constraints with that value as we analyze a path. For
151*67e74705SXin Li  example, we might record that the value of a symbol is greater than
152*67e74705SXin Li  <tt>0</tt>, etc.
153*67e74705SXin Li  <p>
154*67e74705SXin Li
155*67e74705SXin Li  <p>
156*67e74705SXin Li  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
157*67e74705SXin Li  It is used to provide a lexicon of how to describe abstract memory. Regions can
158*67e74705SXin Li  layer on top of other regions, providing a layered approach to representing memory.
159*67e74705SXin Li  For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
160*67e74705SXin Li  but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
161*67e74705SXin Li  be used to represent the memory associated with a specific field of that object.
162*67e74705SXin Li  So how do we represent symbolic memory regions? That's what
163*67e74705SXin Li  <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
164*67e74705SXin Li  is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
165*67e74705SXin Li  symbol is unique and has a unique name; that symbol names the region.
166*67e74705SXin Li
167*67e74705SXin Li  <P>
168*67e74705SXin Li  Let's see how the analyzer processes the expressions in the following example:
169*67e74705SXin Li  <p>
170*67e74705SXin Li  <pre class="code_example">
171*67e74705SXin Li  int foo(int x) {
172*67e74705SXin Li     int y = x * 2;
173*67e74705SXin Li     int z = x;
174*67e74705SXin Li     ...
175*67e74705SXin Li  }
176*67e74705SXin Li  </pre>
177*67e74705SXin Li  <p>
178*67e74705SXin LiLet's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
179*67e74705SXin Liwe first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
180*67e74705SXin Lithis case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
181*67e74705SXin LiAfterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
182*67e74705SXin Liwhich references the value <b>currently bound</b> to <tt>x</tt>. That value is
183*67e74705SXin Lisymbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
184*67e74705SXin LiLet's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
185*67e74705SXin Liand get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
186*67e74705SXin Liwe evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
187*67e74705SXin Liand create a new <tt>SVal</tt> that represents their multiplication (which in
188*67e74705SXin Lithis case is a new symbolic expression, which we might call <tt>$1</tt>). When we
189*67e74705SXin Lievaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
190*67e74705SXin Liand then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
191*67e74705SXin Lito the <tt>MemRegion</tt> in the symbolic store.
192*67e74705SXin Li<br>
193*67e74705SXin LiThe second line is similar. When we evaluate <tt>x</tt> again, we do the same
194*67e74705SXin Lidance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
195*67e74705SXin Limight reference the same underlying values.
196*67e74705SXin Li
197*67e74705SXin Li<p>
198*67e74705SXin LiTo summarize, MemRegions are unique names for blocks of memory. Symbols are
199*67e74705SXin Liunique names for abstract symbolic values. Some MemRegions represents abstract
200*67e74705SXin Lisymbolic chunks of memory, and thus are also based on symbols. SVals are just
201*67e74705SXin Lireferences to values, and can reference either MemRegions, Symbols, or concrete
202*67e74705SXin Livalues (e.g., the number 1).
203*67e74705SXin Li
204*67e74705SXin Li  <!--
205*67e74705SXin Li  TODO: Add a picture.
206*67e74705SXin Li  <br>
207*67e74705SXin Li  Symbols<br>
208*67e74705SXin Li  FunctionalObjects are used throughout.
209*67e74705SXin Li  -->
210*67e74705SXin Li
211*67e74705SXin Li<h2 id=idea>Idea for a Checker</h2>
212*67e74705SXin Li  Here are several questions which you should consider when evaluating your
213*67e74705SXin Li  checker idea:
214*67e74705SXin Li  <ul>
215*67e74705SXin Li    <li>Can the check be effectively implemented without path-sensitive
216*67e74705SXin Li    analysis? See <a href="#ast">AST Visitors</a>.</li>
217*67e74705SXin Li
218*67e74705SXin Li    <li>How high the false positive rate is going to be? Looking at the occurrences
219*67e74705SXin Li    of the issue you want to write a checker for in the existing code bases might
220*67e74705SXin Li    give you some ideas. </li>
221*67e74705SXin Li
222*67e74705SXin Li    <li>How the current limitations of the analysis will effect the false alarm
223*67e74705SXin Li    rate? Currently, the analyzer only reasons about one procedure at a time (no
224*67e74705SXin Li    inter-procedural analysis). Also, it uses a simple range tracking based
225*67e74705SXin Li    solver to model symbolic execution.</li>
226*67e74705SXin Li
227*67e74705SXin Li    <li>Consult the <a
228*67e74705SXin Li    href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&amp;bug_status=NEW&amp;bug_status=REOPENED&amp;version=trunk&amp;component=Static%20Analyzer&amp;product=clang">Bugzilla database</a>
229*67e74705SXin Li    to get some ideas for new checkers and consider starting with improving/fixing
230*67e74705SXin Li    bugs in the existing checkers.</li>
231*67e74705SXin Li  </ul>
232*67e74705SXin Li
233*67e74705SXin Li<p>Once an idea for a checker has been chosen, there are two key decisions that
234*67e74705SXin Lineed to be made:
235*67e74705SXin Li  <ul>
236*67e74705SXin Li    <li> Which events the checker should be tracking. This is discussed in more
237*67e74705SXin Li    detail in the section <a href="#events_callbacks">Events, Callbacks, and
238*67e74705SXin Li    Checker Class Structure</a>.
239*67e74705SXin Li    <li> What checker-specific data needs to be stored as part of the program
240*67e74705SXin Li    state (if any). This should be minimized as much as possible. More detail about
241*67e74705SXin Li    implementing custom program state is given in section <a
242*67e74705SXin Li    href="#extendingstates">Custom Program States</a>.
243*67e74705SXin Li  </ul>
244*67e74705SXin Li
245*67e74705SXin Li
246*67e74705SXin Li<h2 id=registration>Checker Registration</h2>
247*67e74705SXin Li  All checker implementation files are located in
248*67e74705SXin Li  <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe
249*67e74705SXin Li  how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of
250*67e74705SXin Li  stream APIs, was registered with the analyzer.
251*67e74705SXin Li  Similar steps should be followed for a new checker.
252*67e74705SXin Li<ol>
253*67e74705SXin Li  <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was
254*67e74705SXin Li  created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>.
255*67e74705SXin Li  <li>The following registration code was added to the implementation file:
256*67e74705SXin Li<pre class="code_example">
257*67e74705SXin Livoid ento::registerSimpleStreamChecker(CheckerManager &amp;mgr) {
258*67e74705SXin Li  mgr.registerChecker&lt;SimpleStreamChecker&gt();
259*67e74705SXin Li}
260*67e74705SXin Li</pre>
261*67e74705SXin Li<li>A package was selected for the checker and the checker was defined in the
262*67e74705SXin Litable of checkers at <tt>lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Since all
263*67e74705SXin Licheckers should first be developed as "alpha", and the SimpleStreamChecker
264*67e74705SXin Liperforms UNIX API checks, the correct package is "alpha.unix", and the following
265*67e74705SXin Liwas added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>:
266*67e74705SXin Li<pre class="code_example">
267*67e74705SXin Lilet ParentPackage = UnixAlpha in {
268*67e74705SXin Li...
269*67e74705SXin Lidef SimpleStreamChecker : Checker<"SimpleStream">,
270*67e74705SXin Li  HelpText<"Check for misuses of stream APIs">,
271*67e74705SXin Li  DescFile<"SimpleStreamChecker.cpp">;
272*67e74705SXin Li...
273*67e74705SXin Li} // end "alpha.unix"
274*67e74705SXin Li</pre>
275*67e74705SXin Li
276*67e74705SXin Li<li>The source code file was made visible to CMake by adding it to
277*67e74705SXin Li<tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
278*67e74705SXin Li
279*67e74705SXin Li</ol>
280*67e74705SXin Li
281*67e74705SXin LiAfter adding a new checker to the analyzer, one can verify that the new checker
282*67e74705SXin Liwas successfully added by seeing if it appears in the list of available checkers:
283*67e74705SXin Li<br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
284*67e74705SXin Li
285*67e74705SXin Li<h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2>
286*67e74705SXin Li
287*67e74705SXin Li<p> All checkers inherit from the <tt><a
288*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html">
289*67e74705SXin LiChecker</a></tt> template class; the template parameter(s) describe the type of
290*67e74705SXin Lievents that the checker is interested in processing. The various types of events
291*67e74705SXin Lithat are available are described in the file <a
292*67e74705SXin Lihref="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
293*67e74705SXin LiCheckerDocumentation.cpp</a>
294*67e74705SXin Li
295*67e74705SXin Li<p> For each event type requested, a corresponding callback function must be
296*67e74705SXin Lidefined in the checker class (<a
297*67e74705SXin Lihref="http://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html">
298*67e74705SXin LiCheckerDocumentation.cpp</a> shows the
299*67e74705SXin Licorrect function name and signature for each event type).
300*67e74705SXin Li
301*67e74705SXin Li<p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to
302*67e74705SXin Litake action at the following times:
303*67e74705SXin Li
304*67e74705SXin Li<ul>
305*67e74705SXin Li<li>Before making a call to a function, check if the function is <tt>fclose</tt>.
306*67e74705SXin LiIf so, check the parameter being passed.
307*67e74705SXin Li<li>After making a function call, check if the function is <tt>fopen</tt>. If
308*67e74705SXin Liso, process the return value.
309*67e74705SXin Li<li>When values go out of scope, check whether they are still-open file
310*67e74705SXin Lidescriptors, and report a bug if so. In addition, remove any information about
311*67e74705SXin Lithem from the program state in order to keep the state as small as possible.
312*67e74705SXin Li<li>When file pointers "escape" (are used in a way that the analyzer can no longer
313*67e74705SXin Litrack them), mark them as such. This prevents false positives in the cases where
314*67e74705SXin Lithe analyzer cannot be sure whether the file was closed or not.
315*67e74705SXin Li</ul>
316*67e74705SXin Li
317*67e74705SXin Li<p>These events that will be used for each of these actions are, respectively, <a
318*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>,
319*67e74705SXin Li<a
320*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>,
321*67e74705SXin Li<a
322*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>,
323*67e74705SXin Liand <a
324*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>.
325*67e74705SXin LiThe high-level structure of the checker's class is thus:
326*67e74705SXin Li
327*67e74705SXin Li<pre class="code_example">
328*67e74705SXin Liclass SimpleStreamChecker : public Checker&lt;check::PreCall,
329*67e74705SXin Li                                           check::PostCall,
330*67e74705SXin Li                                           check::DeadSymbols,
331*67e74705SXin Li                                           check::PointerEscape&gt; {
332*67e74705SXin Lipublic:
333*67e74705SXin Li
334*67e74705SXin Li  void checkPreCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
335*67e74705SXin Li
336*67e74705SXin Li  void checkPostCall(const CallEvent &amp;Call, CheckerContext &amp;C) const;
337*67e74705SXin Li
338*67e74705SXin Li  void checkDeadSymbols(SymbolReaper &amp;SR, CheckerContext &amp;C) const;
339*67e74705SXin Li
340*67e74705SXin Li  ProgramStateRef checkPointerEscape(ProgramStateRef State,
341*67e74705SXin Li                                     const InvalidatedSymbols &amp;Escaped,
342*67e74705SXin Li                                     const CallEvent *Call,
343*67e74705SXin Li                                     PointerEscapeKind Kind) const;
344*67e74705SXin Li};
345*67e74705SXin Li</pre>
346*67e74705SXin Li
347*67e74705SXin Li<h2 id=extendingstates>Custom Program States</h2>
348*67e74705SXin Li
349*67e74705SXin Li<p> Checkers often need to keep track of information specific to the checks they
350*67e74705SXin Liperform. However, since checkers have no guarantee about the order in which the
351*67e74705SXin Liprogram will be explored, or even that all possible paths will be explored, this
352*67e74705SXin Listate information cannot be kept within individual checkers. Therefore, if
353*67e74705SXin Licheckers need to store custom information, they need to add new categories of
354*67e74705SXin Lidata to the <tt>ProgramState</tt>. The preferred way to do so is to use one of
355*67e74705SXin Liseveral macros designed for this purpose. They are:
356*67e74705SXin Li
357*67e74705SXin Li<ul>
358*67e74705SXin Li<li><a
359*67e74705SXin Lihref="http://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>:
360*67e74705SXin LiUsed when the state information is a single value. The methods available for
361*67e74705SXin Listate types declared with this macro are <tt>get</tt>, <tt>set</tt>, and
362*67e74705SXin Li<tt>remove</tt>.
363*67e74705SXin Li<li><a
364*67e74705SXin Lihref="http://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>:
365*67e74705SXin LiUsed when the state information is a list of values. The methods available for
366*67e74705SXin Listate types declared with this macro are <tt>add</tt>, <tt>get</tt>,
367*67e74705SXin Li<tt>remove</tt>, and <tt>contains</tt>.
368*67e74705SXin Li<li><a
369*67e74705SXin Lihref="http://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>:
370*67e74705SXin LiUsed when the state information is a set of values. The methods available for
371*67e74705SXin Listate types declared with this macro are <tt>add</tt>, <tt>get</tt>,
372*67e74705SXin Li<tt>remove</tt>, and <tt>contains</tt>.
373*67e74705SXin Li<li><a
374*67e74705SXin Lihref="http://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>:
375*67e74705SXin LiUsed when the state information is a map from a key to a value. The methods
376*67e74705SXin Liavailable for state types declared with this macro are <tt>add</tt>,
377*67e74705SXin Li<tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>.
378*67e74705SXin Li</ul>
379*67e74705SXin Li
380*67e74705SXin Li<p>All of these macros take as parameters the name to be used for the custom
381*67e74705SXin Licategory of state information and the data type(s) to be used for storage. The
382*67e74705SXin Lidata type(s) specified will become the parameter type and/or return type of the
383*67e74705SXin Limethods that manipulate the new category of state information. Each of these
384*67e74705SXin Limethods are templated with the name of the custom data type.
385*67e74705SXin Li
386*67e74705SXin Li<p>For example, a common case is the need to track data associated with a
387*67e74705SXin Lisymbolic expression; a map type is the most logical way to implement this. The
388*67e74705SXin Likey for this map will be a pointer to a symbolic expression
389*67e74705SXin Li(<tt>SymbolRef</tt>). If the data type to be associated with the symbolic
390*67e74705SXin Liexpression is an integer, then the custom category of state information would be
391*67e74705SXin Lideclared as
392*67e74705SXin Li
393*67e74705SXin Li<pre class="code_example">
394*67e74705SXin LiREGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int)
395*67e74705SXin Li</pre>
396*67e74705SXin Li
397*67e74705SXin LiThe data would be accessed with the function
398*67e74705SXin Li
399*67e74705SXin Li<pre class="code_example">
400*67e74705SXin LiProgramStateRef state;
401*67e74705SXin LiSymbolRef Sym;
402*67e74705SXin Li...
403*67e74705SXin Liint currentlValue = state-&gt;get&lt;ExampleDataType&gt;(Sym);
404*67e74705SXin Li</pre>
405*67e74705SXin Li
406*67e74705SXin Liand set with the function
407*67e74705SXin Li
408*67e74705SXin Li<pre class="code_example">
409*67e74705SXin LiProgramStateRef state;
410*67e74705SXin LiSymbolRef Sym;
411*67e74705SXin Liint newValue;
412*67e74705SXin Li...
413*67e74705SXin LiProgramStateRef newState = state-&gt;set&lt;ExampleDataType&gt;(Sym, newValue);
414*67e74705SXin Li</pre>
415*67e74705SXin Li
416*67e74705SXin Li<p>In addition, the macros define a data type used for storing the data of the
417*67e74705SXin Linew data category; the name of this type is the name of the data category with
418*67e74705SXin Li"Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply
419*67e74705SXin Libe passed data type; for the other three macros, this will be a specialized
420*67e74705SXin Liversion of the <a
421*67e74705SXin Lihref="http://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>,
422*67e74705SXin Li<a
423*67e74705SXin Lihref="http://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>,
424*67e74705SXin Lior <a
425*67e74705SXin Lihref="http://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a>
426*67e74705SXin Litemplated class. For the <tt>ExampleDataType</tt> example above, the type
427*67e74705SXin Licreated would be equivalent to writing the declaration:
428*67e74705SXin Li
429*67e74705SXin Li<pre class="code_example">
430*67e74705SXin Litypedef llvm::ImmutableMap&lt;SymbolRef, int&gt; ExampleDataTypeTy;
431*67e74705SXin Li</pre>
432*67e74705SXin Li
433*67e74705SXin Li<p>These macros will cover a majority of use cases; however, they still have a
434*67e74705SXin Lifew limitations. They cannot be used inside namespaces (since they expand to
435*67e74705SXin Licontain top-level namespace references), and the data types that they define
436*67e74705SXin Licannot be referenced from more than one file.
437*67e74705SXin Li
438*67e74705SXin Li<p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing
439*67e74705SXin Lione, functions that modify the state will return a copy of the previous state
440*67e74705SXin Liwith the change applied. This updated state must be then provided to the
441*67e74705SXin Lianalyzer core by calling the <tt>CheckerContext::addTransition</tt> function.
442*67e74705SXin Li<h2 id=bugs>Bug Reports</h2>
443*67e74705SXin Li
444*67e74705SXin Li
445*67e74705SXin Li<p> When a checker detects a mistake in the analyzed code, it needs a way to
446*67e74705SXin Lireport it to the analyzer core so that it can be displayed. The two classes used
447*67e74705SXin Lito construct this report are <tt><a
448*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt>
449*67e74705SXin Liand <tt><a
450*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html">
451*67e74705SXin LiBugReport</a></tt>.
452*67e74705SXin Li
453*67e74705SXin Li<p>
454*67e74705SXin Li<tt>BugType</tt>, as the name would suggest, represents a type of bug. The
455*67e74705SXin Liconstructor for <tt>BugType</tt> takes two parameters: The name of the bug
456*67e74705SXin Litype, and the name of the category of the bug. These are used (e.g.) in the
457*67e74705SXin Lisummary page generated by the scan-build tool.
458*67e74705SXin Li
459*67e74705SXin Li<P>
460*67e74705SXin Li  The <tt>BugReport</tt> class represents a specific occurrence of a bug. In
461*67e74705SXin Li  the most common case, three parameters are used to form a <tt>BugReport</tt>:
462*67e74705SXin Li<ol>
463*67e74705SXin Li<li>The type of bug, specified as an instance of the <tt>BugType</tt> class.
464*67e74705SXin Li<li>A short descriptive string. This is placed at the location of the bug in
465*67e74705SXin Lithe detailed line-by-line output generated by scan-build.
466*67e74705SXin Li<li>The context in which the bug occurred. This includes both the location of
467*67e74705SXin Lithe bug in the program and the program's state when the location is reached. These are
468*67e74705SXin Liboth encapsulated in an <tt>ExplodedNode</tt>.
469*67e74705SXin Li</ol>
470*67e74705SXin Li
471*67e74705SXin Li<p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made
472*67e74705SXin Lias to whether or not analysis can continue along the current path. This decision
473*67e74705SXin Liis based on whether the detected bug is one that would prevent the program under
474*67e74705SXin Lianalysis from continuing. For example, leaking of a resource should not stop
475*67e74705SXin Lianalysis, as the program can continue to run after the leak. Dereferencing a
476*67e74705SXin Linull pointer, on the other hand, should stop analysis, as there is no way for
477*67e74705SXin Lithe program to meaningfully continue after such an error.
478*67e74705SXin Li
479*67e74705SXin Li<p>If analysis can continue, then the most recent <tt>ExplodedNode</tt>
480*67e74705SXin Ligenerated by the checker can be passed to the <tt>BugReport</tt> constructor
481*67e74705SXin Liwithout additional modification. This <tt>ExplodedNode</tt> will be the one
482*67e74705SXin Lireturned by the most recent call to <a
483*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>.
484*67e74705SXin LiIf no transition has been performed during the current callback, the checker should call <a
485*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a>
486*67e74705SXin Liand use the returned node for bug reporting.
487*67e74705SXin Li
488*67e74705SXin Li<p>If analysis can not continue, then the current state should be transitioned
489*67e74705SXin Liinto a so-called <i>sink node</i>, a node from which no further analysis will be
490*67e74705SXin Liperformed. This is done by calling the <a
491*67e74705SXin Lihref="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0">
492*67e74705SXin LiCheckerContext::generateSink</a> function; this function is the same as the
493*67e74705SXin Li<tt>addTransition</tt> function, but marks the state as a sink node. Like
494*67e74705SXin Li<tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated
495*67e74705SXin Listate, which can then be passed to the <tt>BugReport</tt> constructor.
496*67e74705SXin Li
497*67e74705SXin Li<p>
498*67e74705SXin LiAfter a <tt>BugReport</tt> is created, it should be passed to the analyzer core
499*67e74705SXin Liby calling <a href = "http://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>.
500*67e74705SXin Li
501*67e74705SXin Li<h2 id=ast>AST Visitors</h2>
502*67e74705SXin Li  Some checks might not require path-sensitivity to be effective. Simple AST walk
503*67e74705SXin Li  might be sufficient. If that is the case, consider implementing a Clang
504*67e74705SXin Li  compiler warning. On the other hand, a check might not be acceptable as a compiler
505*67e74705SXin Li  warning; for example, because of a relatively high false positive rate. In this
506*67e74705SXin Li  situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
507*67e74705SXin Li  <tt><b>checkASTCodeBody</b></tt> are your best friends.
508*67e74705SXin Li
509*67e74705SXin Li<h2 id=testing>Testing</h2>
510*67e74705SXin Li  Every patch should be well tested with Clang regression tests. The checker tests
511*67e74705SXin Li  live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
512*67e74705SXin Li  execute the following from the <tt>clang</tt> build directory:
513*67e74705SXin Li    <pre class="code">
514*67e74705SXin Li    $ <b>TESTDIRS=Analysis make test</b>
515*67e74705SXin Li    </pre>
516*67e74705SXin Li
517*67e74705SXin Li<h2 id=commands>Useful Commands/Debugging Hints</h2>
518*67e74705SXin Li<ul>
519*67e74705SXin Li<li>
520*67e74705SXin LiWhile investigating a checker-related issue, instruct the analyzer to only
521*67e74705SXin Liexecute a single checker:
522*67e74705SXin Li<br><tt>
523*67e74705SXin Li$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
524*67e74705SXin Li</tt>
525*67e74705SXin Li</li>
526*67e74705SXin Li<li>
527*67e74705SXin LiTo dump AST:
528*67e74705SXin Li<br><tt>
529*67e74705SXin Li$ <b>clang -cc1 -ast-dump test.c</b>
530*67e74705SXin Li</tt>
531*67e74705SXin Li</li>
532*67e74705SXin Li<li>
533*67e74705SXin LiTo view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
534*67e74705SXin Li<br><tt>
535*67e74705SXin Li$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
536*67e74705SXin Li</tt>
537*67e74705SXin Li</li>
538*67e74705SXin Li<li>
539*67e74705SXin LiTo see all available debug checkers:
540*67e74705SXin Li<br><tt>
541*67e74705SXin Li$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
542*67e74705SXin Li</tt>
543*67e74705SXin Li</li>
544*67e74705SXin Li<li>
545*67e74705SXin LiTo see which function is failing while processing a large file use
546*67e74705SXin Li<tt>-analyzer-display-progress</tt> option.
547*67e74705SXin Li</li>
548*67e74705SXin Li<li>
549*67e74705SXin LiWhile debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
550*67e74705SXin Liinstead of <tt>clang --analyze</tt>, as the later would call the compiler
551*67e74705SXin Liin a separate process.
552*67e74705SXin Li</li>
553*67e74705SXin Li<li>
554*67e74705SXin LiTo view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
555*67e74705SXin Lidebugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
556*67e74705SXin Liexecute:
557*67e74705SXin Li<br><tt>
558*67e74705SXin Li(gdb) <b>p ViewGraph(0)</b>
559*67e74705SXin Li</tt>
560*67e74705SXin Li</li>
561*67e74705SXin Li<li>
562*67e74705SXin LiTo see the <tt>ProgramState</tt> while debugging use the following command.
563*67e74705SXin Li<br><tt>
564*67e74705SXin Li(gdb) <b>p State->dump()</b>
565*67e74705SXin Li</tt>
566*67e74705SXin Li</li>
567*67e74705SXin Li<li>
568*67e74705SXin LiTo see <tt>clang::Expr</tt> while debugging use the following command. If you
569*67e74705SXin Lipass in a SourceManager object, it will also dump the corresponding line in the
570*67e74705SXin Lisource code.
571*67e74705SXin Li<br><tt>
572*67e74705SXin Li(gdb) <b>p E->dump()</b>
573*67e74705SXin Li</tt>
574*67e74705SXin Li</li>
575*67e74705SXin Li<li>
576*67e74705SXin LiTo dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
577*67e74705SXin Li<br><tt>
578*67e74705SXin Li(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
579*67e74705SXin Li(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
580*67e74705SXin Li</tt>
581*67e74705SXin Li</li>
582*67e74705SXin Li</ul>
583*67e74705SXin Li
584*67e74705SXin Li<h2 id=additioninformation>Additional Sources of Information</h2>
585*67e74705SXin Li
586*67e74705SXin LiHere are some additional resources that are useful when working on the Clang
587*67e74705SXin LiStatic Analyzer:
588*67e74705SXin Li
589*67e74705SXin Li<ul>
590*67e74705SXin Li<li> <a href="http://clang.llvm.org/doxygen">Clang doxygen</a>. Contains
591*67e74705SXin Liup-to-date documentation about the APIs available in Clang. Relevant entries
592*67e74705SXin Lihave been linked throughout this page. Also of use is the
593*67e74705SXin Li<a href="http://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes
594*67e74705SXin Lifrom LLVM.
595*67e74705SXin Li<li> The <a href="http://lists.llvm.org/mailman/listinfo/cfe-dev">
596*67e74705SXin Licfe-dev mailing list</a>. This is the primary mailing list used for
597*67e74705SXin Lidiscussion of Clang development (including static code analysis). The
598*67e74705SXin Li<a href="http://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains
599*67e74705SXin Lia lot of information.
600*67e74705SXin Li<li> The "Building a Checker in 24 hours" presentation given at the <a
601*67e74705SXin Lihref="http://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's
602*67e74705SXin Limeeting</a>. Describes the construction of SimpleStreamChecker. <a
603*67e74705SXin Lihref="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a>
604*67e74705SXin Liand <a
605*67e74705SXin Lihref="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>
606*67e74705SXin Liare available.
607*67e74705SXin Li</ul>
608*67e74705SXin Li
609*67e74705SXin Li<h2 id=links>Useful Links</h2>
610*67e74705SXin Li<ul>
611*67e74705SXin Li<li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li>
612*67e74705SXin Li</ul>
613*67e74705SXin Li
614*67e74705SXin Li</div>
615*67e74705SXin Li</div>
616*67e74705SXin Li</body>
617*67e74705SXin Li</html>
618