pycparser is a parser for the C language, written in pure Python. It is a module designed to be easily integrated into applications that need to parse C source code.
Anything that needs C code to be parsed. The following are some uses for pycparser, taken from real user reports:
- C code obfuscator
- Front-end for various specialized C compilers
- Static code checker
- Automatic unit-test discovery
- Adding specialized extensions to the C language
One of the most popular uses of pycparser is in the cffi library, which uses it to parse the declarations of C functions and types in order to auto-generate FFIs.
pycparser is unique in the sense that it's written in pure Python - a very high level language that's easy to experiment with and tweak. To people familiar with Lex and Yacc, pycparser's code will be simple to understand. It also has no external dependencies (except for a Python interpreter), making it very simple to install and deploy.
pycparser aims to support the full C99 language (according to the standard ISO/IEC 9899). Some features from C11 are also supported, and patches to support more are welcome.
pycparser supports very few GCC extensions, but it's fairly easy to set things up so that it parses code with a lot of GCC-isms successfully. See the FAQ for more details.
pycparser very closely follows the C grammar provided in Annex A of the C99 standard (ISO/IEC 9899).
For reporting problems with pycparser or submitting feature requests, please open an issue, or submit a pull request.
- pycparser was tested with Python 3.8+ on Linux, macOS and Windows.
- pycparser has no external dependencies. The only non-stdlib library it
uses is PLY, which is bundled in
pycparser/ply
. The current PLY version is 3.10, retrieved from http://www.dabeaz.com/ply/
Note that pycparser (and PLY) uses docstrings for grammar specifications.
Python installations that strip docstrings (such as when using the Python
-OO
option) will fail to instantiate and use pycparser. You can try to
work around this problem by making sure the PLY parsing tables are pre-generated
in normal mode; this isn't an officially supported/tested mode of operation,
though.
The recommended way to install pycparser is with pip
:
> pip install pycparser
In order to be compilable, C code must be preprocessed by the C preprocessor -
cpp
. cpp
handles preprocessing directives like #include
and
#define
, removes comments, and performs other minor tasks that prepare the C
code for compilation.
For all but the most trivial snippets of C code pycparser, like a C
compiler, must receive preprocessed C code in order to function correctly. If
you import the top-level parse_file
function from the pycparser package,
it will interact with cpp
for you, as long as it's in your PATH, or you
provide a path to it.
Note also that you can use gcc -E
or clang -E
instead of cpp
. See
the using_gcc_E_libc.py
example for more details. Windows users can download
and install a binary build of Clang for Windows from this website.
C code almost always #include
s various header files from the standard C
library, like stdio.h
. While (with some effort) pycparser can be made to
parse the standard headers from any C compiler, it's much simpler to use the
provided "fake" standard includes for C11 in utils/fake_libc_include
. These
are standard C header files that contain only the bare necessities to allow
valid parsing of the files that use them. As a bonus, since they're minimal, it
can significantly improve the performance of parsing large C files.
The key point to understand here is that pycparser doesn't really care about the semantics of types. It only needs to know whether some token encountered in the source is a previously defined type. This is essential in order to be able to parse C correctly.
See this blog post for more details.
Note that the fake headers are not included in the pip
package nor installed
via setup.py
(#224).
Take a look at the examples
directory of the distribution for a few examples
of using pycparser. These should be enough to get you started. Please note
that most realistic C code samples would require running the C preprocessor
before passing the code to pycparser; see the previous sections for more
details.
The public interface of pycparser is well documented with comments in
pycparser/c_parser.py
. For a detailed overview of the various AST nodes
created by the parser, see pycparser/_c_ast.cfg
.
There's also a FAQ available here. In any case, you can always drop me an email for help.
There are a few points to keep in mind when modifying pycparser:
- The code for pycparser's AST nodes is automatically generated from a
configuration file -
_c_ast.cfg
, by_ast_gen.py
. If you modify the AST configuration, make sure to re-generate the code. This can be done by running the_build_tables.py
script from thepycparser
directory. - Make sure you understand the optimized mode of pycparser - for that you
must read the docstring in the constructor of the
CParser
class. For development you should create the parser without optimizations, so that it will regenerate the Yacc and Lex tables when you change the grammar.
Once you unzip the pycparser
package, you'll see the following files and
directories:
- README.rst:
- This README file.
- LICENSE:
- The pycparser license
- setup.py:
- Installation script
- examples/:
- A directory with some examples of using pycparser
- pycparser/:
- The pycparser module source code.
- tests/:
- Unit tests.
- utils/fake_libc_include:
- Minimal standard C library include files that should allow to parse any C code.
Note that these headers now include C11 code, so they may not work when the
preprocessor is configured to an earlier C standard (like
-std=c99
). - utils/internal/:
- Internal utilities for my own use. You probably don't need them.
Some people have contributed to pycparser by opening issues on bugs they've found and/or submitting patches. The list of contributors is in the CONTRIBUTORS file in the source distribution. After pycparser moved to Github I stopped updating this list because Github does a much better job at tracking contributions.