This is a COBOL Copybook parser in Python featuring the following options:
- Parse the Copybook into a usable format to use in Python
- Clean up the Copybook by processing REDEFINES statements and remove unused definitions
- Denormalize the Copybook
- Write the cleaned Copybook in COBOL
- Strip prefixes of field names and ensure that the Copybook only contains unique names
- Can be used from the command-line or included in own Python projects
Because I couldn't find a COBOL Copybook parser that fitted all my needs I wrote my own. It doesn't support all functions found in the Copybook, just the ones that I met on my path: REDEFINES, INDEXED BY, OCCURS.
On a day to day basis I use it so that Informatica PowerCenter creates only 1 table of my COBOL data instead multiple.
The code uses the pic parser code from pyCOBOL.
This code is licensed under GPLv3.
The easiest way to install python-cobol
is using pip
:
$ pip install python-cobol
Below is an example Copybook file before and after being processed.
Before:
00000 * Example COBOL Copybook file AAAAAAAA
00000 01 PAULUS-EXAMPLE-GROUP. AAAAAAAA
00000 05 PAULUS-ANOTHER-GROUP OCCURS 0003 TIMES. AAAAAAAA
00000 10 PAULUS-FIELD-1 PIC X(3). AAAAAAAA
00000 10 PAULUS-FIELD-2 REDEFINES PAULUS-FIELD-1 PIC 9(3). AAAAAAAA
00000 10 PAULUS-FIELD-3 OCCURS 0002 TIMES AAAAAAAA
00000 PIC S9(3)V99. AAAAAAAA
00000 05 PAULUS-THIS-IS-ANOTHER-GROUP. AAAAAAAA
00000 10 PAULUS-YES PIC X(5). AAAAAAAA
After:
01 EXAMPLE-GROUP.
05 FIELD-2-1 PIC 9(3).
05 FIELD-3-1-1 PIC S9(3)V99.
05 FIELD-3-1-2 PIC S9(3)V99.
05 FIELD-2-2 PIC 9(3).
05 FIELD-3-2-1 PIC S9(3)V99.
05 FIELD-3-2-2 PIC S9(3)V99.
05 FIELD-2-3 PIC 9(3).
05 FIELD-3-3-1 PIC S9(3)V99.
05 FIELD-3-3-2 PIC S9(3)V99.
05 THIS-IS-ANOTHER-GROUP.
10 YES PIC X(5).
You can use it in two ways: inside your own python code or as a stand-alone command-line utility.
Do a git clone from the repository and inside your brand new python-cobol folder run:
$ python cobol.py example.cbl
This will process the redefines, denormalize the file, strip the prefixes and ensure all names are unique.
The utility allows for some command-line switches to disable some processing steps.
$ python cobol.py --help
usage: cobol.py [-h] [--skip-all-processing] [--skip-unique-names]
[--skip-denormalize] [--skip-strip-prefix]
filename
Parse COBOL Copybooks
positional arguments:
filename The filename of the copybook.
optional arguments:
-h, --help show this help message and exit
--skip-all-processing
Skips unique names, denormalization and .
--skip-unique-names Skips making all names unique.
--skip-denormalize Skips denormalizing the COBOL.
--skip-strip-prefix Skips stripping the prefix from the names.
The parser can also be called from your Python code. All you need is a list of lines in COBOL Copybook format. See example.py how one would do it:
import cobol
with open("example.cbl",'r') as f:
for row in cobol.process_cobol(f.readlines()):
print row['name']
It is also possible to call one of the more specialized functions within cobol.py:
-
clean_cobol(lines)
Cleans the COBOL by converting the cobol informaton to single lines
-
parse_cobol(lines)
Parses a list of COBOL field definitions into a list of dictionaries containing the parsed information.
-
denormalize_cobol(lines)
Denormalizes parsed COBOL lines
-
clean_names(lines, ensure_unique_names=False, strip_prefix=False, make_database_safe=False)
Cleans names of the fields in a list of parsed COBOL lines. Options to strip prefixes, enforce unique names and make the names database safe by converting dashes (-) to underscores (_)
-
print_cobol(lines)
Prints parsed COBOL lines in the Copybook format