Skip to content

Parsing tabbed tree formatted text files into dict node trees

License

Notifications You must be signed in to change notification settings

iturrovia/tabtree-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tabtree - Parsing tabbed tree formatted text files into dict node trees

Build Status PyPI

  1. tabtree is a tiny package to convert tabbed trees from text files into dict trees.
  2. The functions in this module use lazy evaluation (take line iterators as input and return record iterators as output), so they can be used with very large files without becoming a memory hog.
  3. Although there is a default format for the dict nodes in the output node trees, it can be fully customized by passing a node generating function as an optional argument

How to use tabtree?

Lets assume we have a file (name data.txt) with the following content:

item 0
	item 0 0
	item 0 1
		item 0 1 0
		item 0 1 1
	item 0 2 
item 1 
	item 1 0
	item 1 1
		item 1 1 0
		item 1 1 1
	item 1 2

then the following code:

from tabtree import parser
with open('data.txt', 'r') as lines:
	node_trees = parser.lines_to_node_trees(lines)
	for node_tree in node_trees:
		print(node_tree)

will print the following two trees:

{'data': 'item 0', 'children': [
	{'data': 'item 0 0', 'children': []},
	{'data': 'item 0 1', 'children': [
		{'data': 'item 0 1 0', 'children': []},
		{'data': 'item 0 1 1', 'children': []}
	]},
	{'data': 'item 0 2', 'children': []}
]}
{'data': 'item 1', 'children': [
	{'data': 'item 1 0', 'children': []},
	{'data': 'item 1 1', 'children': [
		{'data': 'item 1 1 0', 'children': []},
		{'data': 'item 1 1 1', 'children': []}
	]},
	{'data': 'item 1 2', 'children': []}
]}

Why implementing this module?

Although this is actually a tiny piece of code, it is a functionality I need quite often. However, I found very few existing python implementations of it (mostly code snippets), and none of them was entirely suitable for my needs:

  • Lazy evaluation (I am working with very big files, so I cannot afford to load all the data into a big tree in memory).
  • Ability to use customized node formats. I'm parsing different types of data that require different representations, so customizing the node format at tree creation time is a big plus (otherwise you need to stick with one-and-only-library-defined-node-class or either apply a traverse-the-trees-and-transform-node process)

About

Parsing tabbed tree formatted text files into dict node trees

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages