CRE is a high-performance forward-chaining rule engine for Python built with the numba just-in-time compiler toolset. CRE rules are written in Python but run with C-like performance.
NOTE: CRE is a work in progress and currently only implements only a subset of its planned capabilities.
- C-like Data-Structures for Facts + Working Memory
- Complex Pythonic Conditions
- High-Performance Incremental Condition Matching
- Pythonic Rule Specification
- High-Performance Rule Engine
- Incremental Data Processing Utilities
- Set Chaining Planner
Facts are struct-like objects that are declared to a MemSet
(i.e. working-memory). Unlike normal Python objects facts are strongly typed and consists of a fixed number of typed attribute-value pairs.
The layout for a new fact type can be defined with define_fact
.
from cre import define_fact
Element = define_fact("Element", {"id" : str, 'val' : int, 'next' : 'Element'})
Using 'inherit_from'
will make a fact type inherit from another, and facts types can even have fact-like (e.g. next
above) or list-like (e.g. children
below) members.
Container = define_fact("Container", {'inherit_from' : Element, 'children' : 'List(Element)'})
Internally a fact's members are allocated in a contiguous region of memory outside of Python. However, we can still instantiate and modify facts within native Python:
a = Element('a', 1) # positional arguments
b = Element(A=2,B='b',next=a) # or key-word arguments
c = Element() # omitted args default => Element(id='', val=0, next=None)
p = Container('parent', 7, children=[a,b,c])
c.next = b # Can also assign afer instantiation
A MemSet
is a container that holds facts. This is CRE's data-structure for working memory, but it also has a wider range of uses.
Facts can be added and removed from a MemSet using the declare
and retract
methods.
from cre import MemSet
wm = MemSet()
wm.declare(a)
wm.declare(b)
wm.declare(c)
wm.declare(p)
wm.retract(a)
print(wm)
Output:
MemSet(facts=(
Element(id='b', val=2, next=<Element at 0x32ace50>)
Element(id='', val=0, next=<Element at 0x32acf00>)
Container(id='parent', val=7, next=None, children=List([<BaseFact at 0x32ace50>, <BaseFact at 0x32acf00>, <BaseFact at 0x32cded0>]))
)
----
MemSets keep lightweight records of all changes made to them. Each chage has an associated identification record or idrec
, which is just a packed 64-bit uint consisting of three unsigned-values:
t_id
(16-bit) identifies a type (usually a fact type).f_id
(40-bit) identifies the particular fact witht_id
within a MemSet.a_id
(8-bit) identifies an argument within a fact.
from cre.utils import encode_idrec, decode_idrec
idrec = encode_idrec(20,1,0) # 5629499534213376
t_id, f_id, a_id = decode_idrec(idrec) # (20, 1, 0)
When a fact is declared it is assigned an idrec
with a unique f_id
and a_id=0
. declare
will return the declared fact's idrec. Instead of keeping references to each fact we can use a fact's idrec
with get_fact
or retract
to retreive or remove it from a memset.
idrec = wm.declare(Element(id='z', val=9))
z = wm.get_fact(idrec)
wm.retract(idrec)
When a fact is declared its mutability becomes protected. If we try to set an attribute of a declared fact directly we'll get an AttributeError
error.
wm.declare(c)
c.id = 'c'
Output:
AttributeError: Facts objects are immutable once declared. Use mem.modify instead.
Instead we can use the modify
method of MemSet to change a fact. This helps MemSet to track all changes to the fact.
wm.modify(c, 'id', 'c')
print(c)
Output:
Element(id='c', val=0, next=<Element at 0x24b1be0>)
get_facts
can be used to get all instances of a particular type of fact.
for fact in wm.get_facts(Element):
print(fact)
Output:
Element(id='c', val=0, next=<Element at 0x3e8cad0>)
Element(id='b', val=2, next=<Element at 0x3e8ca20>)
Element(id='c', val=0, next=<Element at 0x3e8cad0>)
Container(id='parent', val=7, next=None, children=List([<BaseFact at 0x3e8ca20>, <BaseFact at 0x3e8cad0>, <BaseFact at 0x3e6eff0>]))
By default get_fact will retrieve subtypes of the given fact type. For instance the 'parent' Container
fact is included in the output above. To only get the Element
facts we could add nosubtypes=True
like so wm.get_facts(Element, no_subtypes=True)
. To get all types regardless of type we can omit the type like wm.get_facts()
.
We can define a set of Conditions
that pick out sets objects in a MemSet
.
Var
is used in various places in CRE to define typed variables in declarative statements. For instance, when writing conditions each var Var
indicates one fact we would like to extract during matching. Vars are provided a type and an optional alias
.
from cre import Var
A = Var(Element) # An unaliased Var
B = Var(Element, "B") # A Var aliased to "B"
P = Var(Container, "P") # A Var aliased to "P"
⚠️ EveryVar
instance is an independant object even if multiple instances are assigned the same alias. To avoid confusing print statements it is good practice to give every instance ofVar
a unique alias in the context of a single declarative statement (like within the sameConditions
object).
Conditions are logical statements in Disjunctive Normal Form. In other on Conditions
object is a disjunction of conjunctions of Literal
statements. Most conditions can be built by simply writing python expressions Literal
s encloded in ()
connected by &
and |' qualifiers for
ANDand
OR` respectively.
c1 = (((A.id == "c") & (A.val >= 1))
c2 = (A.val < B.val) & (A.next.id == B.id) & (A.next == B)
Each literal statement can contain one Var (an Alpha Literal) or two Vars (a Beta Literal), and a set of Conditions can contain an arbitrary number of Vars.
We can then use get_matches
to get all sets of facts that match these conditions. For instance lets build a new working memory and get the matches for c1
and c2
on it:
wm = MemSet()
c = Element('c', 3)
b = Element('b', 2, next=c)
a = Element('a', 1, next=b)
wm.declare(a)
wm.declare(b)
wm.declare(c)
wm.declare(Container('parent', 7, children=[a,b,c]))
print("c1 matches:")
for match in c1.get_matches(wm):
print(match)
print("c2 matches:")
for match in c2.get_matches(wm):
print(match)
Output:
c1 matches:
(Element(id='c', val=3.0, next=None), Container(id='parent', val=7.0, next=None, children=List([<BaseFact at 0x2e1a040>, <BaseFact at 0x2e19f90>, <BaseFact at 0x2da9fe0>])))
c2 matches:
(Element(id='a', val=1.0, next=<Element at 0x2e19f90>), Element(id='b', val=2.0, next=<Element at 0x2da9fe0>))
(Element(id='b', val=2.0, next=<Element at 0x2da9fe0>), Element(id='c', val=3.0, next=None))
The antiunify
method of a Conditions
object can be used to to get the antiunion of two logical statements---essentially the intersection of two Conditions
objects.
from numba import f8
x, y, z = Var(f8,'x'), Var(f8,'y'), Var(f8,'z')
a, b, c, d = Var(f8,'a'), Var(f8,'b'), Var(f8,'c'), Var(f8,'d')
c1 = (x < y) & (y < z) & (y < z) & (z != x) & (y != 0)
c2 = (a < b) & (b < c) & (b < c) & (b < c) & (c != a) & (b != 0) & (d != 0)
print(c1.antiunify(c2))
Output:
((x < y) & (y < z) & (y < z) & (z != x) & (y != 0))
In the example above the antiunion is expressed using the variables for c1
. antiunify
has found the optimal structure mapping between of variables of the two statements x->a
, y->b
, z->c
and eliminated the variable `d'. Disjunctive statements can even be rearranged by this structure mapping.
c1 = ((x < y) & (z != x) & (y != 0) | # 1
(x < y) & (z == x) & (y != 7) | # 2
(x > y) & (z != x) & (y != 2) # 3
)
c2 = ((a < b) & (c == a) & (b != 7) & (d > 0) | #2
(a < b) & (c != a) & (b != 0) | #1
(a > b) & (c != a) & (b != 0) & (d != 7) #3
)
print(c1.antiunify(c2))
Output:
((x < y) & ~(z == x) & ~(y == 0) |\
(x < y) & (z == x) & ~(y == 7) |\
(x > y) & ~(z == x))
Op
is a special declarative function object used by CRE for a number of purposes. For instance, each Literal
in a Conditions
object has an underlying Op
that is executed to check candidate facts or pair of facts. Several types of Op
are implemented in cre.default_ops
and new Op
types can be defined by the user.
Below is the standard @decorator
notation for defining an Op
.
def denom_not_zero(a,b):
return b != 0
@Op(shorthand = '({0} / {1})', check=denom_not_zero)
def Divide(a, b):
return a / b
The decorator is applied to the Op
's call
function and an optional check
function may also be provided to indicate when the Op
can be applied (and also guard against errors). shorthand
indicates a string template to use when str
is called.
Since we haven't specified the input and output types for our Op
, type the above is actually an UntypedOp
. A proper Op
can be defined by providing a signature
.
from numba.types import unicode_type #numba str type
@Op(shorthand = '({0} + {1})',
signature = unicode_type(unicode_type,unicode_type),
commutes=False)
def Concatenate(a, b):
return a + b
We define signatures using numba
types with the pattern return_type(*arg_types)
. For instance unicode_type
is the numba equivalent of the native Python str
. Alternatively we can type an UntypedOp
by simply calling it with the chosen types.
from numba import f8 #numba float type (specifically an 8 byte float)
Divide_f8 = Divide(f8,f8)
Op
s and Var
s can also be composed to creat new Op
s.
from cre.default_ops import Add, Multiply
Add, Multiply = Add(f8, f8), Multiply(f8, f8)
Double = Multiply(Var(float,'x'), 2)
DoublePlusOne = Add(Double,1)
TimesDoublePlusOne = Multiply(DoublePlusOne,Var(float,'y'))
print(str(Double), Double(2))
print(str(DoublePlusOne), DoublePlusOne(2))
print(str(TimesDoublePlusOne), TimesDoublePlusOne(2,3))
('(x * 2)', 4.0)
('((x * 2) + 1)', 5.0)
('(((x * 2) + 1) * y)', 15.0)
Composing Op
s can be costly and should be avoided in performance critical sections. When Op
s are compsed source code is dynamically createed and compiled to build the composed Op
. Future versions of CRE may implement dynamic composability eliminating the need for this compilation step.
TODO
TODO