Deadline: Fri, 08.03.2024, 12:00pm
Instructors: Amir Shaikhha
TAs: Nikos Mavrogeorgis, Mahdi Ghorbani, Kim Stonehouse
Demonstrators: Amir Khordadi, Pei Mu, Maks Kret
This coursework already provides two semantic analyses:
- Check Assign Target: This analysis checks that the LHS of an assignment contains only expressions that can serve as targets of an assignment.
- Name Analysis: This analysis checks that only variable and function names are used that have previously been introduced.
- Core: Type Checking
- Expert: Dead Code Analysis
First, you will need to clone the repository. If you used GitHub before, you will already have either an HTTPS access token, or an SSH key. If you do not have either, you will need to create this. You can use either, just follow the guides below:
- To create an SSH key, use this
- To create an HTTPS token, use this At some point, you will be asked to specify what you want to do with the token: feel free to tick all the boxes.
At this point, you can clone the repository:
- if you used HTTPS above, use the command
git clone https://github.com/compiling-techniques/REPOSITORY_NAME.git
- if you used SSH, use the command
git clone [email protected]:compiling-techniques/REPOSITORY_NAME.git
Now enter the repository directory: cd REPOSITORY_NAME
. If this is your first time using Git, you will need to set up a bit of configuration:
git config --global user.name "your_github_username"
git config --global user.email "your_github_email"
This will set your username and email globally (for all repositories, unless they overwrite this), so all future GitHub repositories will already have this set up. If you do not wish to do that, just omit the --global
.
You can verify the setup using git config -l
, which will just tell you what settings you set.
You can create an isolated python environment using venv.
To set up venv
for the assignment, follow the steps below (a summary is given below the bulleted list):
- Set up a virtual environment for the assignment with
python3.10 -m venv env
. This creates a subdirectory calledenv
in the current folder that creates an isolated version of Python. - Activate the virtual environment by
source
ing the activation file:source env/bin/activate
- Confirm that you are in the virtual environment by running
which python
. The output should be/path/to/coursework/env/bin/python
. - Upgrade pip with
pip install --upgrade pip
. - Run
pip install -U -r requirements.txt
to install dependencies within the virtual environment. - Run
pip install -e .
to install the ChocoPy compiler as a package. - If you are using PyCharm, please configure PyCharm to work with the environment by following the instructions on this page of the PyCharm manual: Configure a virtual environment.
In summary, the process looks as follows:
/path/to/coursework$ python3.10 -m venv env # set up virtual environment called `env`
/path/to/coursework$ source env/bin/activate # run activation file
(env) /path/to/coursework$ # (env) shows up
(env) /path/to/coursework$ which python # confirm python path
/path/to/coursework/env/bin/python
/path/to/coursework$ pip install --upgrade pip # upgrade pip
(env) /path/to/coursework$ pip install -U -r requirements.txt # install dependencies
(env) /path/to/coursework$ pip install -e . # install ChocoPy as a package
(env) /path/to/coursework$ # get to hacking, and best of luck! :)
It would be convenient for you, if you used a modern IDE for Python.
A popular choice, PyCharm
, comes pre-installed in your DICE desktop environment.
If you decide to use PyCharm
, in order to install the packages, you should open the embedded terminal (Alt+F12
by default)
and follow the previous instructions using pip install -U -r requirements.txt
.
Instead of cloning the repository to your local machine, you can also use GitHub codespaces to do your coursework. This is a beta-test, so it is an optional offer that is delivered on a best-effort basis. To use GitHub codespaces click on the green "Code" button at the top of this repository, select "code spaces" and create your personal codespace. Then enter the console and run:
$ export PATH=/home/codespace/.local/lib/python3.10/site-packages/bin/:$PATH
$ pip install -U -r requirements.txt # install dependencies
$ pip install -e . # install ChocoPy as a package
$ # get to hacking, and best of luck! :)
You can use lit
to automatically test your code, which is include in the requirements.txt
.
To run it locally, do:
lit -v tests/type-checking/
This will examine recursively all the files with valid formats inside the above directory.
The -v
flag adds a verbose output with more information in case some tests fail.
You can also leverage the --timeout <seconds>
flag, in order to bound the time allowed for your test cases to run.
This way you can detect if your implementation loops infinitely in some test cases.
For more details on the configuration of lit
, see tests/lit.cfg
.
For more info on lit
check the online documentation.
The goal of task 1 is to write a type checker for the ChocoPy Language in order to check if a syntactically valid program satisfies ChocoPy's type system and reject programs with meaningless or invalid code. We call a program that has passed type checking well typed.
The template already provides a partial implementation of the type checker. You will have to implement the rest.
First read this README completely and carefully! It explains the type system of ChocoPy by looking at the types, typing environments, and type checking rules. We have also added some hints that are useful for this coursework.
To get started coding look at the file choco/type_checking.py
which contains an incomplete implementation of the type checker.
The file contains of:
- Python classes representing the ChocoPy types.
- The operations on types:
is_subtype
,is_assignment_compatible
, andjoin
. You will use them when implementing typing rules. - Helper functions that will throw a
SemanticError
if their checks fail. You can use them while implementing your type checking. - The class
FunctionInfo
representing the information about functions stored in the environment. - The
LocalEnvironment
type that is represented as a Python dictionary. - A function
type_checking
that is the entry point for the type checker. - A function
build_env
that builds the initial local environment. - The "dispatch" functions:
check_stmt_or_def_list
,check_stmt_or_def
, andcheck_expr
. These functions traverse the AST and must decide which typing rule to invoke for a given AST node. You will need to complete their implementations.check_stmt_or_def_list
is already given. - The typing rules, each implemented by a Python function. You will need to complete their implementations.
- Read the file carefully and understand how the overall type checking process works.
- Identify an existing or write a new test where the type checking does not work correctly yet.
- Identify the typing rules that are missing to make the test pass.
- Implement the typing rules by corresponding functions.
- Modify the dispatch functions to call the new typing rules.
- Go back to 2. until you have covered all typing rules.
- Single variable assignment and list creation. (VAR-ASSIGN-STMT, LIST-DISPLAY, NIL)
- Logical operators and conditional expressions. (AND, OR, NOT, IS)
- Index operation. (STR-SELECT, LIST-SELECT)
- Pass statement, control flow statements. (PASS, IF-ELSE, WHILE, FOR-STR, FOR-LIST)
- Comparison operators. (INT-COMPARE, STR-COMPARE, BOOL-COMPARE)
- Arithmetic operators. (NEGATE, ARITH, LIST-CONCAT, STR-CONCAT)
- Complex assign statements. (LIST-ASSIGN-STMT, MULT-ASSIGN-STMT)
- Function definitions. (FUNC-DEF, INVOKE, RETURN-E, RETURN)
The tests/type-checking
folder contains a directory for each section, making it easier
to check your progress.
Note that it is important to have all the tests in the first section, assign-and-list
,
entirely correct, since the typing rules tested there are used throughout all tests.
From the grammar of CW1 we know that a ChocoPy user can write the following types:
type := type_name | `[` type `]`
type_name := object | int | bool | str
In addition, ChocoPy has the following types that can not be written explicitly by the user:
<Empty>
is the type of the expression[]
<None>
is the type of the expressionNone
- The bottom type
⊥
(the type that has no value)
All types are organized into a type hierarchy, which is visualized in the following diagram:
object
______________________/ \__________________________
/ | | | | | \
int bool str [int] … [[bool]] <Empty> <None>
\______|_____|______|__ ______|_________|_________/
\ /
⊥
We can provide a precise description of the type hierarchy by defining a subtyping relationship.
A type T₁
is considered a subtype of T₂
(written as T₁ ≤ T₂
>) iff at least one of the following is true:
T ≤ T
for all typesT
int ≤ object
bool ≤ object
str ≤ object
[T] ≤ object
for all typesT
<Empty> ≤ object
<None> ≤ object
⊥ ≤ T
for all typesT
To describe assignments, passing arguments to functions, and returning values form functions, we define assignment compatibility. The idea is that we may assign a value of type T₁
to something of type T₂
iff T₁
is assignment compatible with T₂
.
A type T₁
is assignment compatible with T₂
(written as T₁ ≤a T₂
) iff at least one of the following is true:
T₁ ≤ T₂
(i.e.,T₁
is a subtype ofT₂
)T₁
is<None>
andT₂
is notint
,bool
,str
, or⊥
T₂
is a list type[T]
andT₁
is<Empty>
T₂
is a list type[T]
andT₁
is[<None>]
, where<None> ≤a T
.
The last rule allows us to write programs such as:
x: [object] = [None, None]
Sometimes (e.g, when type checking an if-then-else
expression), we need to find a single type that can be used to represent the two original types. For this, we define the join operator.
The join of two types T₁
and T₂
(written as T₁ ⨆ T₂
) is:
T₂
ifT₁ ≤a T₂
T₁
ifT₂ ≤a T₁
object
otherwise, as it is the least common ancestor ofT₁
andT₂
To type check a program, we will traverse its AST and inspect every program term.
When checking a term we will investigate its sub-terms to determine and check its type.
For example, to check the type of the term 1 + 4
we investigate 1
and 4
, conclude that both are of type int
and, therefore, the addition is also of type int
and is well typed.
We cannot determine the type of term without knowing the context of the term. Consider, for example, the call expression foo(x)
.
We cannot say if this expression is well types, without knowing with what types x
and foo
were declared.
To capture this contextual information, we use a typing environment. Our typing environment consists of two parts:
- the local environment
O
which records information about currently declared variables and their types as well as information about the declared functions; - the return type
R
of the current function.
The content of typing environment changes when we traverse the AST, always representing the context of the sub-term we are currently investigating.
For a variable v
we write: O(v) = T
to record that the variable has type T
.
For a function f
we store more information in the local environment and write:
O(f) = {T₁ ⨯ … ⨯ Tₙ → T₀; x₁, …, xₙ; v₁: T'₁, …, vₘ: T'ₘ}
to record:
- the function type of
f
with:T₁
, …,Tₙ
the types of the function parametersT₀
the function return type
x₁
, …,xₙ
the names of the function parameters.v₁: T'₁
, …,vₘ: T'ₘ
the identifiers and types of variables declared in the body of the function.
R
is set to ⊥
when we are inspecting a term outside any function.
When type checking function definitions, we create a new typing environment to type check the body of the function.
This new environment is initialized with the content of the typing environment from the current scope.
We then need to update or add the parameters and their types to the new typing environment.
We write O[T/c]
to add or update the binding of a variable, that is:
O[T/c](c) = T
O[T/c](d) = O(d)
ifd ≠ c
The type checking rules describe how to check the type for a specific program term. The general form of a type checking rule is:
⋮
---------------
O, R ⊢ e : T
The rule consists of two parts:
- zero, one, or multiple premises above the line, and
- a conclusion below the line.
The rule states, that if the premises are true, then the conclusion is true as well.
O, R ⊢ e : T
is a typing judgement, where the turnstyle ⊢
separates the typing environment from the proposition.
This judgement should be read as:
"In the type environment with the local environment
O
and the return typeR
the expressione
is well typed and has typeT
".
A first example, is the typing rule for the negation operation (i.e., -x
).
This rule states, that in the environment O, R
we can conclude that -e
is well typed and has type int
if the sub-expression e
has type int
in the same environment:
O, R ⊢ e : int
------------------ [NEGATE]
O, R ⊢ -e : int
When defining and initializing a variable, the rule states that in the environment O, R
we can conclude that id : T = e₁
is well typed, if:
id
has the typeT
in the environmentO
;- the sub-expression
e₁
has typeT₁
in the same environment; T₁
is assignment compatible toT
:
O(id) = T
O, R ⊢ e₁ : T₁
T₁ ≤a T
--------------------- [VAR-INIT]
O, R ⊢ id : T = e₁
The :
used in the conclusion of the rule is here the syntax for type annotations.
For conditional expressions we use the join
operation in the conclusion:
O, R ⊢ e₀ : bool
O, R ⊢ e₁ : T₁
O, R ⊢ e₂ : T₂
------------------------------------ [COND]
O, R ⊢ e₁ if e₀ else e₂ : T₁ ⨆ T₂
The rule for function definitions is the most complicated.
T = T₀ if return type is present, <None> otherwise
O(f) = {T₁ ⨯ … ⨯ Tₙ → T; x₁, …, xₙ; v₁: T'₁, …, vₘ: T'ₘ}
O[T₁/x₁]…[Tₙ/xₙ][T'₁/v₁]…[T'ₘ/vₘ], R ⊢ b
------------------------------------------------------------------ [FUNC-DEF]
O, R ⊢ def f(x₁: T₁, …, xₙ: Tₙ) ⟦→ T₀⟧? : b
Here, we determine the return type T
depending on the return type annotation.
Then, we look of the function information in the environment O(f)
.
Finally, we check the type of the function body b
using a different environment:
- where we have added the function parameters and nested variable definitions to the local environment
O
; and - where we use the return type
R
.
Variables
When reading a variable, the rule states that if the environment O, R
assigns an identifier id
a type T
,
then the expression id
has type T
:
O(id) = T; where T is not a function type.
-------------------------------------------- [VAR-READ]
O, R ⊢ id : T
id
has the typeT
in the environmentO
;T
is not a function type (like the{T₁ ⨯ … ⨯ Tₙ -> T₀; x₁, …, xₙ}
which was described before)
We must, however, prohibit identifiers with function types when reading values (that is, when identifiers are used as expressions in the syntax). This simply reflects the fact that ChocoPy does not treat functions as first-class (assignable, storable) values.
Variable definitions and assignments
O(id) = T
O, R ⊢ e₁ : T₁
T₁ ≤a T
--------------------- [VAR-INIT]
O, R ⊢ id : T = e₁
O(id) = T
O, R ⊢ e₁ : T₁
T₁ ≤a T
------------------------- [VAR-ASSIGN-STMT]
O, R ⊢ id = e₁
Statement and Definitions list: Type check all the component definitions and statements.
O, R ⊢ s₁
O, R ⊢ s₂
…
O, R ⊢ sₙ
n ≥ 1
---------------------------------------------- [STMT-DEF-LIST]
O, R ⊢ s₁ NEWLINE s₂ NEWLINE … sₙ NEWLINE
Pass Statements
------------------------- [PASS]
O, R ⊢ pass
Expression Statements
O, R ⊢ e : T
------------------------- [EXPR-STMT]
O, R ⊢ e
Literals
------------------------- [BOOL-FALSE]
O, R ⊢ False : bool
------------------------- [BOOL-TRUE]
O, R ⊢ True : bool
i is an integer literal
------------------------- [INT]
O, R ⊢ i : int
s is an string literal
------------------------- [STR]
O, R ⊢ s : str
The None literal is assigned the (unmentionable) type <None>
:
------------------------- [NONE]
O, R ⊢ None : <None>
Arithmetic and numerical relational operators
O, R ⊢ e : int
------------------------- [NEGATE]
O, R ⊢ -e : int
O, R ⊢ e₁ : int
O, R ⊢ e₂ : int
op ∈ {+, -, *, //, %}
------------------------- [ARITH]
O, R ⊢ e₁ op e₂ : int
O, R ⊢ e₁ : int
O, R ⊢ e₂ : int
⨝ ∈ {<, <=, >, >=, ==, !=}
---------------------------- [INT-COMPARE]
O, R ⊢ e₁ ⨝ e₂ : bool
O, R ⊢ e₁ : bool
O, R ⊢ e₂ : bool
⨝ ∈ {==, !=}
------------------------- [BOOL-COMPARE]
O, R ⊢ e₁ ⨝ e₂ : bool
Logical Operators
O, R ⊢ e₁ : bool
O, R ⊢ e₂ : bool
-------------------------- [AND]
O, R ⊢ e₁ and e₂ : bool
O, R ⊢ e₁ : bool
O, R ⊢ e₂ : bool
------------------------- [OR]
O, R ⊢ e₁ or e₂ : bool
O, R ⊢ e : bool
------------------------- [NOT]
O, R ⊢ not e : bool
Conditional expressions
O, R ⊢ e₀ : bool
O, R ⊢ e₁ : T₁
O, R ⊢ e₂ : T₂
----------------------------------- [COND]
O, R ⊢ e₁ if e₀ else e₂ : T₁ ⨆ T₂
String operations
O, R ⊢ e₁ : str
O, R ⊢ e₂ : str
⨝ ∈ {==, !=}
------------------------- [STR-COMPARE]
O, R ⊢ e₁ ⨝ e₂ : bool
O, R ⊢ e₁ : str
O, R ⊢ e₂ : str
------------------------- [STR-CONCAT]
O, R ⊢ e₁ + e₂ : str
O, R ⊢ e₁ : str
O, R ⊢ e₂ : int
------------------------- [STR-SELECT]
O, R ⊢ e₁[e₂] : str
The 'is' operator
O, R ⊢ e₁ : T₁
O, R ⊢ e₂ : T₂
T₁, T₂ are not one of int, bool, str
------------------------------------- [IS]
O, R ⊢ e₁ is e₂: bool
List Displays
n ≥ 1
O, R ⊢ e₁ : T₁
O, R ⊢ e₂ : T₂
…
O, R ⊢ en : Tₙ
T = T₁ ⨆ T₂ ⨆ … ⨆ Tₙ
--------------------------------- [LIST-DISPLAY]
O, R ⊢ [e₁, e₂, …, en] : [T]
The empty list is a special case.
------------------------- [NIL]
O, R ⊢ [] : <Empty>
List Operators
O, R ⊢ e₁ : [T₁]
O, R ⊢ e₂ : [T₂]
T = T₁ ⨆ T₂
------------------------- [LIST-CONCAT]
O, R ⊢ e₁ + e₂ : [T]
O, R ⊢ e₁ : [T]
O, R ⊢ e₂ : int
------------------------- [LIST-SELECT]
O, R ⊢ e₁[e₂] : T
O, R ⊢ e₁ : [T₁]
O, R ⊢ e₂ : int
O, R ⊢ e₃ : T₃
T₃ ≤a T₁
------------------------- [LIST-ASSIGN-STMT]
O, R ⊢ e₁[e₂] = e₃
Multiple assignments: Multiple assignment is type-checked by decomposing into individual single assignments, as follows:
n > 1
O, R ⊢ e₀ : T₀
O, R ⊢ e₁ = e₀
…
O, R ⊢ en = e₀
T₀ ≠ [<None>]
--------------------------------- [MULTI-ASSIGN-STMT]
O, R ⊢ e₁ = e₂ = … = eₙ = e₀
The restriction that T₀ ≠ [<None>]
avoids a subtle type-safety issue.
It is dangerous to allow there to be two different views of a list with differing element types.
The type [<None>]
can only arise from list displays.
As long as the value of such a display is immediately consumed by assignment to a single variable,
parameter, or operand (+
for lists), there will be only be one opinion as to its type subsequently.
But, multiple assignment opens the possibility of programs like this:
x: [object] = None
y: [[int]] = None
x = y = [None] # Trouble ahead!
x[0] = 5
print(y[0][0]) # ???
There is another very subtle point lurking here in the case where e₀ has type <Empty>
(the type of the empty list).
In this case, however, we need no special rule because in ChocoPy, there is no .append method to allow elements to be added to an empty list.
Were that not the case, we could get this situation:
A: [int] = None
B: [str] = None
A = B = []
A.append(3)
and we would subsequently have B[0]
returning the value 3, which is certainly not a string.
Function applications
O, R ⊢ e₁ : T''₁
O, R ⊢ e₂ : T''₂
…
O, R ⊢ eₙ : T''ₙ
n ≥ 0
O(f) = {T₁ ⨯ … ⨯ Tₙ -> T₀; x₁, …, xₙ; v₁ : T'₁, …, vₘ : T'ₘ}
∀ 1 ≤ i ≤ n : T''ᵢ ≤a Tᵢ
-------------------------------------------------------------------- [INVOKE]
O, R ⊢ f(e₁, e₂, …, eₙ) : T₀
To type check a function invocation, each of the arguments to the function must be first type checked. The type of each argument must conform to the types of the corresponding formal parameter of the function. The invocation expression is assigned the function’s declared return type.
Return Statements: This is where the return-type environment comes into play:
O, R ⊢ e : T
T ≤a R
------------------------- [RETURN-E]
O, R ⊢ return e
<None> ≤a R
------------------------- [RETURN]
O, R ⊢ return
Conditional statements
O, R ⊢ c : bool
O, R ⊢ b₀
O, R ⊢ b₁
------------------------------------------------------ [IF-ELSE]
O, R ⊢ if c:b₀ else:b₁
While statements
O, R ⊢ e : bool
O, R ⊢ b
------------------------- [WHILE]
O, R ⊢ while e:b
For statements
O, R ⊢ e : str
O(id) = T
str ≤a T
O, R ⊢ b
------------------------- [FOR-STR]
O, R ⊢ for id in e:b
O, R ⊢ e : [T₁]
O(id) = T
T₁ ≤a T
O, R ⊢ b
------------------------- [FOR-LIST]
O, R ⊢ for id in e:b
Function Definitions
T = T₀ if return type is present, <None> otherwise
O(f) = {T₁ x … x Tₙ → T; x₁, …, xₙ; v₁: T'₁, …, vₘ: T'ₘ}
O[T₁/x₁]…[Tₙ/xₙ][T'₁/v₁]…[T'ₘ/vₘ], T ⊢ b
------------------------------------------------------------------ [FUNC-DEF]
O, R ⊢ def f(x₁: T₁, …, xₙ: Tₙ) ⟦→ T₀⟧? : b
The Global Typing Environment: The following functions are predefined globally:
O(len) = {object -> int; arg}
O(print) = {object -> <None>; arg}
O(input) = {-> str}
As shown in the type checking skeleton, in order to type check the AST nodes, you need to be able to access their fields properly. The node fields are either attributes, which give information about the node, or regions.
Attributes can be accessed directly.
For the regions, you are interested in accessing the operations contained in the region.
In the case of VarDef
, we have the following code:
if isinstance(op, choco_ast.VarDef):
typed_var = op.typed_var.op
assert isinstance(typed_var, choco_ast.TypedVar)
id = typed_var.var_name.data
e1 = op.literal.op
So, op
is a VarDef
node.
We access the typed variable by accessing the .typed_var
region of the node and use .op
to access the operation in it.
Later, we access the name of the typed variable through the ".data" property of the attribute .var_name
.
In the last statement, we get the expression e1
assigned to the literal operation.
Other nodes will have more operations in their blocks, instead of just one. In this case we can use .ops
instead of .op
.
See again dialects/choco_ast.py
for more info.
As mentioned, the current coursework already provides a pass that checks the assignment targets,
as well as a name analysis check.
The choco-opt
tool has been updated to enable this extra functionality with the -p
flag,
which takes check-assign-target
and name-analysis
as arguments:
cd /path/to/coursework
tools/choco_opt.py -p name-analysis tests/name-analysis/global_write.mlir # Gives semantic error
# Easier way to invoke tools/choco_opt.py
choco-opt -p name-analysis tests/name-analysis/global_write.mlir # Gives semantic error
choco-opt -p check-assign-target tests/type-checking/assign-and-list/assign/assign_int.mlir # Passes
choco-opt -p check-assign-target tests/check-assign-target/addition.mlir # Gives semantic error
The type checking functionality is triggered with the argument type-checking
.
The other flags are also active in the given type checking test cases.
You can try the current minimal type-checking functionality by running:
cd /path/to/coursework
choco-opt -p check-assign-target,name-analysis,type-checking tests/type-checking/base-rules/empty.mlir # Passes
Note that you can run the current type-checking directly on a chocopy file by giving a .choc
file instead of a .mlir
file:
cd /path/to/coursework
choco-opt -p check-assign-target,name-analysis,type-checking your-test.choc
Note that you need to keep all three passes activated when developing your code.
As a shortcut, you can use the keyword type
as a pass, instead of typing all of them:
cd /path/to/coursework
choco-opt -p type tests/type-checking/base-rules/empty.mlir # Passes
Finally, you are only interested in passing the test cases of the tests/type-checking
folder (these will be the visible ones).
The function build_env
has already built the local environment, so it already provides the type information for each function.
The goal of the function definition rule is, given the function's type information, to type check the function body.
To this end, you need to carefully read the rule of the function definition, which requires to type check the function body using a different local environment that you will update with the information you got from the function.
Given that these features are checked by the name analysis pass (you can see the code to find out how), there isn't anything special you need to do about them.
Dead code is a section of source code that can either never be executed at run-time or whose result is never used in any computation. Removing the dead code will, therefore, not change the programs' observable behaviour.
The goal of task 2 is to write a pass that warns the programmer of dead code in the input program. You should not remove the dead code, but only warn the programmer about it.
Of course there exist many possible forms of dead code. We give examples of three different forms below.
The file choco/warn_dead_code.py
contains a function warn_dead_code
that is the entry point for the dead code warning pass. This function should print a warning if the program contains dead code.
To invoke the dead code analysis, add the warn-dead-code
pass in choco-opt
:
cd /path/to/coursework
choco-opt -p check-assign-target,name-analysis,type-checking,warn-dead-code your-test.choc
As a shortcut, you can use the keyword warn
as a pass, instead of typing all of them:
cd /path/to/coursework
choco-opt -p warn your-test.choc
You can use any way you like to access and analyse the AST of the program.
We recommend that you make use of the choco/ast_visitor.py
, that we have discussed in the Lecture and that is used for example in the choco/name_analysis.py
pass.
Use the provided exceptions to print a warning that the program contains dead code. If multiple forms of dead code are present, you can report any of them.
Here are all the type of dead code we will grade you on.
The following program has unreachable code
def foo():
return
print("DEAD") # Unreachable statement
return # Unreachable statement
Here, the print
statement is never executed, as the first return
statement will finish the execution of the function.
A statement can be unreachable for multiple reasons, such as because it is after a return
statement, or in
an if/else
branch that is never executed. Note that since pass
statements are ignored, an empty branch of an if/else
statement is not considered unreachable if it is never executed.
Expressions can also be unreachable, such as the right hand side of an and
expression that has the left hand side always evaluated to False
, since and
is short-circuiting.
Statements should be reported with UnreachableStatementsError
, while expressions should be reported with UnreachableExpressionError
.
Note that none of the exceptions require any arguments.
As a final example, consider:
def foo():
print("foo")
5 if True else foo()
This would give an unreachable expression, since the last statement itself is executed, but part of the expression inside the statement is not executed.
Hence, you should report UnreachableExpressionError
.
x: int = 0
x = 5 # Unused store
Here the first assignment x = 5
is dead code, because its result is not used.
Note that initialization values (like 0
here) are not considered dead code.
The following example has also an unused store, because it is overriden by the next assignment:
x: int = 0
x = 2 # Unused store
x = 6
print(x)
We limit the scope of unused stores to variables, ignoring the indices case. In particular, this means that we do not consider this program to have dead code:
x: [int] = None
x = [2, 3]
x[0] = 5
x[0] = 6
print(x[0])
Also, note that even if stores have side effects, they can be considered dead code, since it's the store itself that is not used. For instance, the following program is considered to have dead code:
def f(x: int) -> int:
print(x)
return x
x: int = 0
x = f(3) # Unused store
x = 6
print(x)
An unused store should be reported with UnusedStoreError
. It should have as parameter the Assign
operation
that is unused.
x: int = 0
Here, the variable declaration x: int = 0
is dead code, as the variable is never used.
Note that a variable is only considered unused if it is not used in any way. If it is used
in an unreachable expression or statement, it is still considered used.
These warnings should be reported with UnusedVariableError
, UnusedArgumentError
, and UnusedFunctionError
.
These exceptions are parametrized by the name of the unused variable, argument, or function.
As final example, consider:
def foo():
print("hello")
if False:
foo()
We do not consider this function unused, but the statement that calls it unreachable.
If a function is not used because an expression or statement is unreachable, then we should warn about the expression or statement first, since the function is still called (though never executed).
Thus, you should return an UnreachableStatementsError
.
x: int = 5
x * 2 # Unused expression
Here, the expression x * 2
is dead code, as the result of the expression is never used,
is not None
, and has no side-effects.
Be aware that expressions may have side-effects, such as print
, or x[5]
(which will
raise an error if x
is not long enough). However, if you can guarantee that the expression
has no side-effects, then it is dead code.
These warnings should be reported with UnusedExpressionError
. You should give as parameter the
expression that is unused.
Dead code may require complex analyses to be detected. In fact, dead code analysis is undecidable in our language. Like compiler engineers, you should first focus your analysis on simple cases that are easy to handle, and then try more complex analyses to detect more complex features.
To simplify your analysis, we will only consider programs with the following
operators: . if . else .
, or
, and
, not
, ==
, <
, +
, -(binary)
,
*
, //
, -(unary)
, [](index)
, [](list initialization)
. Note that all
statements can still appear. We also will not test your code on programs that
use nonlocal
and global
statements, or nested assignments (x = y = z
).
Your analysis should also not take into account overflows, but should take into
account that runtime errors (e.g. division by zero) are side-effects.
We recommend you to first implement a simple dead code analysis that only works
when expressions are constant (like if True: ...
), and then try to extend it
with a constant analysis.
We will evaluate task 2 strictly on our test cases, contrary to the task 2 of coursework 1 where you could provide your own.
You are encouraged to commit your changes regularly. This allows you to track the history of your changes so that you can revert to an earlier version of your code if you need to. It also protects you from losing any of your work in the case of a computer failure.
Furthermore, every time you commit and push your changes in the main
branch, your points are updated,
giving you continuous feedback on your implementation.
This badge shows how many successful test cases you've had so far.
These points are provisional and are not related to the final grade. The number of successful test cases is only an indicator of how strong your implementation is, with respect to how many language features it covers. Your implementation will be tested also on other test cases, so it is important that you write your own tests to ensure that your code is thoroughly tested.
Once you have pushed all of your changes to GitHub and you are happy with your code and your points, you're finished! We will grade your last submission on the main
branch.
Submissions after the deadline, will result in penalties without an approved extension. See "Assessment" on Learn for details.
Follow our academic guidelines and take advantage of Piazza to discuss any open questions.
For the following courseworks, the full lexer and parser will be provided.
Please remember the good scholarly practice requirements of the University regarding work for credit. You can find guidance at the School page:
https://web.inf.ed.ac.uk/infweb/admin/policies/academic-misconduct
This also has links to the relevant University pages.
The number of passing test cases is intended to give you an idea of the quality of the code. However, the actual grading of your coursework takes place after the deadline and takes into account public and hidden automatic tests, as well as potential manual reviews.
Submitted code will be checked for similarity with other submissions using the MOSS system. MOSS has been effective in the past at finding similarities and it is not fooled by name changes or reordering of code blocks. Courseworks are INDIVIDUAL, and we expect everyone to turn in their sole, independent work.
Extensions: Please refer to the "Assessment" page on Learn for information on extensions.
If you have questions, you should consult the lecture slides and recordings. If you have questions about the coursework, please start by checking existing discussions on Piazza. If you can't find the answer to your question, start a new discussion. It is quite possible that other students will have encountered and solved the same problem and will be able to help you. The TA will also monitor Piazza and clarify things as necessary, after allowing time for student discussion to take place.