Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the documentation #85

Merged
merged 7 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual
identity and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
and learning from the experience
* Focusing on what is best not just for us as individuals, but for the overall
community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or advances of
any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email address,
without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.

Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
CommunityCodeOfConduct AT intel DOT com.
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series of
actions.

**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or permanent
ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior, harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within the
community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.1, available at
[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].

Community Impact Guidelines were inspired by
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].

For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
[https://www.contributor-covenant.org/translations][translations].

[homepage]: https://www.contributor-covenant.org
[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
132 changes: 71 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,38 @@
# Code Base Investigator
Code Base Investigator (CBI) is a tool designed to help developers reason about the use of _specialization_ (i.e. code written specifically to provide support for or improve performance on some set of platforms) in a code base. Specialization is often necessary, but how a developer chooses to express it may impact code portability and future maintenance costs.

The [definition of platform](https://doi.org/10.1016/j.future.2017.08.007) used by CBI is deliberately very flexible and completely user-defined; a platform can represent any execution environment for which code may be specialized. A platform could be a compiler, an operating system, a micro-architecture or some combination of these options.
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.5018974.svg)](https://doi.org/10.5281/zenodo.5018974)
[![OpenSSF Best Practices](https://www.bestpractices.dev/projects/8679/badge)](https://www.bestpractices.dev/projects/8679)

## Code Divergence
CBI measures the amount of specialization in a code base using [code divergence](http://doi.org/10.1109/P3HPC.2018.00006), which is defined as the arithmetic mean pair-wise distance between the code-paths used by each platform.
Code Base Investigator (CBI) is an analysis tool that provides insight into the
portability and maintainability of an application's source code.

At the two extremes, a code divergence of 0 means that all of the platforms use exactly the same code, while a code divergence of 1 means that there is no code shared between any of the platforms. The code divergence of real codes will fall somewhere in between.
- Measure [code divergence](http://doi.org/10.1109/P3HPC.2018.00006) to
understand how much code is specialized for different compilers, operating
systems, hardware micro-architectures and more.

## How it Works
![Abstract Syntax Tree](./docs/example-ast.png)
- Visualize the distance between the code paths used to support different
compilation targets.

CBI tracks specialization in two forms: source files that are not compiled for all platforms; and regions of source files that are guarded by C preprocessor directives (e.g. `#ifdef`). A typical run of CBI consists of a three step process:
1) Extract source files and compilation commands from a configuration file or compilation database.
2) Build an AST representing which source lines of code (LOC) are associated with each specialization.
3) Record which specializations are used by each platform.
- Identify stale, legacy, code paths that are unused by any compilation target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it directly? Or is it just by going through the gambit of finding residuals after running every possible target?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter, I think. Code in a file that isn't touched by any platform in the configuration file gets associated with the empty platform set, and also called out as a percentage of SLOC:

---------------------------------------------
                      Platform Set LOC % LOC
---------------------------------------------
                                {}   2  4.88
                           {GPU 1}   1  2.44
                           {GPU 2}   1  2.44
                           {CPU 2}   1  2.44
                           {CPU 1}   1  2.44
                            {FPGA}  14 34.15
                    {GPU 2, GPU 1}   6 14.63
                    {CPU 1, CPU 2}   6 14.63
{FPGA, CPU 1, GPU 2, GPU 1, CPU 2}   9 21.95
---------------------------------------------
Code Divergence: 0.55
Unused Code (%): 4.88
Total SLOC: 41

We can't identify code we don't see at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth considering this as a feature in the future and have some routines that sniff the files to see if they could be code or not, and have a "lines of potentially stale code" value as well since unused code is still a maintenance burden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. I've opened #86 to track this.

The legacy interface (arguably) had better support for this, because the user was required to describe all the code in the codebase (via globs) in addition to providing the compile commands. But it still wasn't perfect, because we couldn't identify source files outside of the globs. Actually walking the source directory and alerting the user to anything we find would be much more useful.


## Usage
- Export metrics and code path information required for P3 analysis using [other
tools](https://intel.github.io/p3-analysis-library/).

The `codebasin` script analyzes a code base described in a YAML configuration file and produces one or more output reports. Example configuration files can be found in the [examples](./examples) directory, and see the [configuration file documentation](docs/configuration.md) for a detailed description of the configuration file format.

To see a complete list of `codebasin` options, run `codebasin -h`.
## Table of Contents

> [!IMPORTANT]
> In previous releases of Code Base Investigator, the main script was called `codebasin.py`. The old naming was a bug that needed to be fixed, and we made the difficult decision to rename the script ahead of the next major release.
- [Dependencies](#dependencies)
- [Installation](#installation)
- [Getting Started](#getting-started)
- [Contribute](#contribute)
- [License](#license)
- [Security](#security)
- [Code of Conduct](#code-of-conduct)
- [Citations](#citations)

### Summary Report
The summary report (`-R summary`) gives a high-level summary of a code base, as shown below:
```
---------------------------------------------
Platform Set LOC % LOC
---------------------------------------------
{} 2 4.88
{GPU 1} 1 2.44
{GPU 2} 1 2.44
{CPU 2} 1 2.44
{CPU 1} 1 2.44
{FPGA} 14 34.15
{GPU 2, GPU 1} 6 14.63
{CPU 1, CPU 2} 6 14.63
{FPGA, CPU 1, GPU 2, GPU 1, CPU 2} 9 21.95
---------------------------------------------
Code Divergence: 0.55
Unused Code (%): 4.88
Total SLOC: 41
```
Each row in the table shows the amount of code that is unique to a given set of platforms. Listed below the table are the computed code divergence, the amount of code in the code base that was not compiled for any platform, and the total size of the code base.

### Clustering Report
The clustering report (`-R clustering`) consists of a pair-wise distance matrix, showing the ratio of platform-specific code to code used by both platforms. These distances are the same as those used to compute code divergence.
```
Distance Matrix
-----------------------------------
FPGA CPU 1 GPU 2 GPU 1 CPU 2
-----------------------------------
FPGA 0.00 0.70 0.70 0.70 0.70
CPU 1 0.70 0.00 0.61 0.61 0.12
GPU 2 0.70 0.61 0.00 0.12 0.61
GPU 1 0.70 0.61 0.12 0.00 0.61
CPU 2 0.70 0.12 0.61 0.61 0.00
-----------------------------------
```

The distances can also be used to produce a dendrogram, showing the result of hierarchical clustering by platform similarity:

![Dendrogram](./docs/example-dendrogram.png)

## Dependencies

- jsonschema
- Matplotlib
- NumPy
Expand All @@ -75,15 +41,59 @@ The distances can also be used to produce a dendrogram, showing the result of hi
- PyYAML
- SciPy

CBI and its dependencies can be installed using `setup.py`:

## Installation

The latest release of CBI is version 1.2.0. To download and install this
laserkelvin marked this conversation as resolved.
Show resolved Hide resolved
release, run the following:

```
python3 setup.py install
git clone --branch 1.2.0 https://github.com/intel/code-base-investigator.git
cd code-base-investigator
pip install .
```

The master branch of CBI is the development branch, and should not be used in production. Tagged releases are available [here](https://github.com/intel/code-base-investigator/releases).
We strongly recommend installing CBI within a [virtual
environment](https://docs.python.org/3/library/venv.html).

## Getting Started

After installation, run `codebasin -h` to see a complete list of options.

A full tutorial can be found in the [online
documentation](https://intel.github.io/code-base-investigator/).
laserkelvin marked this conversation as resolved.
Show resolved Hide resolved


## Contribute

Contributions to CBI are welcome in the form of issues and pull requests.

See [CONTRIBUTING](CONTRIBUTING.md) for more information.


## License

[BSD 3-Clause](./LICENSE)

## Contributing
See the [contribution guidelines](./CONTRIBUTING.md) for details.

## Security

See [SECURITY](SECURITY.md) for more information.

The main branch of CBI is the development branch, and should not be used in
production. Tagged releases are available
[here](https://github.com/intel/code-base-investigator/releases).


## Code of Conduct

Intel has adopted the Contributor Covenant as the Code of Conduct for all of
its open source projects. See [CODE OF CONDUCT](CODE_OF_CONDUCT.md) for more
information.


## Citations

If your use of CBI results in a research publication, please consider citing
the software and/or the papers that inspired its functionality (as
appropriate). See [CITATION](CITATION.cff) for more information.
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
78 changes: 0 additions & 78 deletions docs/configuration.md

This file was deleted.

Loading
Loading