DCVis is software for Visual Knowledge Discovery (VKD) developed at Central Washington University's Visual Knowledge Discovery Lab. Initiated in 2022, the project is built by Alice Williams, James Battistoni IV, Charles Recaido, and Dr. Boris Kovalerchuk.
DCVis allows users to interactively build visual ML models for classifying new data. It offers tools to explore multidimensional data using General Line Coordinate visualizations and includes features for interactive classification and synthetic data generation to enhance supervised learning models.
Please see a quick 3 minute video to watch the software in action here.
- Python 3.x: Recommended version 3.6 or higher (primarily developed on Python 3.10.6).
- Dependencies: Install libraries from requirements.txt by running
pip install -r requirements.txt
in the terminal.
To set up the project:
-
Download the project and open it in a terminal:
- Use git to download and open in an IDE with an integrated terminal.
-
Create a virtual environment:
python -m venv venv
(The virtual environment will be named
venv
) -
Activate the virtual environment: (Refer to the table below for your OS and shell)
-
Install the required libraries:
python -m pip install -r requirements.txt
-
Start the application:
python DCVIS_MAIN.py
Platform | Shell | Command to activate virtual environment |
---|---|---|
POSIX | bash/zsh | $ source <venv>/bin/activate |
fish | $ source <venv>/bin/activate.fish |
|
csh/tcsh | $ source <venv>/bin/activate.csh |
|
PowerShell | $ <venv>/bin/Activate.ps1 |
|
Windows | cmd.exe | C:\> <venv>\Scripts\activate.bat |
PowerShell | PS C:\> <venv>\Scripts\Activate.ps1 |
Source: Python Virtual Environments Documentation
Execute the DCVIS_MAIN.py
script to launch the DCVis application:
python DCVIS_MAIN.py
DCVis offers tools for visualizing and analyzing multidimensional numerical data:
- 2D Visualizations: Use OpenGL to render multidimensional data from .csv or .txt files.
- Interactive Plotting: Pan, zoom, and adjust color/transparency options.
- Dynamic Visualization: Reorder or invert axes to uncover patterns in n-D data.
- Clipping Tools: Dual-right click to highlight, analyze, export, or hide data.
- Point Selection: Select points directly on the plot for detailed examination.
- Rule-Based Classification: Create and combine classification rules to build visual ML models.
- Enhanced Learning Models: Use interactive tools and synthetic data generation to improve supervised learning classifiers.
- Configurable Display: Customize background, class, and axes colors.
- Visibility Toggles: Show or hide axes, points, and other elements.
- Trace Mode: Use alternating colors to trace data points across visualization schemes.
- Interactive UI Elements: Manage data and visualization settings with intuitive controls.
- Hotkeys and Shortcuts: Access frequently used functions with keyboard shortcuts.
These features provide an interactive, user-friendly experience for data analysis.
- Pan Plot: Click and drag with the scroll-wheel.
- Zoom: Scroll the mouse wheel.
- Box Clipping: Right-click twice to create a clipping rectangle.
- Select Point: Left-click to select single or multiple points.
- Grow Clipping Box: Middle-click once to create clipping box, again to grow it.
- Cycle Selections:
Q
andE
keys. - Adjust Vertical Position:
W
andS
keys. - Delete Samples:
D
key. - Print Samples:
P
key. - Clone Samples:
C
key. - Insert Sample:
I
key.
- Highlight Clipped Cases: Clipped cases are highlighted; use 'Add Classification Rule' to convert to a new rule.
- Rule Visibility: Toggle visibility with the checkbox next to the rule.
- Reorder Tables: Drag and drop within class and attribute tables.
- Adjust Transparency: Use the slider below the attribute table.
Dataset | Cases | Features | Classes | File Name |
---|---|---|---|---|
Fisher Iris | 150 | 4 | 3 | fisher_iris.csv |
Breast Cancer Wisconsin (30 features) | 569 | 30 | 2 | breast-cancer-wisconsin.csv |
Breast Cancer Wisconsin (9 features) | 569 | 9 | 2 | breast-cancer-wisconsin-9f.csv |
Diabetes | 768 | 8 | 2 | diabetes.csv |
Heart Disease | 1,025 | 13 | 2 | heart_disease.csv |
Ionosphere | 351 | 34 | 2 | ionosphere.csv |
MNIST Capital Letters | 20,000 | 16 | 26 | MNIST_letters.csv |
Sinusoidal Wave | 200 | 2 | 2 | sin_cos.csv |
Wheat Seed | 210 | 7 | 3 | wheat_seeds.csv |
Wine | 4,898 | 11 | 7 | wine.csv |
Fisher Iris 2 (doubled classes) | 300 | 4 | 6 | fisher_iris2.csv |
Fisher Iris Setosa vs Versicolor | 100 | 4 | 2 | fisher_iris_SvVe.csv |
Iris Setosa | 50 | 4 | 1 | iris_setosa.csv |
Iris Setosa vs Virginica | 100 | 4 | 2 | iris_S_vs_VV.csv |
Iris Versicolor vs Virginica | 100 | 4 | 2 | iris_V_vs_V.csv |
Iris Setosa vs Versicolor vs Virginica | 150 | 4 | 3 | iris_SVe_vs_Vi.csv |
Synthetic Grades (100 cases) | 100 | 3 | 3 | synthetic_grades_100case.csv |
Synthetic Grades (250 cases) | 250 | 3 | 3 | synthetic_grades_250case.csv |
Synthetic Grades (25 cases) | 25 | 3 | 3 | synthetic_grades_25case.csv |
Synthetic Grades (50 cases) | 50 | 3 | 3 | synthetic_grades_50case.csv |
Synthetic Grades (75 cases) | 75 | 3 | 3 | synthetic_grades_75case.csv |
MNIST Capital Letters A vs B | 10,000 | 16 | 2 | MNIST_letters_AvB.csv |
Artificial Test | 10 | 2 | 2 | artiftest.txt |
Datasets included meet data formatting requirements listed in the next section. Files placed in subfolders of the datasets folder are in the .gitignore to store personal data.
DCVis works with structured numerical datasets and requires a data format of:
- File Format: .txt or .csv
- Header Row: Must include headers for feature names and the label column.
- Label Column: Must be named 'class'.
- Feature Columns:
- Variable number of columns
- Labeled in the header
- Must contain numerical data only
These requirements ensure proper data recognition and processing for accurate visualizations in DCVis. Datasets included meet these requirements.
Generalized vertex class included for adding visualization methods.
This visualization tool features multiple visualisation methods:
- Parallel Coordinates (PC)
- Shifted Paired Coordinates (SPC)
- Dynamic Scaffold Coordinates 1 (DSC1)
- Dynamic Scaffold Coordinates 2 (DSC2)
- Static Circular Coordinates (SCC)
- Dynamic Circular Coordinates (DCC)
93.33% classification of Iris with three rules found in SPC
DCC Iris Setosa and Versicolor classes with LDA coefficients separation
If you encounter any difficulties with the software please submit an issue with screenshots of the behavior occurring and explanation of the expected behavior.
- [Preprint] Synthetic Data Generation and Automated Multidimensional Data Labeling for AI/ML in General and Circular Coordinates by Alice Williams and Dr. Boris Kovalerchuk
- Interpretable Machine Learning for Self-Service High-Risk Decision-Making by Charles Recaido and Dr. Boris Kovalerchuk
DCVis is a complete rebuild of the DSCVis software, with new visualizations and enhanced tools.
DCVis is licensed under the MIT License, allowing both personal and commercial use. For full details, see the LICENSE
file.