Skip to content

Commit

Permalink
[Docs] Updates for SDG README
Browse files Browse the repository at this point in the history
Signed-off-by: Kelly Brown <[email protected]>
  • Loading branch information
kelbrown20 committed Sep 19, 2024
1 parent 432c2d1 commit 70ceff8
Showing 1 changed file with 74 additions and 1 deletion.
75 changes: 74 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,81 @@
# sdg
# Synthetic Data Generation (SDG)

![Lint](https://github.com/instructlab/sdg/actions/workflows/lint.yml/badge.svg?branch=main)
![Build](https://github.com/instructlab/sdg/actions/workflows/pypi.yaml/badge.svg?branch=main)
![Release](https://img.shields.io/github/v/release/instructlab/sdg)
![License](https://img.shields.io/github/license/instructlab/sdg)

Python library for Synthetic Data Generation

## Introduction

Synthetic Data Generation (SDG) is a process that creates an artificially generated dataset that mimics real data based on provided examples. SDG uses a YAML file containing question-and-answer pairs as input data.

## Installing the SDG library

Clone the library and navigate to the repo:

```bash
git clone https://github.com/instructlab/sdg
cd sdg
```

Install the library:

```bash
pip install .
```

## Using the library

You can use the SDG library with the following items

```bash
from instructlab.sdg.generate_data import generate_data
from instructlab.sdg.utils import GenerateException
```

<!--Not sure what more your thinking of adding here -->

## Pipelines

There are four pipelines that are used in SDG. Each pipeline requires specific hardware specifications.
<!--TODO: Add explanations of pipelines-->

*Full* -

Minimum hardware requirements for running the simple pipeline

*Simple* -

Minimum hardware requirements for running the simple pipeline

### Pipeline architecture

All the pipelines are written in YAML format.

Check failure on line 55 in README.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Trailing spaces

README.md:55:46 MD009/no-trailing-spaces Trailing spaces [Expected: 0 or 2; Actual: 1] https://github.com/DavidAnson/markdownlint/blob/v0.34.0/doc/md009.md

Knowledge:

Grounded Skills:

Freeform Skills:

<!--TODO: Add content here-->

## Repository structure

```bash
|-- sdg/src/instructlab/pipelines/ (1)
|-- sdg/src/instructlab/configs/ (2)
|-- sdg/src/instructlab/utils/ (3)
|-- sdg/docs/ (4)
|-- sdg/scripts/ (5)
|-- sgd/tests/ (6)
```

1. Contains the YAML code that configures the SDG pipelines
2.
3.
4.
5.
6. Contains all the CI tests for the SDG repository

0 comments on commit 70ceff8

Please sign in to comment.