Skip to content

Commit

Permalink
Merge pull request #283 from gridai/dev
Browse files Browse the repository at this point in the history
0.8.58 Release
  • Loading branch information
Esther Quansah authored Jun 7, 2022
2 parents 3a855ab + b23ad5f commit a75bb66
Show file tree
Hide file tree
Showing 7 changed files with 119 additions and 5 deletions.
58 changes: 58 additions & 0 deletions changelog/2022-06-07.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
## :zap: June 7, 2022

**CLI version: 0.8.58**


## Grid Cloud Instance Types

We've made some changes to the platform that will impact start times for Sessions and Runs.

As a result of these changes, you'll experience longer start times for Sessions and Runs that use the `p3.2xlarge` instance type. If you're looking for a faster start time, we suggest using the `g4dn.xlarge` instance type instead.

**In future Grid releases, the following instance types will be supported:**

| Name | CPU | GPU | Memory | Accelerator | numberOfAccelerators acceleratorType availableMemory |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **m5a.large (recommended for fast startup times)** | 2 | 0 | 8 | CPU | 2_CPU_8GB |
| m5a.2xlarge | 8 | 0 | 32 | CPU | 8_CPU_32GB |
| **g4dn.xlarge (recommended for fast startup times)** | 4 | 1 | 16 | T4 | 1_T4_16GB |
| p3.2xlarge | 8 | 1 | 61 | V100 | 1_V100_61GB |
| p3.8xlarge | 32 | 4 | 244 | V100 | 4_V100_244GB |



### Why have we made these changes?

We closely monitor usage of Grid and are always looking for improvements that will make the platform more straightforward, easier to use, and cost-effective.
In changing how we manage certain instance types, we're able to offer faster start times on cheaper instances. Managing these instance types is a key area that will make Grid more sustainable and less expensive to use in the long term. We always want to ensure that Grid users are getting the compute resources they need at a price that is fair and transparent.

### BYOC Instance Types

If you are currently using the BYOC feature, you will continue to have access to the full list of [supported AWS instance types](../docs/platform/3_machines.md#machines). If you are not currently using BYOC and want access to or information about additional instance types, reach out to us at [email protected].


If you've got questions about these changes, reach out to us at [email protected].

## Fixes and Enhancements

- Adds UI support for [skipping parameter evaluation](../docs/features/runs/1_Creating%20Runs/1_Basic%20Runs/3_sweep-syntax.md#skipping-parameter-evaluation) when running hyperparemeter sweeps

- Improvements to the process of integrating Grid with public and private Github organizations

- BYOC users: Fixes issue with starting runs with unavailable instance types. If the default instance type is not available, the first instance in the specified list of instances will be used instead

- Stability improvements in the UI to make analzying experiment results a better experience

- Better error messaging in the CLI

- Fixes CLI issue where users could only retrieve the 50 most recent runs. To request details for a specific run in your run history, use `grid status RUN_NAME`

## :warning: Known Issues

- When creating a run in the UI, specify the path to the Github repo where the script is located. Providing the URL to the specific script is not currently supported.

- When creating a Datastore, data directories that contain soft symlinks files will cause the Datastore upload to fail. To prevent this failure, update soft symlinks to hard links.


---

56 changes: 56 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,62 @@ Upgrade your CLI with `pip install lightning-grid --upgrade`
:heart: Find us in our [Slack Community](http://gridai-community.slack.com) to say hi and/or to express your thoughts/questions.

---
## :zap: June 7, 2022

**CLI version: 0.8.58**

Today's release includes several bug fixes that improve the Grid experience.

## Grid Cloud Instance Types

We've made some changes to the platform that will impact start times for Sessions and Runs.

As a result of these changes, you'll experience longer start times for Sessions and Runs that use the `p3.2xlarge` instance type. If you're looking for a faster start time, we suggest using the `g4dn.xlarge instance type instead`.

**In future Grid releases, the following instance types will be supported:**

| Name | CPU | GPU | Memory | Accelerator | numberOfAccelerators acceleratorType availableMemory |
| :--- | :--- | :--- | :--- | :--- | :--- |
| m5a.large (recommended for fast startup times) | 2 | 0 | 8 | CPU | 2_CPU_8GB |
| m5a.2xlarge | 8 | 0 | 32 | CPU | 8_CPU_32GB |
| g4dn.xlarge (recommended for fast startup times) | 4 | 1 | 16 | T4 | 1_T4_16GB |
| p3.2xlarge | 8 | 1 | 61 | V100 | 1_V100_61GB |
| p3.8xlarge | 32 | 4 | 244 | V100 | 4_V100_244GB |



### Why have we made these changes?

We closely monitor usage of Grid and are always looking for improvements that will make the platform more straightforward, easier to use, and cost-effective.
In changing how we manage certain instance types, we're able to offer faster start times on cheaper instances. Managing these instance types is a key area that will make Grid more sustainable and less expensive to use in the long term. We always want to ensure that Grid users are getting the compute resources they need at a price that is fair and transparent.

### BYOC Instance Types

If you are currently using the BYOC feature, you will continue to have access to the full list of [supported AWS instance types](../docs/platform/3_machines.md#machines). If you are not currently using BYOC and want access to or information about additional instance types, reach out to us at [email protected].


If you've got questions about these changes, reach out to us at [email protected].

## Fixes and Enhancements

- Improvements to the process of integrating Grid with public and private Github organizations

- BYOC users: Fixes issue with starting runs with unavailable instance types. If the default instance type is not available, the first instance in the specified list of instances will be used instead.

- Better error messaging in the CLI!

- Fixes CLI issue where users could only retrieve the 50 most recent runs. To request details for a specific run in your run history, use `grid status RUN_NAME`.

## :warning: Known Issues

- When creating a run in the UI, specify the path to the Github repo where the script is located. Providing the URL to the specific script is not currently supported.

- When creating a Datastore, data directories that contain soft symlinks files will cause the Datastore upload to fail. To prevent this failure, update soft symlinks to hard links.


---


## :partying_face: May 17, 2022

**CLI version: 0.8.47**
Expand Down
4 changes: 2 additions & 2 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -481,8 +481,8 @@ grid run [OPTIONS] [RUN_COMMAND]...
| `--seed` | text | Seed value for the `random_search` strategy | None |
| `--instance_type` | text | Instance type to start training session in | `t2.medium` |
| `--gpus` | integer | Number of GPUs to allocate per experiment | `0` |
| `--cpus` | integer | Number of CPUs to allocate per experiment | `1` |
| `--memory` | text | How much memory an experiment needs | `100` |
| `--cpus` | integer | Number of CPUs to allocate per experiment. This parameter also affects memory (RAM) allocating for your experiment using the following rule: the amount of memory for the experiments will be allocated in the same proportion as the CPU allocated for the instance type chosen for the experiments. For example, if you plan to choose a machine with 16 CPUs and 64 Gb RAM and use a default value of CPUs (1 CPU) for your experiments, 1/16 * 64 Gb = 4 Gb of RAM will be allocated per each experiment. | `1` |
| `--memory` | text | How much disk memory (storage) an experiment needs, Gb | `100` |
| `--datastore_name` | text | Datastore name to be mounted in training | None |
| `--datastore_version` | integer | Datastore version to be mounted in training | None |
| `--datastore_mount_dir` | text | Directory to mount Datastore in training job. The default datastore mount location is /datastores | None |
Expand Down
2 changes: 1 addition & 1 deletion docs/features/runs/1_README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,4 +107,4 @@ Grid Run respects the use of .ignore files; these files are used to tell a progr
![](/images/runs/run_start.gif)

# Next Steps
Check out our documentation on [using runs](https://docs.grid.ai/features/runs/creating-basic-runs)
Check out our documentation on [using runs](https://docs.grid.ai/features/runs/Creating%20Runs/Basic%20Runs/basic-runs)
2 changes: 1 addition & 1 deletion docs/features/runs/3_faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pip freeze > requirements.txt
It's as easy as running `grid artifacts my-run-name`! This will download all artifacts fromthe run into a new directory called `grid_artifacts`.

### From the UI
https://user-images.githubusercontent.com/47154698/146597173-30a6f5af-4ecc-4958-866a-95ddb1ba70e0.mp4
<Video src="https://user-images.githubusercontent.com/47154698/146597173-30a6f5af-4ecc-4958-866a-95ddb1ba70e0.mp4" type="video/mp4"/>

## How long are artifacts stored?
Artifacts are stored until the run or experiment that generated the artifacts is deleted.
Expand Down
2 changes: 1 addition & 1 deletion docs/support/2_maintneance_windows.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Grid.ai is committed to continually minimizing the customer impact during the ma

The current maintenance window is:

- 9 AM - 10 AM Eastern Mon - Fri
- 10 AM - 11 AM Eastern Mon - Fri

:::note
The maintenance window can be shorter than the published maintenance window without notice.
Expand Down
Binary file added static/images/runs/downloading-artifacts.mp4
Binary file not shown.

0 comments on commit a75bb66

Please sign in to comment.