-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #36 from gsbdarc/mason-edits
Mason edits two pages
- Loading branch information
Showing
3 changed files
with
78 additions
and
99 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,16 @@ We recommend taking a look at the [official documentation](https://jupyter-noteb | |
|
||
Notebooks allow you to write code and execute it within a web browser. Code is written into cells, which can be run in any order, on demand. You can also include text, images, and plots to make your code read like a lab notebook. As of March 2020, the above coding languages are supported. Contact the [DARC team](mailto:[email protected]) if you have a language you would like installed | ||
|
||
### Text File Editor | ||
------------------------- | ||
![](/assets/images/intro_to_yens/editor.png) | ||
|
||
Finally, you can also edit text files like R scripts directly on JupyterHub. Clicking on Text File icon will open a new file that you can edit. Similarly, clicking on Python File will create an empty `.py` file and clicking on R File will create an empty `.r` file. | ||
You can also navigate to a directory that has the scripts you want to edit and double click on the script name to open it up in the Text Editor. | ||
|
||
For example, navigate to `intro_yens_sep_2023` folder in file brower first then double click on `investment-npv-parallel.R` file to open it in the text editor: | ||
![](/images/intro_to_yens/edit-r-script.png) | ||
|
||
### RStudio | ||
----------- | ||
![](/images/jupyter_rstudio.png "RStudio") | ||
|
@@ -345,16 +355,5 @@ JupyterHub instance will shut down after 3 hours idle (no notebooks actively run | |
|
||
If your processes require more than these limits, reach out to the <a href="/services/researchSupportRequest.html" target="_blank">DARC team</a> for support. | ||
|
||
|
||
### Text File Editor | ||
------------------------- | ||
![](/images/intro_to_yens/editor.png) | ||
|
||
Finally, you can also edit text files like R scripts directly on JupyterHub. Clicking on Text File icon will open a new file that you can edit. Similarly, clicking on Python File will create an empty `.py` file and clicking on R File will create an empty `.r` file. | ||
You can also navigate to a directory that has the scripts you want to edit and double click on the script name to open it up in the Text Editor. | ||
|
||
For example, navigate to `intro_yens_sep_2023` folder in file brower first then double click on `investment-npv-parallel.R` file to open it in the text editor: | ||
![](/images/intro_to_yens/edit-r-script.png) | ||
|
||
--- | ||
<a href="/gettingStarted/7_transfer_files.html"><span class="glyphicon glyphicon-menu-left fa-lg" style="float: left;"/></a> <a href="/gettingStarted/9_run_jobs.html"><span class="glyphicon glyphicon-menu-right fa-lg" style="float: right;"/></a> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,10 @@ | ||
|
||
# Parallel Processing in Python | ||
|
||
## Common Python Libraries (Numpy, Sklearn, Pytorch, etc...) | ||
## Common Python Libraries (`numpy`, `sklearn`, `pytorch`, etc...) | ||
Some Python libraries will parallelize tasks for you. A few of these libraries include `numpy`, `sklearn`, and `pytorch`. If you are working on a shared system like the Yens, you may want to limit the amount of cores these packages can use. The following code should work for the packages listed above: | ||
|
||
Some Python libraries will parallelize tasks for you. A few of these libraries include numpy, sklearn, and pytorch. If you are working on a shared system like the Yens, you may want to limit the amount of cores these packages can use. The following code should work for the packages listed above: | ||
|
||
```python | ||
```python title="Python Code" | ||
import os | ||
os.environ["OMP_NUM_THREADS"] = "6" | ||
os.environ["OPENBLAS_NUM_THREADS"] = "6" | ||
|
@@ -21,9 +20,9 @@ import torch | |
Note that the core count is set *before* importing the packages. | ||
|
||
## Using the `multiprocessing` library | ||
This library has several methods to help you parallelize your code. The most common example is using the `Pool` object. In general, the `Pool` object works by applying a processing function you've created to a number of items you need processed. Take the following example: | ||
This library has several methods to help you parallelize your code. The most common example is using the `Pool` object. In general, the `Pool` object works by applying a processing function you've created to a number of items you need processed. Take the following example: | ||
|
||
```python | ||
```python title="Python Code" | ||
from multiprocessing import Pool | ||
|
||
def f(x): | ||
|
@@ -37,7 +36,7 @@ This code will open a `Pool` of 5 processes, and execute the function `f` over e | |
|
||
If you've got a directory full of files you need to process, this library can be very helpful. Look at this example: | ||
|
||
```python | ||
```python title="Python Code" | ||
import multiprocessing | ||
import os | ||
|
||
|
@@ -50,21 +49,22 @@ input_file_dir = '/path/to/input/files/' | |
output_file_dir = '/path/to/output/' | ||
|
||
input_files = [input_file_dir+filename for filename in os.listdir(input_file_dir)] | ||
output_files =[output_file_dir+filename for filename in os.listdir(input_file_dir)] | ||
output_files = [output_file_dir+filename for filename in os.listdir(input_file_dir)] | ||
|
||
with Pool(5) as p: | ||
results = p.map(process_file, zip(input_files,output_files)) | ||
results = p.map(process_file, zip(input_files,output_files)) | ||
``` | ||
|
||
You can check your processes and their core usage on the Yens using `htop`! | ||
!!! tip | ||
You can check your processes and their core usage on the Yens using `htop`! See [this page](/_user_guide/best_practices_monitor_usage/){:target="_blank"} for more information on monitoring your resource usage. | ||
|
||
## Example | ||
We will use `numpy` and `multiprocessing` packages to do a giant matrix inversion (which will | ||
take a long time to run so we have time to monitor our CPU utilization). | ||
|
||
Here is a python script (save it as `matrix_invert.py`): | ||
|
||
```python | ||
```python title="Python Code" | ||
import os | ||
|
||
# set number of CPUs to run on | ||
|
@@ -95,62 +95,52 @@ with Pool(6) as p: | |
results = p.map(f, data) | ||
``` | ||
|
||
In the above script, we are setting a few environment variables to limit the number of cores that `numpy` wants to use. | ||
In this case, we are setting the number of cores to 1. Then using `multiprocessing` package, we can create a parallel region | ||
in our code with the `Pool` object which will run a function `f` in parallel 100 times (for each of 100 elements in | ||
the `data` list). | ||
|
||
On the yens, before we start running the script, let's login in a second terminal window so we can monitor our | ||
CPU usage. Remember to login to the same yen machine! Check `hostname` in terminal 1 then `ssh` to the same yen in terminal 2. | ||
For example, | ||
In the above script, we are setting a few environment variables to limit the number of cores that `numpy` wants to use. In this case, we are setting the number of cores to 1. Then using `multiprocessing` package, we can create a parallel region in our code with the `Pool` object which will run a function `f` in parallel 100 times (for each of 100 elements in the `data` list). | ||
|
||
```bash | ||
$ hostname | ||
On the yens, before we start running the script, let's log into a second terminal window so we can monitor our CPU usage. Remember to login to the same yen machine! Check which yen you are on in terminal 1 by typing `hostname` then `ssh` to the same yen in terminal 2. For example, running `hostname` might output: | ||
|
||
# output | ||
yen4 | ||
```{.yaml .no-copy title="Terminal Output"} | ||
yen4 | ||
``` | ||
I am logged in to yen4 in my terminal window, so I will connect to the same yen in the second one: | ||
|
||
```bash | ||
$ ssh $USER@yen4.stanford.edu: | ||
```title="Terminal Command" | ||
ssh [email protected] | ||
``` | ||
Enter your password and authenticate with Duo. We will be running `htop` in this terminal to see how many | ||
cores our program is claiming as we change the parameters in the python script. | ||
|
||
Run: | ||
```bash | ||
$ htop -u $USER | ||
```title="Terminal Command" | ||
htop -u $USER | ||
``` | ||
where `$USER` is your SUNet ID. | ||
where `$USER` is your SUNet ID. | ||
|
||
You should see something like: | ||
|
||
![](/images/htop-no-processes.png) | ||
![](/assets/images/htop-no-processes.png) | ||
|
||
No python processes are running yet. | ||
|
||
Back in terminal one, let's start the python program and watch what happens to CPU usage with `htop` in terminal 2: | ||
Back in terminal one, let's start the Python program and watch what happens to CPU usage with `htop` in terminal 2: | ||
|
||
```bash | ||
$ python3 matrix_invert.py | ||
```title="Terminal Command" | ||
python3 matrix_invert.py | ||
``` | ||
|
||
Now you should see 6 python processes running with each CPU utilized close to 100%. | ||
|
||
![](/images/htop-6-python-processes.png) | ||
![](/assets/images/htop-6-python-processes.png) | ||
|
||
Those 6 processes come from `p.map()` command where `p` is the `Pool` of processes created with 6 CPU's. | ||
The environment variable set 1 core for each of the spawned processes so we end up with 6 CPU cores being | ||
efficiently utilized but not overloaded. | ||
The environment variable set 1 core for each of the spawned processes so we end up with 6 CPU cores being efficiently utilized but not overloaded. | ||
|
||
#### CPU overloading with `multiprocessing` | ||
It is easy to overload the CPU utilization and exceed 100% which will have a negative impact on performance of your code. | ||
If we were to change `ncore` parameter to say 6 and leave `Pool` as 6, we will end up overloading the 6 cores (spawning 6 processes with 6 cores each). | ||
It is easy to overload the CPU utilization and exceed 100% which will have a negative impact on performance of your code. If we were to change `ncore` parameter to say 6 and leave `Pool` as 6, we will end up overloading the 6 cores (spawning 6 processes with 6 cores each). | ||
|
||
Let's update the `ncore` to 6 in the python script, then for each process in the pool we will use `6 cores: | ||
Let's update the `ncore` to 6 in the python script, then for each process in the pool we will use 6 cores: | ||
|
||
```python | ||
```python title="Python Code" | ||
import os | ||
|
||
# set number of CPUs to run on | ||
|
@@ -183,15 +173,15 @@ with Pool(6) as p: | |
|
||
After the update, rerun the python script: | ||
|
||
```bash | ||
$ python3 matrix_invert.py | ||
```title="Terminal Command" | ||
python3 matrix_invert.py | ||
``` | ||
|
||
and watch `htop` in the other terminal and you should see something like: | ||
|
||
![](/images/htop-overload-cpus.png) | ||
![](/assets/images/htop-overload-cpus.png) | ||
|
||
There were 36 python processes spawned and CPU utilization exceeds 100% which the user would want to avoid. | ||
There are 36 python processes spawned and CPU utilization exceeds 100% which the user would want to avoid. | ||
|
||
{% include note.html content="Another way to limit the script to 6 cores but utilize them effiently is to set `ncore` to 6 but limit `Pool` to 1. | ||
You can think of Pool as setting the number of different processes with multiple CPU *cores* per process." %} | ||
!!! note | ||
Another way to limit the script to 6 cores but utilize them effiently is to set `ncore` to 6 but limit `Pool` to 1. You can think of Pool as setting the number of different processes with multiple CPU *cores* per process. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,76 +1,66 @@ | ||
--- | ||
date: | ||
created: 2023-09-20 | ||
pin: true | ||
categories: | ||
- File Editing | ||
authors: | ||
- jeffotter | ||
--- | ||
# Editing Files on the Command Line | ||
When working within JupyterHub, one can utilize the built-in Text Editor to [directly edit scripts on the Yens](/_getting_started/jupyter/#text-file-editor){:target="_blank"}. | ||
However, it is sometimes more convenient and faster to edit files directly from the command line, for instance if you are logged into a terminal and need to make small changes to your Slurm script prior to submission. | ||
|
||
# Edit Files on the Command Line | ||
In this post, we will illustrate how you can do this using the Vim text editor that comes with Linux distributions and can be used on any HPC system or server. As a specific example, we will make small changes to a Python script from the command line. | ||
|
||
To start, Vim has several modes: | ||
|
||
We can utilize JupyterHub Text Editor to <a href="/gettingStarted/8_jupyterhub.html#text-file-editor" target="_blank">directly edit scripts on the Yens</a>. | ||
However, sometimes it is more convenient to quickly edit files from the command line (without needing to go to a different program like the browser | ||
or editing on your local machine then transferring the edited files). | ||
|
||
We will use vim text editor that comes with Linux distributions and can be used on any HPC system or server. | ||
|
||
For the purposes of this course, we will make small changes to the slurm scripts or python scripts from the command line. | ||
Do not be discouraged as vim is notorious for its steep learning curve! With practice, it becomes a lot easier. | ||
|
||
Vim has several modes: | ||
- Command mode: in which the user issues editing commands such as search, replace, delete a block and so on | ||
but cannot type directly. When you open a new vi file, you will be in Command mode. | ||
- Insert mode: in which the user types in edits to the files. We can switch from Command mode to the Insert mode to type in by pressing the `i` key. | ||
When you are in Insert mode, the bottom of the editor displays `-- INSERT --`. Switch back to Command mode by pressing the `esc` key. | ||
- **Command mode**: This is the mode you will be in when you first enter Vim. The user issues commands such as search, replace, block deletion and so on, but cannot type new content directly. It is in this mode where you can also save the edited file. | ||
- **Insert mode**: The user types in content edits to files. We can switch from **Command mode** to this mode by pressing the `i` key. When you are in **Insert mode**, the bottom of the editor displays `-- INSERT --`. Switch back to **Command mode** by pressing the `esc` key. | ||
|
||
Let's open up a test file and edit it: | ||
|
||
```bash | ||
$ vi test.py | ||
```title="Terminal Command" | ||
vi test.py | ||
``` | ||
|
||
On the bottom of the editor, you should see: | ||
```bash | ||
On the bottom of the editor, we see: | ||
```{.yaml .no-copy title="Terminal Output"} | ||
"test.py" [New File] | ||
``` | ||
which says the name of the file you are editing and signals that you are in Command mode (default mode when opening a file). | ||
which presents the name of the file we are editing and signals that we are in **Command mode** (default mode when opening a file). | ||
|
||
Now if we want to start typing, press `i` key and make sure the bottom of the editor now says: | ||
```bash | ||
Now if we want to start typing content edits, we press the `i` key and make sure the bottom of the editor now says: | ||
```{.yaml .no-copy title="Terminal Output"} | ||
-- INSERT -- | ||
``` | ||
which indicates that **Insert mode** is activated. | ||
|
||
Then type some test python command: | ||
|
||
```python | ||
We then add in a line with a test python command: | ||
```python title="Python Code" | ||
print("hello world!") | ||
``` | ||
|
||
To change the position of the position cursor, we use the arrow keys or `h`, `j`, `k`, `l` keys. This will allow us to jump to a different line or position the cursor within a line. | ||
|
||
To change position of the mouse cursor use arrow keys (to jump to a different line or position the cursor within a line) | ||
or `h`, `j`, `k`, `l` keys. | ||
|
||
Let's save and quit the vi editor. First, press `esc` to go back to Command mode (the bottom of the editor should not say Insert). | ||
Then type `:wq` to write and quit vi. This should save the file and return you back to the command line. | ||
Let's now save and quit the Vim editor. First, we press `esc` to go back to **Command mode** (the bottom of the editor should no longer show `-- INSERT --`). | ||
Then type `:wq` to write and quit Vim. This should save the file and return you back to the command line. | ||
|
||
After you are back on the command line, let's make sure the file is saved correctly: | ||
|
||
```bash | ||
$ cat test.py | ||
```title="Terminal Command" | ||
cat test.py | ||
``` | ||
|
||
You should see the file's content that we created: | ||
|
||
```py | ||
```{.yaml .no-copy title="Terminal Output"} | ||
print("hello world!") | ||
``` | ||
|
||
Mostly we will stick to these three vi commands throughout the course: | ||
|
||
- `i` : switch to Insert mode from Command mode | ||
- `esc`: switch back to Command mode from Insert mode | ||
- `:wq` : to save file and quit out of vi (must be in Command mode) | ||
In summary, use: | ||
|
||
Download a <a href="https://drive.google.com/file/d/1sBbdrk_UcfX_tfy1jgxBaomwhDWKli2T/view?usp=sharing" target="_blank">short list of useful vim commands</a> to get going with vim | ||
or <a href="https://vim-adventures.com" target="_blank">learn vim while playing a game</a>. | ||
- `i` : switch to **Insert mode** from **Command mode** | ||
- `esc`: switch back to **Command mode** from **Insert mode** | ||
- `:wq` : save file and quit out of Vim (must be in **Command mode**) | ||
|
||
Once you get a hang of vi commands, you will have the power to edit files quickly and directly from the command line. | ||
Finally, you can download a [short list of useful Vim commands](https://drive.google.com/file/d/1sBbdrk_UcfX_tfy1jgxBaomwhDWKli2T/view?usp=sharing){:target="_blank"} to reference while using the editor | ||
and [learn Vim while playing a game](https://vim-adventures.com){:target="_blank"}. |