Skip to content

Commit

Permalink
Merge pull request #36 from gsbdarc/mason-edits
Browse files Browse the repository at this point in the history
Mason edits two pages
  • Loading branch information
mpj2104 authored Oct 24, 2024
2 parents b680d78 + c212cbf commit 50d294a
Show file tree
Hide file tree
Showing 3 changed files with 78 additions and 99 deletions.
21 changes: 10 additions & 11 deletions docs/_getting_started/jupyter.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,16 @@ We recommend taking a look at the [official documentation](https://jupyter-noteb

Notebooks allow you to write code and execute it within a web browser. Code is written into cells, which can be run in any order, on demand. You can also include text, images, and plots to make your code read like a lab notebook. As of March 2020, the above coding languages are supported. Contact the [DARC team](mailto:[email protected]) if you have a language you would like installed

### Text File Editor
-------------------------
![](/assets/images/intro_to_yens/editor.png)

Finally, you can also edit text files like R scripts directly on JupyterHub. Clicking on Text File icon will open a new file that you can edit. Similarly, clicking on Python File will create an empty `.py` file and clicking on R File will create an empty `.r` file.
You can also navigate to a directory that has the scripts you want to edit and double click on the script name to open it up in the Text Editor.

For example, navigate to `intro_yens_sep_2023` folder in file brower first then double click on `investment-npv-parallel.R` file to open it in the text editor:
![](/images/intro_to_yens/edit-r-script.png)

### RStudio
-----------
![](/images/jupyter_rstudio.png "RStudio")
Expand Down Expand Up @@ -345,16 +355,5 @@ JupyterHub instance will shut down after 3 hours idle (no notebooks actively run

If your processes require more than these limits, reach out to the <a href="/services/researchSupportRequest.html" target="_blank">DARC team</a> for support.


### Text File Editor
-------------------------
![](/images/intro_to_yens/editor.png)

Finally, you can also edit text files like R scripts directly on JupyterHub. Clicking on Text File icon will open a new file that you can edit. Similarly, clicking on Python File will create an empty `.py` file and clicking on R File will create an empty `.r` file.
You can also navigate to a directory that has the scripts you want to edit and double click on the script name to open it up in the Text Editor.

For example, navigate to `intro_yens_sep_2023` folder in file brower first then double click on `investment-npv-parallel.R` file to open it in the text editor:
![](/images/intro_to_yens/edit-r-script.png)

---
<a href="/gettingStarted/7_transfer_files.html"><span class="glyphicon glyphicon-menu-left fa-lg" style="float: left;"/></a> <a href="/gettingStarted/9_run_jobs.html"><span class="glyphicon glyphicon-menu-right fa-lg" style="float: right;"/></a>
80 changes: 35 additions & 45 deletions docs/_user_guide/best_practices_parallel_processing_python.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@

# Parallel Processing in Python

## Common Python Libraries (Numpy, Sklearn, Pytorch, etc...)
## Common Python Libraries (`numpy`, `sklearn`, `pytorch`, etc...)
Some Python libraries will parallelize tasks for you. A few of these libraries include `numpy`, `sklearn`, and `pytorch`. If you are working on a shared system like the Yens, you may want to limit the amount of cores these packages can use. The following code should work for the packages listed above:

Some Python libraries will parallelize tasks for you. A few of these libraries include numpy, sklearn, and pytorch. If you are working on a shared system like the Yens, you may want to limit the amount of cores these packages can use. The following code should work for the packages listed above:

```python
```python title="Python Code"
import os
os.environ["OMP_NUM_THREADS"] = "6"
os.environ["OPENBLAS_NUM_THREADS"] = "6"
Expand All @@ -21,9 +20,9 @@ import torch
Note that the core count is set *before* importing the packages.

## Using the `multiprocessing` library
This library has several methods to help you parallelize your code. The most common example is using the `Pool` object. In general, the `Pool` object works by applying a processing function you've created to a number of items you need processed. Take the following example:
This library has several methods to help you parallelize your code. The most common example is using the `Pool` object. In general, the `Pool` object works by applying a processing function you've created to a number of items you need processed. Take the following example:

```python
```python title="Python Code"
from multiprocessing import Pool

def f(x):
Expand All @@ -37,7 +36,7 @@ This code will open a `Pool` of 5 processes, and execute the function `f` over e

If you've got a directory full of files you need to process, this library can be very helpful. Look at this example:

```python
```python title="Python Code"
import multiprocessing
import os

Expand All @@ -50,21 +49,22 @@ input_file_dir = '/path/to/input/files/'
output_file_dir = '/path/to/output/'

input_files = [input_file_dir+filename for filename in os.listdir(input_file_dir)]
output_files =[output_file_dir+filename for filename in os.listdir(input_file_dir)]
output_files = [output_file_dir+filename for filename in os.listdir(input_file_dir)]

with Pool(5) as p:
results = p.map(process_file, zip(input_files,output_files))
results = p.map(process_file, zip(input_files,output_files))
```

You can check your processes and their core usage on the Yens using `htop`!
!!! tip
You can check your processes and their core usage on the Yens using `htop`! See [this page](/_user_guide/best_practices_monitor_usage/){:target="_blank"} for more information on monitoring your resource usage.

## Example
We will use `numpy` and `multiprocessing` packages to do a giant matrix inversion (which will
take a long time to run so we have time to monitor our CPU utilization).

Here is a python script (save it as `matrix_invert.py`):

```python
```python title="Python Code"
import os

# set number of CPUs to run on
Expand Down Expand Up @@ -95,62 +95,52 @@ with Pool(6) as p:
results = p.map(f, data)
```

In the above script, we are setting a few environment variables to limit the number of cores that `numpy` wants to use.
In this case, we are setting the number of cores to 1. Then using `multiprocessing` package, we can create a parallel region
in our code with the `Pool` object which will run a function `f` in parallel 100 times (for each of 100 elements in
the `data` list).

On the yens, before we start running the script, let's login in a second terminal window so we can monitor our
CPU usage. Remember to login to the same yen machine! Check `hostname` in terminal 1 then `ssh` to the same yen in terminal 2.
For example,
In the above script, we are setting a few environment variables to limit the number of cores that `numpy` wants to use. In this case, we are setting the number of cores to 1. Then using `multiprocessing` package, we can create a parallel region in our code with the `Pool` object which will run a function `f` in parallel 100 times (for each of 100 elements in the `data` list).

```bash
$ hostname
On the yens, before we start running the script, let's log into a second terminal window so we can monitor our CPU usage. Remember to login to the same yen machine! Check which yen you are on in terminal 1 by typing `hostname` then `ssh` to the same yen in terminal 2. For example, running `hostname` might output:

# output
yen4
```{.yaml .no-copy title="Terminal Output"}
yen4
```
I am logged in to yen4 in my terminal window, so I will connect to the same yen in the second one:

```bash
$ ssh $USER@yen4.stanford.edu:
```title="Terminal Command"
ssh [email protected]
```
Enter your password and authenticate with Duo. We will be running `htop` in this terminal to see how many
cores our program is claiming as we change the parameters in the python script.

Run:
```bash
$ htop -u $USER
```title="Terminal Command"
htop -u $USER
```
where `$USER` is your SUNet ID.
where `$USER` is your SUNet ID.

You should see something like:

![](/images/htop-no-processes.png)
![](/assets/images/htop-no-processes.png)

No python processes are running yet.

Back in terminal one, let's start the python program and watch what happens to CPU usage with `htop` in terminal 2:
Back in terminal one, let's start the Python program and watch what happens to CPU usage with `htop` in terminal 2:

```bash
$ python3 matrix_invert.py
```title="Terminal Command"
python3 matrix_invert.py
```

Now you should see 6 python processes running with each CPU utilized close to 100%.

![](/images/htop-6-python-processes.png)
![](/assets/images/htop-6-python-processes.png)

Those 6 processes come from `p.map()` command where `p` is the `Pool` of processes created with 6 CPU's.
The environment variable set 1 core for each of the spawned processes so we end up with 6 CPU cores being
efficiently utilized but not overloaded.
The environment variable set 1 core for each of the spawned processes so we end up with 6 CPU cores being efficiently utilized but not overloaded.

#### CPU overloading with `multiprocessing`
It is easy to overload the CPU utilization and exceed 100% which will have a negative impact on performance of your code.
If we were to change `ncore` parameter to say 6 and leave `Pool` as 6, we will end up overloading the 6 cores (spawning 6 processes with 6 cores each).
It is easy to overload the CPU utilization and exceed 100% which will have a negative impact on performance of your code. If we were to change `ncore` parameter to say 6 and leave `Pool` as 6, we will end up overloading the 6 cores (spawning 6 processes with 6 cores each).

Let's update the `ncore` to 6 in the python script, then for each process in the pool we will use `6 cores:
Let's update the `ncore` to 6 in the python script, then for each process in the pool we will use 6 cores:

```python
```python title="Python Code"
import os

# set number of CPUs to run on
Expand Down Expand Up @@ -183,15 +173,15 @@ with Pool(6) as p:

After the update, rerun the python script:

```bash
$ python3 matrix_invert.py
```title="Terminal Command"
python3 matrix_invert.py
```

and watch `htop` in the other terminal and you should see something like:

![](/images/htop-overload-cpus.png)
![](/assets/images/htop-overload-cpus.png)

There were 36 python processes spawned and CPU utilization exceeds 100% which the user would want to avoid.
There are 36 python processes spawned and CPU utilization exceeds 100% which the user would want to avoid.

{% include note.html content="Another way to limit the script to 6 cores but utilize them effiently is to set `ncore` to 6 but limit `Pool` to 1.
You can think of Pool as setting the number of different processes with multiple CPU *cores* per process." %}
!!! note
Another way to limit the script to 6 cores but utilize them effiently is to set `ncore` to 6 but limit `Pool` to 1. You can think of Pool as setting the number of different processes with multiple CPU *cores* per process.
76 changes: 33 additions & 43 deletions docs/blog/posts/2023-09-20-edit-files-with-vim.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,66 @@
---
date:
created: 2023-09-20
pin: true
categories:
- File Editing
authors:
- jeffotter
---
# Editing Files on the Command Line
When working within JupyterHub, one can utilize the built-in Text Editor to [directly edit scripts on the Yens](/_getting_started/jupyter/#text-file-editor){:target="_blank"}.
However, it is sometimes more convenient and faster to edit files directly from the command line, for instance if you are logged into a terminal and need to make small changes to your Slurm script prior to submission.

# Edit Files on the Command Line
In this post, we will illustrate how you can do this using the Vim text editor that comes with Linux distributions and can be used on any HPC system or server. As a specific example, we will make small changes to a Python script from the command line.

To start, Vim has several modes:

We can utilize JupyterHub Text Editor to <a href="/gettingStarted/8_jupyterhub.html#text-file-editor" target="_blank">directly edit scripts on the Yens</a>.
However, sometimes it is more convenient to quickly edit files from the command line (without needing to go to a different program like the browser
or editing on your local machine then transferring the edited files).

We will use vim text editor that comes with Linux distributions and can be used on any HPC system or server.

For the purposes of this course, we will make small changes to the slurm scripts or python scripts from the command line.
Do not be discouraged as vim is notorious for its steep learning curve! With practice, it becomes a lot easier.

Vim has several modes:
- Command mode: in which the user issues editing commands such as search, replace, delete a block and so on
but cannot type directly. When you open a new vi file, you will be in Command mode.
- Insert mode: in which the user types in edits to the files. We can switch from Command mode to the Insert mode to type in by pressing the `i` key.
When you are in Insert mode, the bottom of the editor displays `-- INSERT --`. Switch back to Command mode by pressing the `esc` key.
- **Command mode**: This is the mode you will be in when you first enter Vim. The user issues commands such as search, replace, block deletion and so on, but cannot type new content directly. It is in this mode where you can also save the edited file.
- **Insert mode**: The user types in content edits to files. We can switch from **Command mode** to this mode by pressing the `i` key. When you are in **Insert mode**, the bottom of the editor displays `-- INSERT --`. Switch back to **Command mode** by pressing the `esc` key.

Let's open up a test file and edit it:

```bash
$ vi test.py
```title="Terminal Command"
vi test.py
```

On the bottom of the editor, you should see:
```bash
On the bottom of the editor, we see:
```{.yaml .no-copy title="Terminal Output"}
"test.py" [New File]
```
which says the name of the file you are editing and signals that you are in Command mode (default mode when opening a file).
which presents the name of the file we are editing and signals that we are in **Command mode** (default mode when opening a file).

Now if we want to start typing, press `i` key and make sure the bottom of the editor now says:
```bash
Now if we want to start typing content edits, we press the `i` key and make sure the bottom of the editor now says:
```{.yaml .no-copy title="Terminal Output"}
-- INSERT --
```
which indicates that **Insert mode** is activated.

Then type some test python command:

```python
We then add in a line with a test python command:
```python title="Python Code"
print("hello world!")
```

To change the position of the position cursor, we use the arrow keys or `h`, `j`, `k`, `l` keys. This will allow us to jump to a different line or position the cursor within a line.

To change position of the mouse cursor use arrow keys (to jump to a different line or position the cursor within a line)
or `h`, `j`, `k`, `l` keys.

Let's save and quit the vi editor. First, press `esc` to go back to Command mode (the bottom of the editor should not say Insert).
Then type `:wq` to write and quit vi. This should save the file and return you back to the command line.
Let's now save and quit the Vim editor. First, we press `esc` to go back to **Command mode** (the bottom of the editor should no longer show `-- INSERT --`).
Then type `:wq` to write and quit Vim. This should save the file and return you back to the command line.

After you are back on the command line, let's make sure the file is saved correctly:

```bash
$ cat test.py
```title="Terminal Command"
cat test.py
```

You should see the file's content that we created:

```py
```{.yaml .no-copy title="Terminal Output"}
print("hello world!")
```

Mostly we will stick to these three vi commands throughout the course:

- `i` : switch to Insert mode from Command mode
- `esc`: switch back to Command mode from Insert mode
- `:wq` : to save file and quit out of vi (must be in Command mode)
In summary, use:

Download a <a href="https://drive.google.com/file/d/1sBbdrk_UcfX_tfy1jgxBaomwhDWKli2T/view?usp=sharing" target="_blank">short list of useful vim commands</a> to get going with vim
or <a href="https://vim-adventures.com" target="_blank">learn vim while playing a game</a>.
- `i` : switch to **Insert mode** from **Command mode**
- `esc`: switch back to **Command mode** from **Insert mode**
- `:wq` : save file and quit out of Vim (must be in **Command mode**)

Once you get a hang of vi commands, you will have the power to edit files quickly and directly from the command line.
Finally, you can download a [short list of useful Vim commands](https://drive.google.com/file/d/1sBbdrk_UcfX_tfy1jgxBaomwhDWKli2T/view?usp=sharing){:target="_blank"} to reference while using the editor
and [learn Vim while playing a game](https://vim-adventures.com){:target="_blank"}.

0 comments on commit 50d294a

Please sign in to comment.