Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chatbot parser generic output part 1 #676

Draft
wants to merge 153 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
1ebc363
initial commit
EwDa291 Aug 8, 2024
34df842
Merge branch 'hpcugent:main' into chatbot_parser
EwDa291 Aug 8, 2024
10edb20
some cleanup
EwDa291 Aug 8, 2024
85a93ec
used jinja to replace macros
EwDa291 Aug 9, 2024
dfff5fa
adapt if-mangler to accommodate for nested if-clauses
EwDa291 Aug 9, 2024
649ddec
adapt the parser to take all files as input, not all files get parsed…
EwDa291 Aug 9, 2024
2116d6e
adapt the parser to take all files as input, not all files get parsed…
EwDa291 Aug 9, 2024
159aa62
small update, not important
EwDa291 Aug 9, 2024
75765e5
change to the templates
EwDa291 Aug 9, 2024
57d9cfe
change to accommodate for more nested if-clauses
EwDa291 Aug 9, 2024
75d345b
Delete scripts/HPC chatbot preprocessor/start_checker.py
EwDa291 Aug 9, 2024
ff7a9fc
make sure files with duplicate names between normal files and linux-t…
EwDa291 Aug 12, 2024
47a33b7
Merge branch 'chatbot_parser' of https://github.com/EwDa291/vsc_user_…
EwDa291 Aug 12, 2024
7d279d6
fixed the problem of some files being written in reST instead of mark…
EwDa291 Aug 12, 2024
8047572
some small fixes
EwDa291 Aug 12, 2024
7d1c5ed
remove try-except-structure
EwDa291 Aug 13, 2024
984b0cd
collapse all code into one file
EwDa291 Aug 13, 2024
8f5eeaa
Rename file
EwDa291 Aug 13, 2024
2b97b7a
cleanup repository
EwDa291 Aug 13, 2024
b595301
Rename directory
EwDa291 Aug 13, 2024
90c8ab7
add a main function
EwDa291 Aug 13, 2024
b8ae706
make file paths non os-specific
EwDa291 Aug 13, 2024
b751497
use docstrings to document the functions
EwDa291 Aug 13, 2024
0f8eb5d
rewrite the if-mangler to make it more readable
EwDa291 Aug 13, 2024
9938e92
got rid of most global variables
EwDa291 Aug 13, 2024
508b22c
fixed some issues with if statements
EwDa291 Aug 13, 2024
a25ce2d
fixed some issues with if statements
EwDa291 Aug 13, 2024
80d0535
got rid of all global variables
EwDa291 Aug 13, 2024
9163a75
small changes to make file more readable
EwDa291 Aug 14, 2024
1dcffc1
codeblocks, tips, warnings and info reformatted
EwDa291 Aug 14, 2024
4d7fbdb
small optimisations
EwDa291 Aug 14, 2024
671f7f3
small optimisations
EwDa291 Aug 14, 2024
e5c39bd
initial commit
EwDa291 Aug 14, 2024
c6492fc
added requirements
EwDa291 Aug 14, 2024
aff8198
added requirements and usage info
EwDa291 Aug 14, 2024
a981002
minor changes to the print statements
EwDa291 Aug 14, 2024
1f3b343
reworked function to take care of html structures
EwDa291 Aug 16, 2024
b6388d3
Merge branch 'hpcugent:main' into chatbot_parser
EwDa291 Aug 16, 2024
48cad97
filter out images
EwDa291 Aug 16, 2024
df58f23
get rid of backquotes, asterisks, pluses and underscores used for for…
EwDa291 Aug 16, 2024
c423e07
dump to json files instead of txt files
EwDa291 Aug 16, 2024
2c333fe
cleaned up parser with macros
EwDa291 Aug 16, 2024
ce52352
cleaned up parser with macros
EwDa291 Aug 16, 2024
5db34af
cleaned up parser with macros
EwDa291 Aug 16, 2024
4226d28
Update README.md
EwDa291 Aug 19, 2024
d730a26
Update README.md
EwDa291 Aug 19, 2024
f3182e3
added section about restrictions on input files
EwDa291 Aug 19, 2024
aee54de
Merge branch 'hpcugent:main' into chatbot_parser
EwDa291 Aug 19, 2024
675bec5
adapted section about restrictions on input files
EwDa291 Aug 19, 2024
f1e58ef
adapted section about restrictions on input files
EwDa291 Aug 19, 2024
2bf1075
Merge branch 'chatbot_parser' of https://github.com/EwDa291/vsc_user_…
EwDa291 Aug 19, 2024
a168509
change variables to be lowercase
EwDa291 Aug 19, 2024
09b86c9
take out some copy pasting
EwDa291 Aug 19, 2024
f95b99e
added warning about long filepaths
EwDa291 Aug 19, 2024
06bb7b9
fixing typos
EwDa291 Aug 19, 2024
2f3e5b3
take out copy pasting
EwDa291 Aug 19, 2024
0c4dbe8
first draft version of the restructured script to accommodate for the…
EwDa291 Aug 20, 2024
38c4572
added support to filter out collapsable admonitions
EwDa291 Aug 20, 2024
5cbd653
attempt at fix for problems with jinja include, not working yet
EwDa291 Aug 20, 2024
0e6f8b2
fixed an issue with jinja templates
EwDa291 Aug 21, 2024
cd77837
added docstrings to new functions
EwDa291 Aug 21, 2024
98eb695
only add necessary if-statements in front of non-if-complete sections
EwDa291 Aug 21, 2024
27457e3
fixed some more jinja problems
EwDa291 Aug 21, 2024
bb72287
implemented extra test to make sure generic files dont accidentally g…
EwDa291 Aug 21, 2024
67cb19e
make sure empty os-specific files are not saved
EwDa291 Aug 21, 2024
cf9834a
clean up unused code
EwDa291 Aug 21, 2024
da32459
introduce more macros
EwDa291 Aug 21, 2024
093200b
reintroduce logic to remove unnecessary directories
EwDa291 Aug 21, 2024
5d0ffe9
added functionality to include links or leave them out
EwDa291 Aug 21, 2024
a3e34a9
added functionality to include links or leave them out
EwDa291 Aug 21, 2024
7c6154b
adapt filenames to allow for splitting on something other than subtitles
EwDa291 Aug 21, 2024
8d5b50d
making some changes to prepare to add paragraph level splitting tomorrow
EwDa291 Aug 21, 2024
0c10376
making some changes to prepare to add paragraph level splitting tomorrow
EwDa291 Aug 21, 2024
f8ee860
making some changes to prepare to add paragraph level splitting tomorrow
EwDa291 Aug 21, 2024
6533733
adapted the parsing script to allow for testing in a semi-efficient way
EwDa291 Aug 21, 2024
2e7a00f
added test for make_valid_title
EwDa291 Aug 21, 2024
f5e0579
removed useless lines from testscript
EwDa291 Aug 21, 2024
6757b4f
First attempt at splitting in paragraphs (need for other fixes for ti…
EwDa291 Aug 22, 2024
6d9558d
make two functions for different ways of dividing the text
EwDa291 Aug 22, 2024
2c7025a
added docstrings to new functions
EwDa291 Aug 22, 2024
ae99bb9
update test for valid titles
EwDa291 Aug 22, 2024
084b421
fixed problem with splitting os-specific text (metadata not fixed yet)
EwDa291 Aug 22, 2024
cf7f5f0
fix for metadata of os-specific sections
EwDa291 Aug 22, 2024
b7c10d3
clean up temporary version
EwDa291 Aug 22, 2024
4a441f3
added command line options for custom macros
EwDa291 Aug 22, 2024
662134f
small fix to macros
EwDa291 Aug 22, 2024
05eab4a
clean up test for valid title
EwDa291 Aug 22, 2024
b85a8fb
add a test for write_metadata
EwDa291 Aug 22, 2024
39a3c99
added functionality to split on paragraphs
EwDa291 Aug 23, 2024
af9e6cc
clean up
EwDa291 Aug 23, 2024
f4163a7
clean up
EwDa291 Aug 23, 2024
833f964
further clean up and added shebang
EwDa291 Aug 23, 2024
79b1a56
clean up
EwDa291 Aug 23, 2024
cec154c
added test for if mangler
EwDa291 Aug 23, 2024
2f4a277
clean up
EwDa291 Aug 23, 2024
cd0c8eb
clean up customizable options
EwDa291 Aug 23, 2024
3be262a
further adapt the script to be able to test it
EwDa291 Aug 26, 2024
1d32aab
make changes to usage in command line to be more intuitive
EwDa291 Aug 26, 2024
5902c96
first revised version of the README
EwDa291 Aug 26, 2024
6f97d5f
Merge branch 'hpcugent:main' into chatbot_parser
EwDa291 Aug 26, 2024
6e48800
added docstring to main function
EwDa291 Aug 26, 2024
0bc440b
include chatbot_prepprocessor
EwDa291 Aug 26, 2024
e6e6023
added options for source and destination directories
EwDa291 Aug 26, 2024
a6d99d9
cleanup
EwDa291 Aug 26, 2024
2be834f
cleanup
EwDa291 Aug 26, 2024
532543a
cleanup
EwDa291 Aug 26, 2024
107464e
relocate test files
EwDa291 Aug 26, 2024
dd64381
update arguments of if mangler
EwDa291 Aug 26, 2024
ef3fd58
relocate full test files
EwDa291 Aug 26, 2024
4d7db8f
Revert "update arguments of if mangler"
EwDa291 Aug 26, 2024
df9bac5
Revert "relocate full test files"
EwDa291 Aug 26, 2024
631d9e9
update test to adapt to new arguments in if mangler
EwDa291 Aug 26, 2024
c6e600d
relocated full test files
EwDa291 Aug 26, 2024
d1c6194
Rename test_paragraph_split_1.md to test_paragraph_split_1_input.md
EwDa291 Aug 26, 2024
695ffd6
Rename test_title_split_1.md to test_title_split_1_input.md
EwDa291 Aug 26, 2024
af4832b
smal fix
EwDa291 Aug 26, 2024
8805c8c
test text for paragraph split
EwDa291 Aug 26, 2024
a265ffd
start of a fix for double title problem, not done yet
EwDa291 Aug 26, 2024
6c2a61c
Fix for double title bug when splitting on paragraph
EwDa291 Aug 27, 2024
ed08879
Fix bug for empty linklist in metadata
EwDa291 Aug 27, 2024
176af13
fix bug where too many directories were sometimes created
EwDa291 Aug 27, 2024
d4ceac8
test of full script, test files not ready to be pushed yet
EwDa291 Aug 27, 2024
815a863
updated requirements.txt
EwDa291 Aug 27, 2024
d15469f
updated docstring in main function
EwDa291 Aug 27, 2024
daa6b36
add support for comments for the bot to be included in the source files
EwDa291 Aug 27, 2024
4c19f44
changed the default for min paragraph length
EwDa291 Aug 27, 2024
9a6ff58
added test files for full script test
EwDa291 Aug 27, 2024
56543f0
small fix for double title bug
EwDa291 Aug 27, 2024
52a3861
added examples of output of the script when splitting on paragraphs w…
EwDa291 Aug 27, 2024
692e77b
fix for issue with html links
EwDa291 Aug 27, 2024
7f493a1
fix for issue with html links
EwDa291 Aug 27, 2024
0e34396
fix for issue with relative links to the same document
EwDa291 Aug 27, 2024
fa00044
added test for replace_markdown_markers
EwDa291 Aug 27, 2024
b3952b2
fix to small inconsistency in metadata
EwDa291 Aug 27, 2024
73072bf
added test for insert_links
EwDa291 Aug 27, 2024
3161309
make sure paragraphs only include full lists
EwDa291 Aug 28, 2024
7d4d7f9
Merge branch 'hpcugent:main' into chatbot_parser
EwDa291 Aug 28, 2024
3407be3
adapted to the new source files
EwDa291 Aug 28, 2024
6d04bbc
add source-directory to metadata and verbose mode
EwDa291 Aug 28, 2024
f33cfb3
added verbose mode
EwDa291 Aug 28, 2024
1c389d7
Merge branch 'hpcugent:main' into chatbot_parser
EwDa291 Aug 28, 2024
3227f19
Added limitation on lists
EwDa291 Aug 29, 2024
67aed53
fix for non os-specific if-statement not being recognised
EwDa291 Aug 29, 2024
9e297b1
new test for links
EwDa291 Aug 29, 2024
b6b8610
new test to make sure lists are kept as one section
EwDa291 Aug 29, 2024
57a2139
updated test_file for list test
EwDa291 Aug 29, 2024
170a10c
dropped <> around links and started new function to calculate length …
EwDa291 Aug 30, 2024
04efff6
removed parsed mds
EwDa291 Aug 30, 2024
1ef1f10
Changed paragraphs to decide length based on tokens instead of charac…
EwDa291 Aug 30, 2024
621c0a3
Changed paragraphs to decide length based on tokens instead of charac…
EwDa291 Aug 30, 2024
adf364d
Changed paragraphs to decide length based on tokens instead of charac…
EwDa291 Aug 30, 2024
32f884d
Added output of chatbot_parser script part 1
EwDa291 Aug 30, 2024
22b62de
removing unnecessary files
EwDa291 Aug 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Frequently Asked Questions (FAQ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the FAQ needs to be treated separately: question and answer are one document. This intro block can be a separte block as well

New users should consult the Introduction to HPC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or move this intro ina FAQ of it's own: Where to start when using the HPC or something more proper english

to get started, which is a great resource for learning the basics, troubleshooting, and looking up specifics.
If you want to use software that's not yet installed on the HPC, send us a
software installation request.
Overview of HPC-UGent Tier-2 infrastructure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loose sentence

Composing a job
How many cores/nodes should I request?
An important factor in this question is how well your task is being parallelized:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is being -> can be

does it actually run faster with more resources? You can test this yourself:
start with 4 cores, then 8, then 16... The execution time should each time be reduced to
around half of what it was before. You can also try this with full nodes: 1 node, 2 nodes.
A rule of thumb is that you're around the limit when you double the resources but the
execution time is still ~60-70% of what it was before. That's a signal to stop increasing the core count.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

core -> core or node

See also: Running batch jobs.
Which packages are available?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add in special comment for the LLM soem alias questions, like What software is avaialbel

When connected to the HPC, use the commands module avail [search_text] and module spider [module]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the []

to find installed modules and get information on them.
Among others, many packages for both Python and R are readily available on the HPC.
These aren't always easy to find, though, as we've bundled them together.
Specifically, the module SciPy-bundle includes numpy, pandas, scipy and a few others.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specifically -> in particular

For R, the normal R module has many libraries included. The bundle R-bundle-Bioconductor
contains more libraries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, sure. maybe add more common bioinformatics

Use the command module spider [module] to find the specifics on these bundles.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove []

If the package or library you want is not available, send us a
software installation request.
How do I choose the job modules?
Modules each come with a suffix that describes the toolchain used to install them.
Examples:
AlphaFold/2.2.2-foss-2021a*
tqdm/4.61.2-GCCcore-10.3.0*
Python/3.9.5-GCCcore-10.3.0*
matplotlib/3.4.2-foss-2021a*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

faq entries can't be split

Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"main_title": "FAQ",
"subtitle": "How-do-I-choose-the-job-modules",
"source_file": "../../mkdocs/docs/HPC/FAQ.md",
"title_depth": 3,
"directory": "FAQ",
"links": {
"0": "https://www.ugent.be/hpc/en/training/2023/introhpcugent",
"1": "https://www.ugent.be/hpc/en/support/software-installation-request",
"2": "https://www.ugent.be/hpc/en/infrastructure",
"3": "https://docs.hpc.ugent.be/running_batch_jobs",
"4": "https://www.ugent.be/hpc/en/support/software-installation-request"
},
"parent_title": "",
"previous_title": null,
"next_title": "FAQ_paragraph_2",
"OS": "generic",
"reference_link": "https://docs.hpc.ugent.be/FAQ/#how-do-i-choose-the-job-modules"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
Modules from the same toolchain always work together, and modules from a
\*different version of the same toolchain\* never work together.
The above set of modules works together: an overview of compatible toolchains can be found here:
https://docs.easybuild.io/en/latest/Common-toolchains.html#overview-of-common-toolchains.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that url is way too detailed. also the rows should be resorted to have most recent modules first (or cleanup the really old ones)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue is probaly that the subscetion reference is broken, and currentl link points to the begin of page.
https://docs.easybuild.io/common-toolchains/#common_toolchains_overview is probably what you want

You can use module avail [search_text] to see which versions on which toolchains are available to use.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no []

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on which -> of which

If you need something that's not available yet, you can request it through a
software installation request.
It is possible to use the modules without specifying a version or toolchain. However,
this will probably cause incompatible modules to be loaded. Don't do it if you use multiple modules.
Even if it works now, as more modules get installed on the HPC, your job can suddenly break.
Troubleshooting
My modules don't work together
When incompatible modules are loaded, you might encounter an error like this:
Lmod has detected the following error: A different version of the 'GCC' module
is already loaded (see output of 'ml').
You should load another foss module for that is compatible with the currently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for that -> that

loaded version of GCC.
Use ml spider foss to get an overview of the available versions.
Modules from the same toolchain always work together, and modules from a
_different version of the same toolchain_ never work together.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the _

An overview of compatible toolchains can be found here:
https://docs.easybuild.io/en/latest/Common-toolchains.html#overview-of-common-toolchains.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken link, well, the anchor doesn't exist

See also: How do I choose the job modules?
My job takes longer than 72 hours
The 72 hour walltime limit will not be extended. However, you can work around this barrier:
* Check that all available resources are being used. See also:
* How many cores/nodes should I request?.
* My job is slow.
* My job isn't using any GPUs.
* Use a faster cluster.
* Divide the job into more parallel processes.
* Divide the job into shorter processes, which you can submit as separate jobs.
* Use the built-in checkpointing of your software.
Job failed: SEGV Segmentation fault
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"main_title": "FAQ",
"subtitle": "Job-failed-SEGV-Segmentation-fault",
"source_file": "../../mkdocs/docs/HPC/FAQ.md",
"title_depth": 3,
"directory": "FAQ",
"links": {
"0": "https://www.ugent.be/hpc/en/support/software-installation-request",
"1": "https://docs.hpc.ugent.be/FAQ/#how-do-i-choose-the-job-modules",
"2": "https://docs.hpc.ugent.be/FAQ/#how-many-coresnodes-should-i-request",
"3": "https://docs.hpc.ugent.be/FAQ/#my-job-runs-slower-than-i-expected",
"4": "https://docs.hpc.ugent.be/FAQ/#my-job-isnt-using-any-gpus",
"5": "https://www.ugent.be/hpc/en/infrastructure"
},
"parent_title": "",
"previous_title": "FAQ_paragraph_1",
"next_title": "FAQ_paragraph_3",
"OS": "generic",
"reference_link": "https://docs.hpc.ugent.be/FAQ/#job-failed-segv-segmentation-fault"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Any error mentioning SEGV or Segmentation fault/violation has something to do with a memory error.
If you weren't messing around with memory-unsafe applications or programming, your job probably hit its memory limit.
When there's no memory amount specified in a job script, your job will get access to a proportional
share of the total memory on the node: If you request a full node, all memory will be available.
If you request 8 cores on a cluster where nodes have 2x18 cores, you will get 8/36 = 2/9
of the total memory on the node.
Try requesting a bit more memory than your proportional share, and see if that solves the issue.
See also: Specifying memory requirements.
My compilation/command fails on login node
When logging in, you are using a connection to the login nodes. There are somewhat strict
limitations on what you can do in those sessions: check out the output of ulimit -a.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this about? and ulimit -a will mostlikely not show the actual limits.

Specifically, the memory and the amount of processes you can use may present an issue.
This is common with MATLAB compilation and Nextflow. An error caused by the login session
limitations can look like this: Aborted (core dumped).
It's easy to get around these limitations: start an interactive session on one of the clusters.
Then, you are acting as a node on that cluster instead of a login node. Notably, the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acting as a node? what does that mean?

debug/interactive cluster will grant such a session immediately, while other clusters might make you wait a bit.
Example command: ml swap cluster/donphan && qsub -I -l nodes=1:ppn=8
See also: Running interactive jobs.
My job isn't using any GPUs
Only two clusters have GPUs. Check out the infrastructure overview,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, donphan also has a gpu

to see which one suits your needs. Make sure that you manually switch to the GPU cluster before you submit
the job. Inside the job script, you need to explicitly request the GPUs:
#PBS -l nodes=1:ppn=24:gpus=2
Some software modules don't have GPU support, even when running on the GPU cluster. For example,
when running module avail alphafold on the joltik cluster, you will find versions on both
the foss toolchain and the fossCUDA toolchain. Of these, only the CUDA versions will
use GPU power. When in doubt, CUDA means GPU support.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"main_title": "FAQ",
"subtitle": "My-job-isn't-using-any-GPUs",
"source_file": "../../mkdocs/docs/HPC/FAQ.md",
"title_depth": 3,
"directory": "FAQ",
"links": {
"0": "https://docs.hpc.ugent.be/fine_tuning_job_specifications/#specifying-memory-requirements",
"1": "https://docs.hpc.ugent.be/interactive_debug",
"2": "https://docs.hpc.ugent.be/running_interactive_jobs",
"3": "https://www.ugent.be/hpc/en/infrastructure"
},
"parent_title": "",
"previous_title": "FAQ_paragraph_2",
"next_title": "FAQ_paragraph_4",
"OS": "generic",
"reference_link": "https://docs.hpc.ugent.be/FAQ/#my-job-isnt-using-any-gpus"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
See also: HPC-UGent GPU clusters.
My job runs slower than I expected
There are a few possible causes why a job can perform worse than expected.
Is your job using all the available cores you've requested? You can test this by increasing and
decreasing the core amount: If the execution time stays the same, the job was not using all cores.
Some workloads just don't scale well with more cores. If you expect the job to be very parallelizable
and you encounter this problem, maybe you missed some settings that enable multicore execution.
See also: How many cores/nodes should i request?
Does your job have access to the GPUs you requested?
See also: My job isn't using any GPUs
Not all file locations perform the same. In particular, the $VSC_HOME and $VSC_DATA
directories are, relatively, very slow to access. Your jobs should rather use the
$VSC_SCRATCH directory, or other fast locations (depending on your needs), described
in Where to store your data on the HPC.
As an example how to do this: The job can copy the input to the scratch directory, then execute
the computations, and lastly copy the output back to the data directory.
Using the home and data directories is especially a problem when UGent isn't your home institution:
your files may be stored, for example, in Leuven while you're running a job in Ghent.
My MPI job fails
Use mympirun in your job script instead of mpirun. It is a tool that makes sure everything
gets set up correctly for the HPC infrastructure. You need to load it as a module in your
job script: module load vsc-mympirun.
To submit the job, use the qsub command rather than sbatch. Although both will submit a job,
qsub will correctly interpret the #PBS parameters inside the job script. sbatch might not
set the job environment up correctly for mympirun/OpenMPI.
See also: Multi core jobs/Parallel Computing
and Mympirun.
mympirun seems to ignore its arguments
For example, we have a simple script (./hello.sh):
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"main_title": "FAQ",
"subtitle": "`mympirun`-seems-to-ignore-its-arguments",
"source_file": "../../mkdocs/docs/HPC/FAQ.md",
"title_depth": 3,
"directory": "FAQ",
"links": {
"0": "https://docs.hpc.ugent.be/gpu_gent",
"1": "https://docs.hpc.ugent.be/FAQ/#how-many-coresnodes-should-i-request",
"2": "https://docs.hpc.ugent.be/FAQ/#my-job-isnt-using-any-gpus",
"3": "https://docs.hpc.ugent.be/running_jobs_with_input_output_data/#where-to-store-your-data-on-the-hpc",
"4": "https://docs.hpc.ugent.be/multi_core_jobs",
"5": "https://docs.hpc.ugent.be/mympirun"
},
"parent_title": "",
"previous_title": "FAQ_paragraph_3",
"next_title": "FAQ_paragraph_5",
"OS": "generic",
"reference_link": "https://docs.hpc.ugent.be/FAQ/#mympirun-seems-to-ignore-its-arguments"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
#!/bin/bash
echo "hello world"
And we run it like mympirun ./hello.sh --output output.txt.
To our surprise, this doesn't output to the file output.txt, but to
standard out! This is because mympirun expects the program name and
the arguments of the program to be its last arguments. Here, the
--output output.txt arguments are passed to ./hello.sh instead of to
mympirun. The correct way to run it is:
mympirun --output output.txt ./hello.sh
When will my job start?
See the explanation about how jobs get prioritized in When will my job start.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is annoying, looks like a self referring paragraph. maybe add a short summary here and point ot the other url for full details

Why do I get a "No space left on device" error, while I still have storage space left?
When trying to create files, errors like this can occur:
No space left on device
The error "No space left on device" can mean two different things:
- all available storage quota on the file system in question has been used;
- the inode limit has been reached on that file system.
An inode can be seen as a "file slot", meaning that when the limit is reached, no more additional files can be created.
There is a standard inode limit in place that will be increased if needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be? rather can be

The number of inodes used per file system can be checked on the VSC account page.
Possible solutions to this problem include cleaning up unused files and directories or
compressing directories with a lot of files into zip- or tar-files.
If the problem persists, feel free to contact support.
Other
Can I share my account with someone else?
NO. You are not allowed to share your VSC account with anyone else, it is
strictly personal.
See
https://helpdesk.ugent.be/account/en/regels.php.
If you want to share data, there are alternatives (like a shared directories in VO
space, see Virtual organisations).
Can I share my data with other HPC users?
Yes, you can use the chmod or setfacl commands to change permissions
of files so other users can access the data. For example, the following
command will enable a user named "otheruser" to read the file named
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a bit too simpel. this is only the case if the otehruser has directory read and executable access (and all parent dirs)

dataset.txt. See
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"main_title": "FAQ",
"subtitle": "Can-I-share-my-data-with-other-HPC-users",
"source_file": "../../mkdocs/docs/HPC/FAQ.md",
"title_depth": 3,
"directory": "FAQ",
"links": {
"0": "https://docs.hpc.ugent.be/running_batch_jobs/#when-will-my-job-start",
"1": "https://account.vscentrum.be",
"2": "https://docs.hpc.ugent.be/linux-tutorial/manipulating_files_and_directories/#zipping-gzipgunzip-zipunzip",
"3": "https://docs.hpc.ugent.be/FAQ/#i-have-another-questionproblem",
"4": "https://docs.hpc.ugent.be/running_jobs_with_input_output_data/#virtual-organisations"
},
"parent_title": "",
"previous_title": "FAQ_paragraph_4",
"next_title": "FAQ_paragraph_6",
"OS": "generic",
"reference_link": "https://docs.hpc.ugent.be/FAQ/#can-i-share-my-data-with-other-hpc-users"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
$ setfacl -m u:otheruser:r dataset.txt
$ ls -l dataset.txt
-rwxr-x---+ 2 vsc40000 mygroup 40 Apr 12 15:00 dataset.txt
For more information about chmod or setfacl, see
Linux tutorial.
Can I use multiple different SSH key pairs to connect to my VSC account?
Yes, and this is recommended when working from different computers.
Please see Adding multiple SSH public keys on how to do this.
I want to use software that is not available on the clusters yet
Please fill out the details about the software and why you need it in
this form:
https://www.ugent.be/hpc/en/support/software-installation-request.
When submitting the form, a mail will be sent to [email protected] containing all the
provided information. The HPC team will look into your request as soon
as possible you and contact you when the installation is done or if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible you -> `possible

further information is required.
Is my connection compromised? Remote host identification has changed
On Monday 25 April 2022, the login nodes received an update to RHEL8.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

euhm, this is a section that should be removed (and then readded with the el9 variant).

This means that the host keys of those servers also changed. As a result,
you could encounter the following warnings.
MacOS & Linux (on Windows, only the second part is shown):
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
xx:xx:xx.
Please contact your system administrator.
Add correct host key in /home/hostname/.ssh/known_hosts to get rid of this message.
Offending RSA key in /var/lib/sss/pubconf/known_hosts:1
RSA host key for user has changed and you have requested strict checking.
Host key verification failed.
Please follow the instructions at migration to RHEL8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we publish the correct ssh keys somewhere else, or only in the migration page?

to ensure it really is not a hacking attempt \- you will find the correct host key to compare.
You will also find how to hide the warning.
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"main_title": "FAQ",
"subtitle": "Is-my-connection-compromised-Remote-host-identification-has-changed",
"source_file": "../../mkdocs/docs/HPC/FAQ.md",
"title_depth": 3,
"directory": "FAQ",
"links": {
"0": "https://docs.hpc.ugent.be/linux-tutorial/manipulating_files_and_directories/#changing-permissions-chmod",
"1": "https://docs.hpc.ugent.be/account/#adding-multiple-ssh-public-keys-optional",
"2": "https://www.ugent.be/hpc/en/infrastructure/migration_to_rhel8"
},
"parent_title": "",
"previous_title": "FAQ_paragraph_5",
"next_title": "FAQ_paragraph_7",
"OS": "generic",
"reference_link": "https://docs.hpc.ugent.be/FAQ/#is-my-connection-compromised-remote-host-identification-has-changed"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
VO: how does it work?
A Virtual Organisation consists of a number of members and moderators. A moderator can:
* Manage the VO members (but can't access/remove their data on the system).
* See how much storage each member has used, and set limits per member.
* Request additional storage for the VO.
One person can only be part of one VO, be it as a member or moderator.
It's possible to leave a VO and join another one. However, it's not
recommended to keep switching between VO's (to supervise groups, for example).
See also: Virtual Organisations.
My UGent shared drives don't show up
After mounting the UGent shared drives with kinit [email protected],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwaegema is [email protected] correct? that can't be right

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the rest refers to username, so this should be username

you might not see an entry with your username when listing ls /UGent.
This is normal: try ls /UGent/your_username or cd /UGent/your_username, and you should be able to access the drives.
Be sure to use your UGent username and not your VSC username here.
See also: Your UGent home drive and shares.
My home directory is (almost) full, and I don't know why
Your home directory might be full without looking like it due to hidden files.
Hidden files and subdirectories have a name starting with a dot and do not show up when running ls.
If you want to check where the storage in your home directory is used, you can make use of the du command to find out what the largest files and subdirectories are:
du -h --max-depth 1 $VSC_HOME | egrep '[0-9]{3}M|[0-9]G'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

egrep? eg on fedroa alias egrep='grep -E --color=auto'; not sure you should learn them about egrep ;)

The du command returns the size of every file and subdirectory in the $VSC_HOME directory. This output is then piped into an egrep to filter the lines to the ones that matter the most.
The egrep command will only let entries that match with the specified regular expression [0-9]{3}M|[0-9]G through, which corresponds with files that consume more than 100 MB.
How can I get more storage space?
By default you get 3 GB of storage space for your home directory and 25 GB in your personal directories on both the data ($VSC_DATA) and scratch ($VSC_SCRATCH) filesystems.
It is not possible to expand the storage quota for these personal directories.
Loading
Loading