Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nb-culler cronjob #9

Merged
merged 1 commit into from
Feb 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 47 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This repository is a collection of useful scripts and tools for TAs and professo

### group-sync

This cronjob runs once every hour hours at the top of the hour, adding all users with the edit rolebinding in the specified namespace to the specified group.
This cronjob runs once every hours at the top of the hour, adding all users with the edit rolebinding in the specified namespace to the specified group.
This offers us a way to keep class users added to course namespaces via ColdFront in sync with the in cluster OCP course group. To run this cronjob:

1. Ensure you are logged in to your OpenShift account via the CLI and you have access to rhods-notebooks namespace.
Expand All @@ -15,18 +15,18 @@ This offers us a way to keep class users added to course namespaces via ColdFron
oc project <namespace>
```

3. Update the group_name and namespace env variables in cronjobs/group-sync/cronjob.yaml
3. Update the `GROUP_NAME` and `NAMESPACE` env variables in cronjobs/group-sync/cronjob.yaml
4. From cronjobs/group-sync/ directory run:
```
oc apply -k .
oc apply -k . --as system:admin
```

This will deploy all the necessary resources for the cronjob to run on the specified schedule.(Every hour by default)
This will deploy all the necessary resources for the cronjob to run on the specified schedule.(Every hour by default)

Alternatively, to run the script immediately:
Alternatively, to run the script immediately:

1. Ensure you followed the steps above
2. Verify the cronjob ope-notebook-culler exists
2. Verify the cronjob `group-sync` exists
```
oc get cronjob group-sync
```
Expand All @@ -36,6 +36,47 @@ Alternatively, to run the script immediately:
kubectl create -n rhods-notebooks job --from=cronjob/group-sync group-sync
```

### nb-culler

This cronjob runs once every hours at the top of the hour, exclusively applied to notebooks associated with specific user group and will not impact other notebooks within the rhods-notebooks namespace. The cronjob performs the following actions:

1. **Shuts down notebooks exceeding X hours of runtime**: any notebook found to have been running for more than X hours will be gracefully shut down to conserve resources. PVCs persist the shutdown process.
2. **Deletes notebooks with wrong images**: students are allowed to launch notebook instances with their class image. Notebooks that are running images that are not approved for use will be deleted along with their associated PVCs.
3. **Deletes notebooks with wrong container size**: notebooks that are configured with container sizes other than **X Small** will be deleted, including their PVCs.

To add resources to the rhods-notebooks namespace:

1. Ensure you are logged in to your OpenShift account via the CLI and you have access to rhods-notebooks namespace.
2. Switch to rhods-notebooks namespace:
```
oc project rhods-notebooks
```

3. Ensure the environment variables for `GROUP_NAME`, `CUTOFF_TIME` (seconds), `IMAGE_NAME` are correctly set.

4. From cronjobs/nb-culler/ directory run:
```
oc apply -k . --as system:admin
```

This will deploy all the necessary resources for the cronjob to run on the specified schedule.

Alternatively, to run the script immediately:

1. Ensure you followed the steps above
2. Verify the cronjob `nb-culler` exists
```
oc get cronjob nb-culler
```

3. Run:
```
kubectl create -n rhods-notebooks job --from=cronjob/nb-culler nb-culler
```

This will trigger the cronjob to spawn a job manually.


## Scripts

### get_url.py
Expand Down
5 changes: 2 additions & 3 deletions cronjobs/group-sync/cronjob.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
serviceAccountName: group-sync
containers:
- name: group-sync
image: ghcr.io/ocp-on-nerc/bu-rhoai:toolkit
image: ghcr.io/ocp-on-nerc/bu-rhoai:toolkit
command: ["python", "group-sync.py"]
env:
# EDIT VALUE HERE BEFORE RUNNING
Expand All @@ -23,7 +23,6 @@ spec:
- name: NAMESPACE
value: <namespace>
imagePullPolicy: Always
restartPolicy: Never
restartPolicy: Never
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 7

32 changes: 32 additions & 0 deletions cronjobs/nb-culler/clusterrole.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: nb-culler
rules:
- apiGroups:
- user.openshift.io
resources:
- groups
verbs:
- get
- list
- watch
- apiGroups:
- kubeflow.org
resources:
- notebooks
verbs:
- get
- list
- watch
- delete
- patch
- apiGroups:
- ""
resources:
- persistentvolumeclaims
verbs:
- get
- list
- watch
- delete
11 changes: 11 additions & 0 deletions cronjobs/nb-culler/clusterrolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nb-culler
subjects:
- kind: ServiceAccount
name: nb-culler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: nb-culler
108 changes: 108 additions & 0 deletions cronjobs/nb-culler/cronjob.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
kind: CronJob
apiVersion: batch/v1
metadata:
name: nb-culler
labels:
component.opendatahub.io/name: nb-culler
opendatahub.io/component: 'true'
opendatahub.io/modified: 'false'
spec:
schedule: '0 * * * *'
startingDeadlineSeconds: 200
concurrencyPolicy: Replace
suspend: false
jobTemplate:
metadata:
labels:
component.opendatahub.io/name: nb-culler
opendatahub.io/component: 'true'
spec:
template:
metadata:
labels:
component.opendatahub.io/name: nb-culler
opendatahub.io/component: 'true'
parent: nb-culler
spec:
restartPolicy: Never
serviceAccountName: nb-culler
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- name: oc-cli
image: >-
registry.redhat.io/openshift4/ose-cli@sha256:25fef269ac6e7491cb8340119a9b473acbeb53bc6970ad029fdaae59c3d0ca61
command: ["/bin/bash", "-c", "--"]
args:
- |
notebooks=$(oc get notebooks -n rhods-notebooks -o jsonpath="{range .items[?(@.status.containerState.running)]}{.metadata.name}{' '}{.metadata.namespace}{' '}{.status.containerState.running.startedAt}{' '}{.metadata.annotations['opendatahub\.io/username']}{' '}{.metadata.annotations['notebooks\.opendatahub\.io/last-image-selection']}{' '}{.metadata.annotations['notebooks\.opendatahub\.io/last-size-selection']}{'\n'}{end}")
if [ -z "$notebooks" ]; then
echo "No running notebooks found"
exit 0
fi
group_members_1=$(oc get group $GROUP_NAME_1 -o=jsonpath='{.users[*]}')
group_members_2=$(oc get group $GROUP_NAME_2 -o=jsonpath='{.users[*]}')

# Loop through each notebook
while read -r nb ns ts user image size; do
current_time=$(date -u +%s)
timestamp=$(date -d $ts +%s)
difference=$((current_time - timestamp))
user_in_group1=false
user_in_group2=false

if [[ " $group_members_1 " =~ " $user " ]]; then
echo "$user is in the $GROUP_NAME_1 group."
user_in_group1=true
cutoff_time=$CUTOFF_TIME_1
elif [[ " $group_members_2 " =~ " $user " ]]; then
echo "$user is in the $GROUP_NAME_2 group."
user_in_group2=true
cutoff_time=$CUTOFF_TIME_2
fi

if $user_in_group1 || $user_in_group2; then
if [[ $image != *$IMAGE_NAME* ]]; then
echo "$nb is not using $IMAGE_NAME image, deleting the notebook"
oc delete notebook $nb -n $ns
oc delete pvc $nb -n $ns
elif [[ $size != "X Small" ]]; then
echo "$nb resource size is not correct, deleting the notebook"
oc delete notebook $nb -n $ns
oc delete pvc $nb -n $ns
elif [ $difference -gt $cutoff_time ]; then
echo "$nb is more than $(($cutoff_time / 3600)) hours old, stopping the notebook"
oc patch notebook $nb -n $ns --type merge -p '{"metadata":{"annotations":{"kubeflow-resource-stopped":"'$(date -u +"%Y-%m-%dT%H:%M:%SZ")'"}}}'
fi
else
echo "Skipping $nb: user $user does not belong to any monitored group."
fi
done <<< "$notebooks"
env:
# EDIT VALUE HERE BEFORE RUNNING
- name: GROUP_NAME_1
value: <group_1>
# EDIT VALUE HERE BEFORE RUNNING
- name: GROUP_NAME_2
value: <group_2>
# EDIT VALUE HERE BEFORE RUNNING
- name: CUTOFF_TIME_1
value: "21600"
# EDIT VALUE HERE BEFORE RUNNING
- name: CUTOFF_TIME_2
value: "43200"
# EDIT VALUE HERE BEFORE RUNNING
- name: IMAGE_NAME
value: "ucsls-nerc-rhoai"
resources:
limits:
memory: 800Mi
requests:
memory: 400Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
dnsPolicy: ClusterFirst
successfulJobsHistoryLimit: 7
failedJobsHistoryLimit: 7
9 changes: 9 additions & 0 deletions cronjobs/nb-culler/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- clusterrole.yaml
- clusterrolebinding.yaml
- cronjob.yaml
- rolebinding.yaml
- serviceaccount.yaml
namespace: ope-rhods-testing-1fef2f
11 changes: 11 additions & 0 deletions cronjobs/nb-culler/rolebinding.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nb-culler
subjects:
- kind: ServiceAccount
name: nb-culler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
4 changes: 4 additions & 0 deletions cronjobs/nb-culler/serviceaccount.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: nb-culler