Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update spark on k8s operator to the latest version and support suspend #53

Open
wants to merge 178 commits into
base: master
Choose a base branch
from

Conversation

KunWuLuan
Copy link

No description provided.

yuchaoran2011 and others added 30 commits February 27, 2020 10:22
This would make it match the location.
Expand the description of when to use volumes instead of `/tmp` for scratch space.
…to driver pods (kubeflow#811)

* feat: delete driver pods with a grace period

* feat: adding lifecycle pod spec for driver pods

* adding tests for grace period and lifecycle

* fix: adding user guide for termination grace period and container hooks
Updated a broken link and reference on the page.
…ubeflow#847)

* add fix for metricsProperties when HasPrometheusConfigFile is true.

* add new config MetricsPropertiesFile.

* add missing auto-generated code from privous PRs.

* fix monitoring_config_test.go test condition, redo the configmap logic in monitoring_config.go.

* redo the configmap & javaOption logic in monitoring_config.go.

* set back the configmap & javaOption logic in monitoring_config.go

* update log.
…low#852)

* Add new metric for job start latency

* Add job latency histogram metric and namespace tag

Job start latency is defined as the time difference between when the
job is submitted by the user and when the job is in a running or any of
the terminal states. We use histogram with configurable boundaries
because users can provide different boundaries that they are interested
of. They can use one of them as their SLO/SLA and use the histogram
values to compute the percentage of number of jobs that meet the SLA. We
also added the namespace label into all the metrics when applicable when
the users specify it in the command line option. In addition, we fixed
the controller state machine diagram.

* Add start latency metrics doc, fix based on review

Added start latency summary and histogram metrics doc in
quick-start-guide.md. Added fixes based on the code review comments in
the PR.

Co-authored-by: Vaishnavi Giridaran <[email protected]>
Trivial fixes on typos, links and formats in several docs.
… schedule changes (kubeflow#857)

* scheduledSparkApplications: NextRun should be recalculated whenever schedule changes

* updatedScheduleRuntime -> updatedNextRunTime
* Add total SparkApplication count metric

Total SparkApplication count is the total number of SparkApplications
that have been processed by the operator. This metric can be used to
track how many SparkApplication the users have submitted to the K8s API
server, and also can be used as denominator when computing job success
rate, for example.

* Export SparkApp count metric in sparkapp_metric.go

Invoking the export of SparkApp count metric in exportMetrics() in
sparkapp_metrics.go instead of syncSparkApplication() in controller.go,
in order to align with the metric exporting convention in the code base.
…Forbid) (kubeflow#865)

* Fix scheduled app test

* fix typo

Co-authored-by: Thi Nguyen <[email protected]>
Co-authored-by: Thi Nguyen <[email protected]>
…flow#867)

* update code to fix error on termination time when using sidecar

* fix and add new tests

* fixup! update code to fix error on termination time when using sidecar
* Cache Sync before Worker Threads

* Remove newline
* Migrate to go mod

* rm gopkg

* update travis ci

* remove dep totally

* remove dep from developer guide
* Upgrade volcano to v0.4.0

* fix gen code

* fiox build

* fix test

* update developer guide

* fmt
* Set podgroup to stable version

Signed-off-by: Peng Gao <[email protected]>

* Update operator rbac

Signed-off-by: Peng Gao <[email protected]>
* support custom seriveport and targetport

* support customizing ingress object with specific port and annotations

* change expositionOptions into sparkUIOptions + code enhancement

* update return of functions getUIServicePort and getUITargetPort
…beflow#924)

Spark 3.0 will support specifying a k8s service account for the executor pods. This CL prepares the operator to support that in the upcoming Spark 3.0.0 release.
TomHellier and others added 30 commits December 3, 2021 15:51
This change renames the github workflows to be clearer about their purpose, and adds a set of tests which
aim to force developers to increment the appVersion if they have changed anything with the spark-operator docker container
or the chart version if they have updated the chart.

Signed-off-by: Tom Hellier <[email protected]>
… internal error because the kind was / instead of AdmissionReview (kubeflow#1421)

Changed where we set Response
…w#1422)

* use github container registry instead of gcr.io for releases

It appears that it is possible looking at this workflow.

https://github.com/GoogleCloudPlatform/gke-autoneg-controller/blame/master/.github/workflows/go.yml

It can only be done on the master branch though, so can't find out until after merge

Update main.yaml

Update main.yaml

* Update to use github token

Following the guidance 
from here. This stil might not work, but I don't think there is anthing else that can be done. https://github.blog/changelog/2021-03-24-packages-container-registry-now-supports-github_token/
* This commit addresses deprection of extensions/v1beta1 and
networking/v1beta1 on kubernetes v1.22+

* Updated logic for handling both extensions/v1beta1 and networking.k8s.io/v1

* Bumped appVersion

* Bumped chart version

* Updated version matrix in README
* Github actions workflow fix for Helm chart deployment

* Updated the if condition for Release Spark-Operator Docker Image
* Updated default registry to ghcr.io

* Bumped chart version and built api-docs

* Updated build-helm-chart job, step Run chart-testing (install) to look for the correct registry
* Operator volumes and volumeMounts

* Update values.yaml

* Update Chart.yaml

chart version up to 1.1.18

* Update Chart.yaml

1.1.19

* Update values.yaml

remove white space
* Add ingress-class-name controller flag

Add a flag to set ingressClassName field for ingress
objects created for Spark UI.
This will make ingress compliant with Kubernetes >v1.19 and
better utilizing multiple ingress controllers

* Update Helm chart to v1.1.20/appVersion 1.3.4

Include ingressClassName changes

Co-authored-by: Hristo Voyvodov <[email protected]>
…ubeflow#1521)

* Ensure that driver is deleted prior to sparkapplication resubmission

Signed-off-by: Khor Shu Heng <[email protected]>

* Update app version

Signed-off-by: Khor Shu Heng <[email protected]>

* Update chart version

Signed-off-by: Khor Shu Heng <[email protected]>

Co-authored-by: Khor Shu Heng <[email protected]>
kubeflow#1504)

* added missing manifest yaml, point the manifest to the right firection

* re-run checks
…#1550)

* Update README - specify sidecars needing mutating webhooks

Like Init-Containers, Sidecars require mutating admission webhooks to work.

* Also update for mounting secrets
Added new RBAC permissions needed by default for leader election for the coordination/v1 API.
Required after upgrade to golang:1.19.2.
In k8s.io/[email protected]/tools/leaderelection/resourcelock/interface.go:166 `configMapsResourceLock` was removed and should be replaced by `ConfigMapsLeasesResourceLock`.
* Added support for setting extra commonLabels

* Added support for podLabels on cleanup and init job

* Fixed templating errors

* Added documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.