forked from kubeflow/spark-operator
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update spark on k8s operator to the latest version and support suspend #53
Open
KunWuLuan
wants to merge
178
commits into
AliyunContainerService:master
Choose a base branch
from
KunWuLuan:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This would make it match the location.
Expand the description of when to use volumes instead of `/tmp` for scratch space.
…to driver pods (kubeflow#811) * feat: delete driver pods with a grace period * feat: adding lifecycle pod spec for driver pods * adding tests for grace period and lifecycle * fix: adding user guide for termination grace period and container hooks
Updated a broken link and reference on the page.
…ubeflow#847) * add fix for metricsProperties when HasPrometheusConfigFile is true. * add new config MetricsPropertiesFile. * add missing auto-generated code from privous PRs. * fix monitoring_config_test.go test condition, redo the configmap logic in monitoring_config.go. * redo the configmap & javaOption logic in monitoring_config.go. * set back the configmap & javaOption logic in monitoring_config.go * update log.
…low#852) * Add new metric for job start latency * Add job latency histogram metric and namespace tag Job start latency is defined as the time difference between when the job is submitted by the user and when the job is in a running or any of the terminal states. We use histogram with configurable boundaries because users can provide different boundaries that they are interested of. They can use one of them as their SLO/SLA and use the histogram values to compute the percentage of number of jobs that meet the SLA. We also added the namespace label into all the metrics when applicable when the users specify it in the command line option. In addition, we fixed the controller state machine diagram. * Add start latency metrics doc, fix based on review Added start latency summary and histogram metrics doc in quick-start-guide.md. Added fixes based on the code review comments in the PR. Co-authored-by: Vaishnavi Giridaran <[email protected]>
Trivial fixes on typos, links and formats in several docs.
… schedule changes (kubeflow#857) * scheduledSparkApplications: NextRun should be recalculated whenever schedule changes * updatedScheduleRuntime -> updatedNextRunTime
* Add total SparkApplication count metric Total SparkApplication count is the total number of SparkApplications that have been processed by the operator. This metric can be used to track how many SparkApplication the users have submitted to the K8s API server, and also can be used as denominator when computing job success rate, for example. * Export SparkApp count metric in sparkapp_metric.go Invoking the export of SparkApp count metric in exportMetrics() in sparkapp_metrics.go instead of syncSparkApplication() in controller.go, in order to align with the metric exporting convention in the code base.
…Forbid) (kubeflow#865) * Fix scheduled app test * fix typo Co-authored-by: Thi Nguyen <[email protected]> Co-authored-by: Thi Nguyen <[email protected]>
…flow#867) * update code to fix error on termination time when using sidecar * fix and add new tests * fixup! update code to fix error on termination time when using sidecar
* Cache Sync before Worker Threads * Remove newline
* Migrate to go mod * rm gopkg * update travis ci * remove dep totally * remove dep from developer guide
* Upgrade volcano to v0.4.0 * fix gen code * fiox build * fix test * update developer guide * fmt
* Set podgroup to stable version Signed-off-by: Peng Gao <[email protected]> * Update operator rbac Signed-off-by: Peng Gao <[email protected]>
* support custom seriveport and targetport * support customizing ingress object with specific port and annotations * change expositionOptions into sparkUIOptions + code enhancement * update return of functions getUIServicePort and getUITargetPort
…beflow#924) Spark 3.0 will support specifying a k8s service account for the executor pods. This CL prepares the operator to support that in the upcoming Spark 3.0.0 release.
This change renames the github workflows to be clearer about their purpose, and adds a set of tests which aim to force developers to increment the appVersion if they have changed anything with the spark-operator docker container or the chart version if they have updated the chart. Signed-off-by: Tom Hellier <[email protected]>
… internal error because the kind was / instead of AdmissionReview (kubeflow#1421) Changed where we set Response
…w#1422) * use github container registry instead of gcr.io for releases It appears that it is possible looking at this workflow. https://github.com/GoogleCloudPlatform/gke-autoneg-controller/blame/master/.github/workflows/go.yml It can only be done on the master branch though, so can't find out until after merge Update main.yaml Update main.yaml * Update to use github token Following the guidance from here. This stil might not work, but I don't think there is anthing else that can be done. https://github.blog/changelog/2021-03-24-packages-container-registry-now-supports-github_token/
… changes in resources used in docker file" (kubeflow#1452)
* This commit addresses deprection of extensions/v1beta1 and networking/v1beta1 on kubernetes v1.22+ * Updated logic for handling both extensions/v1beta1 and networking.k8s.io/v1 * Bumped appVersion * Bumped chart version * Updated version matrix in README
* Github actions workflow fix for Helm chart deployment * Updated the if condition for Release Spark-Operator Docker Image
* Updated default registry to ghcr.io * Bumped chart version and built api-docs * Updated build-helm-chart job, step Run chart-testing (install) to look for the correct registry
* Operator volumes and volumeMounts * Update values.yaml * Update Chart.yaml chart version up to 1.1.18 * Update Chart.yaml 1.1.19 * Update values.yaml remove white space
* Add ingress-class-name controller flag Add a flag to set ingressClassName field for ingress objects created for Spark UI. This will make ingress compliant with Kubernetes >v1.19 and better utilizing multiple ingress controllers * Update Helm chart to v1.1.20/appVersion 1.3.4 Include ingressClassName changes Co-authored-by: Hristo Voyvodov <[email protected]>
…ubeflow#1521) * Ensure that driver is deleted prior to sparkapplication resubmission Signed-off-by: Khor Shu Heng <[email protected]> * Update app version Signed-off-by: Khor Shu Heng <[email protected]> * Update chart version Signed-off-by: Khor Shu Heng <[email protected]> Co-authored-by: Khor Shu Heng <[email protected]>
kubeflow#1504) * added missing manifest yaml, point the manifest to the right firection * re-run checks
…sts and to follow logs (kubeflow#1506)
Signed-off-by: York Chen <[email protected]>
…#1550) * Update README - specify sidecars needing mutating webhooks Like Init-Containers, Sidecars require mutating admission webhooks to work. * Also update for mounting secrets
Signed-off-by: André Bauer <[email protected]>
Added new RBAC permissions needed by default for leader election for the coordination/v1 API. Required after upgrade to golang:1.19.2. In k8s.io/[email protected]/tools/leaderelection/resourcelock/interface.go:166 `configMapsResourceLock` was removed and should be replaced by `ConfigMapsLeasesResourceLock`.
* Added support for setting extra commonLabels * Added support for podLabels on cleanup and init job * Fixed templating errors * Added documentation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.