Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add All Go metrics to admin endpoint default #629

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

SuperQ
Copy link
Contributor

@SuperQ SuperQ commented Sep 13, 2023

Closes #

💸 TL;DR

Add All Go runtime/metrics to the default included metrics.

This will add several new metrics per process:

  • go_cgo_go_to_c_calls_calls_total
  • go_cpu_classes_gc_mark_assist_cpu_seconds_total
  • go_cpu_classes_gc_mark_dedicated_cpu_seconds_total
  • go_cpu_classes_gc_mark_idle_cpu_seconds_total
  • go_cpu_classes_gc_pause_cpu_seconds_total
  • go_cpu_classes_gc_total_cpu_seconds_total
  • go_cpu_classes_idle_cpu_seconds_total
  • go_cpu_classes_scavenge_assist_cpu_seconds_total
  • go_cpu_classes_scavenge_background_cpu_seconds_total
  • go_cpu_classes_scavenge_total_cpu_seconds_total
  • go_cpu_classes_total_cpu_seconds_total
  • go_cpu_classes_user_cpu_seconds_total
  • go_gc_cycles_automatic_gc_cycles_total
  • go_gc_cycles_forced_gc_cycles_total
  • go_gc_cycles_total_gc_cycles_total
  • go_gc_gogc_percent
  • go_gc_gomemlimit_bytes
  • go_gc_heap_allocs_by_size_bytes
  • go_gc_heap_allocs_bytes_total
  • go_gc_heap_allocs_objects_total
  • go_gc_heap_frees_by_size_bytes
  • go_gc_heap_frees_bytes_total
  • go_gc_heap_frees_objects_total
  • go_gc_heap_goal_bytes
  • go_gc_heap_live_bytes
  • go_gc_heap_objects_objects
  • go_gc_heap_tiny_allocs_objects_total
  • go_gc_limiter_last_enabled_gc_cycle
  • go_gc_pauses_seconds
  • go_gc_scan_globals_bytes
  • go_gc_scan_heap_bytes
  • go_gc_scan_stack_bytes
  • go_gc_scan_total_bytes
  • go_gc_stack_starting_size_bytes
  • go_godebug_non_default_behavior_execerrdot_events_total
  • go_godebug_non_default_behavior_gocachehash_events_total
  • go_godebug_non_default_behavior_gocachetest_events_total
  • go_godebug_non_default_behavior_gocacheverify_events_total
  • go_godebug_non_default_behavior_gotypesalias_events_total
  • go_godebug_non_default_behavior_http2client_events_total
  • go_godebug_non_default_behavior_http2server_events_total
  • go_godebug_non_default_behavior_httplaxcontentlength_events_total
  • go_godebug_non_default_behavior_httpmuxgo121_events_total
  • go_godebug_non_default_behavior_installgoroot_events_total
  • go_godebug_non_default_behavior_jstmpllitinterp_events_total
  • go_godebug_non_default_behavior_multipartmaxheaders_events_total
  • go_godebug_non_default_behavior_multipartmaxparts_events_total
  • go_godebug_non_default_behavior_multipathtcp_events_total
  • go_godebug_non_default_behavior_panicnil_events_total
  • go_godebug_non_default_behavior_randautoseed_events_total
  • go_godebug_non_default_behavior_tarinsecurepath_events_total
  • go_godebug_non_default_behavior_tls10server_events_total
  • go_godebug_non_default_behavior_tlsmaxrsasize_events_total
  • go_godebug_non_default_behavior_tlsrsakex_events_total
  • go_godebug_non_default_behavior_tlsunsafeekm_events_total
  • go_godebug_non_default_behavior_x509sha1_events_total
  • go_godebug_non_default_behavior_x509usefallbackroots_events_total
  • go_godebug_non_default_behavior_x509usepolicies_events_total
  • go_godebug_non_default_behavior_zipinsecurepath_events_total
  • go_memory_classes_heap_free_bytes
  • go_memory_classes_heap_objects_bytes
  • go_memory_classes_heap_released_bytes
  • go_memory_classes_heap_stacks_bytes
  • go_memory_classes_heap_unused_bytes
  • go_memory_classes_metadata_mcache_free_bytes
  • go_memory_classes_metadata_mcache_inuse_bytes
  • go_memory_classes_metadata_mspan_free_bytes
  • go_memory_classes_metadata_mspan_inuse_bytes
  • go_memory_classes_metadata_other_bytes
  • go_memory_classes_os_stacks_bytes
  • go_memory_classes_other_bytes
  • go_memory_classes_profiling_buckets_bytes
  • go_memory_classes_total_bytes
  • go_sched_pauses_stopping_gc_seconds
  • go_sched_pauses_stopping_other_seconds
  • go_sched_pauses_total_gc_seconds
  • go_sched_pauses_total_other_seconds
  • go_sync_mutex_wait_total_seconds_total

📜 Details

Design Doc

Jira

🧪 Testing Steps / Validation

✅ Checks

  • CI tests (if present) are passing
  • Adheres to code style for repo
  • Contributor License Agreement (CLA) completed if not a Reddit employee

@SuperQ SuperQ requested a review from a team as a code owner September 13, 2023 14:13
@SuperQ SuperQ requested review from fishy, kylelemons and pacejackson and removed request for a team September 13, 2023 14:13
@@ -9,7 +9,22 @@ import (
)

var expectedMetrics = []string{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like on go 1.21 we got more than these: https://github.com/reddit/baseplate.go/actions/runs/6173805147/job/16756970260?pr=629

maybe instead of using diff for this test (equal), we only check that all the expectedMetrics are reported instead (minimal)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated tests for Go >= 1.21

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing we could do is create our own custom filter and only include the metrics we want.

  RedditMetricsGC = collectors.GoRuntimeMetricsRule{regexp.MustCompile(`^/gc/.*`)}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new ones added in go 1.21 are also go_gc_.*, it just report more gc metrics in go 1.21 so that likely won't work,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that we could customize this regexp, not use the default.

internal/admin/server_121_test.go Outdated Show resolved Hide resolved
internal/admin/server.go Outdated Show resolved Hide resolved
@SuperQ SuperQ force-pushed the MetricsGC branch 2 times, most recently from 21d61da to 6e3f72c Compare March 12, 2024 15:09
@SuperQ SuperQ requested a review from kylelemons March 12, 2024 15:10
@SuperQ
Copy link
Contributor Author

SuperQ commented Mar 12, 2024

Updated now that 1.21 is the minimum version.

@SuperQ SuperQ changed the title Add Go GC metrics to admin endpoint default Add All Go metrics to admin endpoint default Mar 12, 2024
@SuperQ
Copy link
Contributor Author

SuperQ commented Mar 12, 2024

Updated to include MetricsAll.

Add All Go `runtime/metrics` to the default included metrics.

This will add several new metrics per process:
* `go_cgo_go_to_c_calls_calls_total`
* `go_cpu_classes_gc_mark_assist_cpu_seconds_total`
* `go_cpu_classes_gc_mark_dedicated_cpu_seconds_total`
* `go_cpu_classes_gc_mark_idle_cpu_seconds_total`
* `go_cpu_classes_gc_pause_cpu_seconds_total`
* `go_cpu_classes_gc_total_cpu_seconds_total`
* `go_cpu_classes_idle_cpu_seconds_total`
* `go_cpu_classes_scavenge_assist_cpu_seconds_total`
* `go_cpu_classes_scavenge_background_cpu_seconds_total`
* `go_cpu_classes_scavenge_total_cpu_seconds_total`
* `go_cpu_classes_total_cpu_seconds_total`
* `go_cpu_classes_user_cpu_seconds_total`
* `go_gc_cycles_automatic_gc_cycles_total`
* `go_gc_cycles_forced_gc_cycles_total`
* `go_gc_cycles_total_gc_cycles_total`
* `go_gc_gogc_percent`
* `go_gc_gomemlimit_bytes`
* `go_gc_heap_allocs_by_size_bytes`
* `go_gc_heap_allocs_bytes_total`
* `go_gc_heap_allocs_objects_total`
* `go_gc_heap_frees_by_size_bytes`
* `go_gc_heap_frees_bytes_total`
* `go_gc_heap_frees_objects_total`
* `go_gc_heap_goal_bytes`
* `go_gc_heap_live_bytes`
* `go_gc_heap_objects_objects`
* `go_gc_heap_tiny_allocs_objects_total`
* `go_gc_limiter_last_enabled_gc_cycle`
* `go_gc_pauses_seconds`
* `go_gc_scan_globals_bytes`
* `go_gc_scan_heap_bytes`
* `go_gc_scan_stack_bytes`
* `go_gc_scan_total_bytes`
* `go_gc_stack_starting_size_bytes`
* `go_godebug_non_default_behavior_execerrdot_events_total`
* `go_godebug_non_default_behavior_gocachehash_events_total`
* `go_godebug_non_default_behavior_gocachetest_events_total`
* `go_godebug_non_default_behavior_gocacheverify_events_total`
* `go_godebug_non_default_behavior_gotypesalias_events_total`
* `go_godebug_non_default_behavior_http2client_events_total`
* `go_godebug_non_default_behavior_http2server_events_total`
* `go_godebug_non_default_behavior_httplaxcontentlength_events_total`
* `go_godebug_non_default_behavior_httpmuxgo121_events_total`
* `go_godebug_non_default_behavior_installgoroot_events_total`
* `go_godebug_non_default_behavior_jstmpllitinterp_events_total`
* `go_godebug_non_default_behavior_multipartmaxheaders_events_total`
* `go_godebug_non_default_behavior_multipartmaxparts_events_total`
* `go_godebug_non_default_behavior_multipathtcp_events_total`
* `go_godebug_non_default_behavior_panicnil_events_total`
* `go_godebug_non_default_behavior_randautoseed_events_total`
* `go_godebug_non_default_behavior_tarinsecurepath_events_total`
* `go_godebug_non_default_behavior_tls10server_events_total`
* `go_godebug_non_default_behavior_tlsmaxrsasize_events_total`
* `go_godebug_non_default_behavior_tlsrsakex_events_total`
* `go_godebug_non_default_behavior_tlsunsafeekm_events_total`
* `go_godebug_non_default_behavior_x509sha1_events_total`
* `go_godebug_non_default_behavior_x509usefallbackroots_events_total`
* `go_godebug_non_default_behavior_x509usepolicies_events_total`
* `go_godebug_non_default_behavior_zipinsecurepath_events_total`
* `go_memory_classes_heap_free_bytes`
* `go_memory_classes_heap_objects_bytes`
* `go_memory_classes_heap_released_bytes`
* `go_memory_classes_heap_stacks_bytes`
* `go_memory_classes_heap_unused_bytes`
* `go_memory_classes_metadata_mcache_free_bytes`
* `go_memory_classes_metadata_mcache_inuse_bytes`
* `go_memory_classes_metadata_mspan_free_bytes`
* `go_memory_classes_metadata_mspan_inuse_bytes`
* `go_memory_classes_metadata_other_bytes`
* `go_memory_classes_os_stacks_bytes`
* `go_memory_classes_other_bytes`
* `go_memory_classes_profiling_buckets_bytes`
* `go_memory_classes_total_bytes`
* `go_sched_pauses_stopping_gc_seconds`
* `go_sched_pauses_stopping_other_seconds`
* `go_sched_pauses_total_gc_seconds`
* `go_sched_pauses_total_other_seconds`
* `go_sync_mutex_wait_total_seconds_total`

Signed-off-by: SuperQ <[email protected]>
Copy link
Member

@fishy fishy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure that having an unit test to ensure that everything defined in upstream packages (go itself and prometheus) that we don't control is a good idea and scales well. I think the unit test should be changed to check for subset (makes sure a few key metrics are there) instead of equal.

@SuperQ
Copy link
Contributor Author

SuperQ commented Mar 12, 2024

The unit test is already setup as a subset check. It only checks one-way, if the expected list is in the output set. Not the other way around.

@bjk-reddit bjk-reddit merged commit 8a597c5 into reddit:master Mar 13, 2024
2 checks passed
@SuperQ SuperQ deleted the MetricsGC branch March 13, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants