Dashboards reference

This document contains a complete reference on Sourcegraph’s available dashboards, as well as details on how to interpret the panels and metrics.

To learn more about Sourcegraph’s metrics and how to view these dashboards, see our metrics guide.

Frontend

Serves all end-user browser and API requests.

Frontend: Search at a glance

frontend: 99th_percentile_search_request_duration

This panel indicates 99th percentile successful search request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_search_request_duration

This panel indicates 90th percentile successful search request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: hard_timeout_search_responses

This panel indicates hard timeout search responses every 5m.

Managed by the Sourcegraph Search team.


frontend: hard_error_search_responses

This panel indicates hard error search responses every 5m.

Managed by the Sourcegraph Search team.


frontend: partial_timeout_search_responses

This panel indicates partial timeout search responses every 5m.

Managed by the Sourcegraph Search team.


frontend: search_alert_user_suggestions

This panel indicates search alert user suggestions shown every 5m.

Managed by the Sourcegraph Search team.


frontend: page_load_latency

This panel indicates 90th percentile page load latency over all routes over 10m.

Managed by the Sourcegraph Core application team.


frontend: blob_load_latency

This panel indicates 90th percentile blob load latency over 10m.

Managed by the Sourcegraph Core application team.


Frontend: Search-based code intelligence at a glance

frontend: 99th_percentile_search_codeintel_request_duration

This panel indicates 99th percentile code-intel successful search request duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: 90th_percentile_search_codeintel_request_duration

This panel indicates 90th percentile code-intel successful search request duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: hard_timeout_search_codeintel_responses

This panel indicates hard timeout search code-intel responses every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: hard_error_search_codeintel_responses

This panel indicates hard error search code-intel responses every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: partial_timeout_search_codeintel_responses

This panel indicates partial timeout search code-intel responses every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: search_codeintel_alert_user_suggestions

This panel indicates search code-intel alert user suggestions shown every 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Search API usage at a glance

frontend: 99th_percentile_search_api_request_duration

This panel indicates 99th percentile successful search API request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_search_api_request_duration

This panel indicates 90th percentile successful search API request duration over 5m.

Managed by the Sourcegraph Search team.


frontend: hard_timeout_search_api_responses

This panel indicates hard timeout search API responses every 5m.

Managed by the Sourcegraph Search team.


frontend: hard_error_search_api_responses

This panel indicates hard error search API responses every 5m.

Managed by the Sourcegraph Search team.


frontend: partial_timeout_search_api_responses

This panel indicates partial timeout search API responses every 5m.

Managed by the Sourcegraph Search team.


frontend: search_api_alert_user_suggestions

This panel indicates search API alert user suggestions shown every 5m.

Managed by the Sourcegraph Search team.


Frontend: Precise code intelligence usage at a glance

frontend: codeintel_resolvers_99th_percentile_duration

This panel indicates 99th percentile successful resolver duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_resolvers_errors

This panel indicates resolver errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Precise code intelligence stores and clients

frontend: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful database store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_dbstore_errors

This panel indicates database store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_upload_workerstore_99th_percentile_duration

This panel indicates 99th percentile successful upload worker store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_upload_workerstore_errors

This panel indicates upload worker store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_index_workerstore_99th_percentile_duration

This panel indicates 99th percentile successful index worker store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_index_workerstore_errors

This panel indicates index worker store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful LSIF store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_lsifstore_errors

This panel indicates lSIF store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_99th_percentile_duration

This panel indicates 99th percentile successful upload store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_uploadstore_errors

This panel indicates upload store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserverclient_99th_percentile_duration

This panel indicates 99th percentile successful gitserver client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


frontend: codeintel_gitserverclient_errors

This panel indicates gitserver client errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Frontend: Out of band migrations

frontend: out_of_band_migrations_up_99th_percentile_duration

This panel indicates 99th percentile successful out-of-band up migration invocation (single batch processed) duration over 5m.

Managed by the Sourcegraph Core application team.


frontend: out_of_band_migrations_up_errors

This panel indicates out-of-band up migration errors every 5m.

Managed by the Sourcegraph Core application team.


frontend: out_of_band_migrations_down_99th_percentile_duration

This panel indicates 99th percentile successful out-of-band down migration invocation (single batch processed) duration over 5m.

Managed by the Sourcegraph Core application team.


frontend: out_of_band_migrations_down_errors

This panel indicates out-of-band down migration errors every 5m.

Managed by the Sourcegraph Core application team.


Frontend: Internal service requests

frontend: internal_indexed_search_error_responses

This panel indicates internal indexed search error responses every 5m.

Managed by the Sourcegraph Search team.


frontend: internal_unindexed_search_error_responses

This panel indicates internal unindexed search error responses every 5m.

Managed by the Sourcegraph Search team.


frontend: internal_api_error_responses

This panel indicates internal API error responses every 5m by route.

Managed by the Sourcegraph Core application team.


frontend: 99th_percentile_gitserver_duration

This panel indicates 99th percentile successful gitserver query duration over 5m.

Managed by the Sourcegraph Core application team.


frontend: gitserver_error_responses

This panel indicates gitserver error responses every 5m.

Managed by the Sourcegraph Core application team.


frontend: observability_test_alert_warning

This panel indicates warning test alert metric.

Managed by the Sourcegraph Distribution team.


frontend: observability_test_alert_critical

This panel indicates critical test alert metric.

Managed by the Sourcegraph Distribution team.


Frontend: Database connections

frontend: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


frontend: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


frontend: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


frontend: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


frontend: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


frontend: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


frontend: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


frontend: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Frontend: Container monitoring (not available on server)

frontend: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod (frontend|sourcegraph-frontend) (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p (frontend|sourcegraph-frontend).
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' (frontend|sourcegraph-frontend) (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the (frontend|sourcegraph-frontend) container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs (frontend|sourcegraph-frontend) (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


frontend: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


frontend: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


frontend: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Frontend: Provisioning indicators (not available on server)

frontend: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


frontend: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


frontend: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


frontend: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Frontend: Golang runtime monitoring

frontend: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


frontend: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


Frontend: Kubernetes monitoring (only available on Kubernetes)

frontend: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Frontend: Sentinel queries (only on sourcegraph.com)

frontend: mean_successful_sentinel_duration_5m

This panel indicates mean successful sentinel search duration over 5m.

Managed by the Sourcegraph Search team.


frontend: mean_sentinel_stream_latency_5m

This panel indicates mean sentinel stream latency over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_successful_sentinel_duration_5m

This panel indicates 90th percentile successful sentinel search duration over 5m.

Managed by the Sourcegraph Search team.


frontend: 90th_percentile_sentinel_stream_latency_5m

This panel indicates 90th percentile sentinel stream latency over 5m.

Managed by the Sourcegraph Search team.


frontend: mean_successful_sentinel_duration_by_query_5m

This panel indicates mean successful sentinel search duration by query over 5m.

  • The mean search duration for sentinel queries, broken down by query. Useful for debugging whether a slowdown is limited to a specific type of query.

Managed by the Sourcegraph Search team.


frontend: mean_sentinel_stream_latency_by_query_5m

This panel indicates mean sentinel stream latency by query over 5m.

  • The mean streaming search latency for sentinel queries, broken down by query. Useful for debugging whether a slowdown is limited to a specific type of query.

Managed by the Sourcegraph Search team.


frontend: unsuccessful_status_rate_5m

This panel indicates unsuccessful status rate per 5m.

  • The rate of unsuccessful sentinel query, broken down by failure type

Managed by the Sourcegraph Search team.


Git Server

Stores, manages, and operates Git repositories.

gitserver: memory_working_set

This panel indicates memory working set.

Managed by the Sourcegraph Core application team.


gitserver: go_routines

This panel indicates go routines.

Managed by the Sourcegraph Core application team.


gitserver: cpu_throttling_time

This panel indicates container CPU throttling time %.

Managed by the Sourcegraph Core application team.


gitserver: cpu_usage_seconds

This panel indicates cpu usage seconds.

Managed by the Sourcegraph Core application team.


gitserver: disk_space_remaining

This panel indicates disk space remaining by instance.

Managed by the Sourcegraph Core application team.


gitserver: io_reads_total

This panel indicates i/o reads total.

Managed by the Sourcegraph Core application team.


gitserver: io_writes_total

This panel indicates i/o writes total.

Managed by the Sourcegraph Core application team.


gitserver: io_reads

This panel indicates i/o reads.

Managed by the Sourcegraph Core application team.


gitserver: io_writes

This panel indicates i/o writes.

Managed by the Sourcegraph Core application team.


gitserver: io_read_througput

This panel indicates i/o read throughput.

Managed by the Sourcegraph Core application team.


gitserver: io_write_throughput

This panel indicates i/o write throughput.

Managed by the Sourcegraph Core application team.


gitserver: running_git_commands

This panel indicates git commands running on each gitserver instance.

A high value signals load.

Managed by the Sourcegraph Core application team.


gitserver: git_commands_received

This panel indicates rate of git commands received across all instances.

per second rate per command across all instances

Managed by the Sourcegraph Core application team.


gitserver: repository_clone_queue_size

This panel indicates repository clone queue size.

Managed by the Sourcegraph Core application team.


gitserver: repository_existence_check_queue_size

This panel indicates repository existence check queue size.

Managed by the Sourcegraph Core application team.


gitserver: echo_command_duration_test

This panel indicates echo test command duration.

A high value here likely indicates a problem, especially if consistently high. You can query for individual commands using sum by (cmd)(src_gitserver_exec_running) in Grafana (/-/debug/grafana) to see if a specific Git Server command might be spiking in frequency.

If this value is consistently high, consider the following:

  • Single container deployments: Upgrade to a Docker Compose deployment which offers better scalability and resource isolation.
  • Kubernetes and Docker Compose: Check that you are running a similar number of git server replicas and that their CPU/memory limits are allocated according to what is shown in the Sourcegraph resource estimator.

Managed by the Sourcegraph Core application team.


gitserver: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Core application team.


Git Server: Gitserver cleanup jobs

gitserver: janitor_running

This panel indicates if the janitor process is running.

1, if the janitor process is currently running

Managed by the Sourcegraph Core application team.


gitserver: janitor_job_duration

This panel indicates 95th percentile job run duration.

95th percentile job run duration

Managed by the Sourcegraph Core application team.


gitserver: repos_removed

This panel indicates repositories removed due to disk pressure.

Repositories removed due to disk pressure

Managed by the Sourcegraph Core application team.


Git Server: Database connections

gitserver: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


gitserver: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


gitserver: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


gitserver: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


gitserver: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


gitserver: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


gitserver: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


gitserver: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Git Server: Container monitoring (not available on server)

gitserver: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod gitserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p gitserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' gitserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the gitserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs gitserver (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


gitserver: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


gitserver: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


gitserver: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Git Server: Provisioning indicators (not available on server)

gitserver: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


gitserver: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Git Server is expected to use up all the memory it is provided.

Managed by the Sourcegraph Core application team.


gitserver: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


gitserver: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Git Server is expected to use up all the memory it is provided.

Managed by the Sourcegraph Core application team.


Git Server: Golang runtime monitoring

gitserver: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


gitserver: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


Git Server: Kubernetes monitoring (only available on Kubernetes)

gitserver: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


GitHub Proxy

Proxies all requests to github.com, keeping track of and managing rate limits.

GitHub Proxy: GitHub API monitoring

github-proxy: github_proxy_waiting_requests

This panel indicates number of requests waiting on the global mutex.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Container monitoring (not available on server)

github-proxy: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod github-proxy (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p github-proxy.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' github-proxy (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the github-proxy container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs github-proxy (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


github-proxy: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


github-proxy: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


github-proxy: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Provisioning indicators (not available on server)

github-proxy: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


github-proxy: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


github-proxy: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


github-proxy: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Golang runtime monitoring

github-proxy: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


github-proxy: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


GitHub Proxy: Kubernetes monitoring (only available on Kubernetes)

github-proxy: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Postgres

Postgres metrics, exported from postgres_exporter (only available on Kubernetes).

postgres: connections

This panel indicates active connections.

Managed by the Sourcegraph Core application team.


postgres: transaction_durations

This panel indicates maximum transaction durations.

Managed by the Sourcegraph Core application team.


Postgres: Database and collector status

postgres: postgres_up

This panel indicates database availability.

A non-zero value indicates the database is online.

Managed by the Sourcegraph Core application team.


postgres: invalid_indexes

This panel indicates invalid indexes (unusable by the query planner).

A non-zero value indicates the that Postgres failed to build an index. Expect degraded performance until the index is manually rebuilt.

Managed by the Sourcegraph Core application team.


postgres: pg_exporter_err

This panel indicates errors scraping postgres exporter.

This value indicates issues retrieving metrics from postgres_exporter.

Managed by the Sourcegraph Core application team.


postgres: migration_in_progress

This panel indicates active schema migration.

A 0 value indicates that no migration is in progress.

Managed by the Sourcegraph Core application team.


Postgres: Object size and bloat

postgres: pg_table_size

This panel indicates table size.

Total size of this table

Managed by the Sourcegraph Core application team.


postgres: pg_table_bloat_ratio

This panel indicates table bloat ratio.

Estimated bloat ratio of this table (high bloat = high overhead)

Managed by the Sourcegraph Core application team.


postgres: pg_index_size

This panel indicates index size.

Total size of this index

Managed by the Sourcegraph Core application team.


postgres: pg_index_bloat_ratio

This panel indicates index bloat ratio.

Estimated bloat ratio of this index (high bloat = high overhead)

Managed by the Sourcegraph Core application team.


Postgres: Provisioning indicators (not available on server)

postgres: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


postgres: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


postgres: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


postgres: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Postgres: Kubernetes monitoring (only available on Kubernetes)

postgres: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Precise Code Intel Worker

Handles conversion of uploaded precise code intelligence bundles.

Precise Code Intel Worker: Upload queue

precise-code-intel-worker: upload_queue_size

This panel indicates queue size.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: upload_queue_growth_rate

This panel indicates queue growth rate over 30m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: job_errors

This panel indicates job errors errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: active_workers

This panel indicates active workers processing uploads.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: active_jobs

This panel indicates active jobs.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Workers

precise-code-intel-worker: job_99th_percentile_duration

This panel indicates 99th percentile successful job duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Stores and clients

precise-code-intel-worker: codeintel_dbstore_99th_percentile_duration

This panel indicates 99th percentile successful database store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_dbstore_errors

This panel indicates database store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_workerstore_99th_percentile_duration

This panel indicates 99th percentile successful worker store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_workerstore_errors

This panel indicates worker store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_99th_percentile_duration

This panel indicates 99th percentile successful LSIF store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_lsifstore_errors

This panel indicates lSIF store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_99th_percentile_duration

This panel indicates 99th percentile successful upload store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_uploadstore_errors

This panel indicates upload store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserverclient_99th_percentile_duration

This panel indicates 99th percentile successful gitserver client operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: codeintel_gitserverclient_errors

This panel indicates gitserver client errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Internal service requests

precise-code-intel-worker: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Database connections

precise-code-intel-worker: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


precise-code-intel-worker: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Precise Code Intel Worker: Container monitoring (not available on server)

precise-code-intel-worker: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-worker (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-worker.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-worker (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-worker container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-worker (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Precise Code Intel Worker: Provisioning indicators (not available on server)

precise-code-intel-worker: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Golang runtime monitoring

precise-code-intel-worker: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-worker: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Worker: Kubernetes monitoring (only available on Kubernetes)

precise-code-intel-worker: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.


Query Runner

Periodically runs saved searches and instructs the frontend to send out notifications.

Query Runner: Internal service requests

query-runner: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Search team.


Query Runner: Container monitoring (not available on server)

query-runner: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod query-runner (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p query-runner.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' query-runner (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the query-runner container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs query-runner (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


query-runner: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


query-runner: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


query-runner: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Query Runner: Provisioning indicators (not available on server)

query-runner: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


query-runner: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


query-runner: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


query-runner: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Query Runner: Golang runtime monitoring

query-runner: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Search team.


query-runner: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Search team.


Query Runner: Kubernetes monitoring (only available on Kubernetes)

query-runner: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Search team.


Worker

Manages background processes.

Worker: Active jobs

worker: worker_job_count

This panel indicates number of worker instances running each job.

The number of worker instances running each job type. It is necessary for each job type to be managed by at least one worker instance.


worker: worker_job_codeintel-janitor_count

This panel indicates number of worker instances running the codeintel-janitor job.

Managed by the Sourcegraph Code-intelligence team.


worker: worker_job_codeintel-commitgraph_count

This panel indicates number of worker instances running the codeintel-commitgraph job.

Managed by the Sourcegraph Code-intelligence team.


worker: worker_job_codeintel-auto-indexing_count

This panel indicates number of worker instances running the codeintel-auto-indexing job.

Managed by the Sourcegraph Code-intelligence team.


Worker: Precise code intelligence commit graph updater

worker: codeintel_commit_graph_queue_size

This panel indicates commit graph queue size.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_commit_graph_queue_growth_rate

This panel indicates commit graph queue growth rate over 30m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_commit_graph_updater_99th_percentile_duration

This panel indicates 99th percentile successful commit graph updater operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_commit_graph_updater_errors

This panel indicates commit graph updater errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Precise code intelligence janitor

worker: codeintel_janitor_errors

This panel indicates janitor errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_upload_records_removed

This panel indicates upload records expired or deleted every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_index_records_removed

This panel indicates index records expired or deleted every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_lsif_data_removed

This panel indicates data for unreferenced upload records removed every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_upload_resets

This panel indicates upload records re-queued (due to unresponsive worker) every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_upload_reset_failures

This panel indicates upload records errored due to repeated reset every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_index_resets

This panel indicates index records re-queued (due to unresponsive indexer) every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_background_index_reset_failures

This panel indicates index records errored due to repeated reset every 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Auto-indexing

worker: codeintel_indexing_99th_percentile_duration

This panel indicates 99th percentile successful indexing operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_indexing_errors

This panel indicates indexing errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_99th_percentile_duration

This panel indicates 99th percentile successful index enqueuer operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


worker: codeintel_autoindex_enqueuer_errors

This panel indicates index enqueuer errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Worker: Internal service requests

worker: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Code-intelligence team.


Worker: Database connections

worker: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


worker: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


worker: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


worker: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


worker: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


worker: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


worker: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


worker: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Worker: Container monitoring (not available on server)

worker: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod worker (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p worker.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' worker (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the worker container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs worker (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


worker: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Worker: Provisioning indicators (not available on server)

worker: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


worker: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Worker: Golang runtime monitoring

worker: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


worker: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Worker: Kubernetes monitoring (only available on Kubernetes)

worker: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.


Repo Updater

Manages interaction with code hosts, instructs Gitserver to update repositories.

Repo Updater: Repositories

repo-updater: syncer_sync_last_time

This panel indicates time since last sync.

A high value here indicates issues synchronizing repo metadata. If the value is persistently high, make sure all external services have valid tokens.

Managed by the Sourcegraph Core application team.


repo-updater: src_repoupdater_max_sync_backoff

This panel indicates time since oldest sync.

Managed by the Sourcegraph Core application team.


repo-updater: src_repoupdater_syncer_sync_errors_total

This panel indicates site level external service sync error rate.

Managed by the Sourcegraph Core application team.


repo-updater: syncer_sync_start

This panel indicates repo metadata sync was started.

Managed by the Sourcegraph Core application team.


repo-updater: syncer_sync_duration

This panel indicates 95th repositories sync duration.

Managed by the Sourcegraph Core application team.


repo-updater: source_duration

This panel indicates 95th repositories source duration.

Managed by the Sourcegraph Core application team.


repo-updater: syncer_synced_repos

This panel indicates repositories synced.

Managed by the Sourcegraph Core application team.


repo-updater: sourced_repos

This panel indicates repositories sourced.

Managed by the Sourcegraph Core application team.


repo-updater: user_added_repos

This panel indicates total number of user added repos.

Managed by the Sourcegraph Core application team.


repo-updater: purge_failed

This panel indicates repositories purge failed.

Managed by the Sourcegraph Core application team.


repo-updater: sched_auto_fetch

This panel indicates repositories scheduled due to hitting a deadline.

Managed by the Sourcegraph Core application team.


repo-updater: sched_manual_fetch

This panel indicates repositories scheduled due to user traffic.

Check repo-updater logs if this value is persistently high. This does not indicate anything if there are no user added code hosts.

Managed by the Sourcegraph Core application team.


repo-updater: sched_known_repos

This panel indicates repositories managed by the scheduler.

Managed by the Sourcegraph Core application team.


repo-updater: sched_update_queue_length

This panel indicates rate of growth of update queue length over 5 minutes.

Managed by the Sourcegraph Core application team.


repo-updater: sched_loops

This panel indicates scheduler loops.

Managed by the Sourcegraph Core application team.


repo-updater: sched_error

This panel indicates repositories schedule error rate.

Managed by the Sourcegraph Core application team.


Repo Updater: Permissions

repo-updater: perms_syncer_perms

This panel indicates time gap between least and most up to date permissions.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_stale_perms

This panel indicates number of entities with stale permissions.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_no_perms

This panel indicates number of entities with no permissions.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_sync_duration

This panel indicates 95th permissions sync duration.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_queue_size

This panel indicates permissions sync queued items.

Managed by the Sourcegraph Core application team.


repo-updater: perms_syncer_sync_errors

This panel indicates permissions sync error rate.

Managed by the Sourcegraph Core application team.


Repo Updater: External services

repo-updater: src_repoupdater_external_services_total

This panel indicates the total number of external services.

Managed by the Sourcegraph Core application team.


repo-updater: src_repoupdater_user_external_services_total

This panel indicates the total number of user added external services.

Managed by the Sourcegraph Core application team.


repo-updater: repoupdater_queued_sync_jobs_total

This panel indicates the total number of queued sync jobs.

Managed by the Sourcegraph Core application team.


repo-updater: repoupdater_completed_sync_jobs_total

This panel indicates the total number of completed sync jobs.

Managed by the Sourcegraph Core application team.


repo-updater: repoupdater_errored_sync_jobs_percentage

This panel indicates the percentage of external services that have failed their most recent sync.

Managed by the Sourcegraph Core application team.


repo-updater: github_graphql_rate_limit_remaining

This panel indicates remaining calls to GitHub graphql API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: github_rest_rate_limit_remaining

This panel indicates remaining calls to GitHub rest API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: github_search_rate_limit_remaining

This panel indicates remaining calls to GitHub search API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: github_graphql_rate_limit_wait_duration

This panel indicates time spent waiting for the GitHub graphql API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


repo-updater: github_rest_rate_limit_wait_duration

This panel indicates time spent waiting for the GitHub rest API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


repo-updater: github_search_rate_limit_wait_duration

This panel indicates time spent waiting for the GitHub search API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


repo-updater: gitlab_rest_rate_limit_remaining

This panel indicates remaining calls to GitLab rest API before hitting the rate limit.

Managed by the Sourcegraph Core application team.


repo-updater: gitlab_rest_rate_limit_wait_duration

This panel indicates time spent waiting for the GitLab rest API rate limiter.

Indicates how long we`re waiting on the rate limit once it has been exceeded

Managed by the Sourcegraph Core application team.


Repo Updater: Internal service requests

repo-updater: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Core application team.


Repo Updater: Database connections

repo-updater: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


repo-updater: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


repo-updater: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


repo-updater: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


repo-updater: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


repo-updater: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


repo-updater: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


repo-updater: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Repo Updater: Container monitoring (not available on server)

repo-updater: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod repo-updater (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p repo-updater.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' repo-updater (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the repo-updater container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs repo-updater (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


repo-updater: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


repo-updater: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


repo-updater: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Repo Updater: Provisioning indicators (not available on server)

repo-updater: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


repo-updater: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


repo-updater: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


repo-updater: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Repo Updater: Golang runtime monitoring

repo-updater: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Core application team.


repo-updater: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Core application team.


Repo Updater: Kubernetes monitoring (only available on Kubernetes)

repo-updater: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Searcher

Performs unindexed searches (diff and commit search, text search for unindexed branches).

searcher: unindexed_search_request_errors

This panel indicates unindexed search request errors every 5m by code.

Managed by the Sourcegraph Search team.


searcher: replica_traffic

This panel indicates requests per second over 10m.

Managed by the Sourcegraph Search team.


Searcher: Internal service requests

searcher: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Search team.


Searcher: Container monitoring (not available on server)

searcher: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod searcher (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p searcher.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' searcher (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the searcher container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs searcher (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


searcher: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


searcher: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


searcher: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Searcher: Provisioning indicators (not available on server)

searcher: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


searcher: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


searcher: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


searcher: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Searcher: Golang runtime monitoring

searcher: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Search team.


searcher: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Search team.


Searcher: Kubernetes monitoring (only available on Kubernetes)

searcher: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Search team.


Symbols

Handles symbol searches for unindexed branches.

symbols: store_fetch_failures

This panel indicates store fetch failures every 5m.

Managed by the Sourcegraph Code-intelligence team.


symbols: current_fetch_queue_size

This panel indicates current fetch queue size.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Internal service requests

symbols: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Container monitoring (not available on server)

symbols: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod symbols (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p symbols.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' symbols (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the symbols container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs symbols (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


symbols: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Symbols: Provisioning indicators (not available on server)

symbols: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


symbols: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Golang runtime monitoring

symbols: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


symbols: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Symbols: Kubernetes monitoring (only available on Kubernetes)

symbols: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.


Syntect Server

Handles syntax highlighting for code files.

syntect-server: syntax_highlighting_errors

This panel indicates syntax highlighting errors every 5m.

Managed by the Sourcegraph Core application team.


syntect-server: syntax_highlighting_timeouts

This panel indicates syntax highlighting timeouts every 5m.

Managed by the Sourcegraph Core application team.


syntect-server: syntax_highlighting_panics

This panel indicates syntax highlighting panics every 5m.

Managed by the Sourcegraph Core application team.


syntect-server: syntax_highlighting_worker_deaths

This panel indicates syntax highlighter worker deaths every 5m.

Managed by the Sourcegraph Core application team.


Syntect Server: Container monitoring (not available on server)

syntect-server: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod syntect-server (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p syntect-server.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' syntect-server (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the syntect-server container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs syntect-server (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Core application team.


syntect-server: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Core application team.


syntect-server: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Core application team.


syntect-server: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Syntect Server: Provisioning indicators (not available on server)

syntect-server: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Core application team.


syntect-server: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Core application team.


syntect-server: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Core application team.


syntect-server: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Core application team.


Syntect Server: Kubernetes monitoring (only available on Kubernetes)

syntect-server: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Core application team.


Zoekt Index Server

Indexes repositories and populates the search index.

zoekt-indexserver: repos_assigned

This panel indicates total number of repos.

Sudden changes should be caused by indexing configuration changes.

Managed by the Sourcegraph Search team.


zoekt-indexserver: repos_priorities

This panel indicates total number of repos with priorities for ranking.

Sudden changes should be caused by indexing configuration changes.

Managed by the Sourcegraph Search team.


zoekt-indexserver: repo_index_state

This panel indicates indexing results over 5m (noop=no changes, empty=no branches to index).

A persistent failing state indicates some repositories cannot be indexed, perhaps due to size and timeouts.

Managed by the Sourcegraph Search team.


zoekt-indexserver: repo_index_success_speed

This panel indicates successful indexing durations.

Latency increases can indicate bottlenecks in the indexserver.

Managed by the Sourcegraph Search team.


zoekt-indexserver: repo_index_fail_speed

This panel indicates failed indexing durations.

Failures happening after a long time indicates timeouts.

Managed by the Sourcegraph Search team.


zoekt-indexserver: average_resolve_revision_duration

This panel indicates average resolve revision duration over 5m.

Managed by the Sourcegraph Search team.


Zoekt Index Server: Container monitoring (not available on server)

zoekt-indexserver: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-indexserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-indexserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-indexserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-indexserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-indexserver (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


zoekt-indexserver: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Zoekt Index Server: Provisioning indicators (not available on server)

zoekt-indexserver: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-indexserver: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Zoekt Index Server: Kubernetes monitoring (only available on Kubernetes)

zoekt-indexserver: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Search team.


Zoekt Web Server

Serves indexed search requests using the search index.

zoekt-webserver: indexed_search_request_errors

This panel indicates indexed search request errors every 5m by code.

Managed by the Sourcegraph Search team.


Zoekt Web Server: Container monitoring (not available on server)

zoekt-webserver: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod zoekt-webserver (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p zoekt-webserver.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' zoekt-webserver (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the zoekt-webserver container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs zoekt-webserver (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Search team.


zoekt-webserver: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Zoekt Web Server: Provisioning indicators (not available on server)

zoekt-webserver: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Search team.


zoekt-webserver: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Search team.


Prometheus

Sourcegraph's all-in-one Prometheus and Alertmanager service.

Prometheus: Metrics

prometheus: prometheus_rule_eval_duration

This panel indicates average prometheus rule group evaluation duration over 10m by rule group.

A high value here indicates Prometheus rule evaluation is taking longer than expected. It might indicate that certain rule groups are taking too long to evaluate, or Prometheus is underprovisioned.

Rules that Sourcegraph ships with are grouped under /sg_config_prometheus. Custom rules are grouped under /sg_prometheus_addons.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_rule_eval_failures

This panel indicates failed prometheus rule evaluations over 5m by rule group.

Rules that Sourcegraph ships with are grouped under /sg_config_prometheus. Custom rules are grouped under /sg_prometheus_addons.

Managed by the Sourcegraph Distribution team.


Prometheus: Alerts

prometheus: alertmanager_notification_latency

This panel indicates alertmanager notification latency over 1m by integration.

Managed by the Sourcegraph Distribution team.


prometheus: alertmanager_notification_failures

This panel indicates failed alertmanager notifications over 1m by integration.

Managed by the Sourcegraph Distribution team.


Prometheus: Internals

prometheus: prometheus_config_status

This panel indicates prometheus configuration reload status.

A 1 indicates Prometheus reloaded its configuration successfully.

Managed by the Sourcegraph Distribution team.


prometheus: alertmanager_config_status

This panel indicates alertmanager configuration reload status.

A 1 indicates Alertmanager reloaded its configuration successfully.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_tsdb_op_failure

This panel indicates prometheus tsdb failures by operation over 1m by operation.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_target_sample_exceeded

This panel indicates prometheus scrapes that exceed the sample limit over 10m.

Managed by the Sourcegraph Distribution team.


prometheus: prometheus_target_sample_duplicate

This panel indicates prometheus scrapes rejected due to duplicate timestamps over 10m.

Managed by the Sourcegraph Distribution team.


Prometheus: Container monitoring (not available on server)

prometheus: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod prometheus (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p prometheus.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' prometheus (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the prometheus container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs prometheus (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Distribution team.


prometheus: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Distribution team.


prometheus: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Distribution team.


prometheus: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Prometheus: Provisioning indicators (not available on server)

prometheus: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Distribution team.


prometheus: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Distribution team.


prometheus: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Distribution team.


prometheus: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Distribution team.


Prometheus: Kubernetes monitoring (only available on Kubernetes)

prometheus: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Distribution team.


Executor Queue

Coordinates the executor work queues.

Executor Queue: Code intelligence queue

executor-queue: codeintel_queue_size

This panel indicates queue size.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: codeintel_queue_growth_rate

This panel indicates queue growth rate over 30m.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: codeintel_job_errors

This panel indicates job errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: codeintel_active_executors

This panel indicates active executors processing codeintel jobs.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: codeintel_active_jobs

This panel indicates active jobs.

Managed by the Sourcegraph Code-intelligence team.


Executor Queue: Stores and clients

executor-queue: codeintel_workerstore_99th_percentile_duration

This panel indicates 99th percentile successful worker store operation duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: codeintel_workerstore_errors

This panel indicates worker store errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Executor Queue: Internal service requests

executor-queue: frontend_internal_api_error_responses

This panel indicates frontend-internal API error responses every 5m by route.

Managed by the Sourcegraph Code-intelligence team.


Executor Queue: Database connections

executor-queue: max_open_conns

This panel indicates maximum open.

Managed by the Sourcegraph Core application team.


executor-queue: open_conns

This panel indicates established.

Managed by the Sourcegraph Core application team.


executor-queue: in_use

This panel indicates used.

Managed by the Sourcegraph Core application team.


executor-queue: idle

This panel indicates idle.

Managed by the Sourcegraph Core application team.


executor-queue: mean_blocked_seconds_per_conn_request

This panel indicates mean blocked seconds per conn request.

Managed by the Sourcegraph Core application team.


executor-queue: closed_max_idle

This panel indicates closed by SetMaxIdleConns.

Managed by the Sourcegraph Core application team.


executor-queue: closed_max_lifetime

This panel indicates closed by SetConnMaxLifetime.

Managed by the Sourcegraph Core application team.


executor-queue: closed_max_idle_time

This panel indicates closed by SetConnMaxIdleTime.

Managed by the Sourcegraph Core application team.


Executor Queue: Container monitoring (not available on server)

executor-queue: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod executor-queue (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p executor-queue.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' executor-queue (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the executor-queue container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs executor-queue (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


executor-queue: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Executor Queue: Provisioning indicators (not available on server)

executor-queue: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Executor Queue: Golang runtime monitoring

executor-queue: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


executor-queue: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Executor Queue: Kubernetes monitoring (only available on Kubernetes)

executor-queue: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Indexer

Executes jobs from the "codeintel" work queue.

Precise Code Intel Indexer: Executor

precise-code-intel-indexer: codeintel_job_99th_percentile_duration

This panel indicates 99th percentile successful job duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: codeintel_active_handlers

This panel indicates active handlers processing jobs.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: codeintel_job_errors

This panel indicates job errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Indexer: Stores and clients

precise-code-intel-indexer: executor_apiclient_99th_percentile_duration

This panel indicates 99th percentile successful API request duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: executor_apiclient_errors

This panel indicates aPI errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Indexer: Commands

precise-code-intel-indexer: executor_setup_command_99th_percentile_duration

This panel indicates 99th percentile successful setup command duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: executor_setup_command_errors

This panel indicates setup command errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: executor_exec_command_99th_percentile_duration

This panel indicates 99th percentile successful exec command duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: executor_exec_command_errors

This panel indicates exec command errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: executor_teardown_command_99th_percentile_duration

This panel indicates 99th percentile successful teardown command duration over 5m.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: executor_teardown_command_errors

This panel indicates teardown command errors every 5m.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Indexer: Container monitoring (not available on server)

precise-code-intel-indexer: container_missing

This panel indicates container missing.

This value is the number of times a container has not been seen for more than one minute. If you observe this value change independent of deployment events (such as an upgrade), it could indicate pods are being OOM killed or terminated for some other reasons.

  • Kubernetes:
    • Determine if the pod was OOM killed using kubectl describe pod precise-code-intel-indexer (look for OOMKilled: true) and, if so, consider increasing the memory limit in the relevant Deployment.yaml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using kubectl logs -p precise-code-intel-indexer.
  • Docker Compose:
    • Determine if the pod was OOM killed using docker inspect -f '{{json .State}}' precise-code-intel-indexer (look for "OOMKilled":true) and, if so, consider increasing the memory limit of the precise-code-intel-indexer container in docker-compose.yml.
    • Check the logs before the container restarted to see if there are panic: messages or similar using docker logs precise-code-intel-indexer (note this will include logs from the previous and currently running container).

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: container_cpu_usage

This panel indicates container cpu usage total (1m average) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: container_memory_usage

This panel indicates container memory usage by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: fs_io_operations

This panel indicates filesystem reads and writes rate by instance over 1h.

This value indicates the number of filesystem read and write operations by containers of this service. When extremely high, this can indicate a resource usage problem, or can cause problems with the service itself, especially if high values or spikes correlate with {{CONTAINER_NAME}} issues.

Managed by the Sourcegraph Core application team.


Precise Code Intel Indexer: Provisioning indicators (not available on server)

precise-code-intel-indexer: provisioning_container_cpu_usage_long_term

This panel indicates container cpu usage total (90th percentile over 1d) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: provisioning_container_memory_usage_long_term

This panel indicates container memory usage (1d maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: provisioning_container_cpu_usage_short_term

This panel indicates container cpu usage total (5m maximum) across all cores by instance.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: provisioning_container_memory_usage_short_term

This panel indicates container memory usage (5m maximum) by instance.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Indexer: Golang runtime monitoring

precise-code-intel-indexer: go_goroutines

This panel indicates maximum active goroutines.

A high value here indicates a possible goroutine leak.

Managed by the Sourcegraph Code-intelligence team.


precise-code-intel-indexer: go_gc_duration_seconds

This panel indicates maximum go garbage collection duration.

Managed by the Sourcegraph Code-intelligence team.


Precise Code Intel Indexer: Kubernetes monitoring (only available on Kubernetes)

precise-code-intel-indexer: pods_available_percentage

This panel indicates percentage pods available.

Managed by the Sourcegraph Code-intelligence team.