System metrics with Prometheus¶
Team Edition system performance can be monitored to understand system health, evaluate network traffic, and detect issues. Each of the Team Edition services expose a set of metrics that can be visualized using the built-in Prometheus expression browser. Metrics are provided in OpenMetrics (Prometheus) format.
To access your system metrics from the UI, click on the My account button in the top right, and then click Metrics. This will open a new browser tab with the root URL of the Team Edition installation appended with /prometheus/. For example, https://yourcompany.com/prometheus/.
Alternatively, you can add
/prometheus to the root URL of the Team Edition installation.
Using the expression browser¶
Prometheus uses a built-in expression browser for time series visualizations of system metrics.
Follow these steps to create visualizations from the expression browser:
Select a metric from the dropdown by clicking in the cell that reads insert metric at cursor.
Click on the Graph tab.
Select a time period. We recommend two weeks (“2w”).
Select an “until” date, that being the point in time up to which the selected metric will displayed on the graph. The default time is the current date and time.
A graph will be populated with the selected metric, and a console readout will appear beneath it.
You can isolate a single resource by clicking it in the legend below the graph.
While not the most exciting graph, the
up time metric tells you if your instance is indeed running.
process_open_fds counts the number of files in
This tells you how many regular files, sockets, pseudo terminals, etc. you currently have open.
/proc/<PID>/limits and uses the Soft Limit from the Max Open Files row.
/limits lists both soft and hard limits. The soft limit is the value the kernel enforces for the corresponding resource,
while the hard limit acts as the ceiling for the soft limit.
Setting a file limit alert¶
Using the two metrics above,
process_max_fds, you can quickly
write an alert to warn you when a process hits, say, 80% of the limit:
groups: - name: example rules: - alert: ProcessNearFDLimits expr: process_open_fds / process_max_fds > 0.8 for: 10m