What Is Amplify?

NGINX Amplify is a tool for comprehensive NGINX monitoring. With NGINX Amplify it’s easy to proactively analyze and fix problems related to running and scaling NGINX-based web applications.

You can use NGINX Amplify to do the following:

  • Visualize and identify NGINX performance bottlenecks, overloaded servers, or potential DDoS attacks
  • Improve and optimize NGINX performance with intelligent advice and recommendations
  • Get notified when something is wrong with the application infrastructure
  • Plan web application capacity and performance
  • Keep track of the systems running NGINX

Main Components

NGINX Amplify is a SaaS product, and it’s hosted on AWS public cloud. It includes the following key components:

  • NGINX Amplify Agent

The agent is a Python application that runs on monitored systems. All communications between the agent and the SaaS backend are done securely over SSL/TLS. All traffic is always initiated by the agent.

  • NGINX Amplify Web UI

The user interface compatible with all major browsers. The web interface is accessible only via TLS/SSL.

  • NGINX Amplify Backend (implemented as a SaaS)

The core system component, implemented as a SaaS. It encompasses scalable metrics collection infrastructure, a database, an analytics engine, and a core API.

How NGINX Amplify Agent Works

NGINX Amplify Agent is a compact application written in Python. Its role is to collect various metrics and metadata and send them securely to the backend for storage and visualization.

You will need to the Amplify Agent on all hosts that you have to monitor.

After proper installation, the agent will automatically start to report metrics, and you should see the real-time metrics data in the NGINX Amplify web interface in about 60 seconds or so.

NGINX Amplify can currently monitor and collect performance metrics for:

  1. Operating system
  2. NGINX and NGINX Plus
  3. PHP-FPM

The agent is currently officially packaged and supported for the following Linux flavors only:

  • Ubuntu 14.04, 16.04, 17.04
  • Debian 7, 8, 9
  • 6, 7
  • Red Hat 6, 7 (and systems based on it, e.g. Oracle )
  • Amazon Linux (latest release)

Other OS and distributions below are not fully supported yet (and no agent packages are available), however you can grab a specialized install script here and see if it works for you. Run install-source.sh (as root) instead of install.sh and follow the dialog. You can copy the API key from the Amplify UI (find it in the Settings or in the New System pop-up).

  • FreeBSD 10, 11
  • SLES 12
  • Alpine 3.3
  • Fedora 24, 26

The agent considers an NGINX instance to be any running NGINX master process that has a unique path to the binary, and possibly a unique configuration.

Note. There’s no need to manually add or configure anything in the web interface after installing the agent. When the agent is started, the metrics and the metadata are automatically reported to the Amplify backend, and visualized in the web interface.

When a system or an NGINX instance is removed from the infrastructure for whatever reason, and is no longer reporting (and therefore no longer necessary), you should manually delete it in the web interface. The “Remove object” button can be found in the metadata viewer popup

Metadata and Metrics Collection

NGINX Amplify Agent collects the following types of data:

  • NGINX metrics. The agent collects a lot of NGINX related metrics from stub_status, the NGINX Plus extended status, the NGINX log files, and from the NGINX process state.
  • System metrics. These are various key metrics describing the system, e.g. CPU usage, memory usage, network traffic, etc.
  • PHP-FPM metrics. The agent can obtain metrics from the PHP-FPM pool status, if it detects a running PHP-FPM master process.
  • NGINX metadata. This is what describes your NGINX instances, and it includes package data, build information, the path to the binary, build configuration options, etc. NGINX metadata also includes the NGINX configuration elements.
  • System metadata. This is the basic information about the OS environment where the agent runs. This could be the hostname, uptime, OS flavor, and other data.

The agent will mostly use Python’s psutil() to collect the metrics, but occasionally it may also invoke certain system utilities like ps(1).

While the agent is running on the host, it collects metrics at regular 20 second intervals. Metrics then get downsampled and sent to the Amplify backend once a minute.

Metadata is also reported every minute. Changes in the metadata can be examined through the Amplify web interface.

NGINX config updates are reported only when a configuration change is detected.

If the agent is not able to reach the Amplify backend to send the accumulated metrics, it will continue to collect metrics, and will send them over to Amplify as soon as connectivity is re-established. The maximum amount of data that could be buffered by the agent is about 2 hour’s worth.

Detecting and Monitoring NGINX Instances

NGINX Amplify Agent is capable of detecting several types of NGINX instances:

  • Installed from a repository package
  • Built and installed manually

A separate instance of NGINX as seen by the agent would be the following:

  • A unique master process and its workers, started with an absolute path to a distinct NGINX binary
  • A master process running with a default config path, or with a custom path set in the command-line parameters

Note. The agent will try to detect and monitor all unique NGINX instances currently running on a host. Separate sets of metrics and metadata are collected for each unique NGINX instance.

Configuring NGINX for Metric Collection

In order to monitor an NGINX instance, the agent should be able to find the relevant NGINX master process first, and determine its key characteristics.

Metrics from stub_status

You need to define stub_status in your NGINX configuration for key NGINX graphs to appear in the web interface. If stub_status is already enabled, the agent should be able to locate it automatically.

If you’re using NGINX Plus, then you need to configure either the stub_status module, or the NGINX Plus extended statusmonitoring.

Without stub_status or the NGINX Plus extended status, the agent will NOT be able to collect key NGINX metrics required for further monitoring and analysis.

Add the stub_status configuration as follows. You may also grab this config snippet here:


# cd /etc/nginx

# grep -i include\.*conf nginx.conf
    include /etc/nginx/conf.d/*.conf;

# cat > conf.d/stub_status.conf
server {
    listen 127.0.0.1:80;
    server_name 127.0.0.1;
    location /nginx_status {
        stub_status on;
        allow 127.0.0.1;
        deny all;
    }
}
<Ctrl-D>

# ls -la conf.d/stub_status.conf
-rw-r--r-- 1 root root 162 Nov  4 02:40 conf.d/stub_status.conf

# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

# kill -HUP `cat /var/run/nginx.pid`

Don’t forget to test your nginx configuration after you’ve added the stub_status section above. Make sure, there’s no ambiguity with either listen or server_name configuration. The agent should be able to clearly identify the stub_status URL and will default to use 127.0.0.1 if the configuration is incomplete.

Note. If you use conf.d directory to keep common parts of your NGINX configuration that are then automatically included in the server sections across your NGINX config, do not use the snippet above. Instead you should configure stub_statusmanually within an appropriate location or server block.

Note. There’s no need to use exactly the above example nginx_status URI for stub_status. The agent will determine the correct URI automatically upon parsing your NGINX configuration. Please make sure that the directory and the actual configuration file with stub_status are readable by the agent, otherwise the agent won’t be able to correctly determine the stub_status URL. If the agent fails to find stub_status, please refer to the workaround described here.

For more information about stub_status, please refer to the NGINX documentation here.

Please make sure the stub_status ACL is correctly configured, especially if your system is IPv6-enabled. Test the reachability of stub_status metrics with wget(1) or curl(1). When testing, use the exact URL matching your NGINX configuration.

If everything is configured properly, you should see something along these lines when testing it with curl(1):

$ curl http://127.0.0.1/nginx_status
Active connections: 2
server accepts handled requests
 344014 344014 661581
Reading: 0 Writing: 1 Waiting: 1

If the above doesn’t work, make sure to check where the requests to /nginx_status are being routed. In many cases other server blocks can be the reason you (and the agent) can’t access stub_status.

The agent uses data from stub_status to calculate metrics related to server-wide HTTP connections and requests as described below:

nginx.http.conn.accepted = stub_status.accepts
nginx.http.conn.active = stub_status.active - stub_status.waiting
nginx.http.conn.current = stub_status.active
nginx.http.conn.dropped = stub_status.accepts - stub_status.handled
nginx.http.conn.idle = stub_status.waiting
nginx.http.request.count = stub_status.requests
nginx.http.request.current = stub_status.reading + stub_status.writing
nginx.http.request.reading = stub_status.reading
nginx.http.request.writing = stub_status.writing

For NGINX Plus the agent will automatically use similar metrics available from the extended status output.

For more information about the metric list, please refer to Metrics and Metadata.

Metrics from access.log and error.log

NGINX Amplify Agent will also collect more NGINX metrics from the access.log and the error.log files. In order to do that, the agent should be able to read the logs. Make sure that either the nginx user or the user defined in the NGINX config (such as www-data) can read the log files. Please also make sure that the log files are being written normally.

You don’t have to specifically point the agent to either the NGINX configuration or the NGINX log files — it should detect their location automatically.

The agent will also try to detect the log format for a particular log, in order to be able to parse it properly and possibly extract even more useful metrics, e.g. $upstream_response_time.

Note. A number of metrics outlined in Metrics and Metadata will only be available if the corresponding variables are included in a custom access.log format used for logging requests. You can find a complete list of NGINX log variables here.

Using Syslog for Metric Collection

If you configured the agent for syslog metric collection (see below), make sure to add the following settings to the NGINX configuration:

  1. Check that you are using NGINX version 1.9.5 or newer (or NGINX Plus Release 8 or newer).
  2. Edit the NGINX configuration file and specify the syslog listener address as the first parameter to the access.log directive. Include the amplify tag, and your preferred log format:
access_log syslog:server=127.0.0.1:12000,tag=amplify,severity=info main_ext;

(see also how to extend the NGINX log format to collect additional metrics)

  1. Reload NGINX:
# nginx -s reload

(or service nginx reload)

Note: To send the NGINX logs to both the existing logging facility and the Amplify Agent, include a separate access.logdirective for each destination.

What to Check if the Agent Isn’t Reporting Metrics

After you install and start the agent, normally it should just start reporting right away, pushing aggregated data to the Amplify backend at regular 1 minute intervals. It’ll take about a minute for a new system to appear in the Amplify web interface.

If you don’t see the new system or NGINX in the web interface, or (some) metrics aren’t being collected, please check the following:

  1. The Amplify Agent package has been successfully installed, and no warnings were seen upon the installation.
  2. The amplify-agent process is running and updating its log file.
  3. The agent is running under the same user as your NGINX worker processes.
  4. The NGINX is started with an absolute path. Currently the agent can’t detect NGINX instances launched with a relative path (e.g. “./nginx”).
  5. The user ID that is used by the agent and the NGINX , can run ps(1) to see all system processes. If ps(1) is restricted for non-privileged users, the agent won’t be able to find and properly detect the NGINX master process.
  6. The time is set correctly. If the time on the system where the agent runs is ahead or behind the world’s clock, you won’t be able to see the graphs.
  7. stub_status is properly configured, and the stub_status module is included in the NGINX build (this can be checked with nginx -V).
  8. NGINX access.log and error.log files are readable by the user nginx (or by the user set in NGINX config).
  9. All NGINX configuration files are readable by the agent user ID (check owner, group and permissions).
  10. Extra configuration steps have been performed as required for the additional metrics to be collected.
  11. The system resolver is correctly configured, and receiver.amplify.nginx.com can be successfully resolved.
  12. Oubound TLS/SSL from the system to receiver.amplify.nginx.com is not restricted. This can be checked with curl(1)Configure a proxy server for the agent if required.
  13. (8)apparmor(7) or grsecurity are not interfering with the metric collection. E.g. for selinux(8) check /etc/selinux/config, try setenforce 0 temporarily and see if it improves the situation for certain metrics.
  14. Some VPS providers use hardened Linux kernels that may restrict non-root users from accessing /proc and /sys. Metrics describing system and NGINX disk I/O are usually affected. There is no an easy workaround for this except for allowing the agent to run as root. Sometimes fixing permissions for /proc and /sys/block may work.

NGINX Configuration Analysis

NGINX Amplify Agent is able to automatically find all relevant NGINX configuration files, parse them, extract their logical structure, and send the associated JSON data to the Amplify backend for further analysis and reporting. For more information on configuration analysis, please see the Analyzer section below.

After the agent finds a particular NGINX configuration, it then automatically starts to keep track of its changes. When a change is detected with NGINX — e.g. a master process restarts, or the NGINX config is edited, an update is sent to the Amplify backend.

Note. The agent DOES NOT ever send the raw unprocessed config files to the backend system. In addition, the following directives in the NGINX configuration are NOT analyzed — and their parameters ARE NOT exported to the SaaS backend:ssl_certificate_keyssl_client_certificatessl_password_filessl_stapling_filessl_trusted_certificateauth_basic_user_filesecure_link_secret.

Source Code for NGINX Amplify Agent

NGINX Amplify Agent is an open source application. It is licensed under the 2-clause BSD license, and is available here:

Installing and Managing NGINX Amplify Agent

Installing the Agent

In order to be able to use NGINX Amplify to monitor your infrastructure, you need to install NGINX Amplify Agent on each system that has to be checked.

Note. The agent will drop root privileges on startup. It will then use the user ID of the user nginx to set its effective user ID. The package install procedure will add the nginx user automatically unless it’s already found in the system. If the userdirective appears in the NGINX configuration, the agent will pick up the user specified in the NGINX config for its effective user ID (e.g. www-data).

Using the Install Script

The installation procedure can be as simple as this.

  1. Download and run the install script.
# curl -sS -L -O \
https://github.com/nginxinc/nginx-amplify-agent/raw/master/packages/install.sh && \
API_KEY='ffeedd0102030405060708' sh ./install.sh

where API_KEY is a unique API key assigned to your Amplify account. You will see your API key when adding a new system in the Amplify web interface. You can also find the API key in the Account menu.

  1. Verify that the agent has started.
# ps ax | grep -i 'amplify\-'
2552 ?        S      0:00 amplify-agent

Installing the Agent Manually

Installing on Ubuntu or Debian
  1. Add the NGINX public key.
# curl -fs http://nginx.org/keys/nginx_signing.key | apt-key add -

or

# wget -q -O - \
http://nginx.org/keys/nginx_signing.key | apt-key add -
  1. Configure the repository as follows.
# codename=`lsb_release -cs` && \
os=`lsb_release -is | tr '[:upper:]' '[:lower:]'` && \
echo "deb http://packages.amplify.nginx.com/${os}/ ${codename} amplify-agent" > \
/etc/apt/sources.list.d/nginx-amplify.list
  1. Verify the repository config file (Ubuntu 14.04 example follows).
# cat /etc/apt/sources.list.d/nginx-amplify.list
deb http://packages.amplify.nginx.com/ubuntu/ trusty amplify-agent
  1. Update the package index files.
# apt-get update
  1. Install and run the agent.
# apt-get install nginx-amplify-agent
Installing on CentOS, Red Hat Linux, or Amazon Linux
  1. Add the NGINX public key.
# curl -sS -L -O http://nginx.org/keys/nginx_signing.key && \
 --import nginx_signing.key

or

# wget -q -O nginx_signing.key http://nginx.org/keys/nginx_signing.key && \
rpm --import nginx_signing.key
  1. Create the repository config as follows (mind the correct release number).

Use the first snippet below for CentOS and Red Hat Linux. The second one applies to Amazon Linux.

# release="7" && \
printf "[nginx-amplify]\nname=nginx amplify repo\nbaseurl=http://packages.amplify.nginx.com/centos/${release}/\$basearch\ngpgcheck=1\nenabled=1\n" > \
/etc/.repos.d/nginx-amplify.repo
# release="latest" && \
printf "[nginx-amplify]\nname=nginx amplify repo\nbaseurl=http://packages.amplify.nginx.com/amzn/${release}/\$basearch\ngpgcheck=1\nenabled=1\n" > \
/etc/yum.repos.d/nginx-amplify.repo
  1. Verify the repository config file (RHEL 7.1 example follows).
# cat /etc/yum.repos.d/nginx-amplify.repo
[nginx-amplify]
name=nginx repo
baseurl=http://packages.amplify.nginx.com/centos/7/$basearch
gpgcheck=1
enabled=1
  1. Update the package metadata.
# yum makecache
  1. Install and run the agent.
# yum install nginx-amplify-agent
Creating the Config File from a Template
# api_key="ffeedd0102030405060708" && \
sed "s/api_key.*$/api_key = ${api_key}/" \
/etc/amplify-agent/agent.conf.default > \
/etc/amplify-agent/agent.conf

API_KEY is a unique API key assigned to your Amplify account. You will see your API key when adding a new system in the Amplify web interface. You can also find the API key in the Account menu.

Starting and Stopping the Agent
# service amplify-agent start
# service amplify-agent stop
# service amplify-agent restart
Verifying that the Agent Has Started
# ps ax | grep -i 'amplify\-'
2552 ?        S      0:00 amplify-agent

Updating the Agent

It is highly recommended that you periodically check for updates and install the latest stable version of the agent.

  1. On Ubuntu/Debian use:
# apt-get update && \
apt-get install nginx-amplify-agent
  1. On CentOS/Red Hat use:
# yum makecache && \
yum update nginx-amplify-agent

Using the Agent with Docker

You can use NGINX Amplify Agent in a Docker environment. Although it’s still work-in-progress, the agent can collect most of the metrics, and send them over to the Amplify backend in either “standalone” or “aggregate” mode. The standalone mode of operation is the simplest one, where there’s a separate “host” created for each Docker container. Alternatively the metrics from the agents running in different containers can be aggregated on a “per-image” basis — this is the aggregate mode of deploying the Amplify Agent with Docker.

For more information, please refer to our Amplify Dockerfile repository.

Configuring the Agent

NGINX Amplify Agent keeps its configuration in /etc/amplify-agent/agent.conf. The agent configuration is a text-based file.

Overriding the Effective User ID

NGINX Amplify Agent will drop root privileges on startup. By default it will then use the user ID of the user nginx to set its effective user ID. The package install procedure will add the nginx user automatically unless it’s already found in the system. If the user directive appears in the NGINX configuration, the agent will pick up the user specified in the NGINX config for its effective user ID (e.g. www-data).

It is really important for the agent and the running NGINX instances to use the same user ID, so that the agent is able to properly collect all NGINX metrics.

In case you’d like to manually specify the user ID that the agent should use for its effective user ID, there’s a specialized section in /etc/amplify-agent/agent.conf for that:

[nginx]
user =
configfile = /etc/nginx/nginx.conf

There’s an option here to explicitly set the real user ID which the agent should pick for its effective user ID. If the userdirective has a non-empty parameter, the agent startup script will use it to look up the real user ID.

In addition, there’s another option to explicitly tell the agent where it should look for an NGINX configuration file suitable for detecting the real user ID. It’s /etc/nginx/nginx.conf by default.

Changing the API Key

When you first install the agent using the procedure above, your API key is written to the agent.conf file automatically. If you ever need to change the API key, please edit the following section in agent.conf accordingly:

[credentials]
api_key = ffeedd0102030405060708

Changing the Hostname and UUID

In order to create unique objects for monitoring, the agent must be able to extract a valid hostname from the system. The hostname is also utilized as one of the components for generating a unique identifier. Essentially, the hostname and the UUID unambiguously identify a particular instance of the agent to the Amplify backend. If the hostname or the UUID are changed, the agent and the backend will register a new object for monitoring.

When first generated, the uuid is written to agent.conf. Typically this happens automatically when the agent starts and successfully detects the hostname for the first time. Normally you SHOULD NOT change the UUID in agent.conf.

The agent will try its best to determine the correct hostname. If it fails to determine the hostname, you can set the hostname manually in the agent.conf file. Check for the following section, and put the desired hostname in here:

[credentials]
..
hostname = myhostname1

The hostname should be something real. The agent won’t start unless a valid hostname is defined. The following aren’t valid hostnames:

  • localhost
  • localhost.localdomain
  • localhost6.localdomain6
  • ip6-localhost

Note. You can also use the above method to replace the system’s hostname with an arbitrary alias. Keep in mind that if you redefine the hostname for a live object, the existing object will be marked as failed in the web interface. Redefining the hostname in the agent’s configuration essentially creates a new UUID, and a new system for monitoring.

Alternatively you can define an “alias” for the host in the UI (see the Graphs section below).

Configuring the URL for stub_status or Extended Status

When the agent finds a running NGINX instance, it automatically detects the stub_status or the NGINX Plus extended statuslocations from the NGINX configuration.

To override the stub_status URI/URL, use the stub_status configuration option.

[nginx]
..
stub_status = http://127.0.0.1/nginx_status

To override the extended status URI/URL, use the plus_status option.

[nginx]
..
plus_status = /status

Note. If only the URI part is specified with the options above, the agent will use http://127.0.0.1 to construct the full URL to access either the stub_status or the NGINX Plus extended status metrics.

Configuring the Path to the NGINX Configuration File

The agent detects the NGINX configuration file automatically. You DO NOT need to explicitly point the agent to the nginx.conf.

If for some reason the agent is not able to find the NGINX configuration, use the following option in /etc/amplify-agent/agent.conf:

[nginx]
configfile = /etc/nginx/nginx.conf

Note. It is better to avoid using this option and only add it as a workaround. Please take some time to fill out a support ticket in case you had to manually add the path to the NGINX config file. (this would be really much appreciated!)

Configuring Host Tags

You can define arbitrary tags on a “per-host” basis. Tags can be configured in the UI (see the Graphs section below), or set in the /etc/amplify-agent.conf file:

[tags]
tags = foo,bar,foo:bar

You can use tags to build custom graphs, configure alerts, and filter the systems on the Graphs page.

Configuring Syslog

The agent can collect the NGINX log files via syslog. This could be useful when you don’t keep the NGINX logs on disk, or when monitoring a container environment such as Docker with NGINX Amplify.

To configure the agent for syslog, add the following to the /etc/amplify-agent/agent.conf:

[listeners]
keys = syslog-default

[listener_syslog-default]
address = 127.0.0.1:12000

Restart the agent to have it reload the configuration and start listening on the specified IP address and port:

# service amplify-agent restart

Make sure to add the syslog settings to your NGINX configuration as well.

Excluding Certain NGINX Log Files

By default the agent will try to find and watch all access.log files described in the NGINX configuration. If there are multiple log files where the same request is logged, the metrics may get counted more than once.

To exclude specific NGINX log files from the metric collection, add something along these lines to /etc/amplify-agent/agent.conf:

[nginx]
exclude_logs=/var/log/nginx/app1/*,access-app1-*.log,sender1-*.log

Setting Up a

If your system is in a DMZ environment without direct access to the Internet, the only way for the agent to report collected metrics to Amplify would be through a proxy.

The agent obeys the usual environment variables that are common on Linux systems (e.g. https_proxy or HTTP_PROXY). However, you can also define HTTPS proxy manually in agent.conf. This could be done as follows:

[proxies]
https = https://10.20.30.40:3030
..

Agent Logfile

The agent maintains its log file in /var/log/amplify-agent/agent.log

Upon installation, the agent’s log rotation schedule is added to /etc/logrotate.d/amplify-agent

The normal level of logging for the agent is INFO. If you ever need to debug the agent, change the level to DEBUG as follows. Bear in mind, the size of the agent’s log file can grow really fast with DEBUG. After you change the log level, please restart the agent.


[logger_agent-default]
level = DEBUG
..

[handler_agent-default]
class = logging.handlers.WatchedFileHandler
level = DEBUG
..

Uninstalling the Agent

To completely delete a previously monitored object, perform the following steps:

  1. Uninstall the agent

On Ubuntu/Debian use:

apt-get remove nginx-amplify-agent

On CentOS and Red Hat use:

yum remove nginx-amplify-agent
  1. Delete objects from the web interface

To delete a system using the web interface — find it in the Inventory, and choose the [i] icon. You can delete objects from the popup window that appears next.

Bear in mind — deleting objects in the UI will not stop the agent. To completely remove a system from monitoring, stop and/or uninstall the agent first, and then clean it up in the web interface. Don’t forget to also clean up any alert rules.

  1. Delete alerts

Check the Alerts page and remove/mute the irrelevant rules.

User Interface

Overview

The Overview page is designed to provide a quick summary about the state of your NGINX infrastructure. Here you can quickly check what is the total sum of HTTP 5xx errors over the past 24 hours, and compare it to the previous 24 hours.

Five key overlay graphs are displayed for the selected time period. By switching over various time periods you can compare trends and see if anything abnormal shows up.

The cumulative metrics displayed on the Overview page are:

  • Total requests — sum of nginx.http.request.count
  • HTTP 5xx errors — sum of nginx.http.status.5xx
  • Request time (P95) — average of nginx.http.request.time.pctl95
  • Traffic — sum of system.net.bytes_sent rate
  • CPU Usage — average of system.cpu.user

By default the metrics above are calculated for all monitored hosts. You can configure specific tags in the Overview settings popup to display the metrics for a set of hosts (e.g. only the “production environment”). You may see zero numbers if some metrics are not being gathered. E.g. if the request time (P95) is 0.000s, please check that you have properly configured NGINX log for additional metric collection.

Application Health Score

The upper left block displays a total score that reflects your web app performance. It’s called Application Health Score (AHS).

The Application Health Score (AHS) is an Apdex-like numerical measure that can be used to estimate the quality of experience for your web application.

AHS is a product of 3 derivative service level indicators (SLI) — percentage of successful requests, percentage of “timely” requests, and agent availability. The “timely” requests are those with the total observed average request time P95 either below the low threshold (100% satisfying) or between the low and high threshold (partially satisfying).

A simplified formula for AHS is the following:

AHS = (Successful Requests %) * (Timely Requests %) * (Agent Availability %)

Each individual SLI in this formula can be turned on or off. By default only the percentage of successful requests is on.

There are T1 and T2 thresholds for the total observed average request time P95, that you can configure for AHS:

  • T1 is the low threshold for satisfying requests
  • T2 is the high threshold for partially satisfying requests

If the average request time (P95) for the selected time period is below T1, this is considered 100% satisfying state of requests. If the request time is above T1 and below T2, a “satisfaction ratio” is calculated accordingly. Requests above T2 are considered totally unsatisfying. E.g. with T1=0.2s and T2=1s, a request time greater than 1s would be considered unsatisfying, and the resulting score would be 0%.

The detailed algorithm for the AHS is the following:

successful_req_pct = (nginx.http.request.count - nginx.http.status.5xx) / nginx.http.request.count

if (nginx.http.request.time.pctl95 < T1)
   timely_req_pct = 1
else
   if (nginx.http.request.time.pctl95 < T2)
       timely_req_pct = 1 - (nginx.http.request.time.pctl95 - T1) / (T2 - T1)
   else
       timely_req_pct = 0

m1 = successful_req_pct
m2 = timely_req_pct
m3 = agent_up_pct

app_health_score = m1 * m2 * m3

Graphs

When you log in to Amplify, you’re presented with a collection of predefined graphs on the Graphs page. Here you can see an overview of the key metric stats, such as CPU, memory, and disk usage for all of your systems.

If you click on a system on the left, the graphs will change to reflect the metrics for the selected system. The graphs are further split into tabs such as “System”, “NGINX” and so on.

Some graphs have an additional selector. E.g., with “Disk Latency” or “Network Traffic” you can select what device or interface you’re analyzing.

Above the graphs, you will find the following:

  • Hostname or alias for the selected system
  • System properties editor where you can set up an alias for the host, and/or assign host tags
  • List of tags assigned to the system
  • Time range selector, which helps to display different time periods for the graphs

You can also copy a predefined graph to a custom dashboard by focusing on the graph and clicking on the arrow in the top right corner.

Check the Metrics and Metadata section below to learn more about the displayed metrics.

Inventory

From the top menu bar, you can always open the inventory of the systems that are being monitored. When the agent is properly installed on a new system and reporting, it’s automatically visible in the system index on the left and in the Inventory.

The Inventory allows you to check the status of all systems at a glance. It also provides a quick overview of the key metrics.

In the rightmost column of the Inventory you will also find the settings and the metadata viewer icons. Click on the [i] icon and the popup will appear with various useful information about the OS and the monitored NGINX instances. If you need to remove an object from the monitoring, it’s in the metadata viewer popup where you can find the “Remove object” buttons. Removing the OS object will delete the NGINX objects too.

You can apply sorting, search, and filters to the Inventory to quickly find the system in question. You can search and filter by hostname, IP address, architecture etc. You can use regular expressions with the search function.

Note. Bear in mind, that you’d also need to stop or uninstall the agent on the systems being removed from the monitoring, otherwise the objects will reappear in the UI. Be sure to delete any system specific alert rules too.

Dashboards

You can create your own dashboards populated with highly customizable graphs of NGINX and system-level metrics.

Some of the use cases for a custom set of graphs are the following:

  • Checking NGINX performance for a particular application or microservice, e.g. based on the URI path
  • Displaying metrics per virtual server
  • Visualizing the performance of a group of NGINX servers — for example, front-end load balancers, or an NGINX edge caching layer
  • Analyzing a detailed breakdown of HTTP status codes per application

When building a custom graph, metrics can be summed or averaged across several NGINX servers. By using metric filters it is also possible to create additional “metric dimensions” — for example, reporting the number of POST requests for a specific URI.

To create a custom dashboard, click CREATE DASHBOARD on the Dashboards drop-down menu. Then click New Graph in the upper right corner to start adding graphs to the dashboard.

When adding or editing a graph, the following dialog appears:

To define a graph, perform these steps:

  1. Enter the graph title.
  2. Pick one or more metrics. You can combine multiple metrics on the same graph using the “Add another metric” button.
  3. After the metric is selected, you are able to see the systems for which the metric has been observed. Select one or multiple systems here. You can also use tags to specify the systems.
  4. When aggregating across multiple systems, select either “Sum” or “Avg” as the aggregation function.
  5. Last but not least, the “filter” functionality is also available for NGINX metrics collected from the log files. If you click on “Add metric filter”, you can then add multiple criteria in order to define specific “metric dimensions”. In the example above, we are matching the NGINX upstream response time against the /api/feed/reports URI. You can also build other filters, e.g. displaying metric nginx.http.status.2xx for the responses with the status code 201.
  6. Click “Save” when you’re done, and the graph is added to the dashboard. You can also edit the graph later on if needed, move it around, resize, stack the graphs on top of each other, etc.

Note. When using filters, all the “metric dimensions” aren’t stored in the NGINX Amplify backend by default. A particular filter starts to slice the metric according to the specification only after the graph is created. Hence, it can be a while before the “filtered” metric is displayed on the graph — the end result depends on how quickly the log files are being populated with the new entries, but typically you should see the first data points in under 5 minutes.

Because NGINX Amplify is not a SaaS log analyzer, the additional slicing for “metric dimensions” is implemented inside the agent. The agent can parse the NGINX access logs on-the-fly and extract all the necessary metrics without sending the raw log entries elsewhere. Moreover, the agent understands custom log formats automatically, and will start looking for various newly defined “metric dimensions” following a particular log_format specification.

Essentially, the agent performs a combination of real-time log analytics and standard metrics collection (e.g. metrics from the stub_status module). The agent does only the real-time log processing, and always on the same host where it is running.

Metric filters can be really powerful. By using the filters and creating additional “metric dimensions”, it is possible to build highly granular and very informative graphs. To enable the agent to slice the metrics you must add the corresponding log variables to the active NGINX log format. Please see the Additional NGINX metrics section below.

Metric filters are available only for the metrics generated from the log files. For other metrics some additional modifiers can be set when editing a graph. E.g. for NGINX Plus it is possible to specify the extended status zones to build more detailed visualizations.

When editing a custom dashboard, you can also use additional features like “Clone” or “New Set” to streamline the worklow. The “New Set” function in particular can be very helpful to quickly create various metric visualizations for NGINX or the operating system.

Analyzer

NGINX Amplify Agent parses NGINX configuration files and transmits them to the backend component for further analysis. This is where Amplify offers configuration recommendations to help improve the performance, reliability, and security of your applications. With well-thought-out and detailed recommendations you’ll know exactly where the problem is, why it is a problem, and how to fix it.

When you switch to the Analyzer page, click on a particular system on the left in order to see the associated report. Unless an NGINX instance is found on a system, there will be no report for it.

The following information is provided when a report is run against an NGINX config structure:

  • Version information
    • Branch, release date, and the latest version in the branch
  • Overview
    • Path to NGINX config files(s)
    • Whether the parser failed or not, and the results of nginx -t
    • Last-modified info
    • 3rd party modules found
    • Breakdown of the key configuration elements (servers, locations, upstreams)
    • Breakdown of IPv4/IPv6 usage
  • Security
    • Any security advisories that apply to this version of NGINX
  • Virtual servers
    • Breakdown of the virtual host configuration (think “apachectl -S”)
  • SSL
    • OpenSSL version information
    • Breakdown of the number of SSL or HTTP/2 servers configured
    • Information about the configured SSL certificates
    • Warnings about common SSL configuration errors
  • Static analysis
    • Various suggestions about configuration structure
    • Typical configuration gotchas highlighted
    • Common advice about proxy configurations
    • Suggestions about simplifying rewrites for certain use cases
    • Key security measures (e.g. stub_status is unprotected)
    • Typical errors in configuring locations, especially with regex

To parse SSL certificate metadata the Amplify Agent uses standard openssl(1) functions. SSL certificates are parsed and analyzed only when the corresponding settings are turned on. SSL certificate analysis is off by default.

Static analysis will only include information about specific issues with the NGINX configuration if those are found in your NGINX setup.

In the future, the Analyzer page will also include dynamic analysis, effectively linking the observed NGINX behavior to its configuration — e.g. when it makes sense to increase or decrease certain parameters like proxy_buffers etc. Stay tuned!

Note. Config analysis is on by default. If you don’t want your NGINX configuration to be checked, unset the corresponding setting in either Global, or Local (per-system) settings. See Settings below.

Alerts

The Alerts page describes the configuration of the alert rules used to notify you of any anomalies in the behavior of your systems.

Alerts are based on setting a rule to monitor a particular metric. Alert rules allow the user to specify the metric, the trigger condition, the threshold, and the email for notifications.

The way alert rules work is the following:

  1. Incoming metric updates are being continuously monitored against the set of rules.
  2. If there’s a rule for a metric, the new metric update is checked against the threshold.
  3. If the threshold is met, an alert notification is generated, and the rule will continue to be monitored.
  4. If subsequent metric updates show that the metric no longer violates the threshold for the configured period, the alert is cleared.

By default there’s no filtering by host. If a specific alert should only be raised for a particular system, you should specify the hostname(s) or tags when configuring the alert. Currently metrics can’t be aggregated across all systems; instead any system will match a particular rule unless a host is specified.

There’s one special rule which is the about amplify.agent.status metric. This metric reflects the state of the agent (and hence, the state of the system as seen by Amplify). You can only configure a 2 minute interval and only 0 (zero) as the threshold for amplify.agent.status.

You shouldn’t see consecutive notifications about the same alert over and over again. Instead there will be digest information sent out every 30 minutes, describing which alerts were generated and which ones were cleared.

Note. Gauges are averaged over the interval configured in the rule. Counters are summed up. Currently that’s not user configurable and these are the only reduce functions available for configuring metric thresholds.

Note. Emails are sent using AWS SES. Make sure your mail relay accepts their traffic. Also make sure to verify the specified email and check the verification status in the Account menu.

Account Settings

The Account option in the “hamburger” menu at the top right corner of the web interface contains various important settings.

First of all, you can always check the information you provided upon signing up, and edit specific fields.

You can also see the current limits such as “maximum number of agents”, “maximum number of custom dashboards”, etc.

The global settings section is used to enable or account-wide behavior for:

  • NGINX configuration files analysis
  • Periodic NGINX configuration syntax checking with “nginx -t”
  • Analyzing SSL certs

Per-system settings are accessible via the “Settings” icon that can be found for a particular system in the Inventory.

Per-system settings override the global settings. If you generally prefer to monitor your NGINX configurations on all but some specific systems, you can uncheck the corresponding settings in the per-system settings menu.

In the Emails section you will find the information about the emails currently registered with your account, and whether they are verified or not. The alert notifications are only sent to verified emails.

Last but not least, inside the Users section you will see the list of the user logins that are associated with this particular account. If you are the admin user, you can also invite your team members to the account.

Metrics and Metadata

Most metrics are collected by the agent without requiring the user to perform any additional setup. For troubleshooting, see What to Check if the Agent Isn’t Reporting Metrics.

Some additional metrics for NGINX monitoring will only be reported if the NGINX configuration file is modified accordingly. See Additional NGINX Metrics below, and pay attention to the Source and Variable fields in the metric descriptions that follow.

OS Metrics

  • amplify.agent.status
Type:        internal, integer
Description: 1 - agent is up, 0 - agent is down.
  • amplify.agent.cpu.system
  • amplify.agent.cpu.user
Type:        gauge, percent
Description: CPU utilization percentage observed from the agent process.
  • amplify.agent.mem.rss
  • amplify.agent.mem.vms
Type:        gauge, bytes
Description: Memory utilized by the agent process.
  • system.cpu.idle
  • system.cpu.iowait
  • system.cpu.system
  • system.cpu.user
Type:        gauge, percent
Description: System CPU utilization.
  • system.cpu.stolen
Type:        gauge, percent
Description: System CPU stolen. Represents time when the real CPU was not available to
             the current VM.
  • system.disk.free
  • system.disk.total
  • system.disk.used
Type:        gauge, bytes
Description: System disk usage statistics.
  • system.disk.in_use
Type:        gauge, percent
Description: System disk usage statistics, percentage.
  • system.io.iops_r
  • system.io.iops_w
Type:        counter, integer
Description: Number of reads or writes per sampling window.
  • system.io.kbs_r
  • system.io.kbs_w
Type:        counter, kilobytes
Description: Number of kilobytes read or written.
  • system.io.wait_r
  • system.io.wait_w
Type:        gauge, milliseconds
Description: Time spent reading from or writing to disk.
  • system.load.1
  • system.load.5
  • system.load.15
Type:        gauge, float
Description: Number of processes in the system run queue, averaged over the last 1, 5,
             and 15 min.
  • system.mem.available
  • system.mem.buffered
  • system.mem.cached
  • system.mem.free
  • system.mem.shared
  • system.mem.total
  • system.mem.used
Type:        gauge, bytes
Description: Statistics about system memory usage.
  • system.mem.pct_used
Type:        gauge, percent
Description: Statistics about system memory usage, percentage.
  • system.net.bytes_rcvd
  • system.net.bytes_sent
Type:        counter, bytes
Description: Network I/O statistics. Number of bytes received or sent, per network
             interface.
  • system.net.drops_in.count
  • system.net.drops_out.count
Type:        counter, integer
Description: Network I/O statistics. Total number of inbound or outbound packets
             dropped, per network interface.
  • system.net.packets_in.count
  • system.net.packets_out.count
Type:        counter, integer
Description: Network I/O statistics. Number of packets received or sent, per network
             interface.
  • system.net.packets_in.error
  • system.net.packets_out.error
Type:        counter, integer
Description: Network I/O statistics. Total number of errors while receiving or sending,
             per network interface.
  • system.net.listen_overflows
Type:        counter, integer
Description: Number of times the listen queue of a socket overflowed.
  • system.swap.free
  • system.swap.total
  • system.swap.used
Type:        gauge, bytes
Description: System swap memory statistics.
  • system.swap.pct_free
Type:        gauge, percent
Description: System swap memory statistics, percentage.

NGINX Metrics

HTTP Connections and Requests

  • nginx.http.conn.accepted
  • nginx.http.conn.dropped
Type:        counter, integer
Description: NGINX-wide statistics describing HTTP connections.
Source:      stub_status (or N+ extended status)
  • nginx.http.conn.active
  • nginx.http.conn.current
  • nginx.http.conn.idle
Type:        gauge, integer
Description: NGINX-wide statistics describing HTTP connections.
Source:      stub_status (or N+ extended status)
  • nginx.http.request.count
Type:        counter, integer
Description: Total number of client requests.
Source:      stub_status (or N+ extended status)
  • nginx.http.request.current
  • nginx.http.request.reading
  • nginx.http.request.writing
Type:        gauge, integer
Description: Number of currently active requests (reading and writing). Number of
             requests reading headers or writing responses to clients.
Source:      stub_status (or N+ extended status)
  • nginx.http.request.malformed
Type:        counter, integer
Description: Number of malformed requests.
Source:      access.log
  • nginx.http.request.body_bytes_sent
Type:        counter, integer
Description: Number of bytes sent to clients, not counting response headers.
Source:      access.log

HTTP Methods

  • nginx.http.method.get
  • nginx.http.method.head
  • nginx.http.method.post
  • nginx.http.method.put
  • nginx.http.method.delete
  • nginx.http.method.options
Type:        counter, integer
Description: Statistics about observed request methods.
Source:      access.log

HTTP Status Codes

  • nginx.http.status.1xx
  • nginx.http.status.2xx
  • nginx.http.status.3xx
  • nginx.http.status.4xx
  • nginx.http.status.5xx
Type:        counter, integer
Description: Number of requests with specific HTTP status codes.
Source:      access.log
  • nginx.http.status.discarded
Type:        counter, integer
Description: Number of requests finalized with status code 499 which is logged when the
             client closes the connection.
Source:      access.log

HTTP Protocol Versions

  • nginx.http.v0_9
  • nginx.http.v1_0
  • nginx.http.v1_1
  • nginx.http.v2
Type:        counter, integer
Description: Number of requests using a specific version of the HTTP protocol.
Source:      access.log

NGINX Process Metrics

  • nginx.workers.count
Type:        gauge, integer
Description: Number of NGINX worker processes observed.
  • nginx.workers.cpu.system
  • nginx.workers.cpu.total
  • nginx.workers.cpu.user
Type:        gauge, percent
Description: CPU utilization percentage observed for NGINX worker processes.
  • nginx.workers.fds_count
Type:        gauge, integer
Description: Number of file descriptors utilized by NGINX worker processes.
  • nginx.workers.io.kbs_r
  • nginx.workers.io.kbs_w
Type:        counter, integer
Description: Number of kilobytes read from or written to disk by NGINX worker processes.
  • nginx.workers.mem.rss
  • nginx.workers.mem.vms
Type:        gauge, bytes
Description: Memory utilized by NGINX worker processes.
  • nginx.workers.mem.rss_pct
Type:        gauge, percent
Description: Memory utilization percentage for NGINX worker processes.
  • nginx.workers.rlimit_nofile
Type:        gauge, integer
Description: Hard limit on the number of file descriptors as seen by NGINX worker
             processes.

Additional NGINX Metrics

NGINX Amplify Agent can collect a number of additional useful metrics described below. To enable these metrics, please make the following configuration changes. More predefined graphs will be added to the Graphs page if the agent finds additional metrics. With the required log format configuration, you’ll be able to build more specific custom graphs.

  • The access.log log format should include an extended set of NGINX variables. Please add a new log format or modify the existing one — and use it with the access_log directives in your NGINX configuration.
log_format  main_ext  '$remote_addr - $remote_user [$time_local] "$request" '
                        '$status $body_bytes_sent "$http_referer" '
                        '"$http_user_agent" "$http_x_forwarded_for" '
                        '"$host" sn="$server_name" '
                        'rt=$request_time '
                        'ua="$upstream_addr" us="$upstream_status" '
                        'ut="$upstream_response_time" ul="$upstream_response_length" '
                        'cs=$upstream__status' ;
  • Here’s how you may use the extended log format with your access log configuration:
access_log  /var/log/nginx/access.log  main_ext;

Note. Please bear in mind that by default the agent will process all access logs that are found in your log directory. If you define a new log file with the extended log format that will contain the entries being already logged to another access log, your metrics might be counted twice. Please refer to the agent configuration section above to learn how to exclude specific log files from processing.

  • The error.log log level should be set to warn.
error_log  /var/log/nginx/error.log warn;

Note. Don’t forget to reload your NGINX configuration with either kill -HUP or service nginx reload.

Here is the list of additional metrics that can be collected from the NGINX log files:

  • nginx.http.request.bytes_sent
Type:        counter, integer
Description: Number of bytes sent to clients.
Source:      access.log (requires custom log format)
Variable:    $bytes_sent
  • nginx.http.request.length
Type:        gauge, integer
Description: Request length, including request line, header, and body.
Source:      access.log (requires custom log format)
Variable:    $request_length
  • nginx.http.request.time
  • nginx.http.request.time.count
  • nginx.http.request.time.max
  • nginx.http.request.time.median
  • nginx.http.request.time.pctl95
Type:        gauge, seconds.milliseconds
Description: Request processing time — time elapsed between reading the first bytes from
             the client and writing a log entry after the last bytes were sent.
Source:      access.log (requires custom log format)
Variable:    $request_time
  • nginx.http.request.buffered
Type:        counter, integer
Description: Number of requests that were buffered to disk.
Source:      error.log (requires 'warn' log level)
  • nginx.http.gzip.ratio
Type:        gauge, float
Description: Achieved compression ratio, calculated as the ratio between the original
             and compressed response sizes.
Source:      access.log (requires custom log format)
Variable:    $gzip_ratio
Upstream Metrics
  • nginx.upstream.connect.time
  • nginx.upstream.connect.time.count
  • nginx.upstream.connect.time.max
  • nginx.upstream.connect.time.median
  • nginx.upstream.connect.time.pctl95
Type:        gauge, seconds.milliseconds
Description: Time spent on establishing connections with upstream servers. With SSL, it
             also includes time spent on the handshake.
Source:      access.log (requires custom log format)
Variable:    $upstream_connect_time
  • nginx.upstream.header.time
  • nginx.upstream.header.time.count
  • nginx.upstream.header.time.max
  • nginx.upstream.header.time.median
  • nginx.upstream.header.time.pctl95
Type:        gauge, seconds.milliseconds
Description: Time spent on receiving response headers from upstream servers.
Source:      access.log (requires custom log format)
Variable:    $upstream_header_time
  • nginx.upstream.response.buffered
Type:        counter, integer
Description: Number of upstream responses buffered to disk.
Source:      error.log (requires 'warn' log level)
  • nginx.upstream.request.count
  • nginx.upstream.next.count
Type:        counter, integer
Description: Number of requests that were sent to upstream servers.
Source:      access.log (requires custom log format)
Variable:    $upstream_*
  • nginx.upstream.request.failed
  • nginx.upstream.response.failed
Type:        counter, integer
Description: Number of requests and responses that failed while proxying.
Source:      error.log (requires 'error' log level)
  • nginx.upstream.response.length
Type:        gauge, bytes
Description: Average length of the responses obtained from the upstream servers.
Source:      access.log (requires custom log format)
Variable:    $upstream_response_length
  • nginx.upstream.response.time
  • nginx.upstream.response.time.count
  • nginx.upstream.response.time.max
  • nginx.upstream.response.time.median
  • nginx.upstream.response.time.pctl95
Type:        gauge, seconds.milliseconds
Description: Time spent on receiving responses from upstream servers.
Source:      access.log (requires custom log format)
Variable:    $upstream_response_time
  • nginx.upstream.status.1xx
  • nginx.upstream.status.2xx
  • nginx.upstream.status.3xx
  • nginx.upstream.status.4xx
  • nginx.upstream.status.5xx
Type:        counter, integer
Description: Number of responses from upstream servers with specific HTTP status codes.
Source:      access.log (requires custom log format)
Variable:    $upstream_status
Cache Metrics
  • nginx.cache.bypass
  • nginx.cache.expired
  • nginx.cache.hit
  • nginx.cache.miss
  • nginx.cache.revalidated
  • nginx.cache.stale
  • nginx.cache.updating
Type:        counter, integer
Description: Various statistics about NGINX cache usage.
Source:      access.log (requires custom log format)
Variable:    $upstream_cache_status

NGINX Plus Metrics

In NGINX Plus a number of additional metrics describing various aspects of NGINX performance are available. The extended status module in NGINX Plus is responsible for collecting and exposing all of the additional counters and gauges.

The NGINX Plus metrics currently supported by the agent are described below. The NGINX Plus extended status metrics have the “plus” prefix in their names.

Some of the NGINX Plus extended metrics extracted from the connections and the requests datasets are used to generate the following server-wide metrics (instead of using the stub_status metrics):

nginx.http.conn.accepted = connections.accepted
nginx.http.conn.active = connections.active
nginx.http.conn.current = connections.active + connections.idle
nginx.http.conn.dropped = connections.dropped
nginx.http.conn.idle = connections.idle
nginx.http.request.count = requests.total
nginx.http.request.current = requests.current

Please see the following reference documentation and a solution brief for more information about the NGINX Plus extended status.

The NGINX Plus metrics below are collected per zone. When configuring a graph using these metrics, please make sure to pick the correct server, upstream or cache zone. A more granular peer-specific breakdown of the metrics below is currently not supported in NGINX Amplify.

A cumulative metric set is also maintained internally by summing up the per-zone metrics. If you don’t configure a specific zone when building graphs, this will result in an “all zones” visualization. E.g. for something like plus.http.status.2xx omitting zone will display the instance-wide sum of the successful requests across all zones.

Server Zone Metrics
  • plus.http.request.count
  • plus.http.response.count
Type:        counter, integer
Description: Number of client requests received, and responses sent to clients.
Source:      NGINX Plus extended status
  • plus.http.request.bytes_rcvd
  • plus.http.request.bytes_sent
Type:        counter, bytes
Description: Number of bytes received from clients, and bytes sent to clients.
Source:      NGINX Plus extended status
  • plus.http.status.1xx
  • plus.http.status.2xx
  • plus.http.status.3xx
  • plus.http.status.4xx
  • plus.http.status.5xx
Type:        counter, integer
Description: Number of responses with status codes 1xx, 2xx, 3xx, 4xx, and 5xx.
Source:      NGINX Plus extended status
  • plus.http.status.discarded
Type:        counter, integer
Description: Number of requests completed without sending a response.
Source:      NGINX Plus extended status
Upstream Zone Metrics
  • plus.upstream.peer.count
Type:        gauge, integer
Description: Current number of live upstream servers in an upstream group. If
             graphed/monitored without specifying an upstream, it's the current
             number of all live upstream servers in all upstream groups.
Source:      NGINX Plus extended status
  • plus.upstream.request.count
  • plus.upstream.response.count
Type:        counter, integer
Description: Number of client requests forwarded to the upstream servers, and responses obtained.
Source:      NGINX Plus extended status
  • plus.upstream.conn.active
Type:        gauge, integer
Description: Current number of active connections to the upstream servers.
Source:      NGINX Plus extended status
  • plus.upstream.bytes_rcvd
  • plus.upstream.bytes_sent
Type:        counter, integer
Description: Number of bytes received from the upstream servers, and bytes sent.
Source:      NGINX Plus extended status
  • plus.upstream.status.1xx
  • plus.upstream.status.2xx
  • plus.upstream.status.3xx
  • plus.upstream.status.4xx
  • plus.upstream.status.5xx
Type:        counter, integer
Description: Number of responses from the upstream servers with status codes 1xx, 2xx,
             3xx, 4xx, and 5xx.
Source:      NGINX Plus extended status
  • plus.upstream.header.time
  • plus.upstream.header.time.count
  • plus.upstream.header.time.max
  • plus.upstream.header.time.median
  • plus.upstream.header.time.pctl95
Type:        gauge, seconds.milliseconds
Description: Average time to get the response header from the upstream servers.
Source:      NGINX Plus extended status
  • plus.upstream.response.time
  • plus.upstream.response.time.count
  • plus.upstream.response.time.max
  • plus.upstream.response.time.median
  • plus.upstream.response.time.pctl95
Type:        gauge, seconds.milliseconds
Description: Average time to get the full response from the upstream servers.
Source:      NGINX Plus extended status
  • plus.upstream.fails.count
  • plus.upstream.unavail.count
Type:        counter, integer
Description: Number of unsuccessful attempts to communicate with upstream servers, and
             how many times upstream servers became unavailable for client requests.
Source:      NGINX Plus extended status
  • plus.upstream.health.checks
  • plus.upstream.health.fails
  • plus.upstream.health.unhealthy
Type:        counter, integer
Description: Number of performed health check requests, failed health checks, and
             how many times the upstream servers became unhealthy.
Source:      NGINX Plus extended status
  • plus.upstream.queue.size
Type:        gauge, integer
Description: Current number of queued requests.
Source:      NGINX Plus extended status
  • plus.upstream.queue.overflows
Type:        counter, integer
Description: Number of requests rejected due to queue overflows.
Source:      NGINX Plus extended status
Cache Zone Metrics
  • plus.cache.bypass
  • plus.cache.bypass.bytes
  • plus.cache.expired
  • plus.cache.expired.bytes
  • plus.cache.hit
  • plus.cache.hit.bytes
  • plus.cache.miss
  • plus.cache.miss.bytes
  • plus.cache.revalidated
  • plus.cache.revalidated.bytes
  • plus.cache.size
  • plus.cache.stale
  • plus.cache.stale.bytes
  • plus.cache.updating
  • plus.cache.updating.bytes
Type:        counter, integer; counter, bytes
Description: Various statistics about NGINX Plus cache usage.
Source:      NGINX Plus extended status

Other metrics

PHP-FPM metrics

You can also monitor your PHP-FPM applications with NGINX Amplify. The agent should run in the same process environment as PHP-FPM, and be able to find the php-fpm processes with ps(1), otherwise the PHP-FPM metric collection won’t work.

When the agent finds a PHP-FPM master process, it tries to auto-detect the path to the PHP-FPM configuration. When the PHP-FPM configuration is found, the agent will look up the pool definitions, and the corresponding pm.status_pathdirectives.

The agent will find all pools and status URIs currently configured. The agent then queries the PHP-FPM pool status(es) via FastCGI. There’s no need to define HTTP proxy in your NGINX configuration that will point to the PHP-FPM status URIs.

To start monitoring PHP-FPM, follow the steps below:

  1. Make sure that your PHP-FPM status is enabled for at least one pool (if not, uncomment the pm.status_path directive for the pool, and restart PHP-FPM).
  2. Check that NGINX, the Amplify Agent, and the PHP-FPM workers are all run under the same user ID (e.g. www-data). If there are multiple pools configured with different user IDs, make sure the agent’s user ID is included in the group IDs of the PHP-FPM workers. This is required in order for the agent to access the PHP-FPM pool socket(s) when querying for metrics.
  3. Check that the listen socket for the PHP-FPM pool you want to monitor (and for which you enabled pm.status_path) is properly configured with listen.owner and listen.group, e.g.
listen.owner = www-data
listen.group = www-data
listen.mode = 0660
  1. Check that the PHP-FPM listen socket for the pool is properly created and has the right permissions.
  2. Check that you can query the PHP-FPM status for the pool from the command line, e.g.
# SCRIPT_NAME=/status SCRIPT_FILENAME=/status QUERY_STRING= REQUEST_METHOD=GET cgi-fcgi -bind -connect /var/run/php5-fpm.sock

and that the above command (or alike) returns the proper set of PHP-FPM metrics.

Note. the cgi-fcgi tool has to be installed separately (usually from the libfcgi-dev package). This tool is not required for the agent to collect and report PHP-FPM metrics, however it can be used to quickly diagnose possible issues with PHP-FPM metric collection.

  1. If your PHP-FPM is configured to use a TCP socket instead of a Unix domain socket, make sure you can query the PHP-FPM metrics manually with cgi-fcgi. Double check that your TCP socket configuration is secure (ideally, PHP-FPM pool listening on 127.0.0.1, and listen.allowed_clients enabled as well).
  2. Update the agent to the most recent version.
  3. Check that the following options are set in /etc/amplify-agent/agent.conf
[extensions]
phpfpm = True
  1. Restart the agent.

The agent should be able to detect the PHP-FPM master and workers, obtain the access to status, and collect the necessary metrics.

Here is the list of caveats to look for if the PHP-FPM metrics are not being collected:

  • No status enabled for any of the pools.
  • Different user IDs used by the agent and the PHP-FPM workers, or lack of a single group (when using PHP-FPM with a Unix domain socket).
  • Wrong permissions configured for the PHP-FPM listen socket (when using PHP-FPM with a Unix domain socket).
  • Agent can’t connect to the TCP socket (when using PHP-FPM with a TCP socket).
  • Agent can’t parse the PHP-FPM configuration. A possible workaround is to not have any ungrouped directives. Try to move any ungrouped directives under [global] and pool section headers.

If checking the above issues didn’t help, please enable the agent’s debug log, restart the agent, wait a few minutes, and then create an issue via Intercom. Please attach the log to the Intercom chat.

With all of the above successfully configured, the end result should be an additional tab displayed on the Graphs page, with the pre-defined visualization of the PHP-FPM metrics.

The PHP-FPM metrics on the Graphs page are cumulative, across all automatically detected pools. If you need per-pool graphs, go to Dashboards and create custom graphs per pool.

Below is the list of the currently supported PHP-FPM metrics.

  • php.fpm.conn.accepted
Type:        counter, integer
Description: The number of requests accepted by the pool.
Source:      PHP-FPM status (accepted conn)
  • php.fpm.queue.current
Type:        gauge, integer
Description: The number of requests in the queue of pending connections.
Source:      PHP-FPM status (listen queue)
  • php.fpm.queue.max
Type:        gauge, integer
Description: The maximum number of requests in the queue of pending connections since FPM has started.
Source:      PHP-FPM status (max listen queue)
  • php.fpm.queue.len
Type:        gauge, integer
Description: The size of the socket queue of pending connections.
Source:      PHP-FPM status (listen queue len)
  • php.fpm.proc.idle
Type:        gauge, integer
Description: The number of idle processes.
Source:      PHP-FPM status (idle processes)
  • php.fpm.proc.active
Type:        gauge, integer
Description: The number of active processes.
Source:      PHP-FPM status (active processes)
  • php.fpm.proc.total
Type:        gauge, integer
Description: The number of idle + active processes.
Source:      PHP-FPM status (total processes)
  • php.fpm.proc.max_active
Type:        gauge, integer
Description: The maximum number of active processes since FPM has started.
Source:      PHP-FPM status (max active processes)
  • php.fpm.proc.max_child
Type:        gauge, integer
Description: The number of times, the process limit has been reached.
Source:      PHP-FPM status (max children reached)
  • php.fpm.slow_req
Type:        counter, integer
Description: The number of requests that exceeded request_slowlog_timeout value.
Source:      PHP-FPM status (slow requests)

Source: https://github.com/nginxinc/nginx-amplify-doc

Print Friendly, PDF & Email

Comments

comments

Bài viết liên quan