Zabbix is a universal monitoring tool that can perform the following functions:
- collecting statistics in the specified working environment and working according to the certain scheme in different cases,
- monitoring the performance dynamics of network equipment and servers,
- warning and quick response to possible load problems.
Technology overview
The system was created by Alexey Vladyshev in Perl; major changes were made to the project (including the architecture). Zabbix has been rewritten in C and PHP. Open source was created in 2001. The first reliable version was released three years later.
The Zabbix web interface is written in PHP. IBM DB2, MySQL, SQLite, PostgreSQL or Oracle are used to store data.
On the official website you can download currently available Zabbix 4.4. Official website also provides webinars and official courses for beginners.
Zabbix architecture
Next, we will consider 4 main components of Zabbix, which allow you to monitor a specific working environment, as well as collect all the necessary information for work optimization:
- Server is the core that stores all system data (including operational, statistical data and configuration). It warns the administrator about existing problems with the equipment. Network services are managed remotely.
- Web interface is part of the system server (usually runs on the same physical node as Zabbix). For its work a web server is required.
- Agent is a program that monitors and collects statistics of the operation of local applications and resources (processor, RAM, drives, etc.).
- Proxy is a service that works on behalf of the server. It collects data about the availability and performance of devices. All necessary data is saved to a buffer and uploaded to the server. This process reduces the load on the processor and hard drive. A separate database is required for the Zabbix proxy to work.
Key features
This tool supports general checks for the most popular services (FTP, SMTP, SSH, DBMS, POP, NTP, VMware, Telnet). it is possible to change the settings yourself or use the add-on via the API.
Basic system functions:
- Ping to check the availability of nodes on the network.
- CPU load control (including individual processes).
- Network activity monitoring.
- Hard drive activity monitoring.
- Collecting data about the amount of free physical memory and RAM.
Checks
To describe the Zabbix monitoring system two key concepts are used:
- Data elements are a set of independent metrics with the help of which data is collected from network nodes. In the “Data Element” tab, data is configured. There is also an option to configure data automatically – by connecting a template.
- Network nodes – working devices (including their groups: switches, workstations, servers) that need to be checked. Practical work with Zabbix often begins with the creation and configuration of network nodes.
By collecting the necessary data, the Zabbix agent is able to display the current state of the physical server. Using a large number of metrics, you can check:
- total swap space,
- CPU iowait time,
- processor load, etc.
Zabbix provides 17 ways to collect information. Below we considered the most popular of them:
- Calculate – checks that the system makes by comparing existing data (including data from previous collections).
- Zabbix agent – connecting at a certain interval, the server independently collects information from the agent.
- SSH agent – using specified commands, the system connects via SSH.
- Simple check – simple operations, including ping.
- Zabbix aggregate – a process that involves collecting aggregate information from a database.
- Zabbix trapper – collecting information from trappers (bridges between the services used and the system).
Templates make it easy to create new checks. Web server availability can also be checked using simulated requests.
Performing a check via a user parameter:
To perform a check through an agent, you need to write an appropriate command in the Zabbix agent configuration file as a user parameter. The following term will help to do this:
UserParameter=<key>,<command>
Considered syntax also contains the unique key of the data element (within the network node). You have to create it yourself and save it. When creating a data component, this key can be used to reference the command entered in the user parameter. For example:
UserParameter=ping,echo 1
This command will allow you to configure the agent to always return the value “1” for the data item with the “ping” key.
Triggers
Triggers are used to process data. They are logical expressions with the values FALSE, TRUE and UNKNOWN, which can be created manually. Triggers are usually tested on arbitrary values before using.
For each trigger there is a threat severity level. Each level is marked with a specific color and is transmitted by an audio alert in the web interface:
- Disaster – red.
- High – light red.
- Average – orange.
- Warning – yellow.
- Information – light blue.
- Not classified – gray.
Trigger indicators:
- time – current time in HHMMSS format.
- sum – the sum of values for the specified interval or number of samples.
- abschange – the absolute difference between the last and penultimate values (0 – values are equal, 1 – not equal).
- change – the difference between the last and penultimate value.
- avg – average value over a certain interval in seconds or number of samples.
- prev – penultimate value.
- now – time in UNIX format.
- delta – the difference between the maximum and minimum with a certain interval or number of samples.
- count – the number of samples that satisfy the criterion.
- max/min – maximum and minimum values for the specified intervals or counts.
- last – any (from the end) value of the data element.
- diff – the parameter has values, where 0 – the last and penultimate values are equal, 1 – different.
- dayofweek – day of the week from 1 to 7.
- date.
Forecasting:
It is worth noting one very important monitoring function – forecasting (the forecast is made based on previously collected data). This function predicts possible values and the time of their occurrence.
By analyzing previously collected data, the trigger identifies future problems and alerts the administrator about the likelihood of their occurrence.
This allows to prevent peaks in load on the limited hard drive space or equipment used. In February 2016, system update 3.0 was released, which was added to the forecasting functionality.
Action
Action is a specified reaction to an Event. The action can be set for one event or a group of events, manually or automatically.
Action parameters:
- Name – the name of the action.
- Status – action status (“active” or “disabled”).
- Recovery subject – the subject who is notified after the operation.
- Default subject – indicates who is notified by default.
- Period – time period for the escalation step (in seconds).
- Recovery message – notification text after solving the problem.
- Default message – standard message text.
There are certain types of conditions for trigger or detection events. For example:
-
"Service type" with operators
=
,<
and>
– indicates that the detected service matches the specified one. -
"Application" with the operators
=
,like
andnot like
– indicates that the trigger is part of the specified application.
Operations
The user has an opportunity to specify an operation (group of operations) for events.
Operation parameters:
- Remote command – command for remote control.
- Event source – source of events.
- Step – when events escalate.
- Subject – who the system notifies.
- Message – message text.
- Default message – default text.
- Send a message to – a group message (User group) or a separate message (Single user).
- Operation type – actions at a certain step (for example: "Execute command", or "Send message").
Low level detection
LLDs are designed to automatically create elements and triggers that make it possible to track server systems under monitoring. The function is enabled by setting attributes. Customization of attributes is possible due to the following order:
“Setup” → “Templates” → “Detection” → “Detection Rules” / “Filters” tabs.
What can be detected:
- Windows Services.
- Common OIDs used by SNMP.
- ODBC.
- Network interfaces.
- File systems.
- Processors (including their kernels).
Additional types:
Using the JSON format, you can define your own low-level detection types. Types of checks for which you can specify a list of ports and the interval for them:
- IMAP;
- TCP;
- SSH;
- LDAP;
- POP;
- NNTP;
- SMTP;
- FTP;
- HTTP.
Proxy
Let's consider a situation where the existing infrastructure is quite large, and a dedicated server cannot cope with such a load. In this case the buffering function via a proxy is used. A proxy is an intermediate link. It collects information from agents into a buffer, and then sends the data to the server.
Proxy can also be used when agents are limited to the local network (or are located far from each other).
Zabbix proxy functions as a daemon. You must have a separate database to be able to use it.
Web interface. Important features
Features of the Zabbix monitoring system web interface:
- Automatic shutdown will be executed after 30 minutes of user inactivity. This is provided for the safety of entry and work.
- The console allows you to view the collected data and configure it.
-
Five functional sections are provided to the user:
- Administration;
- Monitoring;
- Configuration;
- Reports;
- Inventory and others.
Necessary information about the status of network nodes and triggers is always displayed on the screen.
Host groups are available in the "Configurations" section. More information can be found for each item on the list (data graphs, latest events).
In the Templates subsection you can manage available to the administrator templates.
Zabbix 4.4 features
You can check the version of the installed Zabbix server during the protocol file launch.
Zabbix 4.4 important innovations:
- Official platform support (MSI for Windows Agent, SUSE Linux Enterprise Server 15, RHEL 8, Debian 10, Mac OS/X, Raspbian 10).
- Official TimescaleDB support.
- The new Zabbix Agent (zabbix_agent2) is created in Go.
- Histograms and data grouping.
- Options for displaying data graphs.
- Knowledge base for data components and triggers.
- External notifications, bug tracking system.
Conclusion
Today Zabbix is one of the most popular, reliable and advanced tools for remote monitoring of hardware and software resources. Zabbix includes many useful functions, the main purpose of which is to solve problems of monitoring network activity and server performance, as well as warning users about possible errors and dangerous situations. The main capabilities and benefits of Zabbix make it a potential basis for creating a reliable strategy for effectively using IT infrastructure in various companies.