Introduction
These release notes provide a comprehensive overview of the new features, enhanced functionalities, and resolved issues found in version 4.0 of SRE.
What's New in SRE 4.0
Platform Upgrade
All underlying frameworks and libraries have been updated to the latest versions. Starting from release 4, the reference Operating System is now RHEL 8, and support for RHEL 7 has been discontinued.
New Broker
The broker process, sre-broker, responsible for dispatching SIP requests from the SIP stack to the sre-call-processor instances, has been completely rewritten for improved performance.
New Licensing
The licensing system has been revised to provide greater flexibility. Several new licenses have been introduced to differentiate SIP redirect scenarios from SIP relay scenarios. Consequently, it is now possible to obtain different limits depending on the type of scenario. Additionally, a new license, based on the number of platform-wide simultaneous sessions, has been added.
Note
Please note that if you are updating from a previous release, we strongly recommend contacting Netaxis Support to request your licensing keys for the new licensing scheme to prevent any disruption of service.
Call Processing
SIP Registration Client
A new framework has been introduced, allowing the SRE to function as a SIP registration client for multiple endpoints. To facilitate this, a new process named sre-remote-registration has been implemented, responsible for periodically registering SIP endpoints. In its role as a SIP registration client, the SRE manages all aspects of SIP registration. This includes tasks like sending SIP REGISTER messages, handling authentications, and managing expiry timers. Configuration options for SIP endpoints are available through the new menu System -> SIP Remote Registration, or they can be dynamically generated from Datamodel data by creating a specific Service Logic (SL) to be activated. In this SL, the output node "Add registrations" can be utilized to provide a list of SIP endpoints (stored in a records list) for registration.
When deployed across multiple Call Processors (CPs), the sre-remote-registration processes will distribute the SIP endpoints among them for registration. In the event of a CP or process failure, the remaining elements will take over the role of SIP registration until the failed component is restored.
Note
To activate this feature, a new configuration definition is necessary, and you must use the following command to enable the WITH_REGISTER_CLIENT setting:
# echo '#!define WITH_REGISTER_CLIENT' >> /etc/kamailio/kamailio-local.cfg
and restart kamailio.
On the dashboard, a new tab titled "SIP Registration" has been added to facilitate the monitoring of SIP endpoints' status.
Hitless Upgrade
SRE 4.0 enables the redirection of traffic from one call processor to another, offering a practical means to upgrade SRE with minimal disruption to ongoing traffic. For requirements please read the Hitless upgrade paragraph in the Installation guide.
To redirect traffic to another CP use the following command (confirmation is required):
# /opt/sre/bin/sre-admin kamailio set-redirection
A working CP is automatically chosen by this command and a pre-check is done to ensure th target CP is alive.
To deactivate traffic redirection, you can employ the following command:
# /opt/sre/bin/sre-admin kamailio reset-redirection
To get the status of traffic redirection run:
# /opt/sre/bin/sre-admin kamailio get-redirection
Service Logic Editor
Change Log
It is now possible to compare the current (development) version of a service logic with previous release versions of that same service logic. The tool is accessible from the Changelog tab of the Service Logic Editor (SLE). It lists changes in existing node parameters, added nodes, removed nodes, and modifications to links between nodes.
Tracing Local Storage
Tracing has been redesigned to store traces locally on both EMs rather than in-memory. This modification enables increased trace retention and preserves them in the event of an EM manager process restart.
Service Logic Tagging
A new tagging system has been implemented, enabling users to assign one or more dynamically created tags to service logics. This feature allows users to group service logics by category. These tags can also be automatically assigned upon importing a Service Logic. This functionality enables users to swiftly identify an imported service logic along with all its sub-service logics.
Bulk Service Logic Deletion
A button has been added to allow the deletion of multiple service logics at once based on tags.
Service Selection Menu
New buttons have been added to the service selection menu, allowing immediate editing or opening of the currently active service logic.
Automatic Table Joins
For the "Database Query" node, when executing joins between different tables, selecting the left column of the join now automatically populates the right column based on the foreign key of that column. This enhancement eliminates the need to manually select both sides of the join, reducing the potential for errors.
Online Documentation
Links have been incorporated to directly access online documentation from both the GUI menu and the nodes' properties dialog.
Improved Nodes
- The "Regular Expression Match" node has been enhanced to support the use of variables for the regular expression, enabling dynamic behavior (e.g. retrieving the regular expression from the database). This flexibility allows for the utilization of different regular expressions based on scenarios, allowing the matching of another variable with varying expressions.
- For the HTTP query nodes, support for mutual TLS has been added. It is now possible to configure the client certificate, client private key, and CA certificate to establish mutual TLS with the external system.
- The "Set Variables" node has been enhanced to facilitate the indirect setting of a variable through a dereferencing-like mechanism. In practice, the target variable name is not static but is defined inside another variable.
- The "DNS Query" node has been improved to support querying for SRV records, enabling the retrieval of priority, weight, port, and target information from SRV records.
- A new fire-and-forget option has been added to the "HTTP JSON Query" node, allowing the SRE to proceed to the next node without waiting for the result of the request.
New Nodes
A new node, "Remove SIP Body," has been introduced to facilitate the removal of the SIP body during response processing, such as in the case of a 180 Ringing or 183 Session Progress.
Another node, "Strip To Tag," has been added to remove the tag parameter from the To header during response processing.
Two new nodes have been introduced to facilitate data caching through a MongoDB database. The "Set Cached Data" node enables the storage of data by specifying a key, value, and time-to-live. The corresponding "Get Cached Data" node facilitates the retrieval of cached data. Both nodes allow the definition of the MongoDB connection string, optionally incorporating replica set information. A built-in cooldown mechanism ensures a graceful bypass if the underlying database becomes temporarily unavailable. Together, these nodes enable platform-wide data caching accessible from any Call Processor.
- A new node, "Set CDR Filename," has been introduced to enable the selection of the CDR filename prefix used for the output CDR. This feature facilitates the directing of CDR generation to different CDR files on a per-call basis.
Built-in Scheduler
A new service logic interface has been introduced to execute Service Logic at regular intervals. The execution itself is managed by a new process called "sre-scheduler." Each SL execution begins with a call descriptor containing two variables: "now," filled with the current timestamp, and "hostname," set to the hostname of the running SL (enabling distinct behavior based on the host). The execution interval is controlled by the custom endpoint mechanism, allowing the definition of various custom endpoints with different execution frequencies for the SL.
Statistics
Telegraf Integration
Telegraf integration has been implemented, allowing the Telegraf agent to transmit statistical data to the statistics database (operating on InfluxDB) on the Element Managers when activated. This data can subsequently be visualized through the built-in dashboard functionality of SRE. To support this integration, several built-in graph definitions have been introduced:
- Disk activity operations I/O
- Disk activity bytes I/O
- MongoDB commands
- MongoDB connections
- Processes
- CPU Usage over time
- DB cache
- DB connections
- Records activity
- Network bytes I/O
- Network packets I/O
- Network TCP segments I/O
- Network UDP datagrams I/O
These new metrics offer deeper insights into the health and performance of the platform, encompassing aspects such as the operating system, network, and various subsystems like PostgreSQL or MongoDB.
Alarms
New alarms have been introduced to monitor the number of open files, triggering when this count reaches a configured threshold. Additionally, another alarm has been added to signal when the number of open database (DB) connections reaches a configured percentage of the available DB connections.
A scheduled job has been implemented to automatically purge alarms after reaching a configured retention threshold.
Alongside the new automatic DB switchover feature introduced in this release, new alarms have been added to notify of state transitions of the repmgrd daemon and in the case of DB switchover.
Operations & Maintenance
Database Automatic Switchover
Automatic switchover of the PostgreSQL DB has been implemented on the Element Managers through the repmgrd daemon. In the event of a DB switchover, the Call Processors' local DBs will also be directed to follow the newly elected master.
Events Log
A new log, located by default at /var/log/sre/events.log, records all state-changing events. This log offers a comprehensive view of the system and maintains a history of significant events, including alarms or changes in database state, among others. Every midnight, the current state for various items is logged to this file.
Example:
2023-12-22 00:00:00,022 CURRENT: [sre-40-em2] status.db.replication.state: master
2023-12-22 00:00:00,023 CURRENT: [sre-40-em2] status.influxDB.hosts."sre-40-em2".state: pass
2023-12-22 00:00:00,023 CURRENT: [sre-40-em2] status.mongoDB.localMongo: True
2023-12-22 00:00:00,023 CURRENT: [sre-40-em2] status.mongoDB.members."10.0.161.192:27017".lastHeartbeatMessage:
2023-12-22 00:00:00,023 CURRENT: [sre-40-em2] status.mongoDB.members."10.0.161.192:27017".state: SECONDARY
2023-12-22 00:00:00,023 CURRENT: [sre-40-em2] status.mongoDB.members."10.0.161.192:27017".syncSourceHost: 10.0.161.193:27017
...
2023-12-22 09:34:25,356 NEW: [sre-40-cp1] alarm.sre.process.telegraf.stopped: active
2023-12-22 09:34:30,425 CHANGE: [sre-40-cp1] alarm.sre.process.telegraf.stopped: active -> system-cleared
...
2023-12-22 09:34:50,847 CHANGE: [sre-40-cp1] alarm.sre.cpu.critical: system-cleared -> active
2023-12-22 09:34:55,886 CHANGE: [sre-40-cp1] alarm.sre.cpu.critical: active -> system-cleared
Improved Summary Log
The call summary log, part of sre.log has been enhanced to include executions for the ENUM and HTTP interfaces, in addition to the SIP interface. For all interfaces, the switch of service logics is now explicitly detailed, and the total processing duration is also included. This information can be used to identify slow executions.
Example log for SIP, HTTP & ENUM requests:
[sre.cp.tracing.summary INFO]-2023-12-27 09:06:54,228 <sre-40-cp2> hub51u3z: input: 123 -> 456 (c636e7a9-5a13-4729-8d99-27fd95bffe35), logic: (SL=loop)Start->a->410, output: {'nit': 'sipResponse', 'responseCode': '410', 'reasonPhrase': 'Gone', 'reasonHeader': None, 'statCode': '410'}, duration: 72.7 msecs
[sre.cp.tracing.summary INFO]-2023-12-27 09:12:03,448 <sre-40-cp2> ygy11ynp: input: GET / , logic: (SL=RBEN HTTP )Start->crl->response->200, output: {'nit': 'http', 'body': '{"message": "success"}', 'contentType': 'application/json', 'headers': {}, 'responseCode': '200', 'actions': []}, duration: 37.2 msecs
[sre.cp.tracing.summary INFO]-2023-12-27 09:15:34,344 <sre-40-cp2> k7wv2p5p: input: 32495212354 (4.5.3.2.1.2.5.9.4.2.3.e164.arpa), logic: (SL=RBEN ENUM)Start->sample enum, output: {'nit': 'enum', 'answers': [{'ttl': 900, 'order': 100, 'preference': 10, 'flags': 'u', 'service': 'sip+E2U', 'regex': '!.*!domain.com!'}], 'actions': []}, duration: 1670.9 msecs
Maintenance Jobs
By leveraging the new scheduler mechanism, maintenance tasks such as backups or file cleanup, among others, have been incorporated into the new scheduler. As a result, it is no longer necessary to configure Cron for these maintenance tasks. Parameters for these jobs have been included in the System -> Settings menu.
Automatic Rolling Upgrade
Note
Ansible is required. Installation details are available in the Installation Guide.
The new SRE RPM must be acquired and downloaded on the primary EM, then run the following command on master EM:
# ansible-playbook -i /etc/ansible/sre-inventory.py /opt/sre/playbooks/upgrade.yml -e src_rpm=sre-x.y.z.x86_64.rpm
REST API
Single Record Retrieval
The GET API has been enhanced to enable the return of the first matched record as a dictionary, rather than as a list of matching records when performing a query on several records.
Licenses Endpoint
A new API endpoint, accessible at /licenses
, has been added to retrieve the current license usage.
Example output:
[
{
"name": "enum-execution-processor",
"expire": "2100-01-02",
"callsPerSeconds": 1,
"usagePercent": 0
},
{
"name": "http-execution-processor",
"expire": "2100-01-02",
"callsPerSeconds": 1,
"usagePercent": 0
},
{
"name": "sip-relay-call-processor",
"expire": "2100-01-02",
"callsPerSeconds": 1,
"usagePercent": 0
},
{
"name": "sip-redirect-call-processor",
"expire": "2100-01-02",
"callsPerSeconds": 1,
"usagePercent": 0
},
{
"name": "sip-session-execution-platform",
"expire": "2100-01-02",
"limit": 3,
"usagePercent": 0
}
]
Security & Auditing
Password Salting
Starting from this release, local user passwords are now salted in the database. Password salting is a security measure that involves adding a unique, random value (the "salt") to each user's password before hashing and storing it. This prevents attackers from using precomputed tables (like rainbow tables) and enhances security by ensuring that even identical passwords result in different hashed values.
Service Logic Fine-Grained Access Control
By leveraging the new tagging system for Service Logics (SLs), it is now possible to define roles by specifying which SL tag a role can access. This allows for fine-grained access control of the SLs.
GUI
Basic/Advanced Settings Split
In the System -> Settings menu, configuration parameters have been categorized into basic and advanced settings. Typically, only basic settings may need to be modified under normal conditions. Advanced settings provide more granular control over deeper components.
Miscellaneous Enhancements
The following is a list of minor enhancements which do not affect the main functionality of SRE:
- adapted REST audit log to ease parsing
- updated SREaaS Ansible syntax
- fixed top navbar
- added sre-admin tool to rename datamodel table in datamodel definition and/or service logic nodes
- added configuration parameter to modify listen/connection address of broker
- added sre-admin command to redirect traffic to sibling broker
- added sre-admin broker benchmarking tool
- added operators is NULL and is not NULL for data admin search page
- changed dashboard graphs refresh to active tab only
- added log back mechanism
Patch Versions Release Notes
Release 4.0.1
Pull id | Fix |
---|---|
1585 | made SLE nodes copy/paste functionality standard |
1582 | fixed API error 500 if module license is added |
1575 | fixed global stats to show aggregated data for all servers |
1571 | fixed race condition for CAC record removal when BYE arrives right after INVITE 200 OK |
1569 | fixed HTTP custom endpoint URL resolver to allow spaces in URL requests |
1565 | removed call-id-based seeding for nodes "Shuffle list of records" and "Random pick a row" to improve result randomness |
1560 | added missing level filter and class for alarm level warning in alarms browser |
1556 | forced microseconds resolution for CDR fields |
1553 | fixed CSV export with query |
1549 | fixed nodes "Create database record" and "Update database record" when trying to set an integer column to NULL |
1545 | fixed HTTP trace summary display |
1540 | fixed license loading for sre-http-processor; fixed custom endpoint resolver for sre-http-processor; pre-filled record editing with default DM values for input fields |
1538 | improved REST API to consider itself as master only if it is replicating to other servers; fixed validator alphanumeric to allow numbers at start |
1536 | fixed location of import of kamailio-local.cfg into main kamailio.cfg |
1531 | improved batch provisioning foreign key resolver when using numeric references |
1527 | adapted repmgr check to ignore error code; removed links when cloning node |
Release 4.0.2
Pull id | Fix |
---|---|
1676 | fixed regression for "Regular expression substitution" node |
1673 | fixed display of local databases metrics on dashboard for SREaaS |
1669 | fixed service logic changelog when some specific nodes are present |
1654 | fixed performance regression on Datamodel management page |
1650 | adapted node "DB Query" to prevent empty offset value to match empty string key in call descriptor |
1637 | added option to control Kamailio configuration update for SREaaS deployment scripts |
1635 | adapted custom SIP agents probing to use the probing R-URI as key for checking live status |
1633 | updated registry URL for SREaaS |
1629 | reduced frequency of InfluxDB health checks by sre-health-monitor to reduce load |
1625 | fixed HTML escaping of DME constraints |
1623 | improved performance of node "Extract JSON path" by compiling with basic parser; removed cache expiration of compiled JSON path expressions |
1610 | adapted service logic tags search to apply AND logic |
1608 | adapted signal handling during process shutdown for process sre-cdr-collector |
1589 | added index on alarm table for performance |
1680 | fixed dashboard display of cluster status |
Upgrade From 3.3
Note
If you are coming from a release prior to 3.3, refer to the release notes for that release to perform the intermediate steps.
As SRE 4.0 runs on a different OS version (i.e., RHEL 8) than the previous 3.x versions (i.e., RHEL 7), upgrading an existing SRE platform involves prioritizing the OS upgrade. Two possibilities exist for this upgrade:
- Upgrade the OS in-place using RHEL conversion/upgrade tools and transition from SRE 3.3.x to SRE 4.0.x.
- Provision new virtual machines with RHEL 8 and install the SRE RPM on top.
Option 1 minimizes operations, making it easier to maintain IP addresses and ongoing database replication, albeit with some downtime. Option 2 allows for smoother preparation but may require more operations, especially if IP addresses need changing. In such cases, replication operations and configuration adjustments may be necessary, both within SRE and external equipment (IT systems and SIP clients).
High-Level Procedures
All the options detailed below aim to avoid any call processing downtime but differ in the duration of provisioning/monitoring downtime and the configuration changes required for external systems.
Option 1A
Option 1A can be performed when some provisioning/monitoring downtime is acceptable during the upgrade of the master EM. These steps must be carried out:
- Stop SRE on standby EM
- Upgrade the OS on standby EM
- Upgrade the SRE RPM on standby EM
- Stop SRE on master EM
- Upgrade the OS on master EM
- Upgrade the SRE RPM on master EM
- Update the database schema on master EM
- Start SRE on master EM
- Start SRE on standby EM
- Upgrade the CP one-by-one, preferably isolating them from network requests.
Option 1B
Option 1B should be performed when provisioning/monitoring downtime must be minimized. These steps must be carried out:
- Stop SRE on standby EM
- Upgrade the OS on standby EM
- Upgrade the SRE RPM on standby EM
- Perform a switchover of the database to promote the standby EM as master
- Update the database schema on the new master EM
- Start SRE on the new master EM
- Stop SRE on the new standby EM
- Upgrade the OS on the new standby EM
- Upgrade the SRE RPM on the new standby EM
- Start SRE on the new standby EM
- Upgrade the CP one-by-one, preferably isolating them from network requests.
Option 2A
Option 2A can be performed when modifying configuration on other systems is possible. These steps must be carried out:
- Provision a new master EM VM with different IP addresses and RHEL 8.
- Install SRE on top of the OS.
- Set up database synchronization from the SRE 3.3 master VM to this new master VM.
- Provision a new standby EM VM with different IP addresses and RHEL 8.
- Install SRE on top of the OS.
- Set up database synchronization from the new master VM to this new standby VM (resulting in cascading replication: Old SRE master EM -> new SRE master EM -> new SRE standby EM).
- Update the EM IP addresses in the configuration.
- Shut down the old master EM VM.
- Shut down the old standby EM VM.
- Make the new master EM VM the primary DB.
- Instruct all CPs to follow the new master DB.
- Update the database schema on the new master EM.
- Start SRE on the new master EM.
- Start SRE on the new standby EM.
- Instruct external IT systems to point to the new EM addresses.
- Restart SRE on CPs one-by-one so that they communicate their logs and information to the new EMs.
- Upgrade the CP one-by-one, preferably isolating them from network requests and updating the IP addresses on client equipment accordingly.
Option 2B
Option 2B can be performed when modifying configuration on other systems is difficult, and provisioning/monitoring downtime must be kept to a minimum. These steps must be carried out:
- Provision a new standby VM with the same IP addresses and RHEL 8, in isolation mode to avoid conflicts with the existing VMs.
- Install SRE on top of the OS.
- Shut down the current standby EM VM.
- Activate the network on the new standby VM.
- Set up database synchronization from the SRE 3.3 master VM to this new standby VM.
- Switchover the database from the master EM to the standby EM.
- Instruct all CPs to follow the new master DB.
- Update the database schema on the new master EM.
- Start SRE on the new master EM.
- Provision a new standby VM with the same IP addresses and RHEL 8, in isolation mode to avoid conflicts with the existing VMs.
- Install SRE on top of the OS.
- Shut down the standby EM.
- Activate the network on the new standby EM.
- Set up database synchronization from the new master VM to this new standby VM.
- Start SRE on the new standby EM.
- Upgrade the CP one-by-one, preferably isolating them from network requests.
Element Managers
SRE RPM Update
To launch the upgrade, on all EMs do:
# yum install /<path>/sre.4.0.x-y.x86_64.rpm
You must upgrade the internal DB schema. Therefore on the master EM node only, run:
# /opt/sre/bin/sre-admin db upgrade
The DB schema change will be applied to the other nodes through standard DB replication.
After you need to restart SRE on both EMs with:
# systemctl restart sre
In SRE GUI in Settings->Licenses set the new license keys you obtained from Netaxis Support.
Once these changes have been performed, restart SRE with the command
# systemctl restart sre
As maintenance jobs are now handled by the built-in scheduler, the SRE crontab file, placed by default under /etc/cron.d should be removed
Call Processors
Call processors must be upgraded one by one.
If the call processor runs the SIP stack, perform the following steps:
Take the CP offline from the GUI (System->Node operational status->out-of-service). Alternatively, you can set the CP out-of-service from the SIP client equipment (e.g. SBC, ...). Check traffic has stopped on the CP by checking with tcpdump, sngrep or the dashboard statistics.
Shutdown Kamailio with:
# systemctl stop kamailio
- Upgrade SRE from the RPM with the same command used for EM:
# yum install /<path>/sre.4.0.x-y.x86_64.rpm
# systemctl restart sre
- Copy the file /opt/sre/etc/kamailio/kamailio.cfg to /etc/kamailio
- Adapt the file /etc/kamailio/kamailio.cfg depending on the deployment (usually only the line listen, which contains the listening address of your Kamailio instance)
- Restart Kamailio with:
systemctl start kamailio
- Enable traffic from the GUI (System->Node operational status->in-service)
If the call processor runs the ENUM interface or the HTTP interface, perform these steps:
- If the client equipment allows putting the SRE CP out-of-service so that no requests are sent to it, proceed in this way.
- Upgrade SRE from the RPM with the same command used for EM:
# yum install /<path>/sre.4.0.x-y.x86_64.rpm
After the upgrade is done at least on 1 CP node, make sure the CP is handling requests in the expected way, as in the previous release. Verify that CDRs are created on EMs (if enabled) for the requests handled by this CP.
If this is confirmed, proceed to the next CP node.
Downgrade From 4.0 to 3.3
You must downgrade the internal DB schema. Therefore on the master EM node run as user postgres:
# psql
and use the following commands:
# postgres=# \c sre
# sre=# ALTER TABLE service_logic DROP COLUMN tags CASCADE;
Install the previous rpm on all EMs and CPS with the command:
# yum downgrade /<path>/sre.3.3.x-y.x86_64.rpm
On CPs restore the previous Kamailio configuration file and restart kamailio with:
# systemctl restart kamailio
Patch Upgrade Path From 4.0.x
To upgrade to a target patch release, the Admin needs to check the upgrade path to know which actions to take.
Note
It is important to highlight that an action needed at a patch level 4.0.N is also needed for direct upgrade to 4.0.N+1, 4.0.N+2, ...
Patch release | Needed actions |
---|---|
4.0.1 | None |
In addition to the listed needed actions:
On all nodes, do as root:
# yum update /<path>/sre.4.0.x.-y.x86_64.rpm
# systemctl restart sre
Verify always the possible differences of the following files with the diff command:
# diff /etc/kamailio/kamailio.cfg /opt/sre/etc/kamailio/kamailio.cfg
If any difference is observed, verify with Netaxis Support/R&D.