Changes:
- Introducing a a new config key
agent_status_timeout
and setting it to 30s by default. Until version 2.2.17 it was hardcoded to 5 seconds.
Fixes:
- Fixed a bug where pod logs were removed from ingestion because of a temporrary K8s API error and the agent was not able to add them again.
Changes:
- Several packages were upgraded to mitigate vulnerabilities - orjson==3.10.7, PyMySQL==1.1.1
- Unused root CA file scalyr_agent_ca_root was removed
- Alpine base image upgraded to 2.19.4
Fixes:
- Added safety check when handling pending addEvents task to handle the following error:
AttributeError: 'AddEventsTask' object has no attribute '_CopyingManagerWorkerSession__receive_response_status'
- Added missing zstandard library to alpine image
Changes:
- Fixed steady growth of memory usage of Syslog Monitor. syslog_processing_thread_count can be used to limit the number of threads and memory consumption.
- Added the max_allowed_checkpoint_age option to allow configuring the checkpoint age (default 15 minutes), useful for agents monitoring a large number of short-lived logs
- Added the Kubernetes monitor k8s_cri_query_filesystem_retain_not_found option to disable caching info about pods that are dead (default is true), useful for agents that monitor clusters whose pods are not cleaned up frequently
Features: Ability to ignore k8s labels (#1213). A user can now define glob filters for including and excluding container and controller labels in logs. For label to be included it needs to match one of the globs in k8s_label_include_globs and none of the globs in k8s_label_exclude_globs.
From documentation: | k8s_label_include_globs | Optional, (defaults to ['*']). Specifies a list of K8s labels to be added to logs. | | k8s_label_exclude_globs | Optional, (defaults to []]). Specifies a list of K8s labels to be ignored and not added to logs. |
Follows the same logic as label_include_globs and label_exclude_globs from the docker_monitor.
Changes: Support sending events to multiple Dataset accounts via K8s annotations. This feature was updated to use secrets from the same namespace as the pod and hierarchy of default secrets were added: default secret -> namespace secret -> pod secret -> container secret. See kubernetes_monitor.md for documentation.
Feature:
- Kubernetes Monitor - when K8s Pod API returnes 404, the logs are ingested based on SCALYR_K8S_INCLUDE_ALL_CONTAINERS regardless of pods annotations. Short-lived pods are usually affected.
Feature:
- Support sending events to multiple Dataset accounts via K8s annotations. See kubernetes_monitor.md for documentation.
Security fix:
- Alpine image upgraded to 3.19.1 to mitigate CVE-2023-7104 and CVE-2023-5363
Improvement:
- Windows Event Log Monitor - EventID field has been expanded into three fields. Until now EventID from the Windows Event Log was represented only by EventID field computed the following way win32api.MAKELONG(EventID, EventIDQualifiers). EventID and EventIDQualifiers now hold the values from the Windows Event Log without any modification and InstanceID is computed from EventID and EventIDQualifiers as described above.
Fix:
- Missing monitor field added to the event data sent by Syslog Monitor. The following field was added - ‘monitor’:'agentSyslog'.
Feature:
- Support for CRI multi lines - implementing support for the following format: https://github.com/kubernetes/design-proposals-archive/blob/main/node/kubelet-cri-logging.md
Fix:
- Python2 suport for Syslog monitor fixed
Security Fix:
- Alpine image upgraded from 3.18.13 to 3.18.14 to fix recently found vulnerabilities
Features:
- Added allow_http_monitors root configuration option. If true it forces monitors to use https instead of http where relevant. By default true for fips docker images, false otherwise.
Bug fixes:
- Tls certificate handling defect in 2.2.4 - #1199
- Python2 support fixed (broken with version 2.2.7)
- SCALYR_K8S_INCLUDE_ALL_CONTAINERS evironment flag fix - When the K8s Pod API returned 404, the logs were being imported without a label marking the pod as included.
- Base Ubuntu Fips image was upgraded to 2.0.32
- Syslog Monitor uses a thread pool to process requests
Windows:
- Support execution as any administrator account
Other:
- Add missing CA root certificates for FedRAMP environments (DigiCert Global CA root, DigiCert Global G2 CA root, GoDaddy CA root).
- Alpine base image upgraded to 3.8.13
Docker Images / Kubernetes:
- Kubernetes and Docker images switched from Debian to Ubuntu. Python version used in those images now matches Python in this distribution.
- Upgraded bundled Python dependencies to the latest stable versions to include all the latest security fixes (requests==2.28,1, pysnmp==4.4.12, docker==6.1.3)
Kubernetes:
- Fix a bug with Kubernetes Events monitor not emitting events for
CronJob
objects when using Kubernetes >= 1.25.0. The bug was related to the API endpoint being promoted fromv1beta
tostable
. The code has been updated to support both locations - old one (beta) and the new one (stable). - Update Kubernetes monitor to also log the value of
SCALYR_COMPRESSION_LEVEL
environment variable on start up. Contributed by @MichalKoziorowski-TomTom #1104.
Improvements:
- Support SSL connections to PostgreSQL
Kubernetes:
- Kubernetes ClusterRole resource definition now includes Argo Rollouts. The agent won't receive permission errors anymore when it interrogates Argo Rollout resources.
Other:
- Linux
deb
andrpm
packages are now shipped with its own Python interpreter, so agent does not have to rely on system Python interpreter anymore. Please refer to the RELEASE_NOTES document, for more information. - The PostgreSQL monitor now supports connecting to servers over SSL. Configure the monitor with the setting
use_ssl
set toTrue
to enable this.
Bug fixes:
- Fix bug in the Docker monitor which would, under some edge cases, cause a specific log messages to be logged very often and as such, exhausting the CPU cycles available to the Docker monitor.
Docker Images / Kubernetes:
- Docker images now are built against Python:3.8.16 image. Debian and Alpine based Docker images have also been updated to include all the latest security updates for the system packages.
Kubernetes:
- Support
Rollout
controllers from argo-rollouts as a source of server metadata with which to annotate events.
Kubernetes:
- Add
securityContext.allowPrivilegeEscalation: false
annotation to the Scalyr Agent DaemonSet container specification. - Fix bug that caused logging of the Kubernetes cache stats to agent status.
- More accurate docker.mem.usage metric (no longer includes filesystem cache).
- Monitor leader election algorithm now uses pods in the owning ReplicaSet or DaemonSet.
Windows:
- Add support for placeholder parameter (ie %%1234) replacement in event data.
Other
- Added support for
--debug
flag to the scalyr-agent-2 status command. When this flag is used, agent prints additional debug related information with the status output. NOTE: Right now it's only supported with the healthcheck option (--health_check
,-H
).
Kubernetes:
- Fix a bug / edge case in the Kubernetes caching PodProcessor code which could cause an agent to get stuck in an infinite loop when processing controllers which have a custom Kind which is not supported by the agent defined. Contributed by @xdvpser #998 #999.
- Allows user to configure Docker client library connection timeout and maximum connection pool size when using Docker container enumerator via new
k8s_docker_client_timeout
andk8s_docker_client_max_pool_size
config option.
Docker Images:
- Upgrade Linux Docker images to use Python 3.8.14.
- Upgrade docker-py dependency to v6.0.0.
- Upgrade orjson and yappi dependency to the latest stable version (3.8.0 and 1.3.6).
Windows:
- Upgrade agent Windows binary to use Python 3.10.7.
- Fix a bug where under some scenarios log record would not contain "message" attribute.
Other:
- Log how long it took (in milliseconds) to read data from socket when we encounter HTTP errors (e.g. connection aborted, connection reset, etc.).
- Log which version of docker-py and requests Python library is being used so it's easier to spot if bundled / vendored or system version is used.
Windows:
- The monitor field is set to
windows_event_log_monitor
in thewindows_event_log
monitor if thejson
config option is true. - Proper handling of forwarded windows events (ie handling of EvtOpenPublisherMetadata exceptions) in the
windows_event_log
monitor.
Windows:
- More sophisticated handling of array-based (event field) names in
windows_event_log
monitor.
Improvements:
- Add support for tracemalloc based memory profiler (
"memory_profiler": "tracemalloc"
config option). Keep in mind that tracemalloc is only available when running the agent under >= Python 3.5. - Add new
memory_profiler_max_items
config option which sets maximum number of items by memory usage reported by the memory profiler. - Add new
enable_cpu_profiling
andenable_memory_profiling
config option with which user can enable either CPU or memory profiler, or both. Existingenable_profiling
config behavior didn't change and setting it totrue
will enable both profilers (CPU and memory). - Allow user to specify additional trace filters (path globs) for tracemalloc memory profiler using
memory_profiler_ignore_path_globs
config option. (e.g.memory_profiler_ignore_path_globs: ["**/scalyr_agent/util.py", "**/scalyr_agent/json_lib/**"]
). - Update syslog_monitor to ignore errors when decoding data from bytes into unicode if data falls outside of the utf-8 range.
Bug fixes:
- Update windows_event_log_monitor to handle embedded double quotes in fields.
Improvements:
- Add option
stop_agent_on_failure
for each monitor's configuration. Iftrue
, the agent will stop if the monitor fails. For Kubernetes deployments this istrue
by default for the Kubernetes monitor (scalyr_agent.builtin_monitors.kubernetes_monitor
). The agent's pod will restart if the monitor fails.
Kubernetes Explorer:
- Update code to calculate per second rate for various metrics used by Kubernetes Explorer on the client (agent) side. This may result in slight CPU and memory usage increase when using Kubernetes Explorer functionality.
Windows:
- Add new
json
config option to thewindows_event_log
monitor. When this option is set to true, events are formatted as JSON.
Other:
- Changed log severity level from
ERROR
toWARNING
for non-fatal and temporary network client error. - Update agent packages to also bundle new LetsEncrypt CA root certificate (ISRG Root X2). Some of the environments use LetsEncrypt issued certificates.
- Update agent code base to log a warning with the server side SSL certificate in PEM format on SSL certificate validation failure for easier troubleshooting.
- Upgrade various bundled dependencies (orjson, docker).
Bug fixes:
- Set new
stop_agent_on_failure
monitor config option totrue
in the agent Docker image for Kubernetes deployments. Solves an issue, present in some rare edge cases, where the Kubernetes Monitor (scalyr_agent.builtin_monitors.kubernetes_monitor
) exits, but the agent continues to run. The agent's pod will restart if the monitor fails.
Windows:
- Fix bug in Windows System Metrics and Windows Process Metrics monitor where user wasn't able to override / change default sampling rating of 30 seconds (
sample_interval
monitor config option). - Update Windows Process Metrics monitor to log a message in case process with the specified pid / command line string is not found when retrieving process metrics.
- Update Windows Process Metrics monitor to throw an error in case invalid monitor configuration is specified (neither "pid" nor "commandline" config option is specified or both config options which are mutually exclusive are specified).
Bug fixes:
- Fix a bug with
import_vars
functionality which didn't work correctly when the same variable name prefix was used (e.g.SCALYR_FOO_TEST
,SCALYR_FOO
). - Fix a bug with handling the log file of the Kubernetes Event Monitor twice, which led to duplication in the agent's status.
- Fix a bug in scalyr-agent-2-config
--export-config
,import-config
options caused by Python 2 and 3 code incompatibility. - Fix a bug with the wrong executable
scalyr-agent-2-config
in Docker and Kubernetes, due to which it could not be used.
Docker images:
- Upgrade various dependencies: orjson, requests, zstandard, lz4, docker.
Other:
- Support for Python 2.6 has been dropped.
- Support for
ujson
JSON library (json_library
configuration option) has been removed in favor oforjson
. - Update agent log messages to include full name of the module which produced the message.
Windows:
- Update base Python interpreter for Windows MSI package from 3.8 to 3.10.
- Upgrade various bundled dependencies (pywin32, orjson, urllib3, six).
Bug fixes:
- Fix a regression introduced in v2.1.29 which would cause the agent to inadvertently skip connectivity check on startup.
- Default value for
check_remote_if_no_tty
config option isFalse
. Previously the changelog entry incorrectly stated it defaults toTrue
. This means that a connectivity check is not performed on startup if tty is not available. - Fix a bug in syslog monitor on Windows under Python 3 which would prevent TCP handler from working.
- Fixed agent checkpoint selection bug that could cause old log files to be re-uploaded.
- Small bug with command line argument parsing for the Agent. Agent raised unhandled exception instead of normal argparse error message when agent main command wasn't specified.
- Fix a bug in the Agent's custom JSON parser, which did not raise error on unexpected ending of the JSON document which might be caused by a JSON syntax error.
Docker images:
- Upgrade orjson dependency
Other:
- Monitor
emit_value()
method now correctly sanitizes / escapes metric field names which are "reserved" (logfile, metric, value, serverHost, instance, severity). This is done to prevent possible collisions with special / reserved metric event attribute names which could cause issues with some queries. Metric field names which are escaped get added_
suffix (e.g.metric
becomesmetric_
). - Upgrade dependency
requests
library to 2.25.1. - Failed docker container metric status requests from the docker client now logged as warnings instead of errors.
Kubernetes:
- Agent has been updated to periodically try to re-read Kubernetes authentication token value from
/var/run/secrets/kubernetes.io/serviceaccount/token
file on disk (every 5 minutes by default). This way agent also supports Kubernetes deployments where token files are periodically automatically refreshed / rotated (e.g. https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#service-account-token-volume-projection). - Fix an edge case with handling empty
resourceVersion
value in the Kubernetes Events Monitor.
Docker images:
- Upgrade Python used by Docker images from 3.8.12 to 3.8.13.
Bug fixes:
- Fix minimum Python version detection in the deb/rpm package pre/post install script and make sure agent packages also support Python >= 3.10 (e.g. Ubuntu 22.04). Contributed by Arkadiusz Skalski (@askalski85).
Other:
- Update the code to log connection related errors which are retried and are not fatal under WARNING instead of ERROR log level.
- Update agent start up log message to use format which is supported by the agent message parser.
Docker images:
- Kubernetes Docker image (
scalyr-k8s-agent
) has been updated to ignore checkpoint data for ephemeral log files (anything matching/var/log/scalyr-agent-2/*.log
) on agent start up. Those log files are ephemeral (aka only available during container runtime) which means we don't want to re-use checkpoints for those log files across pod restarts (recreations). Previously, those checkpoints were preserved across restarts which meant that on subsequent pod restarts some of the early internal agent log messages produced by the agent during start up phase were not ingested.
Bug fixes:
- Kubernetes monitor now correctly dynamically detects pod metadata changes (e.g. annotations) when using containerd runtime. Previously metadata updates were not detected dynamically which meant agent needed to be restarted to pick up any metadata changes (such as Scalyr related annotations).
Improvements:
- Docker and Kubernetes monitors now support parsing date and time information from the container log lines with custom timezones (aka non UTC).
Docker images:
- Docker images now include udatetime time dependency which should speed up parsing date and time information from Docker container log lines.
- Upgrade zstandard and orjson dependency used by the Docker image.
Improvements:
- Add new
debug_level_logger_names
config option. With this config option user can specify a list of logger names for which to set debug log level for. Previously user could only set debug level for all the loggers. This comes handy when we want to set debug level for a single of subset of loggers (e.g. for a single monitor). - If profiling is enabled (
enable_profiling
configuration option), memory profiling data will be ingested by default (CPU profiling data was already being ingested when profiling was enabled, but not memory one).
Docker images:
- Upgrade Python used by Docker images from 3.8.10 to 3.8.12.
- Base distribution version for non slim Alpine Linux based images has been upgraded from Debian Buster (10) to Debian Bullseye (11).
Bug fixes:
- Fix a bug with the URL monitor not working correctly with POST methods when
request_data
was specified under Python 3.
Improvements:
- Add
collect_replica_metrics
config option to the MySQL monitor. When this option is set to False (for backward compatibility reasons it defaults to True) we don't try to collect replica metrics and as such, user which is used to connect to the database only needs PROCESS permissions and nothing else. - Add support for specifying new
server_url
config option for each session worker. This allows user to configure different session workers to use different server urls (e.g. some workers send data to agent.scalyr.com and other send data to eu.scalyr.com).
Other:
- Update agent to also log response
message
field value when we receiveerror/client/badParam
status code from the API to make troubleshooting easier.
Docker Images:
- Docker images now utilize Python 3.8 (in the past they used Python 2.7).
- Docker images are now also produced and pushed to the registry for the
linux/arm64
andlinux/arm/v7
architecture. - Docker image now includes
pympler
dependency by default. This means memory profiling can be enabled via the agent configuration option without the need to modify and re-build the Docker image. ujson
dependency has been removed from the Docker image in favor oforjson
which is more performant and now used by default.- Alpine based Docker images which are 50% small than regular Debian bullseye-slim based ones are now available. Alpine based images contains
-alpine
tag name suffix.
Other:
- Added a LetsEncrypt root certificate to the Agent's included certificate bundle.
- Update Kubernetes monitor to log a message under INFO log level that some container level metrics won't be available when detected runtime is
containerd
and notdocker
and container level metrics reporting is enabled.
Bug fixes:
- Fix plaintext mode in the Graphite monitor.
- Update Linux system metric monitor so it doesn't print an error and ignores
/proc/net/netstat
lines we don't support metrics for.
Improvements:
- Implemented optional line merging when running in Docker based Kubernetes to work around Docker's own 16KB line length limit. Use the
merge_json_parsed_lines
config option orSCALYR_MERGE_JSON_PARSED_LINES
environment variable to enable this functionality.
Bug fixes:
- Update agent to throw more user-friendly error when we fail to parse fragment file due to invalid content.
Bug fixes:
- Fix docker container builder scripts to only use
buildx
if it is available. - Fix memory leak in the Syslog monitor which is caused by a bug in standard TCP/UDP servers in Python 3 (https://bugs.python.org/issue37193). The version of Python for Windows was updated to 3.8.10. For Linux users, who run agent on Python 3 and use Syslog monitor, it is also recommended to check if their Python 3 installation is up to date and has appropriate bug fixes.
- Fix bug in the copying manager which works with monitor (such as Docker or Syslog monitor). This bug might cause re-uploading of the old log messages in some edge cases.
Bug fixes:
- Don't log "skipping copying log lines" messages in case number of last produced bytes is 0.
- Fix Kubernetes Agent DaemonSet liveness probe timeout too short and unhealthy agent pod not restarting when a liveness probe timeout occurs.
Other:
- Update Windows Event Log monitor to emit a warning if
maximum_records_per_source
config option is set to a non-default value when using a new event API where that config option has no affect.
Improvements:
- Allow journald monitor to be configured via globs on multiple fields via
journald_logs.journald_globs
config option. Contributed by @imron. #741
Bug fixes:
- Fix an issue where log lines may be duplicated or lost in the Kubernetes monitor when running under CRI with an unstable connection to the K8s API.
- Fix an issue where LogFileIterator during the copy-truncate process picks wrong pending file with a similar name causing loss of the log and showing negative bytes in agent status.
Other:
- The discovery logic of the log files, which have been rotated with copy truncate method, now can handle only default rogrotate configurations.
- Agent now emits a warning if running under Python 2.6 which we will stop supporting in the next release.
Improvements:
- Add support for collecting some metrics Kubernetes when running in a CRI runtime.
- Add support for SSL in the
mysql_monitor
. This can be enabled by settinguse_ssl
to True in the monitor configuration.
Bug fixes:
- Ensure pod digest which we calculate and use to determine if pod info in the Kubernetes monitor has changed is deterministic and doesn't depend on dictionary item ordering.
- Fix a race condition in the worker session checkpoint read/write logic, which was introduced with the
multi-worker
feature.
Other:
- Changed the logging level of "not found" errors while querying pods from the Kubernetes API from ERROR to DEBUG, as these errors are always transient and result in no data loss.
Improvements:
- Add
Endpoint
to the defaultevent_object_filter
values for the Kubernetes Event Monitor. - Update
scalyr-agent-2 status -v
output to also include process id (pid) of the main agent process and children worker processes in scenarios where the agent is configured with multiple worker processes. - Add a new experimental
--systemd-managed
flag to thescalyr-agent-2-config
tool which converts existing scalyr agent installation to be managed by systemd instead of init.d. - Add support for capturing k8s container names to the kubernetes monitor via log line attributes. Container name can be captured using
${k8s_container_name}
config template syntax. - Kubernetes monitor has been updated to log 403 errors when connecting to the Kubernetes API before falling back to the old fallback URL under warning log level. Previously those errors were logged under debug log level which would make it hard to troubleshoot some issues related to missing permissions, etc. since those log messages would only end up in debug log file which is disabled by default.
Bug fixes:
- Fix
scalyr-agent-status -v
to not emit / print warnings under some edge cases. - Fix regression with syslog monitor which caused the ingestion of the file with a wrong parser. The root cause was in the too broad path glob that was used in the agent log's
LogMatcher
, so it also processed theagent_syslog.log
as an agent log file.
Improvements:
- Add new
tcp_request_parser
andtcp_message_delimiter
config option to thesyslog_monitor
. Valid values fortcp_request_parser
includedefault
andbatch
. New TCP recv batch oriented request parser is much more efficient than the default one and should be a preferred choice in most situations. For backward compatibility reasons, the default parser hasn't been changed yet. - Update agent to emit a warning if
k8s_logs
config option is defined, but Kubernetes monitor is not enabled / configured. - Update Kubernetes and Docker monitor to not propagate and show some non-fatal errors.
- Field values in log lines for monitor metrics which contain extra fields are now sorted in alphabetic order. This should have no impact on the end user since server side parsers already support arbitrary ordering, but it's done to ensure consistent ordering and output for for monitor log lines.
- Update agent to throw more user-friendly exceptions on Windows when the agent doesn't have access to the agent config file and Windows registry.
- Update code which periodically prints out useful configuration settings to also include actual value of the json library used in case the config option is set to "auto".
Bug fixes:
- Fix a race condition in
docker_monitor
which could cause the monitor to throw exception on start up. - Fix a config deprecated options bug when they are set to
false
. - Fix agent so it doesn't throw an exception on Windows when trying to escalate permissions on agent start.
- Make sure we only print the value of
win32_max_open_fds
config option on Windows if it has changed. - Fix a bug which could result in docker, journald and windows event log checkpoint files to be deleted when restarting the agent. This would only affect docker monitor configurations which are setup to ingest logs via Docker API and not json log files on disk (aka
docker_raw_logs: false
docker monitor option is set). - Fix a small bug which might skip first lines of the worker session log file.
Other:
- Pre-built frozen Windows binaries now bundle and utilize Python 3 (3.8.7). In the past releases, they utilized Python 2.7, but due to the Python 2.7 being EOL for a year now, new release will utilize Python 3. This change should be opaque to the end user and everything should continue to work as it did in the previous releases.
Bug fixes:
- Fix syslog monitor default TCP message parser bug which was inadvertently introduced in 2.1.16 which may sometimes cause for a delayed ingest of some syslog data. This would only affect installations utilizing syslog monitor in TCP mode using the default message parser. As part of this fix, we have removed the new message parsers added in 2.1.16. If you were using them, you will revert to the default parser
Features:
- Add copy truncate log rotation support. This is enabled by default. It does not support copy truncate with compression unless the
delaycompress
option is used. This feature can be disabled by settingenable_copy_truncate_log_rotation_support
to false.
Improvements:
- Add new
tcp_request_parser
andtcp_message_delimiter
config option. Valid values fortcp_request_parser
includedefault
andbatch
. New TCP recv batch oriented request parser is much more efficient than the default one and should be a preferred choice in most situations. For backward compatibility reasons, the default parser hasn't been changed yet. shell_monitor
now outputs two additional metrics during each sample gather interval -duration
andexit_code
. First one represents how many seconds it took to execute the shell command / script and the second one represents that script exit (status).
Misc:
- On startup and when parsing a config file, agent now emits a warning if the config file is readable by others.
- Add the config option
enable_worker_process_metrics_gather
to enable 'linux_process_metrics' monitor for each multiprocess worker. - Each session, which runs in a separate process, periodically writes its stats in the log file. The interval between writes can be changed by using the
default_worker_session_status_message_interval
- Rename some of the configuration parameters:
use_miltiprocess_copying_workers
touse_multiprocess_workers
,default_workers_per_api_key
todefault_sessions_per_api_key
. Previous option names are preserved for the backward compatibility but they are marked as deprecated. NOTE: The appropriate environment variable names are changed too. - Update docker monitor so we don't log some non-fatal errors under warning log level when consuming logs using Docker API.
- Add support for
compression_type: none
config option which completely disables compression for outgoing requests. Right now one of the main bottle necks in the high volume scenarios in the agent is compression operation. Disabling it can, in some scenarios, lead to large increase to the overall throughput (up to 2x). Disabling the compression will in most cases result in larger data egress traffic which may incur additional charges on your infrastructure provider so this option should never be set tonone
unless explicitly advised by the technical support. - Linux system metrics monitor has been updated to also ignore
/var/lib/docker/*
and/snap/*
mount points by default. Capturing metrics for those mount points usually offers no additional insight to the end user. For information on how to change the ignore list via configuration option, please see RELEASE_NOTES. - The agent install bash script now adds the Scalyr repositories directly without installing the
scalyr-repo
packages. This also eliminates errors caused by re-acquiring the package manager's lock file during the pre/post install/uninstall scripts. The issue occurred in bothapt
andrpm
package managers.
Security fixes and improvements:
- Agent installation artifacts have been updated so the default
agent.json
file which is bundled with the agent is not readable by "other" system users by default anymore. For more context, details and impact, please see RELEASE_NOTES.
Feature:
- Ability to upload logs to different Scalyr team accounts by specifying different API keys for different log files. See RELEASE_NOTES for more details.
- New configuration option
default_workers_per_api_key
which creates more than one session with the Scalyr servers to increase upload throughput. This may be set using theSCALYR_DEFAULT_WORKERS_PER_API_KEY
environment variable. - New configuration option
use_multiprocess_copying_workers
which uses separate processes for each upload session, thereby providing more CPU resources to the agent. This may be set using theSCALYR_USE_MULTIPROCESS_COPYING_WORKERS
environment variable. Improvements: - Linux system metrics monitor now ignores the following special mounts points by default:
/sys/*
,/dev*
,/run*
. If you want still capturedf.*
metrics for those mount points, please refer to RELEASE_NOTES. - Update
url_monitor
so it sends correctUser-Agent
header which identifies requests are originating from the agent.
Misc:
- The default value for the
k8s_cri_query_filesystem
Kubernetes monitor config option (set via theSCALYR_K8S_CRI_QUERY_FILESYSTEM
environment var) has changed toTrue
. This means that by default when in CRI mode, the monitor will only query the filesystem for the list of active containers, rather than first querying the Kubelet API. If you wish to revert to the original default to prefer using the Kubelet API, setSCALYR_K8S_CRI_QUERY_FILESYSTEM
the environment variable to "false" for the Scalyr Agent daemonset. - New
global_monitor_sample_interval_enable_jitter
config option has been added which is enabled by default. When this option is enabled, random sleep between 2/10 and 8/10 of the configured monitor sample gather interval is used before gathering the sample for the first time. This ensures that sample gathering for all the monitors doesn't run at the same time. This comes in handy when running agent configured with many monitors on lower powered devices to spread the monitor sample gathering related load spike across a longer time frame.
Bug fixes:
- Fix to make sure we don't expect a valid Docker socket when running Kubernetes monitor in CRI mode. This fixes an issue preventing the K8s monitor from running in CRI mode if Docker is not available.
- Fix line grouping code and make sure we don't throw if line data contains bad or partial unicode escape sequence.
- Fix
scalyr_agent/run_monitor.py
script so it also works correctly out of the box when using source code installation. - Update Windows System Metrics monitor to better handle a situation when disk io counters are not available.
- Docker monitor has been fixed that when running in "API mode" (
docker_raw_logs: false
) it also correctly ingests logs from containerstderr
. Previously only logs fromstdout
have been ingested.
Features:
- Add new
initial_stopped_container_collection_window
configuration option to the Kubernetes monitor, which can be configured by setting theSCALY_INITIAL_STOPPED_CONTAINER_COLLECTION_WINDOW
environment variable. By default, the Scalyr Agent does not collect the logs from any pods stopped before the agent was started. To override this, set this parameter to the number of seconds the agent will look in the past (before it was started). It will collect logs for any pods that was started and stopped during this window. This can be useful in autoscaling environments to ensure all pod logs are captured since node creation, even if the Scalyr Agent daemonset starts just after other pods.
Improvements:
- Improve logging in the Kubernetes monitor.
- On agent start up we now also log the locale (language code and encoding) used by the agent process. This will make it easier to troubleshoot issues which are related to the agent process not using UTF-8 coding.
- Default value for
tcp_buffer_size
Syslog monitor config option has been increased from 2048 to 8192 bytes. - New
message_size_can_exceed_tcp_buffer
config option has been added to Syslog monitor. When set to True, monitor will support messages which are larger thantcp_buffer_size
bytes in size andtcp_buffer_size
config option will tell how much bytes we try to read from the socket at once / in a single recv() call. For backward compatibility reasons, it defaults to False.
Bug fixes:
- Fix a bug / race-condition in Docker monitor which could cause, under some scenarios, when monitoring containers running on the same host, logs to stop being ingested after the container restart. There was a relatively short time window when this could happen and it was more likely to affect containers which take longer to stop / start.
- Update code for all the monitors to correctly use UTC timezone everywhere. Previously some of the code incorrectly used local server time instead of UTC. This means some of those monitors could exhibit incorrect / undefined behavior when running the agent on a server which has local time set to something else than UTC.
- Fix
docker_raw_logs: false
functionality in the Docker monitor which has been broken for a while now. - Update Windows System Metrics monitor to better handle a situation when disk io counters are not available.
Bug fixes:
- Fix
scalyr-agent-2 status
command non-fatal error when running status command multiple times concurrently or in a short time frame. - Fix
scalyr-agent-status
command to not log config override warning to stdout since it may interfere with consumers of the status command output. - Fix merging of active-checkpoints.json and checkpoints.json checkpoint file data. Previously data from active checkpoints file was not correctly merged into full checkpoint data file which means that under some scenarios (e.g. agent crashed after active checkpoint file was written, but before full checkpoint file was written), data which was already sent to the server could be sent twice. Actual time window when this could happen was relatively small since full checkpoint data is written out every 60 seconds by default. Reported by @anton-ryzhov. #638
- Fix Postgres monitor error when specifying the Postgres
database_port
in the agent config.
Mics:
- Upgrade
psutil
dependency which incorporates many critical fixes. As part of the change, Windows Server 2003/XP is no longer supported. - Small fix for the
pywin32
library which is used in the Windows package.
Features:
- Add new
win32_max_open_fds
configuration option which allows user to overwrite maximum open file limit on Windows for the scalyr agent process.
Bug fixes:
- Fix bug in packaging which would cause agent to sometimes crash on Windows when using windows event log monitor.
Bug fixes:
- Fix formatting of the "Health Check:" line in
scalyr-agent-2 status -v
command output and make sure the value is left padded and consistent with other lines. - Fix reporting of "Last successful communication with Scalyr" line value in the ``scalyr-agent-2 status -v` command output if we never successfuly establish connection with the Scalyr API.
- Fix a regression in
scalyr-agent-2-config --upgrade-windows
functionality which would sometimes throw an exception, depending on the configuration values.
Security fixes and improvments:
- Fix a bug with the agent not correctly validating that the hostname which is stored inside the certificate returned by the server matches the one the agent is trying to connect to (
scalyr_config
option). This would open up a possibility for MITM attack in case the attacker was able to spoof or control the DNS. - Fix a bug with the agent not correctly validating the server certificate and hostname when using
scalyr-agent-2-config --upgrade-windows
functionality under Python < 2.7.9. This would open up a possibility for MITM attack in case the attacker was able to spoof or control the DNS. - When connecting to the Scalyr API, agent now explicitly requests TLS v1.2 and aborts connection if the server doesn't support it or tries to use an older version. Recently Scalyr API deprecated support for TLS v1.1 which allows us to implement this change which makes the agent more robust against potential downgrade attacks. Due to lack of required functionality in older Python versions, this is only true when running the agent under Python >= 2.7.9.
- When connecting to the Scalyr API, server now sends a SNI header which matches the host specified in the agent config. Due to lack of required functionality in older Python versions, this is only true when running the agent under Python >= 2.7.9.
Bug fixes:
- Fixed a regression in Scalyr Windows Agent cmdlet script (
ScalyrShell.cmd
) which prevents the agent from starting.
Features:
- The
status -v
command now contains health check information, and will have a return code of2
if the health check has failed. New optional flag for thestatus
CLI command-H
returns a short status with only health check info. A new configuration featurehealthy_max_time_since_last_copy_attempt
defines how many seconds is acceptable for the Agent to not attempt to send up logs before the health check should fail, defaulting to60.0
. For more information, please refer to the release notes document. - Kubernetes yaml has been updated to include a liveliness check based on the new health check info, which will cause a pod restart if the agent is considered unhealthy.
Bug fixes:
- Fixed race condition in pipelined requests which could lead to duplicate log upload, especially for systems with a large number of inactive log files. Log files would be reuploaded from their start over short period of time (seconds to minutes). This bug is triggered when pipelining is enabled, either by explicitly setting the
pipeline_threshold
config option or by using a Scalyr Agent release >= 2.1.6 (pipelining was turned on by default in 2.1.6). - Fixed the misconfiguration in Windows packager which causes some number of the monitors to not be included in Windows version. This generates import errors when attempting to use monitors like the syslog or shell monitor. Misc:
compression_level
configuration option now defaults to6
when usingdeflate
compression_type
(deflate
is the default value for thecompression_type
configuration option). 6 offers the best trade off between compression ratio and CPU usage. For more information, please refer to the release notes document.
Features:
- New configuration feature
k8s_logs
allows configuring of Kubernetes logs similarly to thelogs
configuration but matches based on Kubernetes pod, namespace, and container name. Please see the RELEASE_NOTES for more details.
Bug fixes:
- Fixed race condition that sometimes resulted in duplicated K8s logs being uploaded on agent restart or configuration update.
Misc:
- The Windows package is now built using
pyInstaller
instead ofpy2exe
. As part of the change, we are no longer supporting 32-bit Windows systems. Nothing else should change due move topy2Installer
.
Features:
- New configuration option
max_send_rate_enforcement
allows setting a limit on the rate at which the Agent will upload log bytes to Scalyr. You may wish to set this if you are worried about bursts of log data from problematic files and want to avoid getting charged for these bursts. - New default overrides for a number of configuration parameters that will result in a higher throughput for the Agent. If you were relying on the lower throughput as a makeshift rate limiter we recommend setting the new
max_send_rate_enforcement
configuration option to an acceptable rate or "legacy" to maintain the current behavior. See the RELEASE_NOTES for more details.
Minor updates:
- Default value for
max_line_size
has been raised to 49900. If you have this value in your configuration you may wish to not set it anymore to use the new default.
Bug fixes:
- Fixed Syslog monitor issue causing monitor to write binary strings to the log file.
Bug fix
- Fixed issue causing the Windows version of the Scalyr Agent to not start when starting in forked mode.
Critical bug fix
-
Updated bundled certificates used to verify Scalyr TLS certificate. Existing bundled certificates had expired causing some customers to not be able to verify the TLS connection.
-
Added ability to specify an extra configuration snippet directory (in addition to
agent.d
). You may specify the extra directory by setting the SCALYR_EXTRA_CONFIG_DIR environment variable to the path of your desired directory. This has been added mainly to help mounting additional configuration in Kubernetes.
Bug fix
- Fix timing issue in
kubernetes_monitor
that could result in not collecting logs from very short lived containers.
Features
- The Kubernetes and Kubernetes event monitors now allow you to restrict the namespaces for which the Scalyr Agent will collect logs and events. This may be controlled via the
SCALYR_K8S_INCLUDE_NAMESPACES
environment variable ork8s_include_namespaces
configuration option. See release notes for more information.
Bugs
- Fixed Kubernetes monitor to verify by default TLS connections made to the local Kubelet. This was causing an excessive amount of warnings to be emitted to the stdout. See release notes for details on how to disable verification if this causes issues.
- Fixed issue preventing the Apache monitor from running under Python 2.
- Fixed some issues causing some logging to fail due to incorrect arguments.
Features
- The Agent no longer enforces all events must have monotonically increasing timestamps when pushed into Scalyr. The is now possible due to changes in the server. This allow better use of the default timestamps, such as in Docker and Kubernetes.
- Log lines collected via the Kubernetes or Docker monitors will now have their default timestamps set to the timestamp generated by K8s/Docker. As long as a parser does not override the timestamp, the K8s and Docker timestamps will be used in Scalyr.
- Agent logs with severity higher than INFO will now be output to stdout as well as the
agent.log
file when running with the--no-fork
flag. - The Journald monitor now supports a
detect_escaped_strings
option which automatically detects if thedetails
field is already an escaped string, and if so, just emits it directly to the log instead of re-escaping it. This option is specified using thejournald_logs
array. See the journald monitor documentation for more details.
Bugs
- Fixed monitor configuration bug that prevented config options that did not have a default nor were required from being properly set via environment variables.
Optimizations
- Optimize RFC3339 date strings parsing. This should result in better throughput under highly loaded scenarios (many lines per second) when using Docker / Kubernetes monitor.
- Speed up event serialization under highly loaded scenarios by optimizing json encoding and encoding of event attributes.
- We now default to
orjson
JSON library under Python 3 (if the library is available).orjson
is substantially faster thanujson
for encoding.
Features
- Major update of code base to support running under Python 2 and 3. See release notes for more information.
- Agent now supports Python 2.6, 2.7 and >= 3.5
- When installing using RPM or Debian, Python 2 will be used by default unless unavailable.
- The Python version used to run the Agent may be controlled using the
scalyr-switch-python
command. - The K8s manifest files have been changed to run the Agent in the
scalyr
namespace instead of thedefault
namespace. When updating an existing Scalyr Agent K8s instance, you must follow the upgrade instructions described in the upgrade notes. - RPM and Debian packages no longer declare dependency on Python to promote cross-distribution compatibility. The dependency is now verified at package install time.
- Added option to
scalyr-agent-2 status -v
to emit JSON (--format=[text|json]
). - Add new
metric_name_blacklist
supported attribute to each monitor configuration section. With this attribute, user can define a list of metric names which should be excluded and not shipped to Scalyr. - Add support for
orjson
JSON library when running under Python 3. This library offers significantly better performance and can be enabled by settingjson_library
config option toorjson
and installingorjson
Python package using pip.
Bugs
- Fix authentication issue in
kubernetes_monitor
when accessing kublet API. - Better error messages issued when missing required certificate files.
- Updated Kubernetes manifest file to use
apps/v1
for DaemonSet API version instead of beta version. - Update scalyr client code to log raw uncompressed body under debug log level to aid with troubleshooting.
- Metric type for
app.disk.requests.{read,write}
metrics has been fixed. - Fix
iostat
monitor so it also works with newer versions of Linux kernel. - Fix invalid extra field for two
tomcat.runtime.threads
metrics in the Tomcat monitor (all of the metrics had type set tomax
whereas one should have type set toactive
and the other tobusy
).
Minor updates
- The
/etc/init.d/scalyr-agent-2
symlink is no longer marked as a conf file in Debian packages. - Update of embedded ecsda library to 0.13.3
- Docker support now requires the docker 4.1 client library
- Changed which signal is used to execute
scalyr-agent-2 status -v
under Linux to improve handling of SIGINT. PreviouslySIGINT
was used, nowSIGUSR1
is used. - When running in foreground mode (
--no-fork
flag), SIGINT signal (aka CTRL+C) now starts the graceful shutdown procedure. - Two new metrics (
app.io.wait
,app.mem.majflt
) are now emitted by the Linux process monitor. If you want those metrics to be excluded for your monitors, you can utilize newmetric_name_blacklist
monitor config option.
Testing updates
- Numerous changes to improve testing and coverage reporting
Feature
- Improved support for handling long log lines captured by
kubernetes_monitor
anddocker_monitor
. When parsing docker logs, the serialized JSON objects can now be up tointernal_parse_max_line_size
bytes (defaults to 64KB). This replaces the requirement they be less thanmax_line_size
. Thelog
field extracted from the JSON object is still subjected to themax_line_size
limit. - Improved handling when
kubernetes_monitor
fails due to an uninitialized K8s cache. If akubernetes_monitor
becomes stuck for 2 minutes and cannot initialize, the agent will terminate its container to allow for K8s to restart it.
Bugs
- Add additional diagnostic information to help customers troubleshoot issues in the
kubernetes_monitor
due to failures in K8s cache initialization. - Fix bug due to protected access in the
journald_monitor
.
Packaged by Steven Czerwinski [email protected] on Jan 23, 2020 10:30 -0800
Bugs
- Fixed error in Windows release caused by missing
glob2
module - Fixed removal of
stream
attribute from logs uploaded using Docker and K8s monitors
Bugs
- Fixed error in
syslog-monitor
causing monitor to stop working withoutdocker-py
installed on host system
Features
- Added support to
journald
plugin to create separate log files (with distinct configuration such as parsers) based onunit
field. - Add supported for
**
in file path glob patterns recursively drill down into sub-directories. - Changed to new API format that allows for more efficient encoding of log file attributes
Bugs
- Fixed bug causing SSL connection failures on certain Windows configurations.
Features
- Added TSLite support to allow customers using Python 2.6 to connect via TLSv1.1
Bugs
- Fix issue causing log copying slow down when there is a high churn in the log files it is copying (typically K8s or Docker with lots of containers coming up down)
Miscellaneous changes
- No longer fail startup if cannot connect successfully to server
- Change default of
check_remote_if_no_tty
config option toTrue
- Test cleanups
Features
- Collection of per CPU metric data under docker/kubernetes has been placed behind the config option
docker_percpu_metrics
(defaulted toFalse
) to prevent collecting excessive metrics on machines with a large number of cores.
Miscellaneous changes
- Added scripts for bringing up/tearing down large clusters for testing
- Fixed problem with smoketest scripts
Features
- K8s serverHosts are no longer displayed in Scalyr UI Log Sources (thus avoiding situations where hundreds of serverHost entries pollute the Log Sources overview).
Bugs
- MySQL monitor plugin fails to connect to Amazon RDS instances. Note: the fix requires python 2.7 so customers running older python versions will no longer be able to run the MySQL monitor from this release onward.
Features
- Major refactor of kubernetes_monitor (rate-limit k8s api master calls and avoid large api master queries). Tested on a 1000-node cluster.
Bugs
- You can now set config params to empty string, thus allowing overriding of non-null default values (e.g. set
k8s_ignore_namespaces
to empty list instead of defaultkube-system
). k8s_ignore_namespaces
is now a comma-separated array-of-strings (legacy space-separated behavior is still supported).- JSON logs were causing
total_bytes_skipped
to incorrectly report as always increasing in agent.logagent_status
line. - Python 2.5 sometimes fails to launch because
httplib.HTTPConnection._tunnel_host
is not present.
Miscellaneous changes
- Emit log message indicating whether kubernetes_monitor is using docker socket or cri filesystem to query running containers.
- If an error is encountered while parsing JSON logs, do not log the offending JSON line in agent.log.
- Better reporting of stack trace when exception occurs in kubernetes & docker monitor _get_container() loops.
Features
- Default
docker_raw_logs
to True. - Default
compression_type
tobz2
(previously compression was off). - Add
SCALYR_SERVER
to sample config map and docker readme.
Bugs
- Fix custom json lib handling of negative integers.
- Fix bug to allow agent to inherits environment umask.
- Restore default value of
report_k8s_metrics
to True.
Features
- Add
container_globs_exclude
option to docker monitor for blacklisting containers by name. - Many
docker_monitor
plugin config variables are now environment-aware. - The global
compression_level
config variable is now environment-aware. - Add SCALYR_K8S_EVENTS_DISABLE in addition to K8S_EVENTS_DISABLE, but either will work.
- Allow tilde (~) character in
linux_system_metrics
tags (e.g. mounted file names may contain it). - Incorporate profiler into agent for runtime performance analysis.
- Add random skew to k8s api queries to lessen peak load on k8s api masters.
Bugs
- Fix Dockerfile.custom_k8s_config to use correct image
- Fix bug in Windows upgrade utility
- Fix attribute error in k8s controller object
Miscellaneous changes
scalyr-agent-2.yaml
now exports SCALYR_K8S_EVENTS_DISABLE instead of K8S_EVENTS_DISABLE
Bugs
- Fix memory leak in linux process metrics introduced in v2.0.47 which manifests as high CPU or memory use.
Features
- ContainerD support
- Reduced Agent docker image sizes (using Alpine)
Bugs
- Fix bug where if hashing is used with redaction, non-matching lines are not uploaded
- Fixed code bugs causing Agent not to start in versions of Python < 2.7
- Fixed github issue #180
- Updated docker/README.md
Features
- Expanded mapping of config vars to top-level and k8s environment variables in code (not
import_vars
). - Expanded env vars enable agent configuration via single k8s configMap (imported with
envFrom
). - Config vars such as
container_globs
and logsexclude
globs are now assigned a type (comma-separated strings). - Display Scalyr-related environment variables in
agent status -v
. - Include Docker versioning information when reporting to Scalyr.
Bugs
- Fix issue preventing k8s agent from recognizing new pods.
Features
- Publish image supporting Docker logs via
json
driver instead ofsyslog
- Include K8s versioning information when reporting to Scalyr
Bugs
- Remove Python 2.7 specific uses for
format
- Fix parser not being respected when set in log file attributes
Features
- Add configuration-via-labels support for the
docker_monitor
. - Record
eventId
inwindows_events_monitor
.
Bugs
- Fix issue preventing collection of remote events in
windows_events_monitor
. - Fix incorrect metrics collection in
postgres_monitor
due to not properly reseting cursor. - Eliminate redudant K8s cache in the agent running on the K8s event leader.
Features
- Release of the
kubernetes_events_monitor
plugin with collects Kubernetes Events and pushes them into Scalyr. See the monitor documentation for more information.
Bugs
- Fix issue preventing log configs with glob patterns to override exact match patterns. This is required to help override redaction and parsers rules for Docker.
- Fix issue in mysql_monitor preventing error message from being logged
Features
- Add in v2 K8s support. You must configure a k8s cluster name to start using the new v2 support. See Kubernetes Agent Install directions.
Bugs
- Only allow non-secure connections to Scalyr if explicit override option is set.
- Redesign of k8s cache to better handle large number of pods
- Allow the k8s api server address to be set via a configuration option.
- Parallelized container metric fetches to avoid not being able to collect metrics in time.
Bugs
- Fix bad class name for wrapped responses in Docker and Kubernetes monitor.
- Fix high CPU consumption in Kubernetes monitor due to encoding detection.
- Add defensive code to Kubernetes cache to prevent cache death spiral.
Bugs
- Fix urgent incompatibility issue with python version < 2.7 in call to string decoding.
Bugs
- Adjusted unicode decoding to be more tolerant of bad input.
- Added support for optimized json parsing library. Changed k8s agent to use optimized version.
Features
- Added ability to specify default values for environment variables imported in configuration files.
Bugs
- Added temporary hack to new Kubernetes support to upload Daemonsets as Deployments.
Features
- Finalized new support for Kubernetes. Not turned on by default yet. Will be generally available in November.
Bugs
- Fix journald monitor to system default path.
Bug fixes
- Remove / rework debug log statements that were consuming too much CPU.
- Add ability to disable the Kubernetes monitor from parsing the json log lines on the client
- Do not write out the full checkpoint.json file every iteration. Instead, write out only the active log files on every iteration with a consolidation every one minute.
Features:
- Added support in the
kubernetes_monitor
for annotations-based configuration. You can now specify parsers and whether or not a pod's logs can be sent to Scalyr via k8s annotations. See documentation onkubernetes_monitor
for more details.
Bug fixes:
- Added fixes that should eliminate memory leak.
Features:
- Added
journald_monitor
for collecting logs from a localjournald
service. Requires installation of the Pythonsystemd
library. - Added
garbage_monitor
for debugging memory consumption issues in the agent. - Added
verify_k8s_api_queries
option to thekubernetes_monitor
to allow disabling SSL verification on local API calls.
Bug fixes:
- Fixed issues causing the
linux_process_metrics_monitor
to throw exceptions when the monitored process dies. - Additional debug flags to aide in investigating memory issues with the agent.
Features:
- Relaxed configuration rules to allow for any config option to be specified in a configuration fragment in the
agent.d
directory.
Bug fixes:
- Fixed bug in
linux_process_metrics
that would generate errors when a monitored process died - Fixed issue in the docker and kubernetes monitors that would prevent the agent from sending logs from short lived containers
- Fixed issue in the docker and kubernetes monitors that could cause the tail end of a containers logs to not be uploaded after it exits
- Fixed issue in docker monitor where the
metrics_only
config option was not being obeyed. - Added various options to help investigate memory leak
Bug fixes:
- Added various options to help investigate memory leak
Features:
- Added kubernetes support with introduction of new kubernetes monitor along with kubernete image published on dockerhub.com under scalyr/scalyr-k8s-agent. See online documentation for more information.
- Added
network_interface_suffix
option tolinux_system_metrics
monitor to control regular expression used to validate interfaces names. Additionally, widened existing rule to accept letters at the end of the interface name. - Added
aggregate_multiple_processes
tolinux_process_metrics
which allows multiple processes to be matched by process matching rule. The reported metrics will include the statistics from all matching processed. - Added
include_child_processes
tolinux_process_metrics
to include all child processes from the matched process in the reported metrics.
Features:
- Changed default for
max_line_size
to 9900 to match new largermessage
field support on the server.
Bug fixes:
- Fixed bug causing the Docker plugin to
leak
file descriptors as new containers were added while using thedocker_api
mode.
Features:
- Support to parse log lines written as a JSON object to extract line content before sending to Scalyr. You may turn this on by setting
parse_lines_as_json
totrue
in your log configuration stanza for a particular file. This is useful when uploading raw Docker logs. - Read the Scalyr api key from an environment variable
- Add support for
PUT
requests to thehttp_monitor
. - Add support for replacing redacted values with a hash of the actual value. See the redaction documentation for more details.
Bug fixes:
- Fix win32 build such that it correctly pulls packages from the third party repository. This fixes a bug resulting in SOCK5 proxy support not working under Windows.
New features:
- Upgraded
requests
library to support new proxy protocols such asSOCKS5
. To useSOCKS5
, use either thesocks5
orsocks5h
protocol when specifying the proxy address (socks5
resolves DNS addresses locally, whereassocks5h
resolves addresses at theSOCKS5
server).
Bug fixes:
- Fix bug preventing multiple instances of the
syslog_monitor
from logging to separate log files.
New features:
- Added new option
report_container_metrics
todocker_monitor
to allow disabling of gathering and reporting metrics for each Docker container.
Bug fixes:
- Fix recent breakage causing the
run_monitor
code to fail. - Fix error / warning logging in
syslog_monitor
to better capture issues with Docker support.
New features:
- Added feature to allow completely disabling the use of the Docker socket in certain Docker configurations.
- New
max_log_size
andmax_log_rotations
configuration options that can be used to set the maximum length an agent-generated log file can grow before it is rotated and the maximum number of rotations to keep.
Bug fixes:
- Fix bug preventing rotated Docker files from being deleted
- Fix bug that resulted in
https_proxy
and ``http_proxy` configuration options being ignored.
New features:
- Docker integration: Old docker log files will be deleted after unused for 1 day (by default).
- Change to pidfile format in preparation for better
systemd
support
Bug fixes:
- Syslog/Docker fix that was causing logs being uploaded with default parser instead of user-specified parser
- Minor fixes for BMP unicode
New features:
- New
rename_logfile
option to change the file name and path for a log file when uploading it to Scalyr. - New
copy_from_start
option to instruct the agent to copy a log file from its start when it first matches. - New metric in
linux_system_metrics
to record number of CPUs used by a machine. Metric name issys.cpu.count
. - New
strip_domain_from_default_server_host
configuration option to remove any domain name fromserverHost
when using the default value.
Bug fixes:
- Syslog/Docker fix that generated a
port already in use
error when reading new configuration file - Syslog/Docker fix that caused dropped log lines when write rate exceeded certain threshold
- Syslog CPU performance improvements
Major overhaul of Docker support:
- New approach to improve stability across multiple versions of Docker
- Relies on Docker's
syslog
logging plugin - Automatic collection of Docker metrics for all containers running on host
- Improved Dockerfiles to ease creating images with custom Scalyr Agent configuration
- Improved methods for configuring Scalyr Agent running within the container
- Customize where container log files are written
- See the Docker installation documentation for more details.
Additional features:
- New
exclude
option for writing log matching rules. When specifying which logs to collect using a glob pattern, you can exclude any of the matching logs using another glob. The field value should be an array of glob patterns. - The
import_vars
feature has been extended to work in all files in theagent.d
directory. The variables imported by a file are only applied to that file. - Both the
api_key
andscalyr_server
fields may now be set in any file in theagent.d
directory. - Initial support for compressing upload payloads
Bug fixes:
- Removed spurious exceptions in the
linux_process_metrics
module due to reading blank lines from/prod/pid/io
- Prevent log upload being wedged when encountering invalid utf-8 characters
- Prevent errors in Windows when file descriptors were attempted to be close
- More diagnostic output and defensive code for the
linux_process_metrics
module to help investigate issue
Bug fixes:
- Fixed bug in
linux_process_metrics
causing errors when monitored process disappears
Features:
- Added ability to set HTTP and HTTPS proxies via configuration file. Use new configuration variables
http_proxy
andhttps_proxy
. You must also setuse_requests_lib
to true. - Report the number of open file descriptors held by a process for
linux_process_metrics
. The metric name isapp.io.fds
. - The assigned the
scalyrAgentLog
parser to the agent log - Added ability to skip the Scalyr connectivity check at agent start up using
--no-check-remote-server
.
Bug fixes:
- Added third-party library to fix issue with
postgres_monitor
. - Added extra logging to track issue reported with Windows agent of silent shutdown.
- Fixed issue with
haltBefore
line grouper that prevented logs from being uploaded for several minutes. - Changed default for
line_completion_wait_time
from 300 secs to 5 secs. - Always delete the
*.pyc
files when doing an upgrade or uninstall
Features:
- Added feature to prevent tracking stale log files on a per-directory basis. Reduces checkpoint size due to large directories.
Bug fixes:
- Fix bug causing agent to fail startup due to
'module' object has no attribute 'UUID'
. - Improved log line format for
snmp_monitor
. You may disable using new format by setting the monitor config optionuse_legacy_format
to true.
Features:
- Release of new
snmp_monitor
plugin, used to monitor SNMP devices
Bug fixes:
- Fix bug in
windows_process_metrics
when matching by commandline. - Fix invalid character bug when parsing unicode characters with decimal value > 2^16
- Fix bug in
shell_monitor
plugin resulting in defunct processes lingering - Fix bug in
url_monitor
plugin resulting in not emitting metric in some failure cases
Features:
- Integration of new request library, allowing routing through target specified by
proxy
environment variable. New library turned off by default for now. To turn on, setuse_requests_lib
.
Bug fixes:
- Fix
windows_system_metrics
monitor to no longer report disk usage on empty drives - Fix
mysql_monitor
to recreate connection to db on failures - Fix metrics names rather than throw errors when invalid metrics names are emitted by modules such as graphite.
Features:
- Windows Event Log monitor now supports the EvtLog API implemented by Windows Vista and newer system. This allows for more expressive queries, as well as selecting events by severity level such as "Critical".
Bug fixes:
- Fix UTF-8 truncation bug in redis plugin.
- New option in redis plugin that controls logging of UTF-8 converstion errors. See the
utf8_warning_interval
for more details. - Fix bug where
network_interface_prefixes
option was broken when listing multiple interfaces. - Fix
no attribute _tunnel_host
bug with python 2.4 ssl library
Features:
- Added plugin for reading SLOWLOG from redis
- Heavily optimized common path for copying logs to increase throughput
Features:
- New sequence id and number implementation to help prevent events being added to the database multiple times
- Create new configuration option
max_existing_log_offset_size
to set a limit on how far back in a log file we are willing to go for a log file we have been previously copying. - Enabled
syslog_monitor
for Windows platforms - Preview release of
windows_event_log_monitor
, a monitor for copying the Windows event log to Scalyr. - Preview release of docker support. E-mail [email protected] for more details
- Added new option
network_interface_prefixes
to linux_system_metrics monitor to override the prefix for monitored network interfaces.
Bug fixes:
- Fixed some race conditions in writing the pidfile on POSIX systems.
- Fixed bug causing agent to stop copying logs when previously copied logs are removed
- Disabled checking the command of the running agent process to guard against pid re-use since this was leading to some false negatives in the
is the agent running
checks. - Remove exception logging when bad response received from server
- Print more informative message when server responds with
server too busy
- Fix Utf-8 encoding bug with redaction rules
Bug fixes:
- Added some extra diagnostic information to help investigate some customer issues
New features:
- Enable line groupers on the agent. Each log file may specify line grouping rules in its configuration. The agent can use this to group consecutive log lines together into a single logic log line. All future sampling and redaction rules will work on this single line instead of the individual log lines.
Bug fixes:
- Close file handles for files being scanned that have not had any activity in the last hour. Should fix too many open files bug.
- Reduce the default
max_line_size
value to 3400 to avoid truncation issues introduced by the server.
New features:
- Allow any monitor's metric log's rate limiter to be modified via configuration, along with flush aggregation.
- Provide
--upgrade-without-ui
onscalyr-agent-2-config
to allow upgrading the agent without UI (Windows). - Write user-friendly error message to event log when a major configuration problem is seen during start up (Windows)
Bug fixes:
- Do not fail Scalyr Agent service start if configuration file registry is not set (Windows).
- Remove rate limiter on metric log for graphite monitor and add in flush aggregation.
- Fix that prevented graphite monitor thread from starting when accepting both text and pickle protocols
Bug fixes:
- Prevent syslog monitor from writing syslog message to agent log.
- Performance improvement for syslog monitor to prevent it from flushing the traffic log file too often
- Fix bug in error reporting from client that caused exceptions to be thrown while logging an exception
New features:
- Added syslog monitor for receiving logs via syslog protocol.
Bug fixes:
- Fix bug causing agent to temporarily stop copying logs if the underlying log file disappeared or read access was removed for Windows.
New features:
- Make log processing parameters such as
max_log_offset_size
adjustable via the configuration file.
Bug fixes:
- Fix bug preventing turning off default system and agent process monitors.
- Fix bugs preventing release of open file handles
Bug fixes:
- Relax metric name checking to allow including dashes.
- Fix bug where linux_system_metrics was ignoring the configured sample interval time.
New features:
- Add a new option
--set-server-host
in scalyr-agent-2-config to set the server host from the commandline. - Add new instance variables
_log_write_rate
and_log_max_write_burst
to ScalyrMonitor to allow monitor developers to override the rate limits imposed on their monitor's log. See the comments in test_monitor.py for more details. - Add new parameter
emit_to_metric_log=True
to force a log line to go to the metric log instead of the agent log for monitors. See the comments in test_monitor.py for more details. - Added new configuration parameters to easily change any or all monitor's sample rate. Use
global_monitor_sample_interval
in the general configuration to set it for all monitors, orsample_interval
in an individual monitor's config. The value should be the number of seconds between samples and can be fractional.
Bug fixes:
- Fix failing to accept unicode characters in metric values
New features:
- Support for Windows. Many changes to support this platform.
Bug fixes:
- Fix excessive CPU usage bug when monitoring thousands of logs that are not growing.
New features:
- The
run_monitor.py
takes a new option-d
to set the debug level.
Bug fixes:
- Fix false warning message about file contents disappearing.
- Assign a unique thread id for each log file being copied to mimic old agent's behavior.
New features:
- Added the MySQL, Apache, and Nginx monitor plugins.
- New functions for defining and adding documentation for Scalyr Plugin Monitor configuration options, log entries, and metrics.
- Added support for variable substitution in configuration files.
- New
print_monitor_doc.py
tool for printing documentation for a monitor. This tool is experimental.
Bug fixes:
- Fixed bug that prevented having release notes for multiple releases in CHANGELOG.md
- Fixed bug with CR/LF on Windows preventing log uploads
Documentation fixes:
- Updated MySQL monitor documentation to note deprecated metrics.
Internal:
- Refactored code to extract platform-specific logic in preparation for Windows work.
New core features:
- Complete rewrite of the Scalyr Agent in Python!
- No more Java dependency.
- Easier to configure.
- Easier to diagnosis issues.
- Monitor plugin support.
- Detailed status page.
Bugs fixed since beta and alpha testing:
- Slow memory "leak".
- graphite_module not accepting connections from localhost when only_accept_local is false.
- Ignore old pidfile if the process is no longer running. Added pid collison detection as well.