Releases

Additional features we're working on:

Cortex Alert Manager :
- scalability & HA consistency improvements for opni-alerting

v0.10.1 (TBA)

Gateway/Agent
- Add ping endpoint to gateway
- Add rate limiting to streaming connections
- Auto update of agent
Logging
- Bug fixes to controlplane logs and events

v0.10.0 (June 6, 2023)

Metrics
- Use OTEL collector for metrics collection (greatly reduced agent footprint)

v0.9.2 (April 26, 2023)

AIOps enhancements/fixes
- Update AI services to now give clearer statuses of the Deep Learning model when training. (#1272)
- Fix bug where updating a watchlist by simply removing a deployment would cause training status to hang. (#1344)
Alerting enhancements/fixes
- Resolve internal data race with internal alerting conditions (#1174)
- Further improved performance of alarm status listing (#1022)
Monitoring fixes:
- Fixed an issue where an agent reconnecting could cause errors scraping gateway metrics (#1295)
CLI fixes:
- Fixed bash completion not working correctly (#1299)

v0.9.1 (April 6, 2023)

AIOps enhancements/fixes
- Update AI Services to now use Nats Jetstream KV storage (#1146)
- Remove "Enable GPU Services" button from Opni Admin Dashboard. (#1147)
- Update AIOps gateway plugin to update Nats Jetstream kv when model is submitted for training. (#1148)
- Update AI Services to launch model training upon startup of service (#1233)

v0.9.0 (March 16, 2023)

Alerting enhancements
- Metrics based alarm incidents are now tracked in the alerting timeline
Monitoring enhancements
- Opni internal metrics containing cluster names are now stored per-tenant instead of in the local cluster's tenant
- Upstream Opni local agent identification
Logging enhancements
- Move to using open-telemetry-collector to collect logs
- Send logs to the central cluster using gRPC
- Store cluster id to name mapping for better visibility
AIOps enhancements/fixes
- Update Opensearch query to get log count per deployment based on updated field names from open-telemetry collector ( #1181 )
- Update Opensearch query to use updated field names from open-telemetry-collector to fetch logs from Opensearch for training Deep Learning model. ( #19 )
UI
- Fixed a bug that was preventing the Alert timeline to fully render
- Added a friendly name field to the Manual Installation section of Add Agent to coincide with a chart update. (#1177)
- Tagging Agents in the table with an icon which indicates if the agent is local or not. (#1166)

v0.8.3 (March 2, 2023)

AIOps Enhancements/Fixes
- Optimize query for filtering out anomalous keywords within training controller. ( #18 )
- implement streaming data-loader for log anomaly detection model training to reduce memory pressure from large training dataset. ( pr40, pr17)
UI Enhancements/Fixes
- Fix an issue where the agents page crashed when an agent no longer exists for a role that was applied to the agent ( #1109 )
- Fixed Monitoring capability metrics not showing up ( #1111 )
- Changing how we add agents via the UI to improve reliability ( #965 )
- Switched to using the new alarm API. ( pr118 )

v0.8.2 (February 16, 2023)

Monitoring enhancements
- metrics admin CLI can list / filter & aggregate cortex rules
  - includes listing & querying stateful information about alerting rules
- metrics admin CLI can create,read, update or delete standard prometheus rule group files to clusters
- An initial cluster name can be set during bootstrap of new agents by setting the opni-agent.friendlyName helm value. When a cluster name is set in the UI, this flag will be added to the copyable helm install command.
Minor update to ingest plugin to prepare for Opni preprocessing service
Notable dependency updates
- go 1.20
- cortex to v1.14.1
- go-plugin to v1.4.8
- nats-server to v2.9.12
- alertmanager to v0.25.0
- prometheus to v0.41.0
- opentelemetry to v1.12.0
- grpc to v1.52.1
- kubernetes sdk to v0.26.1
- controller-runtime to v0.14.4
- cert-manager to v1.10.2
- gin to v1.8.2
UI Enhancements/Fixes
- Updating the tooltip in the alerting overview page to improve accuracy (#1006)
- Fixed a bug where we didn't show all log templates in the Opensearch plugin. Also added a count to the top of the same table. (pr 106)
- Fixed a bug where you couldn't update the s3 endpoint in the monitoring config (#1009)
- Fixed a bug where you couldn't edit PagerDuty alarm endpoints (#958)
- Improved the consistency of Cluster/Agent name and id labeling (#1048)
- Improved chart labeling on the insights page of the opensearch dashboard (#1046)
- Only show relevant storage options when selecting the 'Highly Available' Monitoring option
- Compensate for an overzealous 'no changes to apply' error the Alerting backend emits when installing Alerting.
AIOps Enhancements/Fixes
- Improved pre-trained DRAIN models with more ground truth (pr16)
- Added test coverage to 64% to the deep-learning model service (pr38)
- Fixed the NotFoundError during downloading training data from Opensearch(pr39)
- Updated Opensearch query to filter out anomaly keywords in training data (pr16)
- Bump PyTorch version to latest stable version (pr1035)
Alerting enhancements
- scalability optimizations; reduced memory footprint for the Alerting backend
- Improved fault tolerance for the Alerting backend notification system

v0.8.1 (December 18, 2022)

Bugfix release:

Add retry for fetching client certs
Fix agent crash on startup

v0.8.0 (December 16, 2022)

Introduction of Opni AI workload training module:

Pre-req: NVIDIA GPU is enabled in cluster Opni is installed in
Users can select deployments are important to them in the Opni Admin UI -> a model will be trained for the user and logs belonging to those deployments in the future will be given a "Normal" or "Anomalous" label
Users can ingest these insights in the Opni plugin in their Opensearch Dashboards (enabled when Opni Logging backend is setup)

Alerting enhancements

New PagerDuty endpoint for receiving notifications
Clone operation for cloning alarms to any target cluster(s)
Alarms for when downstream agent capabilities are unhealthy
Alarms for tracking the health of the opni monitoring backend health
Alarms based on prometheus queries
Alarms for tracking the states of kubernetes objects

Opensearch changes:

Opensearch version updated to 2.4.0
Dataprepper version updated to 2.0.1
Reconcilers now user TLC client cert to authenticate to the Opensearch API

v0.7.0 (November 23, 2022)

Introduction of Opni Alerting:

Slack, Email Endpoint for receiving notifications
Alarms on agent connection status
Overview for breached alarms

Architecture

Backends
Core Components
- Opni Gateway
- Opni Agent

How Tos

Releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases

v0.10.1 (TBA)

v0.10.0 (June 6, 2023)

v0.9.2 (April 26, 2023)

v0.9.1 (April 6, 2023)

v0.9.0 (March 16, 2023)

v0.8.3 (March 2, 2023)

v0.8.2 (February 16, 2023)

v0.8.1 (December 18, 2022)

v0.8.0 (December 16, 2022)

v0.7.0 (November 23, 2022)

Clone this wiki locally