Skip to content

Commit

Permalink
Control Plane Logs Collection for OCNE and Standalone Kubernetes Clus…
Browse files Browse the repository at this point in the history
…ters (#56)

* Control Plane Logs Collection for OCNE and Standalone Kubernetes Clusters
  • Loading branch information
naga-barri authored Nov 15, 2023
1 parent a802a45 commit 67012e7
Show file tree
Hide file tree
Showing 8 changed files with 137 additions and 5 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Change Log

## 2023-11-07
### Added
- Control Plane Logs Collection for OCNE and Standalone Kubernetes Clusters.
- Support for launching Fluentd containers in privileged mode (default false).
- Added FAQ for triaging log collection setup issues in OCNE and Standalone Kubernetes Clusters.

## 2023-10-31
### Changed
- Ruby upgrade from 2.7.8 to 3.1.2 for OL8-Slim Fluentd container image. It also includes Fluentd (1.15.3 to 1.16.2) and other dependency gem upgrades.
Expand Down
2 changes: 1 addition & 1 deletion charts/logan/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ apiVersion: v2
name: oci-onm-logan
description: Charts for sending Kubernetes platform logs, compute logs, and Kubernetes Objects information to OCI Logging Analytics.
type: application
version: 3.1.0
version: 3.1.1
appVersion: "3.0.0"

dependencies:
Expand Down
6 changes: 6 additions & 0 deletions charts/logan/templates/fluentd-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ spec:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
effect: NoSchedule
{{- if $imagePullSecrets }}
imagePullSecrets:
- name: {{ .Values.image.imagePullSecrets }}
Expand All @@ -41,6 +43,10 @@ spec:
- name: {{ $resourceNamePrefix }}-fluentd
image: {{ .Values.image.url }}
imagePullPolicy: {{ default "IfNotPresent" .Values.image.imagePullPolicy }}
{{- if .Values.privileged }}
securityContext:
privileged: {{ .Values.privileged }}
{{- end}}
env:
- name: FLUENTD_CONF
value: {{ .Values.fluentd.path }}/{{ .Values.fluentd.file }}
Expand Down
4 changes: 4 additions & 0 deletions charts/logan/templates/fluentd-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ spec:
- name: {{ $resourceNamePrefix }}-fluentd
image: {{ .Values.image.url }}
imagePullPolicy: {{ default "IfNotPresent" .Values.image.imagePullPolicy }}
{{- if .Values.privileged }}
securityContext:
privileged: {{ .Values.privileged }}
{{- end}}
env:
- name: FLUENTD_CONF
value: {{ .Values.fluentd.path }}/{{ .Values.fluentd.file }}
Expand Down
6 changes: 4 additions & 2 deletions charts/logan/templates/logs-configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,9 @@ data:
encoding {{ $.Values.fluentd.tailPlugin.encoding }}
{{- end }}
<parse>
{{- if eq $runtime "docker" }}
{{- if eq $name "kube-audit" }}
@type none
{{- else if eq $runtime "docker" }}
@type json
{{- else}}
@type cri
Expand Down Expand Up @@ -119,7 +121,7 @@ data:
# Concat filter to handle partial logs in CRI/ContainerD
# Docker can also have partial logs but handling is different for different docker versions. Considering Kubernetes/OKE moved to ContainerD/CRI since last 4-5 releases, ignoring docker handling.
# This filter can not be clubbed with concat filter for multiline as both are mutually exclusive.
{{- if eq $runtime "cri" }}
{{- if and (ne $name "kube-audit") (eq $runtime "cri") }}
<filter oci{{- ternary (print "." $currWorker) "" $multiWorkersEnabled }}.oke.{{ $name }}.**>
@type concat
key message
Expand Down
51 changes: 51 additions & 0 deletions charts/logan/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,11 @@ kubernetesClusterID:
# e.g. production-cluster
kubernetesClusterName:

# -- Kubernetes Security Context privileged flag
# Default: 'false'. This is not a required for OKE clusters.
# In Kubernetes environments where SELinux mode is enforced, set this flag to 'true' to allow fluentd pods to access log files.
privileged: false

# -- Logging Analytics OCID for OKE Cluster
#ociLAEntityID:

Expand Down Expand Up @@ -303,6 +308,48 @@ fluentd:
ociLALogSourceName: "Kubernetes Autoscaler Logs"
# The regular expression pattern for the starting line in case of multi-line logs.
multilineStartRegExp: /^\S\d{2}\d{2}\s+[^\:]+:[^\:]+:[^\.]+\.\d{0,3}/

# Config specific to API Server Logs Collection
kube-apiserver:
# The path to the source files.
path: /var/log/containers/kube-apiserver-*.log
# Logging Analytics log source to use for parsing and processing the logs: Kubernetes API Server Logs.
ociLALogSourceName: "Kubernetes API Server Logs"
# The regular expression pattern for the starting line in case of multi-line logs.
multilineStartRegExp: /^\S\d{2}\d{2}\s+[^\:]+:[^\:]+:[^\.]+\.\d{0,3}/

# Config specific to etcd Logs Collection
etcd:
# The path to the source files.
path: /var/log/containers/etcd-*.log
# Logging Analytics log source to use for parsing and processing the logs: Kubernetes etcd Logs.
ociLALogSourceName: "Kubernetes etcd Logs"

# Config specific to kube-controller-manager Logs Collection
kube-controller-manager:
# The path to the source files.
path: /var/log/containers/kube-controller-manager-*.log
# Logging Analytics log source to use for parsing and processing the logs: Kubernetes Controller Manager Logs.
ociLALogSourceName: "Kubernetes Controller Manager Logs"
# The regular expression pattern for the starting line in case of multi-line logs.
multilineStartRegExp: /^\S\d{2}\d{2}\s+[^\:]+:[^\:]+:[^\.]+\.\d{0,3}/

# Config specific to kube-scheduler Logs Collection
kube-scheduler:
# The path to the source files.
path: /var/log/containers/kube-scheduler-*.log
# Logging Analytics log source to use for parsing and processing the logs: Kubernetes Scheduler Logs.
ociLALogSourceName: "Kubernetes Scheduler Logs"
# The regular expression pattern for the starting line in case of multi-line logs.
multilineStartRegExp: /^\S\d{2}\d{2}\s+[^\:]+:[^\:]+:[^\.]+\.\d{0,3}/

# Config specific to Kubernetes Audit Logs Collection
kube-audit:
# The path to the source files.
path: /var/log/kubernetes/audit/audit*
# Logging Analytics log source to use for parsing and processing the logs: Kubernetes Audit Logs.
ociLALogSourceName: "Kubernetes Audit Logs"

# Configuration for Linux System specific logs like CronLogs and SecureLogs
linuxSystem:
# Setting the following properties will override the default/generic configuration and applies to all Kubernetes system logs
Expand Down Expand Up @@ -394,6 +441,10 @@ fluentd:
- '"/var/log/containers/csi-oci-node-*.log"'
- '"/var/log/containers/proxymux-client-*.log"'
- '"/var/log/containers/cluster-autoscaler-*.log"'
- '"/var/log/containers/kube-apiserver-*.log"'
- '"/var/log/containers/etcd-*.log"'
- '"/var/log/containers/kube-controller-manager-*.log"'
- '"/var/log/containers/kube-scheduler-*.log"'
# Worker number in case of multi process workers enabled. If not set when multi process workers enabled, then it defaults to 0.
#worker: 1

Expand Down
4 changes: 2 additions & 2 deletions charts/oci-onm/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 3.1.0
version: 3.1.1

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
Expand All @@ -32,7 +32,7 @@ dependencies:
repository: "file://../common"
condition: oci-onm-common.enabled
- name: oci-onm-logan
version: "3.1.0"
version: "3.1.1"
repository: "file://../logan"
condition: oci-onm-logan.enabled
- name: oci-onm-mgmt-agent
Expand Down
63 changes: 63 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,3 +275,66 @@ oci-onm-logan:
worker: 1
```

### Log Collection for OCNE (Oracle Cloud Native Environment)

#### How to fix _execution expired_ error ?

Log location: `/var/log/oci-logging-analytics.log`

Sample Error :
```
E, [2023-08-07T10:17:13.710854 #18] ERROR -- : oci upload exception : Error while uploading the payload. { 'message': 'execution expired', 'status': 0, 'opc-request-id': 'D733ED0C244340748973D8A035068955', 'response-body': '' }
```

* Check if your OCNE setup configuration has `restrict-service-externalip` value set to `true` for kubernetes module. If yes, update it to false to allow access to Logging Analytics endpoint from containers. Refer [this](https://docs.oracle.com/en/operating-systems/olcne/1.3/orchestration/external-ips.html#8.4-Enabling-Access-to-all-externalIPs) for more details. If the issue is still not resolved,
* Check if your OCNE setup configuration has `selinux` value set to `enforcing` in globals section. If yes, you may need to start the fluentd containers in privileged mode. To achieve the same, set `privileged` to true in override_values.yaml.

```
..
..
oci-onm-logan:
..
..
privileged: true
```

#### How to fix _Permission denied @ dir_s_mkdir - /var/log/oci_la_fluentd_outplugin_ error ?

Log location: Pod logs of Daemonset `oci-onm-logan`

Set `privileged` to true in override_values.yaml to resolve this.

```
..
..
oci-onm-logan:
..
..
privileged: true
```

### Log Collection for Standalone cluster (docker runtime)

#### How to fix the warning _/var/log/containers/..log unreadable_ ?

Log location: Pod logs of Daemonset `oci-onm-logan`

Sample Error:
```
2023-10-10 13:00:16 +0000 [warn]: #0 [in_tail_containerlogs] /var/log/containers/kube-flannel-ds-kl9bb_kube-flannel_kube-flannel-c2a954a05c57f4f68bc3ab348f071812be2405c76bd1631890638eac7c503506.log unreadable. It is excluded and would be examined next time.
```

The default path for docker data (in which the container logs will be written) in a typical standalone cluster is `/var/lib/docker/containers`. You may need to validate the same and update `containerdataHostPath` in override_values.yaml accordingly.

```
..
..
oci-onm-logan:
..
..
volumes:
..
containerdataHostPath: /var/lib/docker/containers
```


0 comments on commit 67012e7

Please sign in to comment.