Skip to main content

Chapter 9: Monitoring & Logging

Authored by syscook.dev

What is Monitoring and Logging in Kubernetes?

Monitoring and logging in Kubernetes involve collecting, analyzing, and visualizing metrics and logs from your cluster and applications. This provides visibility into system performance, health, and issues for effective troubleshooting and optimization.

Key Concepts:

  • Metrics Collection: CPU, memory, network, and custom metrics
  • Log Aggregation: Centralized log collection and analysis
  • Alerting: Proactive notification of issues
  • Dashboards: Visual representation of metrics and logs
  • Tracing: Distributed request tracing
  • Health Monitoring: Application and infrastructure health

Why Use Monitoring and Logging?

1. System Visibility

Monitor cluster and application performance in real-time.

# Example: Prometheus configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token

- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)

2. Centralized Logging

Collect and analyze logs from all applications and components.

# Example: Fluentd configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
namespace: monitoring
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</source>

<filter kubernetes.**>
@type kubernetes_metadata
</filter>

<match kubernetes.**>
@type elasticsearch
host elasticsearch.monitoring.svc.cluster.local
port 9200
index_name kubernetes
type_name _doc
</match>

How to Implement Monitoring and Logging?

1. Prometheus and Grafana Setup

Prometheus Deployment

# Example: Prometheus deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
ports:
- containerPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
- name: storage-volume
mountPath: /prometheus
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: storage-volume
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
namespace: monitoring
spec:
selector:
app: prometheus
ports:
- port: 9090
targetPort: 9090
type: LoadBalancer

Grafana Deployment

# Example: Grafana deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_PASSWORD
value: "admin123"
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
volumes:
- name: grafana-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: grafana-service
namespace: monitoring
spec:
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
type: LoadBalancer

2. ELK Stack for Logging

Elasticsearch Deployment

# Example: Elasticsearch deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: elasticsearch
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
containers:
- name: elasticsearch
image: elasticsearch:7.15.0
ports:
- containerPort: 9200
env:
- name: discovery.type
value: "single-node"
- name: ES_JAVA_OPTS
value: "-Xms512m -Xmx512m"
volumeMounts:
- name: elasticsearch-storage
mountPath: /usr/share/elasticsearch/data
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumes:
- name: elasticsearch-storage
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: elasticsearch-service
namespace: monitoring
spec:
selector:
app: elasticsearch
ports:
- port: 9200
targetPort: 9200

Kibana Deployment

# Example: Kibana deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: kibana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: kibana
template:
metadata:
labels:
app: kibana
spec:
containers:
- name: kibana
image: kibana:7.15.0
ports:
- containerPort: 5601
env:
- name: ELASTICSEARCH_HOSTS
value: "http://elasticsearch-service:9200"
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: kibana-service
namespace: monitoring
spec:
selector:
app: kibana
ports:
- port: 5601
targetPort: 5601
type: LoadBalancer

3. Application Monitoring

Metrics Collection

# Example: Application with metrics
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitored-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: monitored-app
template:
metadata:
labels:
app: monitored-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
- containerPort: 9090
env:
- name: METRICS_ENABLED
value: "true"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"

Practical Examples

1. Complete Monitoring Stack

Step 1: Install Prometheus and Grafana

#!/bin/bash
# install-monitoring.sh

# Create monitoring namespace
kubectl create namespace monitoring

# Install Prometheus
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus:latest
ports:
- containerPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus
volumes:
- name: config-volume
configMap:
name: prometheus-config
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
namespace: monitoring
spec:
selector:
app: prometheus
ports:
- port: 9090
targetPort: 9090
type: LoadBalancer
EOF

# Install Grafana
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_PASSWORD
value: "admin123"
---
apiVersion: v1
kind: Service
metadata:
name: grafana-service
namespace: monitoring
spec:
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
type: LoadBalancer
EOF

echo "Monitoring stack installed successfully!"

Step 2: Configure Application Metrics

#!/bin/bash
# configure-app-metrics.sh

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: monitored-app
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: monitored-app
template:
metadata:
labels:
app: monitored-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
image: myapp:latest
ports:
- containerPort: 8080
env:
- name: METRICS_ENABLED
value: "true"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
EOF

echo "Application metrics configured successfully!"

Best Practices

1. Comprehensive Monitoring

# Good: Comprehensive monitoring configuration
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"

2. Resource Monitoring

# Good: Resource monitoring
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"

3. Health Checks

# Good: Comprehensive health checks
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10

readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Common Pitfalls and Solutions

1. Missing Metrics Configuration

# ❌ Bad: No metrics configuration
metadata:
labels:
app: myapp

# ✅ Good: Proper metrics configuration
metadata:
labels:
app: myapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"

2. Inadequate Health Checks

# ❌ Bad: No health checks
spec:
containers:
- name: app
image: myapp:latest

# ✅ Good: Proper health checks
spec:
containers:
- name: app
image: myapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
readinessProbe:
httpGet:
path: /ready
port: 8080

Conclusion

Monitoring and logging are essential for maintaining healthy and performant Kubernetes applications. By understanding:

  • What monitoring and logging are and their importance
  • Why they're crucial for system visibility and troubleshooting
  • How to implement comprehensive monitoring and logging solutions

You can create robust applications that are observable, maintainable, and performant. Proper monitoring and logging practices ensure that you can quickly identify and resolve issues in production environments.

Next Steps

  • Practice with different monitoring scenarios
  • Learn about security and RBAC
  • Move on to Chapter 10: Security & RBAC

This tutorial is part of the Kubernetes Mastery series by syscook.dev