Chapter 12: Production Best Practices
Authored by syscook.dev
What are Production Best Practices?
Production best practices in Kubernetes are proven methodologies, configurations, and operational procedures that ensure your applications run reliably, securely, and efficiently in production environments. These practices cover security, performance, monitoring, disaster recovery, and operational excellence.
Key Concepts:
- Security Hardening: Multi-layered security approach
- Performance Optimization: Resource efficiency and scaling
- Monitoring and Observability: Comprehensive visibility
- Disaster Recovery: Backup and restore strategies
- Operational Excellence: Automation and reliability
- Compliance: Regulatory and industry standards
Why Follow Production Best Practices?
1. Security and Compliance
Protect your applications and data from threats and ensure regulatory compliance.
# Example: Secure pod configuration
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
namespace: production
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
2. High Availability and Reliability
Ensure your applications are always available and can handle failures gracefully.
# Example: High availability deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: ha-deployment
namespace: production
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-app
topologyKey: kubernetes.io/hostname
containers:
- name: web-server
image: nginx:1.20
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
3. Performance and Scalability
Optimize resource usage and enable automatic scaling.
# Example: Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app-deployment
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
How to Implement Production Best Practices?
1. Security Hardening
Pod Security Standards
# Example: Pod Security Policy
apiVersion: v1
kind: Namespace
metadata:
name: secure-namespace
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
---
apiVersion: v1
kind: LimitRange
metadata:
name: secure-limit-range
namespace: secure-namespace
spec:
limits:
- default:
memory: "512Mi"
cpu: "500m"
defaultRequest:
memory: "256Mi"
cpu: "250m"
type: Container
- max:
memory: "1Gi"
cpu: "1000m"
min:
memory: "128Mi"
cpu: "100m"
type: Container
Network Security
# Example: Comprehensive network policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-nginx
namespace: production
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 80
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-database-access
namespace: production
spec:
podSelector:
matchLabels:
app: postgres
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: api
ports:
- protocol: TCP
port: 5432
2. Monitoring and Observability
Comprehensive Monitoring
# Example: Monitoring configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: 'kubernetes-nodes'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
Log Aggregation
# Example: Fluentd configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: fluentd-config
namespace: monitoring
data:
fluent.conf: |
<source>
@type tail
path /var/log/containers/*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</source>
<filter kubernetes.**>
@type kubernetes_metadata
</filter>
<match kubernetes.**>
@type elasticsearch
host elasticsearch.monitoring.svc.cluster.local
port 9200
index_name kubernetes
type_name _doc
</match>
3. Disaster Recovery
Backup Strategy
# Example: Backup CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: database-backup
namespace: production
spec:
schedule: "0 2 * * *" # Daily at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:13
command:
- /bin/bash
- -c
- |
# Create backup
pg_dump -h postgres-service -U user -d myapp > /backup/backup-$(date +%Y%m%d).sql
# Compress backup
gzip /backup/backup-$(date +%Y%m%d).sql
# Upload to S3
aws s3 cp /backup/backup-$(date +%Y%m%d).sql.gz s3://my-backup-bucket/
# Clean up local backup
rm /backup/backup-$(date +%Y%m%d).sql.gz
echo "Backup completed successfully"
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: aws-secret
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: aws-secret
key: secret-access-key
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
emptyDir: {}
restartPolicy: OnFailure
Practical Examples
1. Complete Production Setup
Step 1: Security Configuration
#!/bin/bash
# setup-security.sh
# Create secure namespace
kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
EOF
# Create network policies
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-nginx
namespace: production
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 80
EOF
echo "Security configuration completed!"
Step 2: Monitoring Setup
#!/bin/bash
# setup-monitoring.sh
# Create monitoring namespace
kubectl create namespace monitoring
# Install Prometheus and Grafana
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--set grafana.adminPassword=admin123 \
--set prometheus.prometheusSpec.retention=30d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
echo "Monitoring setup completed!"
Step 3: Application Deployment
#!/bin/bash
# deploy-application.sh
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app-deployment
namespace: production
spec:
replicas: 5
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-app
topologyKey: kubernetes.io/hostname
containers:
- name: web-server
image: nginx:1.20
ports:
- containerPort: 80
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop:
- ALL
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 5
periodSeconds: 5
EOF
echo "Application deployed successfully!"
Best Practices
1. Security First
# Good: Security-first configuration
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
2. Resource Management
# Good: Proper resource management
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
3. Health Checks
# Good: Comprehensive health checks
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
4. Monitoring and Observability
# Good: Comprehensive monitoring
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
Common Pitfalls and Solutions
1. Security Issues
# ❌ Bad: Insecure configuration
spec:
containers:
- name: app
image: myapp:latest
securityContext:
runAsUser: 0
# ✅ Good: Secure configuration
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
containers:
- name: app
image: myapp:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
2. Resource Issues
# ❌ Bad: No resource limits
spec:
containers:
- name: app
image: myapp:latest
# ✅ Good: Proper resource limits
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
3. Monitoring Issues
# ❌ Bad: No monitoring
metadata:
labels:
app: myapp
# ✅ Good: Proper monitoring
metadata:
labels:
app: myapp
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
Conclusion
Production best practices are essential for running reliable, secure, and performant Kubernetes applications. By understanding:
- What production best practices are and their importance
- Why they're crucial for production environments
- How to implement them effectively
You can create robust applications that meet enterprise requirements for security, performance, and reliability. Following these practices ensures that your applications are production-ready and can handle real-world challenges.
Next Steps
- Practice with different production scenarios
- Implement monitoring and alerting
- Set up disaster recovery procedures
- Continuously improve and optimize
This tutorial is part of the Kubernetes Mastery series by syscook.dev