π€ KubeNidra Agent
Configuration
The KubeNidra Agent uses a YAML configuration file to control its behavior. Configuration can be loaded from a file or set via environment variables.
Configuration File
By default, the agent looks for configuration in /etc/kubenidra/kubenidra.yaml. You can specify a custom path using the -c flag:
KUBECONFIG=~/.kube/config ./bin/kubenidra-agent -c /path/to/config.yamlConfiguration Structure
# Logging
log_level: "info" # debug, info, warn, error
# Server configuration
server:
port: 8118 # HTTP server port
# Prometheus configuration
prometheus:
endpoint: "http://prometheus:9090" # Prometheus server URL
timeout: "30s" # Query timeout
# Namespaces to watch
watched_namespaces:
- "development"
- "staging"
- "test"
# Empty array or omit to watch all namespaces
# Snooze configuration
snooze:
# Resource thresholds
cpu_threshold: 0.01 # CPU threshold in cores (0.01 = 10m cores)
memory_threshold: 52428800 # Memory threshold in bytes (50MB)
# Timing configuration
idle_duration: "10m" # How long workload must be idle
check_interval: "5m" # How often to check workloads
wake_check_interval: "1m" # How often to check for wake conditions
# Grace periods
wake_grace_period: "30m" # Grace period after manual wake
operation_cooldown: "5m" # Cooldown between operations
new_workload_grace_period: "20m" # Extra grace for new workloads
# Behavior mode
behavior_mode: "aggressive" # aggressive, conservative, manual
# Smart automation
max_operations_per_hour: 10 # Max operations before backoff
operation_history_limit: 50 # Number of operations to track
backoff_multiplier: 2.0 # Multiplier for backoff calculation
max_backoff_duration: "4h" # Maximum backoff duration
# Metrics validation
minimum_running_duration: "15m" # Min time workload must run
minimum_data_coverage: 0.8 # Min percentage of time with data (80%)
minimum_data_points: 5 # Min number of data points required
prometheus_validation_enabled: true # Enable validation by defaultEnvironment Variables
All configuration options can be set via environment variables with the KUBENIDRA_ prefix. Below are a few for example:
# Logging
export KUBENIDRA_LOG_LEVEL="info"
# Server configuration
export KUBENIDRA_SERVER_PORT=8118
# Prometheus configuration
export KUBENIDRA_PROMETHEUS_ENDPOINT="http://prometheus:9090"
export KUBENIDRA_PROMETHEUS_TIMEOUT="30s"
# Namespaces
export KUBENIDRA_WATCHED_NAMESPACES="development,staging,test"
# Snooze configuration
export KUBENIDRA_SNOOZE_CPU_THRESHOLD=0.01
export KUBENIDRA_SNOOZE_MEMORY_THRESHOLD=52428800
export KUBENIDRA_SNOOZE_IDLE_DURATION="10m"
export KUBENIDRA_SNOOZE_CHECK_INTERVAL="5m"
export KUBENIDRA_SNOOZE_BEHAVIOR_MODE="aggressive"
...Configuration Options
Server Configuration
| Option | Type | Default | Description |
|---|---|---|---|
server.port | int | 8118 | HTTP server port for health checks and metrics |
Prometheus Configuration
| Option | Type | Default | Description |
|---|---|---|---|
prometheus.endpoint | string | "http://localhost:9090" | Prometheus server URL |
prometheus.timeout | duration | "30s" | Query timeout for Prometheus requests |
Snooze Configuration
Resource Thresholds
| Option | Type | Default | Description |
|---|---|---|---|
snooze.cpu_threshold | float | 0.01 | CPU threshold in cores (0.01 = 10m cores) |
snooze.memory_threshold | int | 52428800 | Memory threshold in bytes (50MB) |
Timing Configuration
| Option | Type | Default | Description |
|---|---|---|---|
snooze.idle_duration | duration | "10m" | How long workload must be idle before snoozing |
snooze.check_interval | duration | "5m" | How often to check workloads for snooze conditions |
snooze.wake_check_interval | duration | "1m" | How often to check for wake conditions |
Grace Periods
| Option | Type | Default | Description |
|---|---|---|---|
snooze.wake_grace_period | duration | "30m" | Grace period after manual wake |
snooze.operation_cooldown | duration | "5m" | Cooldown between operations |
snooze.new_workload_grace_period | duration | "20m" | Extra grace for new workloads |
Behavior Mode
| Option | Type | Default | Description |
|---|---|---|---|
snooze.behavior_mode | string | "aggressive" | Behavior mode: aggressive, conservative, manual |
Behavior Modes:
- aggressive: Actively snoozes idle workloads
- conservative: More cautious, requires longer idle periods
- manual: Only snoozes on explicit requests
Smart Automation
| Option | Type | Default | Description |
|---|---|---|---|
snooze.max_operations_per_hour | int | 10 | Max operations before backoff |
snooze.operation_history_limit | int | 50 | Number of operations to track |
snooze.backoff_multiplier | float | 2.0 | Multiplier for backoff calculation |
snooze.max_backoff_duration | duration | "4h" | Maximum backoff duration |
Metrics Validation
| Option | Type | Default | Description |
|---|---|---|---|
snooze.minimum_running_duration | duration | "15m" | Min time workload must run before snoozing |
snooze.minimum_data_coverage | float | 0.8 | Min percentage of time window with data (80%) |
snooze.minimum_data_points | int | 5 | Min number of data points required |
snooze.prometheus_validation_enabled | bool | true | Enable Prometheus validation |
Namespace Configuration
| Option | Type | Default | Description |
|---|---|---|---|
watched_namespaces | []string | [""] | Namespaces to watch (empty string = all namespaces) |
Logging Configuration
| Option | Type | Default | Description |
|---|---|---|---|
log_level | string | "info" | Log level: debug, info, warn, error |
Configuration Validation
- Port: Must be between 1-65535
- Timeouts: Must be positive duration
- Thresholds: Must be positive numbers
- Intervals: Must be positive duration
- Coverage: Must be between 0.0-1.0
- Data points: Must be positive integer
- Behavior mode: Must be one of: aggressive, conservative, manual
- Log level: Must be one of: debug, info, warn, error
Configuration Reload
The agent does not support hot reloading. Configuration changes require a restart:
# Restart the agent
kubectl rollout restart deployment/kubenidra-agent -n kubenidra
# Check status
kubectl rollout status deployment/kubenidra-agent -n kubenidraConfiguration Best Practices
1. Start Conservative
Begin with conservative settings and gradually adjust:
snooze:
behavior_mode: "conservative"
idle_duration: "30m"
check_interval: "10m"
wake_grace_period: "60m"2. Use Namespace Isolation
Limit scope to specific namespaces:
watched_namespaces:
- "development"
- "staging"
# Avoid production initially3. Enable Prometheus Validation
Always enable Prometheus validation in production:
snooze:
prometheus_validation_enabled: true
minimum_data_coverage: 0.8
minimum_data_points: 54. Set Appropriate Thresholds
Adjust thresholds based on your workloads:
snooze:
cpu_threshold: 0.1 # 100m cores
memory_threshold: 104857600 # 100MB
# Adjust based on your application patterns