Configuration

The KubeNidra Agent uses a YAML configuration file to control its behavior. Configuration can be loaded from a file or set via environment variables.

Configuration File

By default, the agent looks for configuration in /etc/kubenidra/kubenidra.yaml. You can specify a custom path using the -c flag:

KUBECONFIG=~/.kube/config ./bin/kubenidra-agent -c /path/to/config.yaml

Configuration Structure

# Logging
log_level: "info" # debug, info, warn, error

# Server configuration
server:
  port: 8118 # HTTP server port

# Prometheus configuration
prometheus:
  endpoint: "http://prometheus:9090" # Prometheus server URL
  timeout: "30s" # Query timeout

# Namespaces to watch
watched_namespaces:
  - "development"
  - "staging"
  - "test"
  # Empty array or omit to watch all namespaces

# Snooze configuration
snooze:
  # Resource thresholds
  cpu_threshold: 0.01 # CPU threshold in cores (0.01 = 10m cores)
  memory_threshold: 52428800 # Memory threshold in bytes (50MB)

  # Timing configuration
  idle_duration: "10m" # How long workload must be idle
  check_interval: "5m" # How often to check workloads
  wake_check_interval: "1m" # How often to check for wake conditions

  # Grace periods
  wake_grace_period: "30m" # Grace period after manual wake
  operation_cooldown: "5m" # Cooldown between operations
  new_workload_grace_period: "20m" # Extra grace for new workloads

  # Behavior mode
  behavior_mode: "aggressive" # aggressive, conservative, manual

  # Smart automation
  max_operations_per_hour: 10 # Max operations before backoff
  operation_history_limit: 50 # Number of operations to track
  backoff_multiplier: 2.0 # Multiplier for backoff calculation
  max_backoff_duration: "4h" # Maximum backoff duration

  # Metrics validation
  minimum_running_duration: "15m" # Min time workload must run
  minimum_data_coverage: 0.8 # Min percentage of time with data (80%)
  minimum_data_points: 5 # Min number of data points required
  prometheus_validation_enabled: true # Enable validation by default

Environment Variables

All configuration options can be set via environment variables with the KUBENIDRA_ prefix. Below are a few for example:

# Logging
export KUBENIDRA_LOG_LEVEL="info"

# Server configuration
export KUBENIDRA_SERVER_PORT=8118

# Prometheus configuration
export KUBENIDRA_PROMETHEUS_ENDPOINT="http://prometheus:9090"
export KUBENIDRA_PROMETHEUS_TIMEOUT="30s"

# Namespaces
export KUBENIDRA_WATCHED_NAMESPACES="development,staging,test"

# Snooze configuration
export KUBENIDRA_SNOOZE_CPU_THRESHOLD=0.01
export KUBENIDRA_SNOOZE_MEMORY_THRESHOLD=52428800
export KUBENIDRA_SNOOZE_IDLE_DURATION="10m"
export KUBENIDRA_SNOOZE_CHECK_INTERVAL="5m"
export KUBENIDRA_SNOOZE_BEHAVIOR_MODE="aggressive"
...

Configuration Options

Server Configuration

Option	Type	Default	Description
`server.port`	int	8118	HTTP server port for health checks and metrics

Prometheus Configuration

Option	Type	Default	Description
`prometheus.endpoint`	string	"http://localhost:9090"	Prometheus server URL
`prometheus.timeout`	duration	"30s"	Query timeout for Prometheus requests

Snooze Configuration

Resource Thresholds

Option	Type	Default	Description
`snooze.cpu_threshold`	float	0.01	CPU threshold in cores (0.01 = 10m cores)
`snooze.memory_threshold`	int	52428800	Memory threshold in bytes (50MB)

Timing Configuration

Option	Type	Default	Description
`snooze.idle_duration`	duration	"10m"	How long workload must be idle before snoozing
`snooze.check_interval`	duration	"5m"	How often to check workloads for snooze conditions
`snooze.wake_check_interval`	duration	"1m"	How often to check for wake conditions

Grace Periods

Option	Type	Default	Description
`snooze.wake_grace_period`	duration	"30m"	Grace period after manual wake
`snooze.operation_cooldown`	duration	"5m"	Cooldown between operations
`snooze.new_workload_grace_period`	duration	"20m"	Extra grace for new workloads

Behavior Mode

Option	Type	Default	Description
`snooze.behavior_mode`	string	"aggressive"	Behavior mode: aggressive, conservative, manual

Behavior Modes:

aggressive: Actively snoozes idle workloads
conservative: More cautious, requires longer idle periods
manual: Only snoozes on explicit requests

Smart Automation

Option	Type	Default	Description
`snooze.max_operations_per_hour`	int	10	Max operations before backoff
`snooze.operation_history_limit`	int	50	Number of operations to track
`snooze.backoff_multiplier`	float	2.0	Multiplier for backoff calculation
`snooze.max_backoff_duration`	duration	"4h"	Maximum backoff duration

Metrics Validation

Option	Type	Default	Description
`snooze.minimum_running_duration`	duration	"15m"	Min time workload must run before snoozing
`snooze.minimum_data_coverage`	float	0.8	Min percentage of time window with data (80%)
`snooze.minimum_data_points`	int	5	Min number of data points required
`snooze.prometheus_validation_enabled`	bool	true	Enable Prometheus validation

Namespace Configuration

Option	Type	Default	Description
`watched_namespaces`	[]string	[""]	Namespaces to watch (empty string = all namespaces)

Logging Configuration

Option	Type	Default	Description
`log_level`	string	"info"	Log level: debug, info, warn, error

Configuration Validation

Port: Must be between 1-65535
Timeouts: Must be positive duration
Thresholds: Must be positive numbers
Intervals: Must be positive duration
Coverage: Must be between 0.0-1.0
Data points: Must be positive integer
Behavior mode: Must be one of: aggressive, conservative, manual
Log level: Must be one of: debug, info, warn, error

Configuration Reload

The agent does not support hot reloading. Configuration changes require a restart:

# Restart the agent
kubectl rollout restart deployment/kubenidra-agent -n kubenidra

# Check status
kubectl rollout status deployment/kubenidra-agent -n kubenidra

Configuration Best Practices

1. Start Conservative

Begin with conservative settings and gradually adjust:

snooze:
  behavior_mode: "conservative"
  idle_duration: "30m"
  check_interval: "10m"
  wake_grace_period: "60m"

2. Use Namespace Isolation

Limit scope to specific namespaces:

watched_namespaces:
  - "development"
  - "staging"
  # Avoid production initially

3. Enable Prometheus Validation

Always enable Prometheus validation in production:

snooze:
  prometheus_validation_enabled: true
  minimum_data_coverage: 0.8
  minimum_data_points: 5

4. Set Appropriate Thresholds

Adjust thresholds based on your workloads:

snooze:
  cpu_threshold: 0.1 # 100m cores
  memory_threshold: 104857600 # 100MB
  # Adjust based on your application patterns

Configuration

On this page