πŸ’€ KubeNidra
πŸ€– KubeNidra Agent

Configuration

The KubeNidra Agent uses a YAML configuration file to control its behavior. Configuration can be loaded from a file or set via environment variables.

Configuration File

By default, the agent looks for configuration in /etc/kubenidra/kubenidra.yaml. You can specify a custom path using the -c flag:

KUBECONFIG=~/.kube/config ./bin/kubenidra-agent -c /path/to/config.yaml

Configuration Structure

# Logging
log_level: "info" # debug, info, warn, error

# Server configuration
server:
  port: 8118 # HTTP server port

# Prometheus configuration
prometheus:
  endpoint: "http://prometheus:9090" # Prometheus server URL
  timeout: "30s" # Query timeout

# Namespaces to watch
watched_namespaces:
  - "development"
  - "staging"
  - "test"
  # Empty array or omit to watch all namespaces

# Snooze configuration
snooze:
  # Resource thresholds
  cpu_threshold: 0.01 # CPU threshold in cores (0.01 = 10m cores)
  memory_threshold: 52428800 # Memory threshold in bytes (50MB)

  # Timing configuration
  idle_duration: "10m" # How long workload must be idle
  check_interval: "5m" # How often to check workloads
  wake_check_interval: "1m" # How often to check for wake conditions

  # Grace periods
  wake_grace_period: "30m" # Grace period after manual wake
  operation_cooldown: "5m" # Cooldown between operations
  new_workload_grace_period: "20m" # Extra grace for new workloads

  # Behavior mode
  behavior_mode: "aggressive" # aggressive, conservative, manual

  # Smart automation
  max_operations_per_hour: 10 # Max operations before backoff
  operation_history_limit: 50 # Number of operations to track
  backoff_multiplier: 2.0 # Multiplier for backoff calculation
  max_backoff_duration: "4h" # Maximum backoff duration

  # Metrics validation
  minimum_running_duration: "15m" # Min time workload must run
  minimum_data_coverage: 0.8 # Min percentage of time with data (80%)
  minimum_data_points: 5 # Min number of data points required
  prometheus_validation_enabled: true # Enable validation by default

Environment Variables

All configuration options can be set via environment variables with the KUBENIDRA_ prefix. Below are a few for example:

# Logging
export KUBENIDRA_LOG_LEVEL="info"

# Server configuration
export KUBENIDRA_SERVER_PORT=8118

# Prometheus configuration
export KUBENIDRA_PROMETHEUS_ENDPOINT="http://prometheus:9090"
export KUBENIDRA_PROMETHEUS_TIMEOUT="30s"

# Namespaces
export KUBENIDRA_WATCHED_NAMESPACES="development,staging,test"

# Snooze configuration
export KUBENIDRA_SNOOZE_CPU_THRESHOLD=0.01
export KUBENIDRA_SNOOZE_MEMORY_THRESHOLD=52428800
export KUBENIDRA_SNOOZE_IDLE_DURATION="10m"
export KUBENIDRA_SNOOZE_CHECK_INTERVAL="5m"
export KUBENIDRA_SNOOZE_BEHAVIOR_MODE="aggressive"
...

Configuration Options

Server Configuration

OptionTypeDefaultDescription
server.portint8118HTTP server port for health checks and metrics

Prometheus Configuration

OptionTypeDefaultDescription
prometheus.endpointstring"http://localhost:9090"Prometheus server URL
prometheus.timeoutduration"30s"Query timeout for Prometheus requests

Snooze Configuration

Resource Thresholds

OptionTypeDefaultDescription
snooze.cpu_thresholdfloat0.01CPU threshold in cores (0.01 = 10m cores)
snooze.memory_thresholdint52428800Memory threshold in bytes (50MB)

Timing Configuration

OptionTypeDefaultDescription
snooze.idle_durationduration"10m"How long workload must be idle before snoozing
snooze.check_intervalduration"5m"How often to check workloads for snooze conditions
snooze.wake_check_intervalduration"1m"How often to check for wake conditions

Grace Periods

OptionTypeDefaultDescription
snooze.wake_grace_periodduration"30m"Grace period after manual wake
snooze.operation_cooldownduration"5m"Cooldown between operations
snooze.new_workload_grace_periodduration"20m"Extra grace for new workloads

Behavior Mode

OptionTypeDefaultDescription
snooze.behavior_modestring"aggressive"Behavior mode: aggressive, conservative, manual

Behavior Modes:

  • aggressive: Actively snoozes idle workloads
  • conservative: More cautious, requires longer idle periods
  • manual: Only snoozes on explicit requests

Smart Automation

OptionTypeDefaultDescription
snooze.max_operations_per_hourint10Max operations before backoff
snooze.operation_history_limitint50Number of operations to track
snooze.backoff_multiplierfloat2.0Multiplier for backoff calculation
snooze.max_backoff_durationduration"4h"Maximum backoff duration

Metrics Validation

OptionTypeDefaultDescription
snooze.minimum_running_durationduration"15m"Min time workload must run before snoozing
snooze.minimum_data_coveragefloat0.8Min percentage of time window with data (80%)
snooze.minimum_data_pointsint5Min number of data points required
snooze.prometheus_validation_enabledbooltrueEnable Prometheus validation

Namespace Configuration

OptionTypeDefaultDescription
watched_namespaces[]string[""]Namespaces to watch (empty string = all namespaces)

Logging Configuration

OptionTypeDefaultDescription
log_levelstring"info"Log level: debug, info, warn, error

Configuration Validation

  1. Port: Must be between 1-65535
  2. Timeouts: Must be positive duration
  3. Thresholds: Must be positive numbers
  4. Intervals: Must be positive duration
  5. Coverage: Must be between 0.0-1.0
  6. Data points: Must be positive integer
  7. Behavior mode: Must be one of: aggressive, conservative, manual
  8. Log level: Must be one of: debug, info, warn, error

Configuration Reload

The agent does not support hot reloading. Configuration changes require a restart:

# Restart the agent
kubectl rollout restart deployment/kubenidra-agent -n kubenidra

# Check status
kubectl rollout status deployment/kubenidra-agent -n kubenidra

Configuration Best Practices

1. Start Conservative

Begin with conservative settings and gradually adjust:

snooze:
  behavior_mode: "conservative"
  idle_duration: "30m"
  check_interval: "10m"
  wake_grace_period: "60m"

2. Use Namespace Isolation

Limit scope to specific namespaces:

watched_namespaces:
  - "development"
  - "staging"
  # Avoid production initially

3. Enable Prometheus Validation

Always enable Prometheus validation in production:

snooze:
  prometheus_validation_enabled: true
  minimum_data_coverage: 0.8
  minimum_data_points: 5

4. Set Appropriate Thresholds

Adjust thresholds based on your workloads:

snooze:
  cpu_threshold: 0.1 # 100m cores
  memory_threshold: 104857600 # 100MB
  # Adjust based on your application patterns