Amazon CloudWatch Integration

Retrieve and stream logs from AWS CloudWatch Logs with LogFlux Agent

AWS CloudWatch

The LogFlux CloudWatch integration retrieves and streams logs from Amazon CloudWatch Logs, enabling centralized log analysis from your AWS infrastructure. This plugin provides seamless integration with AWS CloudWatch Logs service, supporting multiple authentication methods and advanced filtering capabilities.

Overview

The CloudWatch plugin provides:

  • CloudWatch Logs Integration: Direct connection to AWS CloudWatch Logs service
  • Multiple Authentication Methods: IAM roles, profiles, access keys, and credential chains
  • Log Group and Stream Filtering: Target specific log groups and streams
  • Pattern Filtering: Apply CloudWatch filter patterns to reduce noise
  • Follow Mode: Continuously poll for new log entries in real-time
  • Batch Processing: Efficient batching for high-volume log retrieval
  • Flexible Time Ranges: Query historical logs or stream real-time data
  • Auto-discovery: Discover available log groups automatically
  • Cross-Region Support: Connect to CloudWatch in any AWS region

Installation

The CloudWatch plugin is included with the LogFlux Agent but disabled by default.

Prerequisites

  • LogFlux Agent installed (see Installation Guide)
  • AWS credentials configured (IAM role, AWS CLI profile, or access keys)
  • Appropriate IAM permissions for CloudWatch Logs access
  • Network connectivity to AWS CloudWatch endpoints

Required IAM Permissions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams",
        "logs:GetLogEvents",
        "logs:FilterLogEvents"
      ],
      "Resource": "*"
    }
  ]
}

Enable the Plugin

1
2
3
4
5
# Enable and start the CloudWatch plugin
sudo systemctl enable --now logflux-cloudwatch

# Check status
sudo systemctl status logflux-cloudwatch

Configuration

Basic Configuration

Create or edit the CloudWatch plugin configuration:

1
sudo nano /etc/logflux-agent/plugins/cloudwatch.yaml

Basic configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# CloudWatch Plugin Configuration
name: cloudwatch
version: 1.0.0
source: cloudwatch-plugin

# Agent connection
agent:
  socket_path: /tmp/logflux-agent.sock

# AWS Configuration
aws:
  region: us-east-1
  profile: ""  # AWS profile name (optional)
  
# Log retrieval settings
cloudwatch:
  # Log groups to monitor
  log_groups:
    - "/aws/lambda/my-function"
    - "/aws/apigateway/my-api"
  
  # Specific log streams (optional)
  log_streams: []
  
  # Follow mode for real-time streaming
  follow: true
  poll_interval: 30s
  
  # Maximum events per request
  max_events: 10000
  
  # Filter pattern (CloudWatch syntax)
  filter_pattern: ""

# Logging metadata
logging:
  verbose: false
  labels:
    plugin: cloudwatch
    source: aws

# Batching for efficiency
batch:
  enabled: true
  size: 100
  flush_interval: 5s

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# Advanced CloudWatch Configuration
name: cloudwatch
version: 1.0.0
source: cloudwatch-plugin

# Enhanced agent settings
agent:
  socket_path: /tmp/logflux-agent.sock
  connect_timeout: 30s
  max_retries: 5
  retry_delay: 10s

# AWS Configuration
aws:
  region: us-west-2
  
  # Authentication options
  profile: "production"  # Named AWS profile
  
  # Or explicit credentials (not recommended for production)
  # access_key: "AKIA..."
  # secret_key: "..."
  # session_token: "..."  # For temporary credentials

# Advanced CloudWatch settings
cloudwatch:
  # Multiple log groups with patterns
  log_groups:
    - "/aws/lambda/*"
    - "/aws/apigateway/*"
    - "/aws/ecs/cluster/*"
    - "/aws/rds/instance/*/error"
    - "/aws/elasticloadbalancing/*"
  
  # Specific log streams
  log_streams:
    - "2024/01/20/[$LATEST]"
    - "application-logs"
  
  # Time range for historical data
  start_time: "-1h"  # 1 hour ago
  end_time: ""       # Now (empty = current time)
  
  # Real-time following
  follow: true
  poll_interval: 15s
  
  # Request limits
  max_events: 50000
  
  # Advanced filtering
  filter_pattern: '[timestamp, request_id, level="ERROR", ...]'
  
  # Auto-discovery settings
  auto_discover: true
  discovery_pattern: "/aws/lambda/*"

# Enhanced metadata
logging:
  verbose: true
  labels:
    plugin: cloudwatch
    source: aws
    environment: production
    region: us-west-2
  
  # Custom field mapping
  field_mapping:
    log_group: "aws_log_group"
    log_stream: "aws_log_stream"
    event_id: "aws_event_id"
    ingestion_time: "aws_ingestion_time"

# Advanced batching
batch:
  enabled: true
  size: 500
  buffer_size: 10000
  flush_interval: 10s
  
  # Memory management
  max_memory: 100MB

# Monitoring and health
health:
  check_interval: 60s
  max_api_errors: 10
  alert_on_rate_limit: true

Usage Examples

Lambda Function Logs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Monitor specific Lambda function
sudo logflux-cloudwatch \
  -region us-east-1 \
  -log-groups "/aws/lambda/my-function" \
  -follow

# Monitor multiple Lambda functions
sudo logflux-cloudwatch \
  -region us-east-1 \
  -log-groups "/aws/lambda/function1,/aws/lambda/function2" \
  -follow

# Filter for errors only
sudo logflux-cloudwatch \
  -region us-east-1 \
  -log-groups "/aws/lambda/my-function" \
  -filter-pattern "[timestamp, request_id, level=ERROR, ...]" \
  -follow

API Gateway Logs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# API Gateway monitoring
cloudwatch:
  log_groups:
    - "/aws/apigateway/my-api"
  
  filter_pattern: '[timestamp, request_id, ip, user, timestamp, method, resource, protocol, status, error, ...]'
  
  follow: true
  poll_interval: 30s

logging:
  labels:
    service: api_gateway
    log_type: access

ECS Container Logs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# ECS cluster monitoring
cloudwatch:
  log_groups:
    - "/aws/ecs/containerinsights/my-cluster/application"
    - "/aws/ecs/containerinsights/my-cluster/performance"
  
  follow: true
  poll_interval: 20s

logging:
  labels:
    service: ecs
    cluster: my-cluster

RDS Database Logs

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# RDS error log monitoring
cloudwatch:
  log_groups:
    - "/aws/rds/instance/prod-db/error"
    - "/aws/rds/instance/prod-db/slowquery"
  
  filter_pattern: "ERROR"
  follow: true

logging:
  labels:
    service: rds
    database: prod-db
    log_type: database

Command Line Usage

Basic Commands

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Monitor specific log group
logflux-cloudwatch -region us-east-1 -log-groups "/aws/lambda/my-function"

# Follow mode for real-time logs
logflux-cloudwatch -region us-east-1 -log-groups "/aws/lambda/my-function" -follow

# Historical logs from last hour
logflux-cloudwatch -region us-east-1 -log-groups "/aws/lambda/my-function" -start-time "-1h"

# Multiple log groups
logflux-cloudwatch -region us-east-1 -log-groups "/aws/lambda/func1,/aws/lambda/func2"

# Using AWS profile
logflux-cloudwatch -profile production -region us-west-2 -log-groups "/aws/lambda/my-function"

# Specific time range
logflux-cloudwatch -region us-east-1 \
  -log-groups "/aws/lambda/my-function" \
  -start-time "2024-01-20T10:00:00Z" \
  -end-time "2024-01-20T11:00:00Z"

Advanced Options

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Custom filter pattern
logflux-cloudwatch -region us-east-1 \
  -log-groups "/aws/lambda/my-function" \
  -filter-pattern "[timestamp, request_id, level=ERROR, ...]"

# Specific log streams
logflux-cloudwatch -region us-east-1 \
  -log-groups "/aws/lambda/my-function" \
  -log-streams "2024/01/20/[$LATEST]a1b2c3d4"

# Custom batch settings
logflux-cloudwatch -region us-east-1 \
  -log-groups "/aws/lambda/my-function" \
  -batch-size 200 \
  -flush-interval 10s

# Explicit AWS credentials (not recommended)
logflux-cloudwatch -region us-east-1 \
  -access-key "AKIA..." \
  -secret-key "..." \
  -log-groups "/aws/lambda/my-function"

# Verbose output
logflux-cloudwatch -region us-east-1 \
  -log-groups "/aws/lambda/my-function" \
  -verbose

# Configuration file
logflux-cloudwatch -config /etc/logflux-agent/plugins/cloudwatch.yaml

Authentication Methods

1
2
3
4
5
# EC2 instance with IAM role
# No additional configuration needed - uses instance profile

# ECS task with task role
# Configure task definition with appropriate IAM role

AWS Profile

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Configure AWS CLI profile
aws configure --profile production
AWS Access Key ID [None]: AKIA...
AWS Secret Access Key [None]: ...
Default region name [None]: us-east-1
Default output format [None]: json

# Use in configuration
aws:
  profile: "production"
  region: us-east-1

Environment Variables

1
2
3
4
5
6
7
# Set AWS credentials via environment
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

# Optional session token for temporary credentials
export AWS_SESSION_TOKEN="..."

Explicit Credentials

1
2
3
4
5
6
# Not recommended for production
aws:
  access_key: "AKIA..."
  secret_key: "..."
  session_token: ""  # Optional
  region: us-east-1

CloudWatch Filter Patterns

Common Filter Patterns

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Error logs only
-filter-pattern "ERROR"

# Specific log level
-filter-pattern '[timestamp, request_id, level="ERROR", ...]'

# Multiple conditions
-filter-pattern '[timestamp, request_id, level="ERROR" || level="WARN", ...]'

# Field extraction
-filter-pattern '[timestamp, request_id="*-*-*", level, message]'

# Numeric filtering
-filter-pattern '[timestamp, request_id, level, duration > 1000]'

# Exclude patterns
-filter-pattern '[timestamp, request_id, level != "DEBUG", ...]'

# JSON log filtering
-filter-pattern '{ $.level = "ERROR" }'

# Complex JSON filtering
-filter-pattern '{ ($.level = "ERROR") && ($.service = "api") }'

Pattern Examples by Service

Lambda Functions:

1
2
3
4
5
6
7
8
# Lambda errors
-filter-pattern "[timestamp, request_id, level=ERROR, ...]"

# Lambda cold starts
-filter-pattern "INIT_START"

# Lambda timeouts
-filter-pattern "Task timed out"

API Gateway:

1
2
3
4
5
# 4xx/5xx responses
-filter-pattern "[timestamp, request_id, ip, user, timestamp, method, resource, protocol, status>=400, ...]"

# Specific endpoint errors
-filter-pattern '[timestamp, request_id, ip, user, timestamp, method, resource="/api/users", protocol, status>=400, ...]'

ECS/Container:

1
2
3
4
5
# Container crashes
-filter-pattern "OOMKilled"

# Health check failures
-filter-pattern "Health check failed"

Metadata and Output Format

Metadata Fields

The plugin adds CloudWatch-specific metadata:

Field Description Example
source_type Always “plugin” plugin
source_name Always “cloudwatch” cloudwatch
aws_log_group CloudWatch log group name /aws/lambda/my-function
aws_log_stream CloudWatch log stream name 2024/01/20/[$LATEST]a1b2c3d4
aws_event_id CloudWatch event ID 12345678901234567890
aws_ingestion_time CloudWatch ingestion timestamp 1642679850000
aws_region AWS region us-east-1

LogFlux Output Format

Input CloudWatch Event:

1
2
3
4
5
6
7
8
{
  "eventId": "12345678901234567890",
  "ingestionTime": 1642679850000,
  "logGroupName": "/aws/lambda/my-function",
  "logStreamName": "2024/01/20/[$LATEST]a1b2c3d4",
  "message": "ERROR: Database connection failed",
  "timestamp": 1642679850000
}

Output LogFlux Log:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "timestamp": "2024-01-20T14:30:50.000Z",
  "level": "info",
  "message": "ERROR: Database connection failed",
  "node": "aws",
  "metadata": {
    "source_type": "plugin",
    "source_name": "cloudwatch",
    "aws_log_group": "/aws/lambda/my-function",
    "aws_log_stream": "2024/01/20/[$LATEST]a1b2c3d4",
    "aws_event_id": "12345678901234567890",
    "aws_ingestion_time": 1642679850000,
    "aws_region": "us-east-1",
    "plugin": "cloudwatch",
    "environment": "production"
  }
}

Performance Optimization

High-Volume Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# High-throughput settings
cloudwatch:
  max_events: 100000
  poll_interval: 10s
  
batch:
  size: 1000
  buffer_size: 50000
  flush_interval: 30s
  max_memory: 500MB

# Use specific log groups to reduce API calls
log_groups:
  - "/aws/lambda/high-volume-function"

Cost Optimization

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Reduce CloudWatch API calls
cloudwatch:
  poll_interval: 60s  # Less frequent polling
  max_events: 1000    # Smaller batch sizes
  
  # Use filter patterns to reduce data transfer
  filter_pattern: "ERROR"

# Target specific log streams
log_streams:
  - "recent-stream-name"

Regional Optimization

1
2
3
4
5
6
# Run plugin in same region as resources
aws:
  region: us-east-1  # Same region as log groups

# Use VPC endpoints to avoid data transfer costs
# Configure VPC endpoint for logs.region.amazonaws.com

Monitoring and Alerting

Plugin Health Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
# check-cloudwatch-plugin.sh

if ! systemctl is-active --quiet logflux-cloudwatch; then
    echo "CRITICAL: LogFlux CloudWatch plugin is not running"
    exit 2
fi

# Check AWS connectivity
if ! aws logs describe-log-groups --region us-east-1 --max-items 1 &>/dev/null; then
    echo "CRITICAL: Cannot connect to CloudWatch Logs API"
    exit 2
fi

# Check recent log processing
if ! journalctl -u logflux-cloudwatch --since="10 minutes ago" | grep -q "events processed"; then
    echo "WARNING: No events processed in last 10 minutes"
    exit 1
fi

echo "OK: LogFlux CloudWatch plugin is healthy"
exit 0

CloudWatch Metrics Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Monitor API usage
aws cloudwatch get-metric-statistics \
  --namespace AWS/Logs \
  --metric-name IncomingLogEvents \
  --dimensions Name=LogGroupName,Value=/aws/lambda/my-function \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
  --period 300 \
  --statistics Sum

# Monitor API throttling
aws logs describe-metric-filters \
  --log-group-name /aws/lambda/my-function

Common Use Cases

AWS Lambda Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Lambda function monitoring
cloudwatch:
  log_groups:
    - "/aws/lambda/api-handler"
    - "/aws/lambda/data-processor"
    - "/aws/lambda/auth-service"
  
  filter_pattern: '[timestamp, request_id, level="ERROR", ...]'
  follow: true
  poll_interval: 30s

logging:
  labels:
    service: lambda
    environment: production

Microservices on ECS

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# ECS service monitoring
cloudwatch:
  log_groups:
    - "/ecs/user-service"
    - "/ecs/order-service"
    - "/ecs/payment-service"
  
  follow: true
  poll_interval: 20s

logging:
  labels:
    architecture: microservices
    platform: ecs

Database Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# RDS and Aurora monitoring
cloudwatch:
  log_groups:
    - "/aws/rds/instance/prod-db/error"
    - "/aws/rds/cluster/aurora-prod/audit"
    - "/aws/rds/instance/prod-db/slowquery"
  
  filter_pattern: "ERROR"
  follow: true

logging:
  labels:
    service: database
    tier: data

API Gateway Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# API Gateway access logs
cloudwatch:
  log_groups:
    - "/aws/apigateway/prod-api"
    - "/aws/apigateway/stage-api"
  
  # Monitor 4xx and 5xx errors
  filter_pattern: '[timestamp, request_id, ip, user, timestamp, method, resource, protocol, status>=400, ...]'
  
  follow: true
  poll_interval: 30s

logging:
  labels:
    service: api_gateway
    log_type: access

Security Considerations

IAM Best Practices

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:DescribeLogGroups",
        "logs:GetLogEvents",
        "logs:FilterLogEvents"
      ],
      "Resource": [
        "arn:aws:logs:us-east-1:123456789012:log-group:/aws/lambda/*",
        "arn:aws:logs:us-east-1:123456789012:log-group:/aws/apigateway/*"
      ]
    }
  ]
}

Network Security

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# VPC endpoint for CloudWatch Logs
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-12345678 \
  --service-name com.amazonaws.us-east-1.logs \
  --route-table-ids rtb-12345678

# Security group for VPC endpoint
aws ec2 create-security-group \
  --group-name cloudwatch-logs-endpoint \
  --description "Security group for CloudWatch Logs VPC endpoint"

Credential Management

1
2
3
4
5
6
7
# Use IAM roles instead of access keys
aws:
  region: us-east-1
  # No credentials - use IAM role

# Rotate credentials regularly if using access keys
# Store credentials in AWS Secrets Manager or Parameter Store

Troubleshooting

Common Issues

Authentication Failures:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Check AWS credentials
aws sts get-caller-identity

# Test CloudWatch access
aws logs describe-log-groups --region us-east-1 --max-items 1

# Check IAM permissions
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/LogFluxRole \
  --action-names logs:DescribeLogGroups \
  --resource-arns "arn:aws:logs:us-east-1:123456789012:*"

No Logs Retrieved:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Verify log group exists
aws logs describe-log-groups \
  --log-group-name-prefix "/aws/lambda/my-function" \
  --region us-east-1

# Check log group has recent data
aws logs describe-log-streams \
  --log-group-name "/aws/lambda/my-function" \
  --order-by LastEventTime \
  --descending \
  --max-items 5

# Test filter pattern
aws logs filter-log-events \
  --log-group-name "/aws/lambda/my-function" \
  --start-time 1642679850000 \
  --filter-pattern "ERROR"

Rate Limiting:

1
2
3
4
5
6
7
8
# Check CloudWatch Logs quotas
aws service-quotas get-service-quota \
  --service-code logs \
  --quota-code L-F50550BC  # GetLogEvents rate

# Increase poll interval
cloudwatch:
  poll_interval: 60s  # Reduce API frequency

High Costs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Optimize for cost
cloudwatch:
  # Use specific log groups
  log_groups:
    - "/aws/lambda/critical-function"
  
  # Apply filters to reduce data transfer
  filter_pattern: "ERROR"
  
  # Increase poll interval
  poll_interval: 300s  # 5 minutes

Debugging

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Enable verbose logging
sudo systemctl edit logflux-cloudwatch
# Add:
[Service]
Environment="LOGFLUX_LOG_LEVEL=debug"

# Monitor API calls
aws logs describe-log-groups --debug

# Check plugin logs
sudo journalctl -u logflux-cloudwatch -f

# Test connectivity
telnet logs.us-east-1.amazonaws.com 443

Best Practices

Configuration Management

  1. Use IAM roles instead of access keys when possible
  2. Apply filter patterns to reduce costs and noise
  3. Monitor specific log groups rather than broad patterns
  4. Set appropriate poll intervals based on log volume

Performance

  1. Optimize batch sizes for your log volume
  2. Use regional optimization - run in same region as log groups
  3. Implement VPC endpoints to reduce data transfer costs
  4. Monitor CloudWatch API quotas and adjust accordingly

Security

  1. Follow least privilege principle for IAM permissions
  2. Use VPC endpoints for private connectivity
  3. Rotate credentials regularly if using access keys
  4. Monitor API access through CloudTrail

Cost Management

  1. Use filter patterns to reduce data retrieval
  2. Target specific log streams when possible
  3. Adjust poll intervals based on requirements
  4. Monitor CloudWatch costs in AWS Billing

Disclaimer

Amazon Web Services, AWS, CloudWatch, and the AWS logo are trademarks of Amazon.com, Inc. or its affiliates. LogFlux is not affiliated with, endorsed by, or sponsored by Amazon Web Services, Inc. The AWS services and logos are referenced solely for identification purposes to indicate compatibility with AWS CloudWatch Logs.

Next Steps