File Stream Integration

Monitor and stream file contents in real-time with LogFlux Agent

📄

The LogFlux File Stream plugin monitors and streams file contents to LogFlux in real-time, providing functionality similar to tail -f with advanced features like glob pattern matching, file rotation handling, and efficient batching. Perfect for monitoring application logs, system files, and custom log formats.

Overview

The File Stream plugin provides:

  • Real-time File Following: Continuously monitors files for new content like tail -f
  • Glob Pattern Support: Monitor multiple files using patterns like /var/log/*.log
  • File Rotation Handling: Automatically detects file truncation and rotation
  • Efficient Batching: Groups multiple log lines for optimal transmission
  • Rich Metadata: Includes filename, line numbers, and directory information
  • Flexible Positioning: Start reading from beginning or end of files
  • Cross-platform Support: Works on Linux, macOS, and Windows
  • Graceful Shutdown: Properly handles signals and flushes pending logs

Installation

The File Stream plugin is included with the LogFlux Agent but disabled by default.

Prerequisites

  • LogFlux Agent installed (see Installation Guide)
  • Read access to the files you want to monitor
  • Sufficient disk space for file monitoring metadata

Enable the Plugin

1
2
3
4
5
# Enable and start the filestream plugin
sudo systemctl enable --now logflux-filestream

# Check status
sudo systemctl status logflux-filestream

Configuration

Basic Configuration

Create or edit the File Stream plugin configuration:

1
sudo nano /etc/logflux-agent/plugins/filestream.yaml

Basic configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File Stream Plugin Configuration
name: filestream
version: 1.0.0
source: filestream-plugin

# Agent connection
agent:
  socket_path: /tmp/logflux-agent.sock
  network: unix
  connect_timeout: 10s
  max_retries: 3
  retry_delay: 1s

# File monitoring
plugin:
  # Specific files to monitor
  files:
    - /var/log/app.log
    - /var/log/error.log
  
  # Glob patterns for multiple files
  patterns:
    - "/var/log/*.log"
    - "/tmp/app-*.txt"
  
  # Follow files in real-time
  follow: true
  
  # Start from end of file (false) or beginning (true)
  from_beginning: false
  
  # Include filename in metadata
  include_filename: true
  
  # File check interval
  check_interval: 100ms

# Logging metadata
logging:
  level: info
  labels:
    component: filestream
    source: files
  prefix: ""

# Batching for efficiency
batch:
  enabled: true
  max_size: 100
  flush_interval: 5s
  auto_flush: true

Advanced Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Advanced File Stream Configuration
name: filestream
version: 1.0.0
source: filestream-plugin

# Enhanced agent settings
agent:
  socket_path: /tmp/logflux-agent.sock
  network: unix
  address: ""
  connect_timeout: 30s
  auth_mode: auto
  shared_secret: ""
  shared_secret_file: ""
  auto_discovery: true
  max_retries: 5
  retry_delay: 2s

# Advanced file monitoring
plugin:
  # File lists and patterns
  files:
    - /var/log/nginx/access.log
    - /var/log/nginx/error.log
    - /var/log/postgresql/postgresql.log
    - /var/log/redis/redis-server.log
  
  patterns:
    - "/var/log/app-*/*.log"
    - "/tmp/debug-*.txt"
    - "/opt/services/*/logs/*.log"
  
  # Reading behavior
  follow: true
  from_beginning: false
  include_filename: true
  include_line_number: true
  
  # Performance tuning
  check_interval: 50ms
  buffer_size: 1048576  # 1MB
  max_line_length: 65536  # 64KB
  
  # File rotation handling
  rotation_detection: true
  reopen_on_truncate: true
  rescan_interval: 30s
  
  # Filtering
  exclude_patterns:
    - "^\\s*$"  # Empty lines
    - "^#.*"    # Comments
  
  include_patterns:
    - ".*ERROR.*"
    - ".*WARN.*"
    - ".*INFO.*"

# Enhanced metadata
logging:
  level: info
  labels:
    component: filestream
    plugin: filestream
    environment: production
    server: web-01
  prefix: "FILE"
  verbose: false
  
  # Custom field mapping
  field_mapping:
    file_path: "source_file"
    line_number: "line"
    filename: "file"

# Advanced batching
batch:
  enabled: true
  max_size: 500
  flush_interval: 10s
  auto_flush: true
  
  # Memory management
  max_memory: 50MB
  compression: false

# Monitoring and health
health:
  check_interval: 60s
  max_file_age: 24h
  alert_on_missing: true
  
# Security
security:
  max_open_files: 1000
  file_permissions_check: true
  sandbox_mode: false

Usage Examples

Monitor Application Logs

1
2
3
4
5
6
7
8
# Monitor single application log
sudo logflux-filestream -files /var/log/app.log

# Monitor multiple application logs
sudo logflux-filestream -files "/var/log/app.log,/var/log/error.log"

# Use configuration file
sudo logflux-filestream -config /etc/logflux-agent/plugins/filestream.yaml

Web Server Log Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# nginx log monitoring
plugin:
  files:
    - /var/log/nginx/access.log
    - /var/log/nginx/error.log
  
  patterns:
    - "/var/log/nginx/sites/*_access.log"
    - "/var/log/nginx/sites/*_error.log"
  
  follow: true
  from_beginning: false

logging:
  labels:
    service: nginx
    log_type: web_server

Database Log Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# postgresql log monitoring
plugin:
  files:
    - /var/log/postgresql/postgresql-13-main.log
  
  patterns:
    - "/var/log/postgresql/*.log"
  
  # Include historical logs
  from_beginning: true

logging:
  labels:
    service: postgresql
    log_type: database

Development Environment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# development log monitoring
plugin:
  patterns:
    - "/tmp/debug-*.log"
    - "/var/log/development/*.log"
    - "./logs/*.log"
  
  follow: true
  from_beginning: true
  check_interval: 50ms

logging:
  level: debug
  labels:
    environment: development

Command Line Usage

Basic Commands

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Stream a single file
logflux-filestream -files /var/log/app.log

# Stream multiple files
logflux-filestream -files "/var/log/app.log,/var/log/error.log"

# Use glob patterns
logflux-filestream -patterns "/var/log/*.log,/tmp/app-*.txt"

# Read from beginning
logflux-filestream -files /var/log/app.log -beginning

# Custom batch size
logflux-filestream -files /var/log/app.log -batch-size 50

# Disable batching (not recommended)
logflux-filestream -files /var/log/app.log -no-batch

# Verbose output
logflux-filestream -files /var/log/app.log -verbose

Advanced Options

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Custom configuration
logflux-filestream -config /path/to/config.yaml

# Custom flush interval
logflux-filestream -files /var/log/app.log -flush-interval 10s

# Include filename in metadata
logflux-filestream -files /var/log/app.log -include-filename

# Show version
logflux-filestream -version

# Show help
logflux-filestream -help

Metadata and Output Format

Metadata Fields

The plugin adds rich metadata to each log entry:

Field Description Example
source_type Always “plugin” plugin
source_name Always “filestream” filestream
file_path Full path to source file /var/log/app.log
filename Base filename app.log
directory Directory path /var/log
line_number Line number in file 1234

LogFlux Output Format

Input File Line:

2024-01-20 14:30:50 [ERROR] Database connection failed: timeout

Output LogFlux Log:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "timestamp": "2024-01-20T14:30:50.000Z",
  "level": "info",
  "message": "2024-01-20 14:30:50 [ERROR] Database connection failed: timeout",
  "node": "files",
  "metadata": {
    "source_type": "plugin",
    "source_name": "filestream",
    "file_path": "/var/log/app.log",
    "filename": "app.log",
    "directory": "/var/log",
    "line_number": 1234,
    "component": "filestream",
    "environment": "production"
  }
}

File Rotation Handling

Automatic Detection

The plugin automatically handles common file rotation scenarios:

  1. File Truncation: When a file is truncated (size becomes smaller), seeks to beginning
  2. File Rotation: Detects when files are moved/renamed and continues reading
  3. New Files: When using glob patterns, periodically scans for new matching files
  4. Log Rotation: Works with logrotate, rsyslog, and other rotation systems

Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
plugin:
  # Enable rotation detection
  rotation_detection: true
  
  # Reopen file when truncated
  reopen_on_truncate: true
  
  # Rescan for new files every 30s
  rescan_interval: 30s
  
  # Handle common rotation patterns
  patterns:
    - "/var/log/app.log*"  # Includes rotated files
    - "/var/log/nginx/*.log*"

Logrotate Integration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# /etc/logrotate.d/app
/var/log/app.log {
    daily
    rotate 7
    compress
    delaycompress
    missingok
    notifempty
    copytruncate  # Compatible with filestream
}

Performance Optimization

High-Volume Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# High-throughput settings
plugin:
  check_interval: 25ms
  buffer_size: 2097152  # 2MB
  max_line_length: 131072  # 128KB

batch:
  max_size: 1000
  flush_interval: 30s
  max_memory: 200MB

health:
  check_interval: 300s  # Less frequent health checks

Resource-Constrained Environments

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Low-resource settings
plugin:
  check_interval: 1s
  buffer_size: 65536  # 64KB
  max_open_files: 10

batch:
  max_size: 25
  flush_interval: 5s
  max_memory: 10MB

# Monitor fewer files
patterns:
  - "/var/log/critical.log"

Memory Management

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Memory optimization
batch:
  max_memory: 25MB
  compression: true  # Compress batches

plugin:
  max_line_length: 8192  # 8KB max line
  buffer_size: 262144    # 256KB buffer

# Limit concurrent files
security:
  max_open_files: 50

Monitoring and Health Checks

Plugin Health Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/bin/bash
# check-filestream-plugin.sh

if ! systemctl is-active --quiet logflux-filestream; then
    echo "CRITICAL: LogFlux filestream plugin is not running"
    exit 2
fi

# Check if plugin is reading files
if ! journalctl -u logflux-filestream --since="5 minutes ago" | grep -q "lines processed"; then
    echo "WARNING: No file processing detected in last 5 minutes"
    exit 1
fi

echo "OK: LogFlux filestream plugin is healthy"
exit 0

File Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Monitor file access
sudo lsof | grep logflux-filestream

# Check open file descriptors
sudo ls -la /proc/$(pgrep logflux-filestream)/fd/

# Monitor disk usage
df -h /var/log/

# Check file permissions
ls -la /var/log/*.log

Common Use Cases

Web Application Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Monitor web app stack
plugin:
  files:
    - /var/log/nginx/access.log
    - /var/log/nginx/error.log
    - /var/log/php8.1-fpm.log
  
  patterns:
    - "/var/www/*/logs/*.log"

logging:
  labels:
    stack: lamp
    application: webstore

Microservices Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Monitor containerized services
plugin:
  patterns:
    - "/var/lib/docker/containers/*/*-json.log"
    - "/opt/services/*/logs/*.log"
    - "/tmp/service-*.log"

logging:
  labels:
    architecture: microservices
    deployment: docker

Development and Debugging

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Development environment
plugin:
  files:
    - ./debug.log
    - ./application.log
  
  patterns:
    - "./logs/*.log"
    - "/tmp/debug-*.log"
  
  from_beginning: true
  check_interval: 100ms

logging:
  level: debug
  labels:
    environment: development

System Log Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# System and security logs
plugin:
  files:
    - /var/log/syslog
    - /var/log/auth.log
    - /var/log/fail2ban.log
  
  patterns:
    - "/var/log/security/*.log"

logging:
  labels:
    log_type: system
    security: true

Security Considerations

File System Permissions

1
2
3
4
5
6
7
# Ensure proper permissions
sudo chown logflux:logflux /etc/logflux-agent/plugins/filestream.yaml
sudo chmod 644 /etc/logflux-agent/plugins/filestream.yaml

# Add user to log group if needed
sudo usermod -a -G adm logflux
sudo usermod -a -G systemd-journal logflux

Access Controls

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Security configuration
security:
  max_open_files: 100
  file_permissions_check: true
  sandbox_mode: true

# Limit patterns to safe directories
plugin:
  patterns:
    - "/var/log/*.log"     # Safe: standard log directory
    - "/opt/app/logs/*.log" # Safe: application directory
  
  # Avoid dangerous patterns
  # - "/*"                 # Dangerous: entire filesystem
  # - "/etc/*"             # Dangerous: system configs

Data Sanitization

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Filter sensitive information
plugin:
  exclude_patterns:
    - ".*password.*"
    - ".*token.*"
    - ".*secret.*"
    - ".*api[_-]?key.*"

logging:
  # Sanitize metadata
  sanitize_paths: true
  exclude_sensitive: true

Troubleshooting

Common Issues

Plugin Won’t Start:

1
2
3
4
5
6
7
8
# Check configuration syntax
sudo logflux-filestream -config /etc/logflux-agent/plugins/filestream.yaml -help

# Verify file permissions
ls -la /var/log/*.log

# Check system limits
ulimit -n

Files Not Being Monitored:

1
2
3
4
5
6
7
8
# Test file access
sudo -u logflux cat /var/log/app.log

# Check glob pattern matching
ls -la /var/log/*.log

# Verify file exists and is readable
stat /var/log/app.log

High Memory Usage:

1
2
3
4
5
6
7
8
# Reduce memory consumption
batch:
  max_memory: 10MB
  max_size: 50

plugin:
  buffer_size: 65536  # 64KB
  max_line_length: 4096  # 4KB

Missing Log Lines:

1
2
3
4
5
6
7
# Increase buffer and check interval
plugin:
  buffer_size: 2097152  # 2MB
  check_interval: 50ms

batch:
  flush_interval: 1s  # Faster flushing

Debugging

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Enable debug logging
sudo systemctl edit logflux-filestream
# Add:
[Service]
Environment="LOGFLUX_LOG_LEVEL=debug"

# Monitor plugin activity
sudo journalctl -u logflux-filestream -f

# Test file monitoring
echo "test message" >> /var/log/app.log

# Check file descriptors
sudo lsof -p $(pgrep logflux-filestream)

Best Practices

Configuration Management

  1. Use specific file paths when possible instead of broad glob patterns
  2. Enable batching for production environments to improve performance
  3. Set appropriate buffer sizes based on log line lengths
  4. Monitor file descriptor usage to prevent resource exhaustion

Performance

  1. Optimize check intervals - balance responsiveness vs CPU usage
  2. Use efficient glob patterns - avoid overly broad patterns
  3. Enable compression for high-volume environments
  4. Monitor memory usage and set appropriate limits

Security

  1. Follow least privilege principle - only monitor necessary files
  2. Use specific directory patterns - avoid monitoring sensitive locations
  3. Filter sensitive information - exclude passwords, tokens, etc.
  4. Regular permission audits - ensure appropriate file access

Monitoring

  1. Set up health checks for plugin availability
  2. Monitor file growth rates and disk usage
  3. Track processing rates and identify bottlenecks
  4. Alert on permission errors or file access issues

Integration Examples

ELK Stack Replacement

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Replace Filebeat with LogFlux filestream
plugin:
  patterns:
    - "/var/log/elasticsearch/*.log"
    - "/var/log/logstash/*.log"
    - "/var/log/kibana/*.log"

logging:
  labels:
    stack: elk
    replaced: filebeat

Splunk Migration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Migrate from Splunk Universal Forwarder
plugin:
  files:
    - /var/log/app.log
    - /var/log/security.log
  
  patterns:
    - "/opt/apps/*/logs/*.log"

logging:
  labels:
    migration: splunk
    indexer: logflux

Cloud Migration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Prepare for cloud migration
plugin:
  patterns:
    - "/var/log/*.log"
  
  # Include cloud metadata
logging:
  labels:
    cloud_ready: true
    migration_phase: preparation

Next Steps