Zipkin Distributed Tracing Integration

Receive and process distributed traces from Zipkin-instrumented applications with LogFlux Agent

Zipkin

The LogFlux Zipkin integration receives distributed traces from Zipkin-instrumented applications and converts them into searchable trace entries within LogFlux. This plugin provides full Zipkin v2 API compatibility while adding the benefits of LogFlux’s centralized observability platform, advanced querying capabilities, and correlation with logs and metrics.

Overview

The Zipkin plugin provides:

  • Full Zipkin v2 API Compatibility: Drop-in replacement for Zipkin server with /api/v2/spans endpoint
  • Legacy v1 Support: Backward compatibility with older Zipkin clients
  • High-Performance Processing: >20,000 spans/second throughput with batching optimization
  • Comprehensive Span Data: Preserves all Zipkin span metadata including tags, annotations, and timing
  • Service Discovery Integration: Automatic service and operation discovery from trace data
  • Advanced Filtering: Filter spans by service, operation, duration, errors, and custom tags
  • Centralized Storage: Store traces in LogFlux for correlation with logs and metrics
  • Multi-Client Support: Compatible with all major Zipkin client libraries

Installation

The Zipkin plugin is included with the LogFlux Agent but disabled by default.

Prerequisites

  • LogFlux Agent installed (see Installation Guide)
  • Applications instrumented with Zipkin tracing libraries
  • Network connectivity on port 9411 (Zipkin standard port)
  • Optional: Load balancer configuration for high availability

Enable the Plugin

1
2
3
4
5
# Enable and start the Zipkin plugin
sudo systemctl enable --now logflux-zipkin

# Check status
sudo systemctl status logflux-zipkin

Configuration

Basic Configuration

Create or edit the Zipkin plugin configuration:

1
sudo nano /etc/logflux-agent/plugins/zipkin.yaml

Basic configuration:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# Zipkin Plugin Configuration
name: zipkin
version: 1.0.0
source: zipkin-plugin

# Agent connection
agent:
  socket_path: /tmp/logflux-agent.sock

# HTTP server settings
server:
  # Zipkin standard port
  bind_addr: "0.0.0.0:9411"
  
  # Request limits
  max_request_size: "10MB"
  max_spans_per_request: 5000
  
  # Timeouts
  read_timeout: "30s"
  write_timeout: "10s"
  idle_timeout: "120s"

# Span processing
processing:
  # Convert span data to log levels
  auto_detect_levels: true
  
  # Default level for spans without error indicators
  default_level: "info"

# Metadata and labeling
metadata:
  labels:
    plugin: zipkin
    source: distributed_tracing
  
  # Include comprehensive span metadata
  include_span_metadata: true

# Batching for efficiency
batch:
  enabled: true
  size: 100
  flush_interval: 5s

Advanced Configuration

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# Advanced Zipkin Configuration
name: zipkin
version: 1.0.0
source: zipkin-plugin

# Enhanced agent settings
agent:
  socket_path: /tmp/logflux-agent.sock
  connect_timeout: 30s
  max_retries: 5
  retry_delay: 10s

# HTTP server configuration
server:
  bind_addr: "0.0.0.0:9411"
  
  # Enhanced request handling
  max_request_size: "25MB"
  max_spans_per_request: 10000
  max_concurrent_requests: 200
  
  # Timeout configuration
  read_timeout: "60s"
  write_timeout: "30s"
  idle_timeout: "300s"
  
  # Keep-alive settings
  keep_alive_enabled: true
  keep_alive_timeout: "30s"

# Advanced span filtering
filters:
  # Service filtering
  allowed_services:
    - "user-service"
    - "payment-service"
    - "notification-service"
  
  denied_services:
    - "health-check-service"
    - "ping-service"
  
  # Operation filtering
  allowed_operations: []
  denied_operations:
    - "health-check"
    - "ping"
    - "metrics"
  
  # Duration-based filtering (microseconds)
  min_duration: 1000        # 1ms minimum
  max_duration: 30000000    # 30s maximum
  
  # Error and warning filtering
  errors_only: false
  include_warnings: true
  
  # Tag-based filtering
  required_tags:
    - "http.method"
    - "http.url"
  
  excluded_tags:
    - "internal.debug"
    - "temp.data"
  
  # Custom tag filters
  tag_filters:
    "http.status_code": "^[45]\\d{2}$"  # 4xx and 5xx errors only
    "environment": "production"

# Enhanced span processing
processing:
  # Level detection from span data
  auto_detect_levels: true
  level_detection:
    error_indicators:
      - "error=true"
      - "http.status_code=5xx"
      - "exception"
    warning_indicators:
      - "warning=true"
      - "http.status_code=4xx"
      - "slow_query"
  
  # Span enrichment
  enrich_spans: true
  enrichment:
    add_service_version: true
    add_environment_info: true
    add_host_info: true
  
  # Data sanitization
  sanitize_data: true
  sensitive_tag_patterns:
    - "password"
    - "secret"
    - "token"
    - "api_key"

# Performance optimization
performance:
  # Batch processing
  batch_size: 200
  flush_interval: 10s
  max_batch_wait: "30s"
  
  # Memory management
  max_memory: "512MB"
  span_buffer_size: 10000
  
  # Concurrent processing
  worker_threads: 10
  queue_size: 50000
  
  # Rate limiting
  max_spans_per_second: 10000
  burst_limit: 15000

# Enhanced metadata
metadata:
  verbose: true
  labels:
    plugin: zipkin
    source: distributed_tracing
    environment: production
    cluster: main
  
  # Span metadata mapping
  include_span_metadata: true
  metadata_prefix: "span_"
  
  # Custom field mapping
  field_mapping:
    trace_id: "trace_id"
    span_id: "span_id"
    parent_span_id: "parent_span_id"
    operation_name: "operation"
    service_name: "service"
    span_kind: "span_kind"
    duration: "duration_us"

# Resource limits
limits:
  max_concurrent_requests: 500
  memory_limit: "1GB"
  cpu_limit: "4"
  
  # Span limits per service
  service_limits:
    high_volume_service: 50000
    normal_service: 10000
    low_priority_service: 1000

# Health monitoring
health:
  check_interval: 30s
  max_processing_errors: 100
  alert_on_high_latency: true
  latency_threshold: "100ms"
  stats_collection: true

Usage Examples

Microservices Tracing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# Microservices distributed tracing
server:
  bind_addr: "0.0.0.0:9411"

filters:
  allowed_services:
    - "api-gateway"
    - "user-service"
    - "product-service"
    - "order-service"
    - "payment-service"

processing:
  auto_detect_levels: true
  enrich_spans: true

metadata:
  labels:
    architecture: microservices
    environment: production

Error and Performance Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Focus on errors and slow operations
filters:
  # Include errors and slow operations
  min_duration: 5000000  # 5 seconds
  
  tag_filters:
    "http.status_code": "^[45]\\d{2}$"  # 4xx/5xx errors
  
  include_warnings: true

processing:
  level_detection:
    error_indicators:
      - "error=true"
      - "exception"
      - "http.status_code=5xx"
    warning_indicators:
      - "http.status_code=4xx"
      - "slow_query=true"

metadata:
  labels:
    monitoring_type: error_tracking
    focus: performance

High-Volume Production

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# High-throughput production environment
server:
  max_concurrent_requests: 1000
  max_spans_per_request: 50000

performance:
  batch_size: 1000
  flush_interval: 5s
  worker_threads: 20
  max_spans_per_second: 50000

limits:
  memory_limit: "2GB"
  cpu_limit: "8"

metadata:
  labels:
    volume: high
    optimization: throughput

Client Integration

Java (Spring Boot + Sleuth)

1
2
3
4
5
6
7
8
9
# application.yml
spring:
  sleuth:
    zipkin:
      base-url: http://logflux-agent:9411
      sender:
        type: web
    sampler:
      probability: 1.0  # 100% sampling for development
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Custom span creation
@Component
public class PaymentService {
    
    @Autowired
    private Tracer tracer;
    
    public void processPayment(String userId, Double amount) {
        Span span = tracer.nextSpan()
            .name("payment-processing")
            .tag("user.id", userId)
            .tag("payment.amount", amount.toString())
            .start();
        
        try (Tracer.SpanInScope ws = tracer.withSpanInScope(span)) {
            // Payment processing logic
            span.tag("payment.status", "success");
        } catch (Exception e) {
            span.tag("error", "true");
            span.tag("error.message", e.getMessage());
            throw e;
        } finally {
            span.end();
        }
    }
}

Python

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Python with py_zipkin
from py_zipkin.zipkin import zipkin_span
import requests

@zipkin_span(service_name='user-service', span_name='get_user')
def get_user(user_id):
    zipkin_attrs = {
        'service_name': 'user-service',
        'span_name': 'get_user',
        'zipkin_attrs': None,
        'transport_handler': http_transport,
        'port': 9411,
        'sample_rate': 100,  # 100% sampling
    }
    
    with zipkin_span(**zipkin_attrs) as span:
        span.update_binary_annotations({'user.id': user_id})
        
        # Business logic
        user = fetch_user_from_db(user_id)
        
        span.update_binary_annotations({
            'user.found': str(user is not None),
            'db.query_time': str(query_duration)
        })
        
        return user

def http_transport(encoded_span):
    requests.post(
        'http://logflux-agent:9411/api/v2/spans',
        data=encoded_span,
        headers={'Content-Type': 'application/json'}
    )

Go

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
package main

import (
    "context"
    "net/http"
    
    "github.com/openzipkin/zipkin-go"
    zipkinhttp "github.com/openzipkin/zipkin-go/middleware/http"
    "github.com/openzipkin/zipkin-go/reporter"
    httpreporter "github.com/openzipkin/zipkin-go/reporter/http"
)

func main() {
    // Create HTTP reporter
    rep := httpreporter.NewReporter("http://logflux-agent:9411/api/v2/spans")
    defer rep.Close()
    
    // Create tracer
    tracer, err := zipkin.NewTracer(
        rep,
        zipkin.WithLocalEndpoint(&zipkin.Endpoint{
            ServiceName: "order-service",
            Port:        8080,
        }),
        zipkin.WithSampler(zipkin.NewModuloSampler(1)), // 100% sampling
    )
    if err != nil {
        panic(err)
    }
    
    // Instrument HTTP server
    http.Handle("/orders", zipkinhttp.NewServerMiddleware(
        tracer,
        zipkinhttp.TagResponseSize(true),
        zipkinhttp.SpanName("create_order"),
    )(http.HandlerFunc(createOrderHandler)))
    
    http.ListenAndServe(":8080", nil)
}

func createOrderHandler(w http.ResponseWriter, r *http.Request) {
    // Create child span
    span, ctx := tracer.StartSpanFromContext(r.Context(), "validate_order")
    defer span.Finish()
    
    // Add custom tags
    span.Tag("order.type", "online")
    span.Tag("user.id", r.Header.Get("User-ID"))
    
    // Business logic with tracing
    if err := validateOrder(ctx); err != nil {
        span.Tag("error", "true")
        span.Tag("error.message", err.Error())
        http.Error(w, err.Error(), http.StatusBadRequest)
        return
    }
    
    w.WriteHeader(http.StatusCreated)
    w.Write([]byte("Order created successfully"))
}

Node.js

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
const { Tracer, BatchRecorder, jsonEncoder } = require('zipkin');
const { HttpLogger } = require('zipkin-transport-http');
const zipkinMiddleware = require('zipkin-instrumentation-express').expressMiddleware;

// Configure Zipkin tracer
const recorder = new BatchRecorder({
  logger: new HttpLogger({
    endpoint: 'http://logflux-agent:9411/api/v2/spans',
    jsonEncoder: jsonEncoder.JSON_V2
  })
});

const tracer = new Tracer({
  ctxImpl: ctxImpl,
  recorder: recorder,
  localServiceName: 'notification-service',
  supportsJoin: false
});

const app = express();

// Add Zipkin middleware
app.use(zipkinMiddleware({
  tracer,
  serviceName: 'notification-service'
}));

app.post('/notify', (req, res) => {
  // Manual span creation
  tracer.scoped(() => {
    const span = tracer.createChildSpan();
    span.setName('send_notification');
    span.putTag('notification.type', req.body.type);
    span.putTag('user.id', req.body.userId);
    
    try {
      // Send notification logic
      sendNotification(req.body);
      span.putTag('notification.status', 'sent');
    } catch (error) {
      span.putTag('error', 'true');
      span.putTag('error.message', error.message);
      throw error;
    } finally {
      span.finish();
    }
  });
  
  res.json({ status: 'sent' });
});

Span Data Processing

Zipkin Span Format

Input Zipkin Span:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
{
  "traceId": "1234567890abcdef1234567890abcdef",
  "id": "abcdef1234567890",
  "parentId": "fedcba0987654321",
  "name": "payment-processing",
  "timestamp": 1642694450123456,
  "duration": 125000,
  "kind": "SERVER",
  "localEndpoint": {
    "serviceName": "payment-service",
    "ipv4": "10.0.0.5",
    "port": 8080
  },
  "remoteEndpoint": {
    "serviceName": "api-gateway",
    "ipv4": "10.0.0.1",
    "port": 80
  },
  "tags": {
    "http.method": "POST",
    "http.url": "/api/payments",
    "http.status_code": "200",
    "user.id": "user123",
    "payment.amount": "99.99"
  },
  "annotations": [
    {
      "timestamp": 1642694450148456,
      "value": "payment.validated"
    }
  ]
}

Output LogFlux Trace Entry:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
{
  "timestamp": "2024-01-20T14:30:50.123Z",
  "level": "info",
  "message": "payment-processing",
  "node": "payment-service",
  "type": "trace",
  "metadata": {
    "source_type": "plugin",
    "source_name": "zipkin",
    "trace_id": "1234567890abcdef1234567890abcdef",
    "span_id": "abcdef1234567890",
    "parent_span_id": "fedcba0987654321",
    "operation": "payment-processing",
    "service_name": "payment-service",
    "span_kind": "SERVER",
    "duration_us": 125000,
    "local_endpoint": "10.0.0.5:8080",
    "remote_endpoint": "10.0.0.1:80",
    "tag_http.method": "POST",
    "tag_http.url": "/api/payments",
    "tag_http.status_code": "200",
    "tag_user.id": "user123",
    "tag_payment.amount": "99.99",
    "annotations": [
      {
        "timestamp": "2024-01-20T14:30:50.148Z",
        "value": "payment.validated"
      }
    ],
    "plugin": "zipkin",
    "environment": "production"
  }
}

Span Kind Mapping

Zipkin Kind Description Use Case
CLIENT Outbound request HTTP client, database query, external API call
SERVER Inbound request HTTP server, message handler, RPC server
PRODUCER Message sent Kafka producer, queue publisher
CONSUMER Message received Kafka consumer, queue subscriber

Level Detection

The plugin automatically maps span characteristics to log levels:

Condition Level Description
error=true tag Error Explicit error indication
HTTP 5xx status Error Server errors
Exception in tags Error Exception or error occurred
HTTP 4xx status Warning Client errors
warning=true tag Warning Explicit warning indication
Slow operation Warning Duration exceeds threshold
Default Info Normal operation

Performance Optimization

High-Throughput Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Optimize for maximum throughput
server:
  max_concurrent_requests: 2000
  max_spans_per_request: 100000

performance:
  batch_size: 2000
  flush_interval: 3s
  worker_threads: 50
  max_spans_per_second: 100000

limits:
  memory_limit: "4GB"
  cpu_limit: "16"

Low-Latency Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Optimize for minimal latency
performance:
  batch_size: 10
  flush_interval: 100ms
  worker_threads: 5

batch:
  enabled: false  # Disable batching for immediate processing

server:
  read_timeout: "1s"
  write_timeout: "500ms"

Memory-Constrained Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Optimize for limited memory
performance:
  batch_size: 50
  span_buffer_size: 1000
  max_memory: "128MB"

limits:
  memory_limit: "256MB"
  max_spans_per_second: 5000

server:
  max_concurrent_requests: 50
  max_spans_per_request: 1000

API Endpoints

Health Check

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# Health check endpoint
curl http://localhost:9411/health

# Response
{
  "status": "UP",
  "zipkin": {
    "status": "UP",
    "spans_received": 125847,
    "uptime": "2h15m30s"
  }
}

Spans Endpoint (v2)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Send spans to v2 API
curl -X POST http://localhost:9411/api/v2/spans \
  -H "Content-Type: application/json" \
  -d '[{
    "traceId": "1234567890abcdef",
    "id": "abcdef1234567890",
    "name": "test-span",
    "timestamp": 1642694450123456,
    "duration": 10000,
    "localEndpoint": {
      "serviceName": "test-service"
    }
  }]'

Legacy v1 Support

1
2
3
4
5
6
7
8
9
# Legacy v1 API support
curl -X POST http://localhost:9411/api/v1/spans \
  -H "Content-Type: application/json" \
  -d '[{
    "traceId": "1234567890abcdef",
    "id": "abcdef1234567890",
    "name": "test-span",
    "annotations": []
  }]'

Monitoring and Alerting

Plugin Health Monitoring

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
#!/bin/bash
# check-zipkin-plugin.sh

if ! systemctl is-active --quiet logflux-zipkin; then
    echo "CRITICAL: LogFlux Zipkin plugin is not running"
    exit 2
fi

# Check HTTP endpoint
if ! curl -s http://localhost:9411/health | grep -q '"status":"UP"'; then
    echo "CRITICAL: Zipkin endpoint not responding"
    exit 2
fi

# Check recent span processing
if ! journalctl -u logflux-zipkin --since="10 minutes ago" | grep -q "spans processed"; then
    echo "WARNING: No spans processed in last 10 minutes"
    exit 1
fi

echo "OK: LogFlux Zipkin plugin is healthy"
exit 0

Performance Metrics

1
2
3
4
5
6
7
8
# Check span processing rate
curl -s http://localhost:9411/health | jq '.zipkin.spans_received'

# Monitor plugin resource usage
ps aux | grep logflux-zipkin

# Check HTTP request metrics
netstat -an | grep :9411 | wc -l

Alerting Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Grafana/Prometheus alerting
- alert: ZipkinHighErrorRate
  expr: rate(zipkin_spans_errors_total[5m]) > 0.1
  labels:
    severity: warning
  annotations:
    summary: "High error rate in Zipkin spans"
    
- alert: ZipkinHighLatency
  expr: zipkin_span_processing_duration_seconds > 0.1
  labels:
    severity: critical
  annotations:
    summary: "High span processing latency"

Common Use Cases

Service Mesh Integration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Istio/Envoy integration
filters:
  allowed_services:
    - "productpage"
    - "details"
    - "reviews"
    - "ratings"
  
  # Focus on service mesh spans
  required_tags:
    - "istio.mesh_id"
    - "envoy.cluster_name"

metadata:
  labels:
    platform: service_mesh
    mesh_type: istio

Database Query Tracing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Database operation monitoring
filters:
  allowed_operations:
    - "db.query"
    - "db.insert"
    - "db.update"
    - "db.delete"
  
  # Slow query detection
  min_duration: 10000  # 10ms
  
  tag_filters:
    "db.type": "(mysql|postgresql|mongodb)"

processing:
  level_detection:
    warning_indicators:
      - "slow_query=true"
      - "duration>100000"  # >100ms

metadata:
  labels:
    monitoring_type: database
    focus: performance

API Gateway Tracing

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# API gateway and routing
filters:
  allowed_services:
    - "api-gateway"
    - "auth-service"
  
  required_tags:
    - "http.method"
    - "http.url"
    - "http.status_code"

processing:
  level_detection:
    error_indicators:
      - "http.status_code=5xx"
    warning_indicators:
      - "http.status_code=4xx"

metadata:
  labels:
    service_type: api_gateway
    monitoring: traffic

Troubleshooting

Common Issues

Plugin Won’t Start:

1
2
3
4
5
6
# Check port availability
netstat -an | grep :9411

# Check systemd service
systemctl status logflux-zipkin
journalctl -u logflux-zipkin -f

Spans Not Appearing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Test endpoint connectivity
curl http://localhost:9411/health

# Send test span
curl -X POST http://localhost:9411/api/v2/spans \
  -H "Content-Type: application/json" \
  -d '[{"traceId":"test","id":"test","name":"test"}]'

# Check agent connectivity
ls -la /tmp/logflux-agent.sock

High Memory Usage:

1
2
3
4
5
6
7
8
9
# Reduce memory consumption
performance:
  batch_size: 100
  span_buffer_size: 5000
  max_memory: "256MB"

limits:
  memory_limit: "512MB"
  max_spans_per_second: 10000

Connection Timeouts:

1
2
3
4
5
6
# Adjust timeout settings
server:
  read_timeout: "60s"
  write_timeout: "30s"
  idle_timeout: "300s"
  keep_alive_timeout: "60s"

Debugging

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Enable verbose logging
export LOGFLUX_LOG_LEVEL=debug
logflux-zipkin -config /etc/logflux-agent/plugins/zipkin.yaml -verbose

# Monitor span processing
journalctl -u logflux-zipkin -f

# Test with curl
curl -v -X POST http://localhost:9411/api/v2/spans \
  -H "Content-Type: application/json" \
  -d '[{"traceId":"debug","id":"debug","name":"debug-span"}]'

Best Practices

Configuration Management

  1. Use appropriate batch sizes based on trace volume
  2. Configure service filtering to focus on important services
  3. Set reasonable timeouts for your network environment
  4. Monitor resource usage and adjust limits accordingly

Instrumentation

  1. Add meaningful span names that describe operations
  2. Use consistent tag naming across services
  3. Include error information in span tags when failures occur
  4. Sample appropriately based on traffic volume

Performance

  1. Batch spans efficiently to balance latency and throughput
  2. Filter noisy operations like health checks and metrics
  3. Use appropriate sampling rates to manage volume
  4. Monitor processing latency and adjust worker threads

Security

  1. Avoid sensitive data in span names and tags
  2. Use network security to protect the Zipkin endpoint
  3. Implement proper firewall rules for production environments
  4. Monitor for unusual trace patterns that might indicate issues

Migration from Zipkin Server

Configuration Changes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Before (Zipkin Server)
# No configuration needed - just point clients to zipkin-server:9411

# After (LogFlux Zipkin Plugin)
server:
  bind_addr: "0.0.0.0:9411"  # Same port, same API

# Optional enhancements
filters:
  # Filter out noise that Zipkin Server couldn't handle
  denied_operations: ["health-check", "ping"]

metadata:
  labels:
    migrated_from: zipkin_server

Client Changes

No client-side changes required! The plugin provides full API compatibility:

  • Same endpoints: /api/v2/spans, /api/v1/spans
  • Same data format: Standard Zipkin JSON
  • Same port: 9411 (configurable)
  • Same behavior: Batch processing, HTTP API

Query Migration

1
2
3
4
5
6
# Before: Zipkin UI queries
http://zipkin-server:9411/zipkin/traces?serviceName=user-service

# After: LogFlux API queries  
curl "http://logflux-api/search" \
  -d '{"query": "service_name:user-service AND type:trace"}'

Disclaimer

Zipkin and the Zipkin logo are trademarks of The Linux Foundation. LogFlux is not affiliated with, endorsed by, or sponsored by The Linux Foundation or the Zipkin project. The Zipkin logo is used solely for identification purposes to indicate compatibility with the Zipkin distributed tracing system.

Next Steps