DeepTrace

Yantao Geng*, Han Zhang*, Zhiheng Wu

University of Tsinghua | ACM SIGCOMM 2025

DeepTrace

DeepTrace is a non-intrusive, distributed tracing framework based on transaction analysis for request correlation, designed to address the challenges of monitoring and diagnosing large-scale microservice systems in production environments.

Core Features

  • No need to modify application code—efficient and accurate end-to-end request tracing is achieved solely through host-side transaction analysis.
  • Multi-protocol compatibility: Supports various application-layer protocols.
  • Precise tracing: High-accuracy request correlation via transaction analysis.
  • Efficient operation with low performance overhead: Suitable for high-concurrency scenarios.
Low performance overhead | Low memory usage

Practical Applications

DeepTrace has been successfully deployed in production environments of dozens of companies for the following tasks:

  • Fault diagnosis
  • Resource optimization
  • Performance analysis
  • Security auditing

In high-concurrency scenarios, DeepTrace maintains a tracing accuracy of over 95%.

Supports large-scale distributed systems | Suitable for high-concurrency scenarios

Cross Protocol Tracing

DeepTrace achieves non-intrusive cross-protocol Span construction. It leverages eBPF technology to non-invasively capture raw network data at the system call layer。

Based on analysis of RFCs and open-source implementations, DeepTrace designs protocol templates (supporting over 20 mainstream application-layer protocols) that include protocol type inference and content partitioning.

Content partitioning utilizes built-in length fields in protocols for "skip parsing," efficiently and accurately segmenting request boundaries

Protocol Template
Skip Parsing
Request Partitioning
Cross-protocol tracing visualization

High-Precision Request Correlation

The core innovation of DeepTrace lies in its transaction-based Span correlation mechanism, addressing the accuracy limitations of traditional non-intrusive solutions under high concurrency.

It infers causal relationships between requests by analyzing stateful request content, including API call relationships and transaction fields within request payloads. At the same time, it combines the probability distributions of multiple Span metrics (such as start/end time latency, request size, and duration) to accurately evaluate the parent-child relationship probability between requests from multiple dimensions.

Even under high concurrency, DeepTrace maintains high tracing accuracy.

Transaction Analysis
Causality
API Call
Multi-dimensional Metrics
Probability Distribution
High Precision
High-precision request correlation

Lightweight On-Demand Collection

To significantly reduce the transmission and storage overhead of massive Spans, DeepTrace adopts a query-driven on-demand tracing assembly strategy. The Agent temporarily stores Spans in memory and builds dual indexes:

1) An inverted index for each tag type (e.g., hostname, service name);

2) A histogram (bucketed by value) for each metric (e.g., duration) to reduce cache overhead. This design avoids redundant storage for identical tags and similar metric values.

On-Demand Collection
Dual Index
Inverted Index
Histogram
Query-Driven
Low Overhead
Lightweight on-demand collection