Observation tools for understanding occasionally-slow performance in large-scale distributed transaction systems are not keeping up with the complexity of the environment. The same applies to large database systems, to real-time control systems in cars and airplanes, and to operating system design.
Extremely low-overhead tracing can reveal the true execution and non-execution (waiting) dynamics of such software, running in situ with live traffic. KUtrace is such a tool, based on small Linux kernel patches recording and timestamping every transition between kernel- and user-mode execution across all CPUs of a datacenter or vehicle computer. The resulting displays show exactly what each transaction is doing every nanosecond, and hence shows why unpredictable ones are slow, all with tracing overhead well under 1%. Recent additions to KUtrace also show interference between programs and show profiles within long execution stretches that have no transitions.
The net result is deep insight into the dynamics of complex software, leading to often-simple changes to improve performance.
Richard L. Sites wrote his first computer program in 1959 and has spent most of his career at the boundary between hardware and software, with a particular interest in CPU/ software performance. He past work includes VAX microcode, DEC Alpha co-architect, and inventing the performance counters found in nearly all processors today. He has done low-overhead microcode and software tracing at DEC, Adobe, Google, and Tesla. Dr. Sites earned his PhD at Stanford in 1974; he holds 66 patents and is a member of the National Academy of Engineering.