Application-Level Tracing: The Good, the Bad, and the Alternative

Traditional app-level tracing was good up until a certain level. For complex troubleshooting, live debugging is taking the first seat in modern observability.

Eran Kinsbruner

Jul. 29, 24 · Opinion

Likes (3)

Comment

Save

2.0K Views

The Sailormen's Legacy: A "Trace" of the Past

By the late 16th century, sailors were measuring speed with a chip log — a method crucial for navigation. Knots were tied at regular intervals in a rope, with one end attached to a pie-slice-shaped piece of wood ("chip") and flung behind the vessel. The rope played freely as the ship traveled for a fixed amount of time, which was measured using an hourglass. The number of knots that passed over the stern was counted to determine the ship's speed, with one knot equaling one nautical mile an hour. Thus, a ship going at 15 knots could traverse 15 nautical miles in an hour. Records of these measurements were kept in a logbook, which was used to calculate the ship's speed over a given period. This logbook was also used to record the ship's course, weather conditions, and other relevant information. The term "log" was derived from this practice, and it has since been used in various contexts to refer to a record of events or activities.

In software engineering, the principle of logging has been refined and expanded to include “traces." In software development, tracing represents a dynamic method of observing, debugging, and recording activities within an application, providing a more detailed, sometimes interactive, insight than traditional logging.

Unlike the static nature of logbooks used by sailors, application-level tracing involves real-time collection of data about a program's execution. This technique is vital for debugging and optimizing software, allowing developers to follow the flow of execution and understand the state of a program at different points in time. Tracing can help identify performance bottlenecks, detect errors, and analyze the behavior of complex systems.

The Trace of the Matter: What Is Trace-Level Logging?

Some of the most common logging levels are "Debug," "Info," "Warn," "Error," and "Fatal." Based on the severity of the message, developers can choose the appropriate level to log. For example, a "Debug" message might be used to log detailed information about the program's execution, while an "Error" message might be used to log a critical failure. However, there are cases where developers need more detailed information about the program's execution, which is where "Trace" logging comes in. While there's no real standard definition for the "Trace" level, Apache Log4j's official documentation, for instance, defines the "Trace" level as follows:

Fine-grained debug message, typically capturing the flow through the application.

It's true that this level is not as common as the others, but as far as debugging is concerned, it can be a game-changer. Here is a brief overview of the most common logging levels to better position "Trace" in the hierarchy:

Level	Description
Error	Critical errors that require immediate attention
Warn	Other errors that are not critical
Info	Informational messages
Debug	Detailed information for debugging purposes
Trace	Very detailed information for lower-level debugging

All of these log levels are essential for creating a multi-level logging system that can be used in different scenarios and environments. Typically, a developer would use the "Debug" level for general debugging purposes, while the "Trace" level would be used for a deeper level of debugging that requires sophisticated analysis of the program's execution.

Application-Level Tracing in Practice

For the sake of this article, we use Python to demonstrate how developers can use traces in their applications.

Unlike other programming languages like Java, Python does not have built-in support for trace-level logging. However, developers can create their own levels of logging since the built-in logging module is flexible enough to allow for custom levels. However, this is not our focus here. Instead, we will focus more on what kind of logs should be added at the trace level. The developer is free to decide; they can either log more details like in the debug level but with more details, or they can log the execution of the program at a very granular level. The choice depends on the developer's needs, but typically, logging at the trace level is used to capture detailed execution traces, the state of variables, and the flow of the program.

Note that, theoretically, trace-level logging and logging traces are not exactly the same concept. One is a logging level, and the other is a technique to profile the program's execution. However, in our case, we will be logging traces at the trace-level log, which is a common practice. Therefore, we will use the terms "trace-level logging" and "application-level tracing" interchangeably to refer to the practice of logging execution traces.

To demonstrate how trace-level logging can be used in Python and what kind of logs should be added at this level, we can use loguru, a Python logging library that provides a more user-friendly interface than the built-in logging module. Here is an example that demonstrates how a nested function call can be traced using loguru:

    Python
   
 

   import sys
import traceback
from loguru import logger
from itertools import takewhile

# Custom formatter for Loguru to include stack trace in logs without Loguru's internal frames
def tracing_formatter(record):
    frames = takewhile(lambda f: "/loguru/" not in f.filename,
                       traceback.extract_stack())
    stack = " > ".join("{}:{}:{}".format(
        f.filename, f.name, f.lineno) for f in frames)
    record["extra"]["stack"] = stack
    return "{time} | {level} | {extra[stack]} - {message}\n{exception}"

# Configure logger
logger.add(sys.stderr, format=tracing_formatter,
           backtrace=True, diagnose=True, level="TRACE")

def function_level_3():
    logger.trace("Entering Level 3")
    # Simulate some processing
    res = 1
    logger.trace("Exiting Level 3 with result: {}".format(res))
    return res

def function_level_2():
    logger.trace("Entering Level 2")
    res = function_level_3()
    logger.trace("Exiting Level 2 with result: {}".format(res))
    return res

def function_level_1():
    logger.trace("Entering Level 1")
    res = function_level_2()
    logger.trace("Exiting Level 1 with result: {}".format(res))
    return res

def main():
    logger.trace("Program started")
    res = function_level_1()
    logger.trace("Program finished with result: {}".format(res))
    return res

if __name__ == '__main__':
    while True:
        res = main()
        input("Press Enter to run again")
  

After executing the code above, the trace logs will be displayed in the console, showing the path of execution through the nested function calls and the state of the variables at different points in time. Here is an example of the trace logs generated by the code above:

    Shell
   
 

   2024-07-06T10:16:57.293111+0200 | TRACE | app.py:<module>:46 > app.py:main:39 - Program started
2024-07-06T10:16:57.293519+0200 | TRACE | app.py:<module>:46 > app.py:main:40 > app.py:function_level_1:33 - Entering Level 1
2024-07-06T10:16:57.293592+0200 | TRACE | app.py:<module>:46 > app.py:main:40 > app.py:function_level_1:34 > app.py:function_level_2:27 - Entering Level 2
2024-07-06T10:16:57.293642+0200 | TRACE | app.py:<module>:46 > app.py:main:40 > app.py:function_level_1:34 > app.py:function_level_2:28 > app.py:function_level_3:20 - Entering Level 3
2024-07-06T10:16:57.293690+0200 | TRACE | app.py:<module>:46 > app.py:main:40 > app.py:function_level_1:34 > app.py:function_level_2:28 > app.py:function_level_3:23 - Exiting Level 3 with result: 1
2024-07-06T10:16:57.293737+0200 | TRACE | app.py:<module>:46 > app.py:main:40 > app.py:function_level_1:34 > app.py:function_level_2:29 - Exiting Level 2 with result: 1
2024-07-06T10:16:57.293781+0200 | TRACE | app.py:<module>:46 > app.py:main:40 > app.py:function_level_1:35 - Exiting Level 1 with result: 1
2024-07-06T10:16:57.293824+0200 | TRACE | app.py:<module>:46 > app.py:main:41 - Program finished with result: 1
Press Enter to run again
  

There are other tools that can be used to trace the execution of a Python program with a lower level of granularity, such as the built-in trace module. The same module (trace) can be used programmatically to trace the program's execution and generate logs that developers can add to their code with logger.debug() (or logger.trace() after defining the custom trace-level logger). To better understand how to use this module, let's consider the following code that simulates an inventory management system. The code allows users to add, view, update, and delete items from the inventory.

    Python
   
 

   """
filename: app.py
usage: python app.py
description: Inventory management system
"""

import json

# Load inventory from file
def load_inventory(filename):
    try:
        with open(filename, 'r') as file:
            return json.load(file)
    except FileNotFoundError:
        return {}

# Save inventory to file
def save_inventory(inventory, filename):
    with open(filename, 'w') as file:
        json.dump(inventory, file, indent=4)

# Add a new item to the inventory
def add_item(inventory):
    item_name = input("Enter item name: ")
    if item_name in inventory:
        print("Item already exists in the inventory.")
    else:
        quantity = int(input("Enter quantity: "))
        price = float(input("Enter price: "))
        inventory[item_name] = {'quantity': quantity, 'price': price}
        print(f"Item {item_name} added to the inventory.")

# View the inventory
def view_inventory(inventory):
    if not inventory:
        print("The inventory is empty.")
    else:
        for item_name, details in inventory.items():
            print(
                f"Item: {item_name}, Quantity: {details['quantity']}, Price: {details['price']}")

# Update the quantity of an item
def update_quantity(inventory):
    item_name = input("Enter item name: ")
    if item_name in inventory:
        quantity = int(input("Enter new quantity: "))
        inventory[item_name]['quantity'] = quantity
        print(f"Quantity of {item_name} updated to {quantity}.")
    else:
        print("Item not found in the inventory.")

# Delete an item from the inventory
def delete_item(inventory):
    item_name = input("Enter item name: ")
    if item_name in inventory:
        del inventory[item_name]
        print(f"Item {item_name} deleted from the inventory.")
    else:
        print("Item not found in the inventory.")

def main():
    filename = 'inventory.json'
    inventory = load_inventory(filename)
    while True:
        print("\nInventory Management System")
        print("1. Add Item")
        print("2. View Inventory")
        print("3. Update Quantity")
        print("4. Delete Item")
        print("5. Save and Exit")
        choice = input("Enter your choice: ")

        if choice == '1':
            add_item(inventory)
        elif choice == '2':
            view_inventory(inventory)
        elif choice == '3':
            update_quantity(inventory)
        elif choice == '4':
            delete_item(inventory)
        elif choice == '5':
            save_inventory(inventory, filename)
            print("Inventory saved. Exiting...")
            break
        else:
            print("Invalid choice. Please try again.")

if __name__ == "__main__":
    main()
  

Here is an example of how to use the log execution traces using the trace module in Python:

    Shell
   
   python -m trace --trace app.py

After running the command above, the program will start executing, and the trace output will be displayed in the console. This is an example:

    Shell
   
 

    --- modulename: app, funcname: <module>
app.py(1): import json
 --- modulename: _bootstrap, funcname: _find_and_load
<frozen importlib._bootstrap>(988):  --- modulename: _bootstrap, funcname: __init__
[...]
[...]
[...]
[...]
[...]
ap>(112):  --- modulename: _bootstrap, funcname: cb
<frozen importlib._bootstrap>(177): <frozen importlib._bootstrap>(178): <frozen importlib._bootstrap>(182): <frozen importlib._bootstrap>(183): <frozen importlib._bootstrap>(185): app.py(14): def load_inventory(filename):
app.py(24): def save_inventory(inventory, filename):
app.py(31): def add_item(inventory):
app.py(44): def view_inventory(inventory):
app.py(55): def update_quantity(inventory):
app.py(67): def delete_item(inventory):
app.py(76): def main():
app.py(104): if __name__ == "__main__":
app.py(105):     main()
 --- modulename: app, funcname: main
app.py(77):     filename = 'inventory.json'
app.py(78):     inventory = load_inventory(filename)
 --- modulename: app, funcname: load_inventory
app.py(15):     try:
app.py(16):         with open(filename, 'r') as file:
app.py(18):     except FileNotFoundError:
app.py(19):         return {}
app.py(80):         print("\nInventory Management System")

Inventory Management System
app.py(81):         print("1. Add Item")
1. Add Item
app.py(82):         print("2. View Inventory")
1. View Inventory
app.py(83):         print("3. Update Quantity")
1. Update Quantity
app.py(84):         print("4. Delete Item")
1. Delete Item
app.py(85):         print("5. Save and Exit")
1. Save and Exit
app.py(86):         choice = input("Enter your choice: ")
Enter your choice:
  

At this level, we need to enter the choice to see the trace output. We can, for example, choose "1" to add a new item to the inventory. The trace output will show the execution of the add_item function and the input values. For example, for the following trace, we added "Dell Server" as the item name, "2" as the quantity, and "2300" as the price:

    Shell
   
 

   app.py(88):         if choice == '1':
app.py(89):             add_item(inventory)
 --- modulename: app, funcname: add_item
app.py(32):     item_name = input("Enter item name: ")
Enter item name: Dell Server
app.py(33):     if item_name in inventory:
app.py(36):         quantity = int(input("Enter quantity: "))
Enter quantity: 2
app.py(37):         price = float(input("Enter price: "))
Enter price: 2300
app.py(38):         inventory[item_name] = {'quantity': quantity, 'price': price}
app.py(39):         print(f"Item {item_name} added to the inventory.")
Item Dell Server added to the inventory.
app.py(80):         print("\nInventory Management System")

Inventory Management System
app.py(81):         print("1. Add Item")
1. Add Item
app.py(82):         print("2. View Inventory")
2. View Inventory
app.py(83):         print("3. Update Quantity")
3. Update Quantity
app.py(84):         print("4. Delete Item")
4. Delete Item
app.py(85):         print("5. Save and Exit")
5. Save and Exit
  

Then selecting "5" to save and exit, the trace output will show the execution of the save_inventory function and the saving of the inventory to the file:

    Shell
   
 

   app.py(86):         choice = input("Enter your choice: ")
Enter your choice: 5
app.py(88):         if choice == '1':
app.py(90):         elif choice == '2':
app.py(92):         elif choice == '3':
app.py(94):         elif choice == '4':
app.py(96):         elif choice == '5':
app.py(97):             save_inventory(inventory, filename)
 --- modulename: app, funcname: save_inventory
app.py(25):     with open(filename, 'w') as file:
 --- modulename: _bootlocale, funcname: getpreferredencoding
_bootlocale.py(34):             assert not do_setlocale
_bootlocale.py(35):             if sys.flags.utf8_mode:
_bootlocale.py(37):             result = _locale.nl_langinfo(_locale.CODESET)
_bootlocale.py(38):             if not result and sys.platform == 'darwin':
_bootlocale.py(46):             return result
 --- modulename: codecs, funcname: __init__
codecs.py(194):         self.errors = errors
codecs.py(195):         self.buffer = ""
app.py(26):         json.dump(inventory, file, indent=4)
 --- modulename: __init__, funcname: dump
[...]
[...]
[...]
[...]
[...]
 --- modulename: encoder, funcname: _iterencode
 --- modulename: encoder, funcname: _iterencode_dict
encoder.py(409):         yield '}'
__init__.py(180):         fp.write(chunk)
__init__.py(179):     for chunk in iterable:
 --- modulename: encoder, funcname: _iterencode
 --- modulename: encoder, funcname: _iterencode_dict
encoder.py(410):         if markers is not None:
encoder.py(411):             del markers[markerid]
app.py(98):             print("Inventory saved. Exiting...")
Inventory saved. Exiting...
app.py(99):             break
  

To save everything traced in the console to a file, we can redirect the output to a file or use the --count option as follows:

    Shell
   
   python -m trace --count -C . app.py

The Problem With Application-Level Tracing

Tracing is a powerful tool for debugging complex applications, but it has its limitations and drawbacks.

Execution tracing is an indispensable tool for developers when debugging twisted issues, as it can reveal the precise sequence of function calls, the state of variables at different points in time, and the detailed execution of loops and conditionals. For example, a trace log might show that a function processData() was called with specific parameters, followed by logs detailing each step within the function, such as the initialization of variables, entry and exit from loops, and the results of conditional checks. This level of detail allows developers to pinpoint the exact moment and conditions under which an error occurred.

However, due to their detailed nature, these logs can generate a substantial amount of data, which can be overwhelming to sift through and can degrade the performance of the application if not managed properly. For instance, enabling trace logging might cause an application that usually runs with minimal logging overhead to experience significant latency due to the sheer volume of log entries being written. Each log entry involves I/O operations, which can slow down the overall execution, especially if the logs are written to a remote logging server or a slow disk.

Consequently, logging traces is typically reserved for local or development environments where performance impacts are less critical. In these settings, the trade-off between performance and the granularity of information provided by trace logs is acceptable. In production or staging environments, the use of tracing is generally avoided due to the potential for significant performance degradation and the risk of exposing sensitive information. For example, trace logs might inadvertently capture sensitive data such as user credentials or personal information, which, if exposed, could lead to security breaches.

Furthermore, generating traces often requires either modifying the application code or updating the application's startup. This process can be labor-intensive and requires careful planning to ensure that the logs provide useful information without cluttering the codebase. Developers need to strategically place trace statements to capture relevant data without overwhelming the log files. For instance, adding traces within deeply nested loops or frequently called functions could lead to excessive log generation. As a consequence, it becomes hard to identify the relevant traces among the noise of irrelevant log entries.

Indeed, handling and analyzing large volumes of trace logs can be challenging, necessitating the use of filtering and conditional logging techniques to focus on relevant data. Developers often use log levels to control the verbosity of the output, enabling trace logs only when needed. Tools and frameworks that support conditional logging, such as logging frameworks in Python (logging module) or Java (Log4j), allow developers to set conditions under which trace logs are generated, such as specific user actions or error conditions. However, testing different logging configurations and filtering criteria can be time-consuming and may require trial and error to find the optimal settings. Especially since every time you want to make a change to the logging configuration, you need either to restart/reload the application or redeploy it and sometimes both. In many cases, changes to the logging configuration in a remote environment systematically require a redeployment of the application, which can be painfully time-consuming, error-prone, and frustrating.

The Light at the End of the Trace

While tracing can be useful for debugging applications, it often comes with performance trade-offs. Therefore, it's essential to use it wisely. There are modern platforms that offer an alternative that allows developers to capture detailed debugging information without the typical overhead associated with traditional tracing.

Challenges With Traditional Tracing

1. Generating Trace-Level Logs in Production

Trace-level logging is generally avoided in production due to performance concerns. Live debugging platforms address this by allowing developers to add snapshots (virtual breakpoints) to the application code and view the execution trace in real time without restarting the application. This makes it possible to safely trace applications in production environments without new deployments or restarts.

2. Tracing in Local vs. Production Environments

Trace-level logging is often done in local or development environments. However, issues seen locally may not replicate in production. Modern platforms enable tracing in any remote environment, including production and staging, without compromise.

3. Code Modifications for Tracing

Traditional tracing requires modifying the application code to add log statements. Platforms eliminate this need by enabling snapshots without code changes. This reduces debugging time and minimizes the risk of introducing new bugs.

4. Stopping Application Execution for Debugging

Conventional debugging tools may require stopping the application to inspect variables or trace the program's flow.

5. Overwhelming Data from Tracing

Tracing can generate vast amounts of data, making analysis challenging. Advanced platforms offer filtering capabilities, letting developers set conditions to focus on relevant traces.

6. Scalability Issues with Tracing

Scaling application-level tracing can be difficult. Platforms may be designed to scale, enabling multiple developers to trace multiple applications simultaneously. This is crucial for debugging complex, distributed systems.

7. Security Concerns

Security is vital when generating traces. Platforms can provide a secure tracing method, ensuring sensitive information is protected, encrypting data in transit and at rest, and actions like adding snapshots are emulated in dedicated sandboxed environments.

8. Developer-Friendliness

Traditional debugging tools can be cumbersome and hard to use. They often provide too much unnecessary detail and lack a user-friendly interface.

Conclusion: Dynamic Instrumentation and Live Debugging

Platforms available can support multiple programming languages and provide a unified approach to application observability and live debugging. This is especially useful for teams working with diverse technologies, as it allows for dynamic observability and live debugging across various environments, making developer productivity a reality.

Management system application Python (language) Debug code

Opinions expressed by DZone contributors are their own.

Related

Trending