Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoggingHandler: logger.exception fails to serialize exceptions as expected #4409

Open
nhubbard opened this issue Feb 3, 2025 · 0 comments
Open
Labels
bug Something isn't working

Comments

@nhubbard
Copy link

nhubbard commented Feb 3, 2025

Describe your environment

OS: Debian Bookworm (from python:3.12-slim Docker image)
Python version: 3.12.8
SDK version: 1.29.0
API version: 1.29.0

What happened?

We are trying to log exceptions from Pika using the Logs interface that's still in beta. When we get an AMQPConnectionError (or other exception), our code does the following:

try:
  # ...
except AMQPConnectionError as e:
  logger.exception(e)
  # other code, including a trace session

where logger is a standard Python logger that has a LoggingHandler added to it.

When such an error occurs, we instead get an exception related to failure to serialize the exception.

Steps to Reproduce

Install the following dependencies:

pika==1.3.2
opentelemetry-api==1.29.0
opentelemetry-sdk==1.29.0
opentelemetry-instrumentation-logging

Create a Python script, call it whatever you'd like. You don't have to spin up a collector or RabbitMQ instance to reproduce the bug.

import os
import pika
import logging

from opentelemetry import _logs as logs
from opentelemetry.exporter.otlp.proto.grpc._log_exporter import OTLPLogExporter
from opentelemetry.sdk._logs.export import ConsoleLogExporter, BatchLogRecordProcessor
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk.resources import Resource


# The function we use to create our LoggingHandler
def create_logger(name: str, console: bool = False) -> LoggingHandler:
    log_provider = LoggerProvider(
        resource=Resource.create(
            {"service.name": name, "service.instance.id": os.uname().nodename}
        )
    )
    logs.set_logger_provider(log_provider)

    # Create OTLP collector log exporter
    # Example URL we use in our Docker setup, feel free to change as necessary
    otlp_exporter = OTLPLogExporter(endpoint="http://collector:4317", insecure=True)
    log_provider.add_log_record_processor(BatchLogRecordProcessor(otlp_exporter))

    # Create console log exporter
    if console:
        console_exporter = ConsoleLogExporter()
        log_provider.add_log_record_processor(BatchLogRecordProcessor(console_exporter))

    return LoggingHandler(level=logging.DEBUG, logger_provider=log_provider)


logger = logging.getLogger(__name__)
logger.addHandler(create_logger(__name__, console=True))

try:
    # Don't start a RabbitMQ server for this issue -- it's easiest to reproduce the error
    # when the connection fails!
    connection = pika.BlockingConnection(
        pika.ConnectionParameters(
            host="localhost:5672",
            heartbeat=600,
            blocked_connection_timeout=300,
        )
    )
    channel = connection.channel()
    channel.queue_declare(queue="example_queue", durable=True)
    channel.basic_qos(prefetch_count=1)
except Exception as e:
    logger.error(f"Failed to initialize Pika connection: {str(e)}")
    # This line triggers the issue
    logger.exception(e)

Expected Result

We were thinking that if the serialization doesn't pass, it could either ignore the log entry, or convert the exception to a string, possibly using traceback.format_exc. Instead, it throws an exception; I'm thinking that it could be an issue to discourage people from using logger.exception because OTel can't serialize the exception.

Actual Result

2025-02-03 11:06:58 Exception: Invalid type <class 'pika.exceptions.AMQPConnectionError'> of value 
2025-02-03 11:07:03 Exception while exporting logs.
2025-02-03 11:07:03 Traceback (most recent call last):
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/sdk/_logs/_internal/export/__init__.py", line 307, in _export_batch
2025-02-03 11:07:03     self._exporter.export(self._log_records[:idx])  # type: ignore
2025-02-03 11:07:03     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 111, in export
2025-02-03 11:07:03     return self._export(batch)
2025-02-03 11:07:03            ^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 300, in _export
2025-02-03 11:07:03     request=self._translate_data(data),
2025-02-03 11:07:03             ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/grpc/_log_exporter/__init__.py", line 108, in _translate_data
2025-02-03 11:07:03     return encode_logs(data)
2025-02-03 11:07:03            ^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 37, in encode_logs
2025-02-03 11:07:03     return ExportLogsServiceRequest(resource_logs=_encode_resource_logs(batch))
2025-02-03 11:07:03                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 72, in _encode_resource_logs
2025-02-03 11:07:03     pb2_log = _encode_log(sdk_log)
2025-02-03 11:07:03               ^^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/_log_encoder/__init__.py", line 58, in _encode_log
2025-02-03 11:07:03     body=_encode_value(body) if body is not None else None,
2025-02-03 11:07:03          ^^^^^^^^^^^^^^^^^^^
2025-02-03 11:07:03   File "/usr/local/lib/python3.12/site-packages/opentelemetry/exporter/otlp/proto/common/_internal/__init__.py", line 90, in _encode_value
2025-02-03 11:07:03     raise Exception(f"Invalid type {type(value)} of value {value}")
2025-02-03 11:07:03 Exception: Invalid type <class 'pika.exceptions.AMQPConnectionError'> of value 

Note that you're more likely to get a socket.gaierror as the invalid type, but the traceback is still identical.

Workaround: inside the except block, use traceback.format_exc() to get a string representation of the traceback, and log that at the error level instead.

Additional context

The example should work just fine; I tested it on my machine. Hopefully it works for you as well.

We also record the exception using a Tracer span, but it has proven easier to extract exceptions from the logs than the tracer using Grafana, so until we figure out how to use Tempo to get exceptions, we've decided to do both logs and traces for capturing exceptions.

Would you like to implement a fix?

I'm not familiar with the inner workings of OTel, or how the specification defines such behavior, so I don't think I would be the right person to make a fix.

@nhubbard nhubbard added the bug Something isn't working label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant