Skip to content

Haystack should not configure root logger handlers #8681

@CSRessel

Description

@CSRessel

Describe the bug
Any application that imports this library cannot expect their own configuration of the Python root logger to be respected, because this library adds to the root logger's list of handlers.

This issue occurred previously in #2485 and #4202

Expected behavior
An application using this library should be able to import haystack and then use logging.basicConfig() as normal.

Additional context
This issue was introduced here

This is an issue because logging.basicConfig() is ignored once any handlers are configured. At a bare minimum, it is reasonable to expect all libraries make no modifications to the root handler. The quickest fix is to edit line 89 so as to only add the handler onto the subloggers that will be used throughout the library:

    haystack_logger = logging.getLogger("haystack")  # only add our handler within our library's hierarchy

    # avoid adding our handler twice
    old_handlers = [
        h for h in haystack_logger.handlers
        if (isinstance(h, logging.StreamHandler) and h.name == "HaystackLoggingHandler")
    ]
    for old_handler in old_handlers:
        haystack_logger.removeHandler(old_handler)

    haystack_logger.addHandler(handler)

    # or more succinctly, only add if not already present
    # if not old_handlers:
    #     haystack_logger.addHandler(handler)

However, it is also generally expected that the application and not the library is the arbiter of all log handlers, as recommended in the python docs' Logging Cookbook. This would mean it is unusual for any library to implicitly add a log handler -- it is the application developer who knows best what log formats they need.

I agree that providing recommended overrides can be very convenient; one route would be to export a factory for the provided handler so that the consuming application can easily opt-in to this feature:

from haystack.logging import configure_logging_handler  # function to create the HaystackLoggingHandler
logging.getLogger().addHandler(configure_logging_handler())  # app dev can choose to add at the root, at the haystack level, or not at all

Quick blog post summary of developer expectations on this topic: http://rednafi.com/python/no_hijack_root_logger/

To Reproduce
Minimal repro:

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

import logging
import pandas as pd
logging.basicConfig(level=logging.CRITICAL)

document_store = InMemoryDocumentStore()
document_store.write_documents([
    # still prints a warning, because of the logging.getLogger().root changes within haystack
    Document(
        content="My name is Jean and I live in Paris.",
        dataframe=pd.DataFrame({"name": ["Jean"], "city": ["Paris"]}),
    ),
])

FAQ Check

System:

  • OS: Ubuntu 22.04
  • GPU/CPU: N/A
  • Haystack version (commit or version number): 2.7.0 in my testing, up to present
  • DocumentStore: N/A
  • Reader: N/A
  • Retriever: N/A

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority, add to the next sprint if no P1 available

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions