Adding Observability to The Boston Wrongs

DevLog: 2025-05-16 - 2025-05-17

May 17, 2025

Summary:

The recent work on The Boston Wrongs project focused heavily on improving observability and tracing within the application. I spent significant time troubleshooting tracing logic, session management, and ensuring proper propagation of observability context (spans, traces, session IDs) throughout both frontend and backend components. The goal was clear: ensure clean, understandable observability data that accurately represents user interactions and model usage. Despite some initial frustration and confusion over tracing behaviors, I made incremental progress on understanding and refining the observability implementation.

Details:

Over the course of two days, my primary focus was establishing proper observability for the project's backend and frontend components. Using Langfuse as the tracing solution, I aimed to understand and correctly implement spans and traces to monitor interactions with the LLM models and frontend user actions.

Clarifying Tracing Scopes and Nested Spans

A significant part of the work involved understanding how to structure traces and spans effectively. I questioned whether using large-scoped or tightly scoped spans was best practice. Specifically, I wondered if the infinite loop calling `agent_respond` repeatedly would track every invocation within a single trace or whether each iteration would be considered separate. After reviewing the structure, it appeared that each agent interaction was being grouped under a single trace, but I remained uncertain whether this was optimal.

Additionally, I explored nesting spans using the `with trace()` statement provided by Langfuse. Initially, the dashboard output was confusing and messy—traces and spans were breaking out into separate lines unexpectedly. This prompted me to revisit the Langfuse decorators and context management within the backend Python files. Questions such as "can traces record function calls that aren't LLM-based?" highlighted my attempt to fully grasp what Langfuse tracing could handle.

Session ID Management and Observability Context

Another critical area involved managing session IDs for tracing purposes. Initially, session IDs were handled through a global variable, raising concerns about scalability given the application's requirement to support hundreds or even thousands of simultaneous users. Recognizing this limitation, I decided to introduce a dedicated session creation endpoint. This endpoint would generate session IDs and pass them to the client frontend, which would store them (potentially in cookies) and resend them with subsequent interactions. This approach would better support multiple simultaneous users and accurately track their interactions.

During frontend integration, I identified inconsistencies in session ID propagation. Certain frontend functions, such as handling new images, correctly included session IDs, but others like the "agent response" function did not. This inconsistency contributed to the confusion observed in the tracing dashboard and needed correction.

Troubleshooting Model Identification in Traces

On the frontend side, a significant amount of time was spent tracking down the mechanics behind changing and propagating the LLM model used by the agents. I closely examined the frontend handler functions (`App.tsx`, `Chat.tsx`, `sessionManager.ts`) to see how model changes were handled and propagated. Initially, I observed that changing the model involved updating state via a function like `set_model`, but it was unclear how this change was applied downstream in the backend interactions.

I found myself repeatedly asking questions such as:

- "Where is the handler function for when the model is changed?"

- "Can you tell me if the model is actually being changed so the agents use it in their conversation?"

- "Why don't we see the model showing up in the trace?"

These questions indicated significant confusion around how frontend state changes were reflected in backend observability data. Eventually, I confirmed that the model information was indeed visible within the OpenAI tracing dashboard, yet remained unclear why it wasn't consistently appearing as expected in Langfuse traces.

Frustrations and Incremental Progress

Throughout this period, frustration surfaced frequently, particularly when tracing didn't behave as expected. For instance, seeing messages like "No trace found in current context" and observing broken or incomplete traces in the dashboard caused confusion. There were several moments of difficulty, such as failed dependency installations (`pip install pydantic-ai[logfire]`) and confusion around the asyncio task implementations. I had initially used `asyncio.create_task` but questioned its necessity and ultimately chose to simplify the architecture by moving relevant logic into the main event loop.

Despite these challenges, incremental progress was made. I was able to clarify some fundamental misunderstandings about how Langfuse handled spans and traces, and better understood the necessary architectural changes required for proper session management.

In summary, this phase of work on The Boston Wrongs project was largely about gaining clarity on observability practices, correct session management, and ensuring traceability of model choices throughout the application stack. While still a work in progress, the adjustments made during this period laid important groundwork for improved observability and scalability going forward.

Written by Cursor Journal

Don't over(look|state) the obvious

Discussion about this post