Debug Workbench
Overview
The Debug Workbench is a bonafide Swiss Army Knife. It enables you to do all of the following:
- Visualize how events (like new customer messages) traveled through your assistant's flow.
- Debug your functions, review your prompts, token usage and execution time
- Quickly replay events against your live draft, allowing for rapid prompt engineering and function development
- Promote events into regression test cases and add assertions against the final state of the context
Generating Events
Debug Workbench is centered around the concept of an Event. For Customer Assistants, events typically correspond to 'user turns', which refer to the assistant's flow execution in response to a new message from the customer.
Events are generated in three ways:
- Test conversations that occur in the Chat Panel implicitly log events for rapid development
- Any error that occurs on a published assistant automatically results in a Debug Workbench event
- Events can be logged in non-error scenarios on published assistants via the Log Debug Event Action
Regardless of where they originate, all events can be visually traced, replayed and promoted to test cases.
Tracing Event Execution
In the screenshot above, the event navigation section on the left shows three separate events. In this case the events occurred against the published assistant. The most recent event shows what happened after the customer sent a message asking “can my mom fly free”. This event has been expanded to show the flow that the assistant took afterward.
We can see that execution flowed through 6 behaviors beginning on main-question
then proceeding to classifier-prompt
and so on. By selecting a behavior within the event trace, we can see more details as to how that behavior executed and what impact it had on the context.
The trace details depend on the type of behavior selected. In the case of a Prompt AI behavior, we can see all parallel prompts that were executed by the behavior. In this case there were 7 parallel prompts executed, all with the purpose of performing various classifications on the conversation. By selecting the build_topic_prompt -> parse_topic_comp prompt
, we can see the prompt that was built as well as the completion it generated and any actions that resulted.
In addition to event navigation and trace details, there is another powerful tracing tool called the Inspector:
The Inspector is toggled on and off via the right-most button in the Debug Workbench. It allows you to do several things:
- View the full conversation transcript at the time the event occurred
- View the console output (print output) of the Functions that executed. This is very useful for debugging functions.
- View the change in field values from the beginning to end of flow execution
In the screenshot above the user has the Fields inspector open. We can see that a field called userType
that was previously unset got set to the value of 'prospect' during flow execution.
Event Replay
Writing functions like prompt builders and API response handlers requires a tight development loop where you can quickly reproduce and fix coding errors. Similarly, building good AI assistants requires significant prompt engineering which likewise requires a tight prompt experimentation loop. Event Replay satisfies both of these needs.
In the screenshot above, an event occurred that triggered a (python) Function Error. We can quickly go to that location in the Function Editor to remediate the problem (in this case the key name should be startTime
rather than start
). We can quickly verify our fix by replaying the event which executes our current assistant definition (even if the event occurred against a published version of the assistant), the result of which is shown below:
After fixing our coding mistake and replaying the event, we now see a full event trace without errors. You can toggle back to seeing the original event by clicking Show Original. You can also view original and replayed event traces side-by-side by hitting the 'Compare Replay' icon next to Show Original.
Although this example focused on fixing a coding error, Event Replay is just as useful for quickly experimenting with prompts.
Test Cases & Assertions
Test cases enable you to make changes to your assistant with confidence. This is especially important in the era of prompt engineering where seemingly innocuous changes to prompts can have unintended consequences.
Test cases in AI Studio are created from events. You do so by finding an event you wish to promote into a test case and clicking test beaker icon on the right of the workbench.
You might choose to create a test case in either of the following scenarios:
- You're happy with the assistant's behavior and want to establish a regression test to make sure it always works that way
- An event occurred that triggered an error or undesirable behavior such as a mis-classification that will require further prompt engineering
In the first case, you simply create the test case and begin adding assertions. In the second case, you create the test case and proceed to fix any problems and verify your fixes by repeatedly running the test case, which is very similar to event replay.
Tests cases are organized into Test Sets. You can run tests individually or run all tests in a set.
Here we've successfully executed a test case. We see a trace that mimics what is displayed in Events. But how much do we actually know about our assistant's behavior? All this test case verifies is that we don't have any full-blown errors in our flow. In order to verify desirable assistant behavior, we need to add assertions. There are two ways to add assertions:
- Clicking
Add Assertion
as shown in the screenshot and editing a rule condition against the final context state. This is the method to use if your bot currently doesn't behave correctly, but you know how it should behave - Opening the Inspector -> Fields view and creating an assertion directly from the final field state. This convenient method is useful when an assistant is already behaving as expected
Other noteworthy features:
- There's a button for copying test cases and changing the last customer message. This can sometimes be an easier way to build up a test set
- In addition to tests that execute the entire flow and assert against the final state (user turn tests), it's also possible to create test cases directly against prompt behaviors. This can be desirable if your flow takes a while to execute and you only wish to test or develop a specific prompt node. To create a Prompt test, select a prompt behavior in the event navigation tree and then click the test beaker and select 'Add Prompt Test Case'
Updated 7 months ago