Faking OpenAI - Unit testing in the age of LLMs (Part Two)

2025-03-02

Mar 02, 2025

This is the continuation of unit testing in GenAI. Please read the previous one before continuing: Mocking OpenAI - Unit testing in the age of LLMs (Part One)

In this case, we will structure the code slightly differently to enable a different type of testing technique called “Faking”.

With mocking, when the tests are running, the part of the code that shouldn’t run (e.g. the actual call to OpenAI) is replaced with a mock function that simply returns the value we want, and then we can carry on testing the business logic we actually wrote.

This requires extra packages and dependencies to set up and implement.

Faking needs you to organise your code slightly differently but doesn’t need additional elements apart from basic Python.

Dependency Injection

“If you have a hammer, every problem looks like a nail”

This “slightly differently” is called Dependency Injection. Probably the most important code organisation principle that allows “modularity”. It allows the construction of complex behaviour to happen _after_ the definition of the parts, either later as in a different part of the program, or later as at runtime rather than writing time.

Here, we will use it to enable faking. We will wrap the part mocked in the previous code with a class and “inject” it into the service class (make it a class argument of the service’s __init__() constructor). During regular operation, the correct code will be injected. During testing, we will write a “fake” class that, instead of calling OpenAI, simply returns the value (very similar to mocking in Part One)

Let’s look at it in practice:

See the code at https://github.com/xLaszlo/mopenai

Remember, this was the original service:

Changing the OpenAI Service

Rewrite this the following way:

As you can see, the service receives an “AsyncOpenAIClient” class at construction that has a chat_completions_create() function, which has the same signature as OpenAI’s AsyncClient.

There is also a get_service() factory function that constructs the class (standard convenience technique)

The client itself is very simple:

Given that we will fake it, we need to make sure it has no moving parts that can break, as we will _not_ be running this during testing. If you want to make it more generic, you can use args/kwargs (I am not sure which is better).

Also, to ensure as much one-to-one consistency as possible, I used the same function naming as in the original client:

“chat.completions.create()” → “chat_completions_create()”

This gives us a standard approach to implementing the rest of the API.

Main

The main() function is only slightly different; we are just using the factory function.

Testing

Let’s look at the test:

As you can see, the code is much simpler. Only the plain Python code is present in the Given-When-Then structure. You don’t need to remember any mocking-related details, just standard generic syntax.

Let’s take a look at the client:

As you can see, it has one function with the same signature as “AsyncOpenAIClient”.

I must admit that figuring out the correct return value was much harder and messier than I expected. This is because OpenAI wraps the API response JSON into Pydantic classes, and to remain completely authentic, this needs to be replicated.

The benefit is that if OpenAI changes its definitions, this test will fail, and you can reconcile your code with the changes.

If you don’t want to bother with this much detail, there is the “SimpleNameSpace” class type that suits this role well: (I didn’t know about this, and ChatGPT recommended it.)

You can see that we only need to replicate the minimum of the return value. (I didn’t know about this, TBH. Thanks, ChatGPT!)

Fakes, Spies and Test Doubles

Both fake classes count the number of times they are called. A fake class with this capability is called a “Spy”; of course, more complex spying behaviour can also be implemented here. (Sometimes, fake classes are called “Test Doubles”, as they are stand-ins of the real thing during tests, like in a movie.)

As you can see in the assertions, the Spy enabled us to assert that the client was actually called.

And that’s it…

Final Thoughts

Dependency Injection is a widely used technique, and we used it to help us write more straightforward tests. This lets you avoid remembering the exact mocking syntax and just write generic Python code. I think less cognitive load is always better. In all honesty, there are many situations that you can’t avoid mocking, but the above technique is quite generally applicable.

I hope you enjoyed this blog post. Next time, I will be writing about async in Python. Subscribe to be notified:

Deliberate Machine Learning

Discussion about this post