10 Tips for adding SQLAlchemy to FastAPI
Introducing a relational database such as PostgreSQL or MySQL to FastAPI the proper way as a data scientist.
The popularity of FastAPI has been steadily growing over the last few years. It is slowly climbing the top 25 web frameworks and technologies used in the Stack Overflow Developer Survey 2022, being right behind Flask and Django (but beating both in Loved vs. Dreaded). In an age where data and AI are claiming a bigger part of any business, it is no surprise there is a lot of interest in the lightweight, performant API framework for Python. And you only need a few lines of code to get started:
FastAPI’s usage among data scientists and ML engineers is growing as we speak. The framework being so minimalistic might tempt you to believe there is not a lot of room for error. But complexity is incremental. Eventually these API’s will have to be updated, extended and maintained. You would be better off having read The Art of Computer Programming than The Art of Statistics. Just thinking about the poor excuse for an ETL pipeline I hacked together years ago using Pandas makes me shiver to this day.
Therefore, we want to share some of our learnings from maturing FastAPI apps, in particular: adding a relational database. Whether you use FastAPI in a large web app or a small ML inference service, chances are you introduce one at some point. You might want to track metrics, or add some users. The most popular Python package to use for adding a relational database is SQLAlchemy. It provides a core framework for communicating with a database, as well as an ORM. Everyone working with a relational database in Python has worked with - or at least considered - SQLAlchemy. In this blog post, we share 10 tips for integrating SQLAlchemy with your FastAPI application. The code samples can be seen in action in our FastAPI Template Project.
FastAPI and DDD
When you decided to add a relational database, it is important to integrate it properly. In this section, we introduce a few practices and abstractions we have found to be helpful for both maintaining and testing the code.
1. Don’t put BaseModels in your endpoints
SQLAlchemy ORM uses what is called the data mapper pattern. The data mapper pattern is great for its flexibility and attempts to bridge the Object–relational impedance mismatch. But there are disadvantages to using these ORM models throughout your application, particularly your API. What if you want to keep the endpoints simple to users, but store objects across multiple tables or hide certain fields to users? The code could easily end up bending over backwards to add awkward logic. This could make your API hard to use. You should consider evolving your API and database schema separately.
In particular, you should not try to create a class that is both a Pydantic BaseModel and a declarative base from SQLAlchemy. Pydantic is mainly an input validation library. Above we see an example of a model for which both a Pydantic BaseModel is created for the API and a SQLAlchemy model. You only need to translate these objects into one another. (The next tip tells you where to do that.)
2. Use the Repositories Pattern
Being able to swap components in your software tells you that it is modular. And although you are probably not switching databases anytime soon, being able to swap real persistence with in-process memory can both allow you to postpone the decision to introduce new entities (or even a whole database) as well as make it easier to write unit tests. It requires creating an interface around the storage layer and its querying logic. Enter the Repository Pattern.
The Repository Pattern can be found in books such as Domain Driven Design. For our purposes, repositories are intermediaries responsible for storing the entities in your domain model. A basic repository typically has methods such as
get_all_enabled(). The signatures of these methods consist of domain models and query parameters such as filters. They do not expose how they persist entities, only which entities to store and receive using domain models.
The interface of a repository can be implemented by a SQLAlchemy implementation and a basic in-memory implementation. As mentioned before, the latter can be used for unit tests.
3. Create Query Objects
One pattern I personally found to be helpful is the use of Query Objects. Since you often want to query your database with several filters, collecting then in a Query Object allows you to add or remove filter parameters, change defaults filter values and even reuse filters without changing a Repository’s interface. Another benefit is that it reduces the amount of arguments you have to keep track of when fetching entities.
We have all written functions with too many arguments at some point; a code-smell suggesting you might want to introduce a new object. In the case of repositories, it’s often a Query Object you need.
Managing SQLAlchemy scopes
Both FastAPI and SQLAlchemy have their own documentation on how to add a database using depends (FastAPI) and managing connections, sessions and transactions, etc. (SQLAlchemy). But the former works with a rather succinct setup, while the latter is a very detailed explanation of how to work with SQLAlchemy in all its glory. Here, we show a complete yet simple setup that should do the trick for most API’s. This leverages SQLAlchemy’s different scopes without coupling implementation details to your domain.
4. Re-use database connections
As every documentation page will tell you, a big anti-pattern is not re-using database connections. They are expensive to set up, so reconnecting for every transaction introduces significant overhead. The FastAPI documentation shows you how you could approach this. One way to deal with this is caching the function that initialises the engine (the object managing the connection pool).
This way you re-use the engine between request and only pay for the connections on the first one.
5. One session per request
The second layer of bookkeeping is managing sessions. Sessions are not expensive to create. They form a layer between the database that performs transactions, identity mapped objects etc. SQLAlchemy has its own documentation on sessions, where they define the transaction scope and the session scope. In short, in the case of API’s it is good practice to create one session per request.
To manage a sessions life cycle, we can leverage FastAPI’s
Depends by creating an iterator.
The FastAPI endpoint to create a Todo would then simply read:
Every time the endpoint is called, a repository is instantiated with a new session that is automatically closed after the request ends. Even when an exception occurs, we rollback the session.
6. Transactions using context managers
The last thing to manage is individual transactions. Sometimes, one transaction per request offers enough granularity. In that case, committing the session right before closing in the
create_todo_repository function does the trick. When you want to commit transactions in the service layer, repositories need a way to construct transactions without introducing a leaky abstraction (by exposing the session). Luckily, context managers are a great solution for this.
To make a context manager out of a repository, you need to implement an
__enter__() and an
__exit__() method. We can commit the session when exiting the context. The
__exit__() even allows you to handle any exceptions raised in the current context, so you can safely rollback any pending transactions before exiting.
7. Add Alembic
You are most likely going to change your database schema at one point. Alembic has been the standard schema management solution for SQLAlchemy. (They were also developed by the same author!)
8. Test your repositories
The great thing about repositories is that they are easy to test. You should check if they save and return the right entities, perform the right filtering and they do not accept invalid values. If you test your repositories well, it gives you a lot of comfort writing more involved business logic since you know that you are at least using the right entities. If you have in-memory implementations (and perhaps add ContractTests for your repositories as well), you can even swap the SQL implementations for these lightweight test doubles to unit test business logic.
While we are at it, we might as well test our migrations by using Alembic in our test setup. This ensures you do not forget to generate and commit your migrations, since your tests will fail without them.
When to add a database to FastAPI
Before diving into SQLAlchemy, here are two tips that help make the right considerations when you have not bitten the bullet yet about introducing SQLAlchemy.
9. Make sure your FastAPI app needs a relational database
Everything is a trade-off. And although a big part of the web runs on relational databases, PostgreSQL is not the answer to everything. Despite their versatility, relational databases can be a mismatch for API’s, especially at an early stage. Does your inference endpoint only need a trained model that updates its version once in a while? Then it is probably enough to add the version number to the models filename. (Python wheels is one of the many examples of how data can be managed largely by a naming convention.) Want to cache some calculations to speed up response times? Check out the functools caching decorators before adding Redis. Even when adding authentication to your API, if it is only for your launching customer you could just verify their hashed credentials directly in your environment.
On the other hand, your API might have performance requirements that can not be met by a relational database. Relational databases are designed and optimized for OLTP purposes. For OLAP use-cases where your API serves complex analytics dashboards on large amounts of data, there may be more suitable database technologies (check out Google BigQuery, Amazon Redshift, HBase… or even DuckDB!). The expected amount of reads and writes, the most common query pattern and ACID requirements are all factors that should be considered. The data domain has developed vastly and if we have learned something, it is that there is no one size fits all.
10. Spend some time on your data model
If you postponed adding a relational database long enough, you should take another critical look at your data model. Of course, if you manage the data yourself and it is not mission critical, you get away with almost anything. But when you put your app in a production environment and start storing other people’s data, migrations of a quickly evolving database schema can become a real pain. Classics include creating a nullable column that should have been mandatory or decreasing VARCHAR lengths. And how normalized should your schema be? Do you create that extra table and sacrifice some query performance because of extra joins you need to do, or do you accept the duplication?
Architecture is what’s hard to change, and your database schema is part of it. Getting an outside view can be very helpful as well!
SQLAlchemy and FastAPI are both great Python packages that get you up and running with minimal setup. In this post we listed several aspects to take into account as you add a relational database to your API and your project matures. But as mentioned above, everything is a trade-off. You might find turning on Pydantic’s ORM mode a lot easier to work with and never run into needing more flexibility on both your API and database mapping. Maybe you want to leave out the context manager and just set
autocommit=True while creating a new session. In the end, you know the requirements of your application best.
Having said that, hopefully these tips help you to build out a great data application or at least give food for thought regarding its design and setup. If you want to see all the code in this post in action, you can check out our FastAPI Template Project on Github and read our post about it. If you think we missed important tips or you disagree on the ones listed, feel free to reach out! We are always happy to discuss and improve on ideas about Python API’s and data.
About the author
Donny is wiskundige en begon als software- en data-engineer te freelancen voordat hij BiteStreams begon. Hij is goed in het ontwerpen van systemen die aan (complexe) behoeftes van de klant voldoen. In zijn vrije tijd sport en leest hij en gaat hij graag een drankje doen met vrienden.About us
Vond je deze post interessant? Bekijk ook onze andere artikelen.
Data-Gedreven Worden: Een Sleutel Tot Success Voor Startups
Data-gedreven besluitvorming is cruciaal voor startups om hun bedrijfsactiviteiten te optimaliseren en langetermijnsucces te behalen. Helaas beginnen veel startups te laat met het worden van een data-gedreven bedrijf, waardoor het later moeilijk is om hun aanpak te veranderen.
5 Hoofdzaken voor effectieve BI dashboards
Business Intelligence (BI) gaat over het gebruik van gegevens om betere besluitvorming te stimuleren. BI-dashboards moeten echter worden ontworpen met bepaalde aspecten in gedachten om echt effectief te zijn. In deze post onderzoeken we enkele essentiële praktijken om organisaties te helpen de waarde van hun BI-inspanningen te maximaliseren.
Do you need a professional data API for the right business insights? Contact us now
Get more data-driven with BiteStreams, and leave the competition behind.Contact us