Python: Do we still need Dataclasses ? If Pydantic is here
Hi everyone, In one of my previous article I have discussed on the Python DataClasses. But there’s an alternate package — “Pydantic” and it adds a couple of really cool features. So in this article I will discuss on Pydantic and when you should choose Pydantic over the built-in Dataclasses.
What is Pydantic?
Pydantic is a library that makes it easy to define data structures with validation and default values. It is designed to be used with Python’s type annotations (a feature introduced in Python 3.5), that allows you to annotate function and variable definitions with type information.
Pydantic uses these type annotations to automatically generate validation code for your data structures. For example, if we define a class with a string field and a numeric field, Pydantic will automatically generate code to ensure that the string field is a string and the numeric field is a number.
Here is an example of a Pydantic model:
from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str
This defines a simple User model with three fields: id
, name
, and email
. The id
field is an integer, and the name
and email
fields are strings.
How does Pydantic work?
Pydantic works by analyzing the type annotations in the code and generating validation code based on them. When we create an instance of a Pydantic model, Pydantic uses the type annotations to validate the data we pass in.
Let’s say, if we create a User object with an email address that is not a valid email address, Pydantic will raise a validation error. Likewise, if we create a User object without an id
field, Pydantic will raise a validation error as well.
Pydantic also provides a number of built-in validators that we can use to ensure that our data is valid. For example, we can use the EmailStr
validator to ensure that an email address is a valid email address.
Understand Pydantic Validation
Lets suppose I Have a json file and I want to load it. I can do that simply using below code -
import json
def main():
with open ("./data.json") as file:
data = json.load(file)
print(data[0])
if __name__ == "__main__":
main()
But, What if we don’t know anything about structure of data, and we want to validate the data easily. That's where Pydantic is useful.
So now I will show you how to use Pydantic to work with data and add validations to it.
Below is the code for the same using Pydantic model -
import json
import pydantic
from typing import Optional, List
class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]
def main():
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
print(books)
if __name__ == "__main__":
main()
So in the above code, A Book
class is defined as a Pydantic BaseModel
. The class has five attributes: title
, author
, publisher
, publishing_no
, verification_no
and subtitle
. title
, author
, and publisher
are required fields, while publishing_no
,verification_no
and subtitle
are optional fields.
The List
type hint is used to indicate that the books
variable will be a list of Book
objects.
list comprehension is used to create a new Book
object for each item in the data
list. The **item
syntax unpacks the dictionary items into keyword arguments that are passed to the Book
constructor.
Lets now add some Validation, for example I want to validate that the json data should have either an publishing_no
and verification_no.
for example -
class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]
@pydantic.root_validator(pre=True)
@classmethod
def check_publishing_or_verification(cls, values):
if "publishing_no" not in values and "verification_no" not in values:
raise Exception("Document should have either an publishing_no or verification_no")
return values
In a similar way we can add validations based on the individual attributes like below -
class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]
@pydantic.validator("verification_no")
@classmethod
def verification_no_valid(cls, value) -> None:
## add validation
return value
There a few more things we can do with Pydantic, as they have a config class that we can add to a base model to change some settings.
for example -
class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]
class Config:
allow_mutation = False
anystr_lower = True
Here, I have made the books as an immutable object so, now we can’t change the values, for example — if we run below code it will throw an error -
def main() -> None:
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
books[0].title = "new_value"
## TypeError: "Book" is immutable and does not support item assignment
And also I am converting everything to lowercase which might be helpful when we are working with data. for example -
def main() -> None:
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
print(books[0])
## title='python: going beyond basic string formatting using f-string' author='pravash' publisher='python community' publishing_no='1' verification_no='978–0753555194' subtitle='faster and more efficient string formatting with python's f-strings'
Understand Pydantic Settings Management
Lets take a scenario where we have a Python application that needs to read settings from a JSON file. We can define a Pydantic model to represent the expected structure of the JSON data, like this:
import pydantic
class Settings(pydantic.BaseSettings):
app_name: str = "My App"
api_key: str
timeout: int = 10
In the above example, the Settings
class extends pydantic.BaseSettings
, which provides the settings management functionality. The class has three attributes: app_name
, api_key
, and timeout
.
Now, To load settings from a JSON file, we can create an instance of the Settings
class and pass the path to the JSON file as the env_file
parameter:
settings = Settings(_env_file=".env.json")
This will load the settings from the JSON file and validate and type-check them using the Settings
model. The resulting settings
object will have the same attributes as the model, with values based on the JSON data. we can then access these settings like regular attributes:
print(settings.app_name)
print(settings.api_key)
print(settings.timeout)
Understand Pydantic Serialization
You can also convert this objects as dictionary type by using dict() methods. and then you can do some other things, for example if you want to include/exclude some keys you can do that -
def main():
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
print(books[0].dict().exclude=("subtitle"))
print(books[0].dict().include=("subtitle"))
Understand Pydantic Automatic Documentation
Lets say, we have a Pydantic model representing a person, like this:
import pydantic
class Person(pydantic.BaseModel):
first_name: str
last_name: str
age: int
email: str
And to generate documentation for ‘Person’ model, we can use the Person.schema()
method, which returns a dictionary representing the structure of the model.
import json
person_schema = Person.schema()
print(json.dumps(person_schema, indent=4))
This will output the following JSON to the console:
{
"title": "Person",
"type": "object",
"properties": {
"first_name": {
"title": "First Name",
"type": "string"
},
"last_name": {
"title": "Last Name",
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"email": {
"title": "Email",
"type": "string"
}
},
"required": [
"first_name",
"last_name",
"age",
"email"
]
}
So, Overall I believe Pydantic is a good solution for Validating data, Automatic Documentation, and Easy Serialization.
I am not saying you shouldn’t use Dataclasses, I think Dataclasses are good alternatives if we don’t need data validations. Its always good to work with built-in packages cause if someone else need to run your code they don’t have to install any packages or third-party things.
So, If you’re working with complex data structures in your Python projects, Pydantic is definitely worth checking out.
Connect with me on LinkedIn