Python: Do we still need Dataclasses ? If Pydantic is here

Revolutionize Your Python Code Quality with Pydantic

Pravash
6 min readApr 17, 2023

Hi everyone, In one of my previous article I have discussed on the Python DataClasses. But there’s an alternate package — “Pydantic” and it adds a couple of really cool features. So in this article I will discuss on Pydantic and when you should choose Pydantic over the built-in Dataclasses.

What is Pydantic?

Pydantic is a library that makes it easy to define data structures with validation and default values. It is designed to be used with Python’s type annotations (a feature introduced in Python 3.5), that allows you to annotate function and variable definitions with type information.

Pydantic uses these type annotations to automatically generate validation code for your data structures. For example, if we define a class with a string field and a numeric field, Pydantic will automatically generate code to ensure that the string field is a string and the numeric field is a number.

Here is an example of a Pydantic model:

from pydantic import BaseModel
class User(BaseModel):
id: int
name: str
email: str

This defines a simple User model with three fields: id, name, and email. The id field is an integer, and the name and email fields are strings.

How does Pydantic work?

Pydantic works by analyzing the type annotations in the code and generating validation code based on them. When we create an instance of a Pydantic model, Pydantic uses the type annotations to validate the data we pass in.

Let’s say, if we create a User object with an email address that is not a valid email address, Pydantic will raise a validation error. Likewise, if we create a User object without an id field, Pydantic will raise a validation error as well.

Pydantic also provides a number of built-in validators that we can use to ensure that our data is valid. For example, we can use the EmailStr validator to ensure that an email address is a valid email address.

Understand Pydantic Validation

Lets suppose I Have a json file and I want to load it. I can do that simply using below code -

import json

def main():
with open ("./data.json") as file:
data = json.load(file)
print(data[0])

if __name__ == "__main__":
main()

But, What if we don’t know anything about structure of data, and we want to validate the data easily. That's where Pydantic is useful.

So now I will show you how to use Pydantic to work with data and add validations to it.
Below is the code for the same using Pydantic model -

import json
import pydantic
from typing import Optional, List


class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]


def main():
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
print(books)


if __name__ == "__main__":
main()

So in the above code, A Book class is defined as a Pydantic BaseModel. The class has five attributes: title, author, publisher, publishing_no, verification_no and subtitle. title, author, and publisher are required fields, while publishing_no,verification_no and subtitle are optional fields.

The List type hint is used to indicate that the books variable will be a list of Book objects.

list comprehension is used to create a new Book object for each item in the data list. The **item syntax unpacks the dictionary items into keyword arguments that are passed to the Book constructor.

Lets now add some Validation, for example I want to validate that the json data should have either an publishing_no and verification_no.
for example -

class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]

@pydantic.root_validator(pre=True)
@classmethod
def check_publishing_or_verification(cls, values):
if "publishing_no" not in values and "verification_no" not in values:
raise Exception("Document should have either an publishing_no or verification_no")
return values

In a similar way we can add validations based on the individual attributes like below -

class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]

@pydantic.validator("verification_no")
@classmethod
def verification_no_valid(cls, value) -> None:
## add validation
return value

There a few more things we can do with Pydantic, as they have a config class that we can add to a base model to change some settings.
for example -

class Book(pydantic.BaseModel):
title: str
author: str
publisher: str
publishing_no: Optional[str]
verification_no: Optional[str]
subtitle: Optional[str]

class Config:
allow_mutation = False
anystr_lower = True

Here, I have made the books as an immutable object so, now we can’t change the values, for example — if we run below code it will throw an error -

def main() -> None:
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
books[0].title = "new_value"

## TypeError: "Book" is immutable and does not support item assignment

And also I am converting everything to lowercase which might be helpful when we are working with data. for example -

def main() -> None:
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
print(books[0])

## title='python: going beyond basic string formatting using f-string' author='pravash' publisher='python community' publishing_no='1' verification_no='978–0753555194' subtitle='faster and more efficient string formatting with python's f-strings'

Understand Pydantic Settings Management

Lets take a scenario where we have a Python application that needs to read settings from a JSON file. We can define a Pydantic model to represent the expected structure of the JSON data, like this:

import pydantic

class Settings(pydantic.BaseSettings):
app_name: str = "My App"
api_key: str
timeout: int = 10

In the above example, the Settings class extends pydantic.BaseSettings, which provides the settings management functionality. The class has three attributes: app_name, api_key, and timeout.

Now, To load settings from a JSON file, we can create an instance of the Settings class and pass the path to the JSON file as the env_file parameter:

settings = Settings(_env_file=".env.json")

This will load the settings from the JSON file and validate and type-check them using the Settings model. The resulting settings object will have the same attributes as the model, with values based on the JSON data. we can then access these settings like regular attributes:

print(settings.app_name) 
print(settings.api_key)
print(settings.timeout)

Understand Pydantic Serialization

You can also convert this objects as dictionary type by using dict() methods. and then you can do some other things, for example if you want to include/exclude some keys you can do that -

def main():
with open("./data.json") as file:
data = json.load(file)
books: List[Book] = [Book(**item) for item in data]
print(books[0].dict().exclude=("subtitle"))
print(books[0].dict().include=("subtitle"))

Understand Pydantic Automatic Documentation

Lets say, we have a Pydantic model representing a person, like this:

import pydantic

class Person(pydantic.BaseModel):
first_name: str
last_name: str
age: int
email: str

And to generate documentation for ‘Person’ model, we can use the Person.schema() method, which returns a dictionary representing the structure of the model.

import json

person_schema = Person.schema()
print(json.dumps(person_schema, indent=4))

This will output the following JSON to the console:

{
"title": "Person",
"type": "object",
"properties": {
"first_name": {
"title": "First Name",
"type": "string"
},
"last_name": {
"title": "Last Name",
"type": "string"
},
"age": {
"title": "Age",
"type": "integer"
},
"email": {
"title": "Email",
"type": "string"
}
},
"required": [
"first_name",
"last_name",
"age",
"email"
]
}

So, Overall I believe Pydantic is a good solution for Validating data, Automatic Documentation, and Easy Serialization.

I am not saying you shouldn’t use Dataclasses, I think Dataclasses are good alternatives if we don’t need data validations. Its always good to work with built-in packages cause if someone else need to run your code they don’t have to install any packages or third-party things.

So, If you’re working with complex data structures in your Python projects, Pydantic is definitely worth checking out.

Connect with me on LinkedIn

--

--

Pravash

I am a passionate Data Engineer and Technology Enthusiast. Here I am using this platform to share my knowledge and experience on tech stacks.