DataZip¶

DataZip is a Python library that extends zipfile.ZipFile to provide seamless serialization and deserialization of complex Python objects — a more flexible and readable alternative to pickle for data science workflows.

Why DataZip?¶

Human-inspectable archives: DataZip files are standard .zip files. You can open them with any archive tool and inspect the contents.
Broad type support: Works out of the box with pandas DataFrames/Series, NumPy arrays, Polars DataFrames, datetimes, paths, sets, frozensets, complex numbers, and custom classes.
Efficient storage: Tabular data is stored as Parquet; arrays as .npy. JSON is used for metadata and simple types.
No pickle by default: Most types are serialized without pickle, making files safer and more portable.
Custom class integration: Any class that implements __getstate__/__setstate__ (the standard pickle protocol) works automatically. The IOMixin makes it even simpler.

Quick Example¶

from io import BytesIO
import pandas as pd
from datazip import DataZip

# Write
buffer = BytesIO()
with DataZip(buffer, "w") as z:
    z["df"] = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
    z["config"] = {"threshold": 0.5, "labels": ["a", "b"]}
    z["values"] = {1, 2, frozenset([3, 4])}

# Read
with DataZip(buffer, "r") as z:
    df = z["df"]
    config = z["config"]

Supported Types¶

Category	Types
Primitives	`str`, `int`, `float`, `bool`, `None`, `complex`
Collections	`dict`, `list`, `tuple`, `set`, `frozenset`, `deque`, `defaultdict`
Date/Time	`datetime`, `pandas.Timestamp`
Paths	`pathlib.Path`
Custom	Any class with `__getstate__`/`__setstate__`
Optional	`numpy.ndarray`, `pandas.DataFrame`, `pandas.Series`, `polars.DataFrame`, `polars.LazyFrame`, `polars.Series`, Plotly figures

Installation¶

See the Installation page for full details including optional dependencies.