DataZip¶
DataZip is a Python library that extends zipfile.ZipFile to provide seamless serialization and deserialization of complex Python objects — a more flexible and readable alternative to pickle for data science workflows.
Why DataZip?¶
- Human-inspectable archives: DataZip files are standard
.zipfiles. You can open them with any archive tool and inspect the contents. - Broad type support: Works out of the box with pandas DataFrames/Series, NumPy arrays, Polars DataFrames, datetimes, paths, sets, frozensets, complex numbers, and custom classes.
- Efficient storage: Tabular data is stored as Parquet; arrays as
.npy. JSON is used for metadata and simple types. - No pickle by default: Most types are serialized without pickle, making files safer and more portable.
- Custom class integration: Any class that implements
__getstate__/__setstate__(the standard pickle protocol) works automatically. TheIOMixinmakes it even simpler.
Quick Example¶
from io import BytesIO
import pandas as pd
from datazip import DataZip
# Write
buffer = BytesIO()
with DataZip(buffer, "w") as z:
z["df"] = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
z["config"] = {"threshold": 0.5, "labels": ["a", "b"]}
z["values"] = {1, 2, frozenset([3, 4])}
# Read
with DataZip(buffer, "r") as z:
df = z["df"]
config = z["config"]
Supported Types¶
| Category | Types |
|---|---|
| Primitives | str, int, float, bool, None, complex |
| Collections | dict, list, tuple, set, frozenset, deque, defaultdict |
| Date/Time | datetime, pandas.Timestamp |
| Paths | pathlib.Path |
| Custom | Any class with __getstate__/__setstate__ |
| Optional | numpy.ndarray, pandas.DataFrame, pandas.Series, polars.DataFrame, polars.LazyFrame, polars.Series, Plotly figures |
Installation¶
See the Installation page for full details including optional dependencies.