Parquet.Net Help

DataFrame Support

Since v4.8 support for Microsoft.Data.Analysis was added.

What's supported?

Due to DataFrame being in general less functional than Parquet, only primitive (atomic) columns are supported at the moment. If DataFrame supports more functionality in future (see related links below), this integration can be extended.

When reading and writing, this integration will ignore any columns that are not atomic (primitive).

Writing

There is magic happening under the hood, but as a user you only need to call WriteAsync() extension method on DataFrame and specify the destination stream to write it to, like so:

DataFrame df; await df.WriteAsync(stream);

Reading

As with writing, the magic is already done under the hood, so you can use System.IO.Stream's extension method to read from parquet stream into DataFrame

DataFrame df = await fs.ReadParquetAsDataFrameAsync();

Samples

For your convenience, there is a sample Jupyter notebook available that demonstrates reading parquet files into DataFrame and displaying them:

Ipynb preview

In order to run this notebook, you can use VS Code with Polyglot Notebooks extension.

Last modified: 14 November 2024