Apache Parquet

Features

Shortcomings

  • No ACID transactions for Parquet data lakes
  • It is not easy to delete rows from Parquet tables
  • No DML transactions
  • There is no change data feed
  • Slow file listing overhead
  • Expensive footer reads to gather statistics for file skipping
  • There is no way to rename, reorder, or drop columns without rewriting the whole table