Examples: Apache Arrow integration showcase #181

Closed
opened 2026-01-04 18:46:44 +00:00 by navicore · 1 comment
navicore commented 2026-01-04 18:46:44 +00:00 (Migrated from github.com)

Summary

Create a comprehensive set of examples demonstrating Seq integration with Apache Arrow ecosystem components.

Background

Apache Arrow is a cross-language development platform for in-memory analytics, with components for:

  • Columnar data format
  • Flight (RPC framework)
  • DataFusion (query engine)
  • Parquet file format

Proposed Examples

1. Arrow IPC / File Format

  • Read/write Arrow IPC files
  • Interop with Arrow data from other languages

2. Parquet Integration

  • Read Parquet files
  • Query columnar data
  • Write results back to Parquet

3. Flight Client

  • Connect to Arrow Flight server
  • Stream data queries
  • Handle record batches

4. DataFusion Queries

  • SQL queries over Arrow data
  • Custom UDFs in Seq
  • Query optimization showcase

Implementation Approach

  • Use FFI to call Arrow C Data Interface
  • Or create Seq bindings via external builtins
  • May require new stdlib modules (std:arrow, std:parquet, etc.)

Prerequisites

  • FFI infrastructure for C libraries
  • Possibly Arrow C++ or arrow-rs bindings

Directory Structure

examples/arrow/
├── README.md
├── ipc-roundtrip.seq
├── parquet-read.seq
├── flight-client.seq
└── datafusion-query.seq

Labels

enhancement, examples, data-engineering

## Summary Create a comprehensive set of examples demonstrating Seq integration with Apache Arrow ecosystem components. ## Background Apache Arrow is a cross-language development platform for in-memory analytics, with components for: - Columnar data format - Flight (RPC framework) - DataFusion (query engine) - Parquet file format ## Proposed Examples ### 1. Arrow IPC / File Format - Read/write Arrow IPC files - Interop with Arrow data from other languages ### 2. Parquet Integration - Read Parquet files - Query columnar data - Write results back to Parquet ### 3. Flight Client - Connect to Arrow Flight server - Stream data queries - Handle record batches ### 4. DataFusion Queries - SQL queries over Arrow data - Custom UDFs in Seq - Query optimization showcase ## Implementation Approach - Use FFI to call Arrow C Data Interface - Or create Seq bindings via external builtins - May require new stdlib modules (std:arrow, std:parquet, etc.) ## Prerequisites - FFI infrastructure for C libraries - Possibly Arrow C++ or arrow-rs bindings ## Directory Structure ``` examples/arrow/ ├── README.md ├── ipc-roundtrip.seq ├── parquet-read.seq ├── flight-client.seq └── datafusion-query.seq ``` ## Labels enhancement, examples, data-engineering
navicore commented 2026-01-23 04:48:32 +00:00 (Migrated from github.com)

the callbacks thru FFI would be hard and make us question why we are doing this. parquet support would be nice but that would be probably where we stopped.

the callbacks thru FFI would be hard and make us question why we are doing this. parquet support would be nice but that would be probably where we stopped.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
navicore/patch-seq#181
No description provided.