Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSPP24 Idea: Implement ETL CLI Tools for GraphAr #463

Open
acezen opened this issue Apr 24, 2024 · 3 comments
Open

OSPP24 Idea: Implement ETL CLI Tools for GraphAr #463

acezen opened this issue Apr 24, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@acezen
Copy link
Contributor

acezen commented Apr 24, 2024

Describe the enhancement requested

Description

GraphAr is designed as a unified storage format for graph data, aiming to provide a standardized graph data storage format for easy import/export, as well as exchange and sharing of graph data.Beyond the foundational format design, GraphAr currently also offers libraries in C++, Java, Python, and Scala to enable users to work with GraphAr formatted data across different programming environments.

To facilitate the use of GraphAr formatted data, we aim to provide a command-line tool based on these libraries. This tool will be used for converting data from various sources into GraphAr formatted data and vice versa - transforming GraphAr formatted data into other formats.

This command-line tool needs to support the following features:

  • A user-friendly command-line interface
  • Graph data management: Users can use the CLI tool to view basic information about the GraphAr formatted data, such as the number of nodes, edges, properties, and related Schema information.
  • Graph data import: Users can import data from other formats into GraphAr format through the CLI tool.
  • Support for importing large-scale data: Users can use the CLI tool to import massive datasets into GraphAr format.
  • (Optional) Graph data export: Users can export GraphAr formatted data into other formats using the CLI tool (lower priority).

Deliverables

  1. A CLI tool that meets the above requirements
  2. Detailed design and usage documentation

Component(s)

Other

Reference

@acezen acezen added the enhancement New feature or request label Apr 24, 2024
@acezen acezen pinned this issue Apr 26, 2024
@acezen
Copy link
Contributor Author

acezen commented May 21, 2024

parquet-cli is a good reference for CLI:
https://github.com/apache/parquet-mr/tree/master/parquet-cli

@ywh555hhh
Copy link
Contributor

@acezen hi,I would like to ask if you have time to check and reply to my email about ospp,please

I sent it to the [email protected]

@acezen
Copy link
Contributor Author

acezen commented Jul 11, 2024

databend native CLI:
https://github.com/datafuselabs/bendsql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants