Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic CSV delimiter detection #1081

Open
ws-garcia opened this issue Jan 12, 2025 · 0 comments
Open

Automatic CSV delimiter detection #1081

ws-garcia opened this issue Jan 12, 2025 · 0 comments

Comments

@ws-garcia
Copy link

ws-garcia commented Jan 12, 2025

Although PapaParse has a delimiter guessing mechanism, the method is far from being accurate. Actually, the research on this subject is on its way. Libraries like CleverCSV implements robust dialect sniffing strategies, backed by scientific research.

However, it is actually really difficult to separate the delimiters guessing mechanism from the Python's CleverCSV library. A recent research has pointed out a new universal method that can be integrated into whatever CSV parser, proved far more reliable than CleverCSV, as its research paper demonstrate.

The proposal is to implement the Table Uniformity Method, already implemented in Python and being considered to be implemented in Rust, by code porting. In this way, the wonderful PapaParse project will have an state of the art delimiter guessing strategy, improving significantly the automation in CSV processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants