Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loading issues #142

Open
nandita0401 opened this issue May 8, 2021 · 12 comments
Open

Data loading issues #142

nandita0401 opened this issue May 8, 2021 · 12 comments

Comments

@nandita0401
Copy link

Screenshot (717)

OverflowError: Python int too large to convert to C long
Getting this error for both train.csv and dev.csv file
What to do to solve this error?

@Breta01
Copy link
Owner

Breta01 commented May 12, 2021

Can you try to replace the line csv.field_size_limit(sys.maxsize) in file src\ocr\datahelpres.py with following code:

max_int  = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.
    try:
        csv.field_size_limit(max_int)
        break
    except OverflowError:
        max_int = int(max_int)

It seems that sys.maxsize behaves differently across platforms and can cause this error. You can also try to replace sys.maxsize with fixed number (e.g. csv.field_size_limit(2147483647), but I am not sure how big the number must be). If the number is too small it will result in error in further loading. Please try it and let me know how it goes.

@nandita0401
Copy link
Author

It is taking too much time for execution. It's been more than 12 hours for execution. Is there any solution?

@Breta01
Copy link
Owner

Breta01 commented May 14, 2021

Oh, that definitely shouldn't take that long (just a few seconds I guess). Did you try setting fixed number like csv.field_size_limit(2147483647)?

@nandita0401
Copy link
Author

Screenshot (795)
From where to get the dataset?

@Breta01
Copy link
Owner

Breta01 commented May 16, 2021

Well, the steps are bit old and I would like to rework it once I have more time.
You have to download datasets according to the instructions in data/ folder (all datasets aren't necessary).
Then go to src/data/ and run these scripts in following order (some extra parameters might be necessary):

  1. python data_extractor.py
  2. python data_normalization.py
  3. python data_create_sets.py --csv

@nandita0401
Copy link
Author

Screenshot (800)

How to solve this error?

@Breta01
Copy link
Owner

Breta01 commented May 16, 2021

It should work now, just pull latest changes from repo.

@Breta01 Breta01 changed the title OverflowError Data loading issues May 16, 2021
@nandita0401
Copy link
Author

Can you please elaborate?

@Breta01
Copy link
Owner

Breta01 commented May 30, 2021

On what exactly?

@nandita0401
Copy link
Author

image
How to solve this error?

@rohit0810
Copy link

Can you try to replace the line csv.field_size_limit(sys.maxsize) in file src\ocr\datahelpres.py with following code:

max_int  = sys.maxsize

while True:
    # decrease the maxInt value by factor 10 
    # as long as the OverflowError occurs.
    try:
        csv.field_size_limit(max_int)
        break
    except OverflowError:
        max_int = int(max_int)

It seems that sys.maxsize behaves differently across platforms and can cause this error. You can also try to replace sys.maxsize with fixed number (e.g. csv.field_size_limit(2147483647), but I am not sure how big the number must be). If the number is too small it will result in error in further loading. Please try it and let me know how it goes.

@rohit0810
Copy link

The error hasnt solved for me too even after replacing the code provided by you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants