Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read config.cfg #1

Open
Bhavin1996 opened this issue Feb 6, 2021 · 18 comments
Open

Can't read config.cfg #1

Bhavin1996 opened this issue Feb 6, 2021 · 18 comments

Comments

@Bhavin1996
Copy link

OSError: [E053] Could not read config.cfg from C:\Users\bhavi\AppData\Local\Programs\Python\Python39\lib\site-packages\resume_parser\degree\model\config.cfg

@kbrajwani
Copy link
Owner

Hey ,
make sure you have installed correct spacy==2.3.5 and en_core_web_sm==2.3.1 version.
https://colab.research.google.com/drive/1p6rhi9g0ughtGBojnCJcPVRRNqziuk3K?usp=sharing
see this colab notebook.

@diracsol
Copy link

I have run python -m spacy validate and confirmed that spacy version 2.3.5 and en_core_web_sm version is 2.3.1
When I run from resumeparser run resumeparse I get user warning [w031] message that says Model 'en_training' (0.0.0) requires spacy 2.2 and is incompatible with spacy 2.3.5

@Jeyandranath
Copy link

Hey ,
make sure you have installed correct spacy==2.3.5 and en_core_web_sm==2.3.1 version.
https://colab.research.google.com/drive/1p6rhi9g0ughtGBojnCJcPVRRNqziuk3K?usp=sharing
see this colab notebook.

I too encounter this issue.
Yes it works fine in Colab, along with some warning but when I run on my Ubuntu server, with the warning it get struck.

@kbrajwani
Copy link
Owner

Hey @Jeyandranath , can you please share some logs from where the process stuck. Also can you share the resume on which it stuck.

@Jeyandranath
Copy link

Hey ,
make sure you have installed correct spacy==2.3.5 and en_core_web_sm==2.3.1 version.
https://colab.research.google.com/drive/1p6rhi9g0ughtGBojnCJcPVRRNqziuk3K?usp=sharing
see this colab notebook.

I too encounter this issue.
Yes it works fine in Colab, along with some warning but when I run on my Ubuntu server, with the warning it get struck.

Tested in Windows, Works fine with the warning below :
UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.5). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)

data = resumeparse.read_file('hello.pdf')
2021-03-21 00:40:45,448 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar to C:\Users\CHARUJ1\AppData\Local\Temp\tika-server.jar.
2021-03-21 00:41:16,323 [MainThread ] [INFO ] Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.24/tika-server-1.24.jar.md5 to C:\Users\CHARUJ
1\AppData\Local\Temp\tika-server.jar.md5.
2021-03-21 00:41:19,471 [MainThread ] [WARNI] Failed to see startup log message; retrying...
2021-03-21 00:41:24,493 [MainThread ] [WARNI] Failed to see startup log message; retrying...
print(data)
{'email': '[email protected]', 'phone': '+91-98845-92980', 'name': 'SHRAVAN KUMAR', 'total_exp': 4, 'university': [], 'designition': ['finance analyst', 'operations tech', 'deputy manager'], 'degree': ['B.Com Degree'], 'skills': ['Known: Tamil', ' English', ' and Tulu', 'Present Address: 22 Vijayalakshmi Avenue', 'Poonamallee', ' Chennai-56'], 'Companies worked at': ['92980', 'SAP', 'Hyundai Motor India Ltd', 'Hyundai Motor India Ltd.']}

@Jeyandranath
Copy link

Hey @Jeyandranath , can you please share some logs from where the process stuck. Also can you share the resume on which it stuck.
After this Warning in Ubuntu:
hello.pdf

UserWarning: [W031] Model 'en_training' (0.0.0) requires spaCy v2.2 and is incompatible with the current spaCy version (2.3.5). This may lead to unexpected results or runtime errors. To resolve this, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
warnings.warn(warn_msg)

@Jeyandranath
Copy link

I think Java is the issue...

@RohitJacob
Copy link

There is no file in the path resume_parser\degree\model\ called config.cfg - even on the github repository. What are the contents of the config.cfg?

@GuidoBartoli
Copy link

GuidoBartoli commented Apr 16, 2021

Yep, same problem here within a Python 3.8 virtual environment (I followed the official installation instructions from here):

>>> from resume_parser import resumeparse
/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py:715: UserWarning: [W094] Model 'en_training' (0.0.0) specifies an under-constrained spaCy version requirement: >=2.2.4. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.0.5,<3.1.0
  warnings.warn(warn_msg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/__init__.py", line 1, in <module>
    from resume_parser.resumeparse import resumeparse
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/resumeparse.py", line 50, in <module>
    custom_nlp2 = spacy.load(os.path.join(base_path,"degree","model"))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 324, in load_model
    return load_model_from_path(Path(name), **kwargs)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 388, in load_model_from_path
    config = load_config(config_path, overrides=dict_to_dot(config))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 545, in load_config
    raise IOError(Errors.E053.format(path=config_path, name="config.cfg"))
OSError: [E053] Could not read config.cfg from /home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/degree/model/config.cfg

That config file does not actually exist in that position, but if it is located in another position, I can move it there. Where it is and what should it contain?

@GuidoBartoli
Copy link

After some experiments, I managed to find the config.cfg file inside my virtual environment (it was located inside ~/.virtualenvs/rsm/lib/python3.8/site-packages/en_core_web_sm/en_core_web_sm-3.0.0), so I copied it to the folder required by resume_parser, so the previous error was solved, but another one appears:

>>> from resume_parser import resumeparse
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/__init__.py", line 1, in <module>
    from resume_parser.resumeparse import resumeparse
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/resumeparse.py", line 50, in <module>
    custom_nlp2 = spacy.load(os.path.join(base_path,"degree","model"))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 324, in load_model
    return load_model_from_path(Path(name), **kwargs)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 390, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1863, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 1174, in from_disk
    reader(path / key)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1849, in <lambda>
    deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(
  File "spacy/tokenizer.pyx", line 740, in spacy.tokenizer.Tokenizer.from_disk
  File "spacy/tokenizer.pyx", line 803, in spacy.tokenizer.Tokenizer.from_bytes
  File "spacy/tokenizer.pyx", line 570, in spacy.tokenizer.Tokenizer._load_special_cases
  File "spacy/tokenizer.pyx", line 589, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for '	'. Tokenizer exceptions are only allowed to specify ORTH and NORM.

This is harder to understand... do you have any suggestions?

@kbrajwani
Copy link
Owner

Hey please make sure your requirements are matched like this spacy==2.3.5 and en_core_web_sm==2.3.1 .
config.cfg is spacy configuration file it will be downloaded when we install en_core_web_sm package.
I will try to update model as i get some time.
Thanks

@ranyaphat29
Copy link

I have the same problems like this and I've installed library following requirements but its doesn't work for me.

@bharath-ts
Copy link

I have faced the same issue of runtime stuck while importing resume_parser (with spacy 2.3.5 and en_core_web_sm 2.3.1). Even the colab notebook also got stuck at same code execution. Could you fix this issue or let us know what is the reason for this issue?

@1zineb
Copy link

1zineb commented Apr 29, 2021

After some experiments, I managed to find the config.cfg file inside my virtual environment (it was located inside ~/.virtualenvs/rsm/lib/python3.8/site-packages/en_core_web_sm/en_core_web_sm-3.0.0), so I copied it to the folder required by resume_parser, so the previous error was solved, but another one appears:

>>> from resume_parser import resumeparse
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/__init__.py", line 1, in <module>
    from resume_parser.resumeparse import resumeparse
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/resume_parser/resumeparse.py", line 50, in <module>
    custom_nlp2 = spacy.load(os.path.join(base_path,"degree","model"))
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/__init__.py", line 47, in load
    return util.load_model(name, disable=disable, exclude=exclude, config=config)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 324, in load_model
    return load_model_from_path(Path(name), **kwargs)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 390, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1863, in from_disk
    util.from_disk(path, deserializers, exclude)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/util.py", line 1174, in from_disk
    reader(path / key)
  File "/home/bartoli/.virtualenvs/rsm/lib/python3.8/site-packages/spacy/language.py", line 1849, in <lambda>
    deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(
  File "spacy/tokenizer.pyx", line 740, in spacy.tokenizer.Tokenizer.from_disk
  File "spacy/tokenizer.pyx", line 803, in spacy.tokenizer.Tokenizer.from_bytes
  File "spacy/tokenizer.pyx", line 570, in spacy.tokenizer.Tokenizer._load_special_cases
  File "spacy/tokenizer.pyx", line 589, in spacy.tokenizer.Tokenizer._validate_special_case
ValueError: [E1005] Unable to set attribute 'POS' in tokenizer exception for '	'. Tokenizer exceptions are only allowed to specify ORTH and NORM.

This is harder to understand... do you have any suggestions?

I have the same issue . Do you have any suggestions please?

@kbrajwani
Copy link
Owner

I have faced the same issue of runtime stuck while importing resume_parser (with spacy 2.3.5 and en_core_web_sm 2.3.1). Even the colab notebook also got stuck at same code execution. Could you fix this issue or let us know what is the reason for this issue?

i have also encounter this. can you please check in local by installing the same way installation done in colab.
i will solve it as i will get time.

@kbrajwani
Copy link
Owner

Hey guys, i have solved it in colab notebook .
If you want to install it in local please follow the steps below.

  1. Create a new python environment https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
  2. Install library pip install resume-parser
  3. Install en_core_web_sm pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz
  4. Install importlib-metadata pip install importlib-metadata==3.2.0

Now you can use the library.

@sz332
Copy link

sz332 commented Dec 9, 2021

I had some issues to understand correctly the steps, so here are my additions to @kbrajwani -s comments.

  1. Follow his description
  2. From python you MUST execute the nltk.download() commands which will download the necessary data. This is something I totally missed.
  3. Install java on the machine. The library uses apache tika, which is written in java and is able to get the content from a pdf file very nicely so parsing will be more more efficient.
  4. Try to use python 3.8, I had some issues with 3.9 and 3.10
  5. Try to use linux. On windows, I had compilation issues.

@kbrajwani
Copy link
Owner

Thanks @sz332 For sharing your experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants