You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a workspace in Databricks that I use to parse large .pdf files into lists of langchain documents, then I store those lists in a dictionary associated with the name of the .pdf they came from like so:
I need to use these preprocessed documents in my AI application. So I pickle the dictionary in Databricks, and then move the pickle file to the Docker container in which my app runs and unpickle it. Then I can access the langchain docs associated with those .pdfs.
This has worked fine for months. But on Friday I had to parse some new .pdfs so I went through the same process. This time, however, when I attempt to unpickle the dictionary, I get this error:
"/usr/local/lib/python3.11/site-packages/pydantic/v1/main.py", line 417, in __setstate__
object_setattr(self, '__fields_set__', state['__fields_set__'])
~~~~~^^^^^^^^^^^^^^^^^^
KeyError: '__fields_set__'
Now, the dictionary is not created using Pydantic, and the process works if instead of having a list of langchain documents in thr dictionary, I just have a list of strings. So my guess is that langchain is using Pydantic and that is how Pydantic gets involved. My research suggested that this error may be due to different Python versions or different pickle versions or different Pydantic versions. To address this, I have set the Python version in Databricks and the Docker container to 3.9.19 (and then I also tried 3.11). In both cases, the pickle version is 4.0. The Pydantic version is 2.9.2 in both environments.
This was working fine for months. The only thing that I can think that changed is that I could no longer import chromadb from langchain.vectorstores. I import it from langchain_community.vectorstores.
Can anyone offer any other things I can try?
System Info
The platform in databricks is linux 5.15 Azure os.
The platform in my Docker container is Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.36
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Checked other resources
Commit to Help
Example Code
Description
I have a workspace in Databricks that I use to parse large .pdf files into lists of langchain documents, then I store those lists in a dictionary associated with the name of the .pdf they came from like so:
I need to use these preprocessed documents in my AI application. So I pickle the dictionary in Databricks, and then move the pickle file to the Docker container in which my app runs and unpickle it. Then I can access the langchain docs associated with those .pdfs.
This has worked fine for months. But on Friday I had to parse some new .pdfs so I went through the same process. This time, however, when I attempt to unpickle the dictionary, I get this error:
Now, the dictionary is not created using Pydantic, and the process works if instead of having a list of langchain documents in thr dictionary, I just have a list of strings. So my guess is that langchain is using Pydantic and that is how Pydantic gets involved. My research suggested that this error may be due to different Python versions or different pickle versions or different Pydantic versions. To address this, I have set the Python version in Databricks and the Docker container to 3.9.19 (and then I also tried 3.11). In both cases, the pickle version is 4.0. The Pydantic version is 2.9.2 in both environments.
This was working fine for months. The only thing that I can think that changed is that I could no longer import chromadb from langchain.vectorstores. I import it from langchain_community.vectorstores.
Can anyone offer any other things I can try?
System Info
The platform in databricks is linux 5.15 Azure os.
The platform in my Docker container is Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.36
Beta Was this translation helpful? Give feedback.
All reactions