Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build/Install issues on x86_64 Linux #1400

Open
dushankw opened this issue Jun 17, 2024 · 5 comments
Open

Build/Install issues on x86_64 Linux #1400

dushankw opened this issue Jun 17, 2024 · 5 comments

Comments

@dushankw
Copy link

dushankw commented Jun 17, 2024

Describe the bug
Presidio is failing to build/install against Python 3.11 (officially supported per docs) and 3.12 on x86_64 Linux

Having tried both spaCy and Stanza as per https://microsoft.github.io/presidio/installation/ I am always encountering the following issue, seemingly a version incompatibility between numpy and something else (probably a compiled library lower down in the import graph).

ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

I have replicated the same issue in a clean container using the official Python upstream image

Thank you for looking into it 🙏

To Reproduce

  1. Create the following Dockerfile
$ cat Dockerfile 
FROM docker.io/library/python:3.11.9
RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg
  1. Build it
$ podman build .
STEP 1/2: FROM docker.io/library/python:3.11.9
STEP 2/2: RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg
Collecting presidio_analyzer
  Downloading presidio_analyzer-2.2.354-py3-none-any.whl.metadata (2.6 kB)
Collecting spacy<4.0.0,>=3.4.4 (from presidio_analyzer)
  Downloading spacy-3.7.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)
<CUT_FOR_BREVITY>
  1. Observe the error towards the end of the build (NOTE: the warning about running as root and the venv is noise as this is in a container)
Installing collected packages: pycryptodome, presidio_anonymizer
Successfully installed presidio_anonymizer-2.2.354 pycryptodome-3.20.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Traceback (most recent call last):
  File "<frozen runpy>", line 189, in _run_module_as_main
  File "<frozen runpy>", line 148, in _get_module_details
  File "<frozen runpy>", line 112, in _get_module_details
  File "/usr/local/lib/python3.11/site-packages/spacy/__init__.py", line 6, in <module>
    from .errors import setup_default_warnings
  File "/usr/local/lib/python3.11/site-packages/spacy/errors.py", line 3, in <module>
    from .compat import Literal
  File "/usr/local/lib/python3.11/site-packages/spacy/compat.py", line 39, in <module>
    from thinc.api import Optimizer  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/thinc/api.py", line 1, in <module>
    from .backends import (
  File "/usr/local/lib/python3.11/site-packages/thinc/backends/__init__.py", line 17, in <module>
    from .cupy_ops import CupyOps
  File "/usr/local/lib/python3.11/site-packages/thinc/backends/cupy_ops.py", line 16, in <module>
    from .numpy_ops import NumpyOps
  File "thinc/backends/numpy_ops.pyx", line 1, in init thinc.backends.numpy_ops
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject
Error: building at STEP "RUN pip install presidio_analyzer && pip install presidio_anonymizer && python -m spacy download en_core_web_lg": while running runtime: exit status 1

Note: Using Stanza instead of spaCy we are able to successfully build the container (install the libraries), but we hit the same error as soon as we try to use the library, eg:

from presidio_analyzer import AnalyzerEngine

Will trigger the same error

Expected behavior
Able to install the library, import it and run the demo code (https://microsoft.github.io/presidio/getting_started/)

Screenshots
N/A

Additional context
Looking at the official Docker image, it seems 3.9 is being used

$ podman run --rm -it mcr.microsoft.com/presidio-analyzer bash
root@d9fed78f0a52:/usr/bin/presidio-analyzer# python -V
Python 3.9.19

Trying to build against this exact version of Python yields the same error

@codingbandit
Copy link

We began seeing this issue in the past day or two as well. Following.

@omri374
Copy link
Contributor

omri374 commented Jun 18, 2024

Thanks for posting. Looks like an issue been spacy and numpy. Consider trying to pip install numpy as well

@JosephCatrambone
Copy link

A new version of numpy got released two days ago. I had some luck pinning numpy==1.26.4.

@omri374
Copy link
Contributor

omri374 commented Jun 19, 2024

Root cause: explosion/thinc#939

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants