Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INVM Crawling for Packages, etc. does not work with dockerized crawler #376

Open
canturkisci opened this issue Aug 9, 2018 · 4 comments

Comments

@canturkisci
Copy link
Member

Description

Build a dockerized crawler and try to use it to inspect the host itself (INVM mode). The crawler will run, but it will keep inspecting itself, i.e., the container.

How to Reproduce

Build the container

sudo docker build -t crawler .

Run crawler to just collect packages and do a simple count to test

$ sudo docker run --privileged --net=host --pid=host -v /cgroup:/cgroup:ro -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/lib/docker:/var/lib/docker:ro -v /var/run/docker.sock:/var/run/docker.sock -v $PWD/output:/crawler/output -it crawler --features package | wc -l

429 (428 pkgs + 1 metadata line)

DO the same from host:

$ dpkg -l | wc -l

598

Do the same directly from the crawler container:

$ sudo docker run -it --entrypoint /bin/bash crawler
root@bc94958a3b78:/crawler# dpkg -l | wc -l

433 (428 pkgs + 5 header lines)

What to do

I am not sure what we could be doing about this. If we want to run the dockerized version also for hosts, the crawler needs to scan host pkg contents.

@sahilsuneja1
Copy link
Contributor

Thanks Canturk!
If dockerized crawl for host is desired, we would need to mount the host's /etc etc. inside the container.
And then inform the crawler to look at the new mountpoint to crawl packages.

@canturkisci
Copy link
Member Author

I am not sure about this. We would ideally expect parity between native vs. dockerized, but this would not be the key feature of dockerized crawler.

Can you please tell me how complicated this seems to you? Is it possible to just share the fs of host; container only contains the process but shares most of the fs? just like you do w pid, etc. namespaces?

@canturkisci
Copy link
Member Author

If gets complicated, alternative is to document it so people dont expect the INVM behavior to be the same either mode.

@sahilsuneja1
Copy link
Contributor

sahilsuneja1 commented Aug 10, 2018

It would depend upon the feature. For package for example, we can directly mount host's /var/lib/dpkg inside the crawler container and it should work.
For files we would need to mount etc.
Its doable, one would need to go over all features individually to ensure parity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants