Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a xrootd demo/link to a demo for very large files #2

Open
betatim opened this issue Oct 11, 2018 · 25 comments
Open

Add a xrootd demo/link to a demo for very large files #2

betatim opened this issue Oct 11, 2018 · 25 comments

Comments

@betatim
Copy link
Member

betatim commented Oct 11, 2018

We recently made it possible to talk to xrootd directly on mybinder.org. We should add/link to a repository that makes use of this (and preferably just this) to access very large files without having to download them first.

CERN has a lot of open-data that we could build on.

A starting point could be https://github.com/lukasheinrich/atlasbinder

@betatim
Copy link
Member Author

betatim commented Oct 11, 2018

cc @matthewfeickert, apparently I can't assign you

@matthewfeickert
Copy link

I was originally going to suggest just fixing up https://github.com/cernphsft/rootbinder so that the Dockerfile built again and worked with Binder, but I then remembered that XRootD is only supported on SLC, Solaris, and MacOS so I should probably just start with getting an SLC6 image that has ROOT installed on it and then installing xRootD in the ROOT build process (assuming that this image would build ROOT from source).

@matthewfeickert
Copy link

Still need to make a repo that actually demos xRootD and CERN Open Data, but this PR at least offers a Binder compatible image with ROOT.

@betatim betatim changed the title Add a xrrotd demo/link to a demo for very large files Add a xrootd demo/link to a demo for very large files Oct 15, 2018
@betatim
Copy link
Member Author

betatim commented Nov 16, 2018

Some potential datasets: https://root-forum.cern.ch/t/public-dataset-available-via-xrootd/31535

@matthewfeickert
Copy link

@betatim Are there restrictions on how services inside of a Binder image can access the outside world?

I have two Dockerfiles for Binders with ROOT + XRootD on Ubuntu (Binder) and CentOS 7 (Binder). If I build those images with docker build locally I am able to run

xrdcp root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root .

in them (albeit the transfer is a bit slow). However, if I launch one of those images and run the above they just hang and never start the transfer. Do I need to tell Binder some additional information?

@matthewfeickert
Copy link

matthewfeickert commented Mar 1, 2019

Are there restrictions on how services inside of a Binder image can access the outside world?

Oh yes, if I had read.

There are a few restrictions on outgoing traffic from your Binder that are imposed by the team operating https://mybinder.org. Currently only connections to HTTP and Git are allowed.

So I guess I need to figure out how to properly configure XRootD with HTTPS

@betatim
Copy link
Member Author

betatim commented Mar 1, 2019

@matthewfeickert
Copy link

matthewfeickert commented Mar 1, 2019

Hm, very strange. I'll have to poke at this more then, as if I do a repo2docker build

jupyter-repo2docker --ref binder-ubuntu-root-with-xrootd https://github.com/matthewfeickert/rootbinder

then things work as expected

$ which xrdcp
/opt/xrootd/bin/xrdcp
$ xrdcp -v root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root ./data/
secgsi: unknown CA: cannot verify server certificate
[144MB/3.231GB][  4%][==>                                               ][4.645MB/s]

So there is some issue with actually being deployed on BinderHub it seems. :?

@betatim
Copy link
Member Author

betatim commented Mar 1, 2019

This is weird. It seems like it has already transferred a whole bunch of data and then fails because of a certificate validation error?

Maybe @minrk has an idea.

@matthewfeickert
Copy link

matthewfeickert commented Mar 1, 2019

It seems like it has already transferred a whole bunch of data and then fails because of a certificate validation error?

Oh no, sorry, it doesn't fail. I just copied and pasted that as it was running (I haven't let it download the full dataset because I didn't want to wait, but it happily downloaded over 1.5 GB before I hit ctrl+c). The

secgsi: unknown CA: cannot verify server certificate

is just a warning, it doesn't stop the transfer in anyway that I can tell.

The strange thing to me is that this repo2docker image seems happy to work, but when I try to run the same thing in a terminal when it has been deployed on BinderHub the transfer never starts. It just hangs.

@stwunsch
Copy link

Hi @matthewfeickert @betatim , I would like to warm up this discussion since this is the key point to make the open data analysis at CERN available through binder.

I've set up a minimal example using the conda build of ROOT which is coming with XRootD and a minimal analysis example: https://github.com/stwunsch/test-root-binder

There's anything that could be done to make this happen? I see that you've already opened the port but I see as well the hanging xrdcp in the terminal. The minimal reproducer is as follows:

xrdcp root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root .

@stwunsch
Copy link

To make the topic probably even more attractive: I plan to push the "technical" tutorials from ROOT like this one or this one to the CERN Open Data portal, which could be powered by CERN's jupyter service SWAN. However, this is restricted to CERN users only. Using binder, these analyses could be run by everyone just out-of-the-box, which is obviously the way to go for open data and open analyses!

@betatim
Copy link
Member Author

betatim commented May 21, 2019

From running with -d3 (for all the debug output) I see the following in the logs:

[2019-05-21 10:35:39.623404 +0000][Debug  ][PostMaster        ] Creating new channel to: p06636710s16574.cern.ch:1095 1 stream(s)

This makes me think there are more ports that need opening.

You could make a PR that adds 1095 as port here. Maybe you can find someone at CERN/one of the experiments who can tell us if these are all the ports or if there are more.

@stwunsch
Copy link

Thanks for the information! I'm going to look for someone ;)

@chrisburr
Copy link

Two things:

  • XRootD is now available from conda-forge
  • The minimal firewall configuration required to get it to work is (tested in a docker container):
iptables --flush
# Lock everything down
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP
# Open DNS + XRootD
iptables -A INPUT --proto udp --sport 53 -j ACCEPT
iptables -A INPUT --proto tcp --sport 1094 -j ACCEPT
iptables -A INPUT --proto tcp --sport 1095 -j ACCEPT
# Open DNS + XRootD
iptables -A OUTPUT --proto udp --dport 53 -j ACCEPT
iptables -A OUTPUT --proto tcp --dport 1094 -j ACCEPT
iptables -A OUTPUT --proto tcp --dport 1095 -j ACCEPT
# View the rules
iptables -L -v -n

@betatim
Copy link
Member Author

betatim commented May 23, 2019

jupyterhub/mybinder.org-deploy#973 opened the port. Should be live in ~10minutes. Thanks for finding out Chris!

@stwunsch
Copy link

Many thanks @chrisburr !

@stwunsch
Copy link

It's amazing!

x

@matthewfeickert
Copy link

With this my Binder built FROM rootproject/root-ubuntu16:6.12 works now too (c.f. Notebooks/XRootD_Example.ipynb): Binder 👍

The goal here was to create add a demo of XRootD, so as conda-forge has ROOT binaries that seem to be the latest release then @betatim does it make more sense in your mind to adapt and expand the notebook that I have and make a binder-examples controlled repo, or to just link to the tutorials that @stwunsch will make?

The Binder I currently have is to get the (seemingly abandoned) JupyROOT examples to work again if people wanted to have a C++ kernel rather than a Python3 kernel with PyROOT, but the notebooks can get adapted.

@betatim
Copy link
Member Author

betatim commented May 23, 2019

Given that this repo won't be able to demo all the possible "access really large data" options I'd make a new repo and link to it from here.

re: C++ kernels, I think https://github.com/QuantStack/xeus-cling is the future there so I would make a pure Python example, instead of using JupyROOT which does weird stuff.

The example can even use uproot to read the file. It is, after all, about demo'ing how to access large datasets, where the whole ROOT thing (and its weird ways of doing notebooks) is more of a distraction IMHO.

@matthewfeickert
Copy link

matthewfeickert commented May 24, 2019

Given that this repo won't be able to demo all the possible "access really large data" options I'd make a new repo and link to it from here.

@betatim Sounds good! Since I don't have the ability to make repos under binder-examples could you make an empty one that I can fork and then make a PR against with content? Or do you mean a new repo not controlled through binder-examples?

re: C++ kernels, I think https://github.com/QuantStack/xeus-cling is the future there so I would make a pure Python example, instead of using JupyROOT which does weird stuff.

🚀 We're in agreement. I wasn't proposing using JupyROOt (just explaining why it was the way I had it so far).

The example can even use uproot to read the file. It is, after all, about demo'ing how to access large datasets, where the whole ROOT thing (and its weird ways of doing notebooks) is more of a distraction IMHO.

I'm an uproot fanboy, so that's an excellent suggestion. :)

@betatim
Copy link
Member Author

betatim commented May 24, 2019

created https://github.com/binder-examples/getting-data-xroot. In your first PR can you also add a LICENSE like https://github.com/binder-examples/minimal-dockerfile/blob/master/LICENSE

@stwunsch
Copy link

Regrading the C++ kernel: I don't know whether I got you wrong but adding ROOT from conda-forge in the environment.yaml seems sufficient so that the C++ jupyter kernel is picked up. You can try https://github.com/stwunsch/test-root-binder and simply select the C++ kernel from there.

@stwunsch
Copy link

Done with binder using the environment from here:
x

@stwunsch
Copy link

Here are some demo notebooks we use for teaching ROOT with C++ and Python, which are using XRootD and run now nicely with binder: https://github.com/stwunsch/root_dataframe_tutorial

These notebooks show-case as well what you can do with the data mentioned above:

x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants