-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a xrootd demo/link to a demo for very large files #2
Comments
cc @matthewfeickert, apparently I can't assign you |
I was originally going to suggest just fixing up https://github.com/cernphsft/rootbinder so that the Dockerfile built again and worked with Binder, but I then remembered that XRootD is only supported on SLC, Solaris, and MacOS so I should probably just start with getting an SLC6 image that has ROOT installed on it and then installing xRootD in the ROOT build process (assuming that this image would build ROOT from source). |
Still need to make a repo that actually demos xRootD and CERN Open Data, but this PR at least offers a Binder compatible image with ROOT. |
Some potential datasets: https://root-forum.cern.ch/t/public-dataset-available-via-xrootd/31535 |
@betatim Are there restrictions on how services inside of a Binder image can access the outside world? I have two
in them (albeit the transfer is a bit slow). However, if I launch one of those images and run the above they just hang and never start the transfer. Do I need to tell Binder some additional information? |
Oh yes, if I had read.
So I guess I need to figure out how to properly configure XRootD with HTTPS |
https://github.com/jupyterhub/mybinder.org-deploy/blob/72ed9d319b394981630ed1df392f499f3dc7cf8b/mybinder/values.yaml#L38 so xrootd should be possible. |
Hm, very strange. I'll have to poke at this more then, as if I do a
then things work as expected
So there is some issue with actually being deployed on BinderHub it seems. :? |
This is weird. It seems like it has already transferred a whole bunch of data and then fails because of a certificate validation error? Maybe @minrk has an idea. |
Oh no, sorry, it doesn't fail. I just copied and pasted that as it was running (I haven't let it download the full dataset because I didn't want to wait, but it happily downloaded over 1.5 GB before I hit
is just a warning, it doesn't stop the transfer in anyway that I can tell. The strange thing to me is that this |
Hi @matthewfeickert @betatim , I would like to warm up this discussion since this is the key point to make the open data analysis at CERN available through binder. I've set up a minimal example using the conda build of ROOT which is coming with XRootD and a minimal analysis example: https://github.com/stwunsch/test-root-binder There's anything that could be done to make this happen? I see that you've already opened the port but I see as well the hanging xrdcp root://eospublic.cern.ch//eos/root-eos/cms_opendata_2012_nanoaod/Run2012B_DoubleMuParked.root . |
To make the topic probably even more attractive: I plan to push the "technical" tutorials from ROOT like this one or this one to the CERN Open Data portal, which could be powered by CERN's jupyter service SWAN. However, this is restricted to CERN users only. Using binder, these analyses could be run by everyone just out-of-the-box, which is obviously the way to go for open data and open analyses! |
From running with
This makes me think there are more ports that need opening. You could make a PR that adds 1095 as port here. Maybe you can find someone at CERN/one of the experiments who can tell us if these are all the ports or if there are more. |
Thanks for the information! I'm going to look for someone ;) |
Two things:
iptables --flush
# Lock everything down
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP
# Open DNS + XRootD
iptables -A INPUT --proto udp --sport 53 -j ACCEPT
iptables -A INPUT --proto tcp --sport 1094 -j ACCEPT
iptables -A INPUT --proto tcp --sport 1095 -j ACCEPT
# Open DNS + XRootD
iptables -A OUTPUT --proto udp --dport 53 -j ACCEPT
iptables -A OUTPUT --proto tcp --dport 1094 -j ACCEPT
iptables -A OUTPUT --proto tcp --dport 1095 -j ACCEPT
# View the rules
iptables -L -v -n |
jupyterhub/mybinder.org-deploy#973 opened the port. Should be live in ~10minutes. Thanks for finding out Chris! |
Many thanks @chrisburr ! |
With this my Binder built The goal here was to create add a demo of XRootD, so as The Binder I currently have is to get the (seemingly abandoned) JupyROOT examples to work again if people wanted to have a C++ kernel rather than a Python3 kernel with PyROOT, but the notebooks can get adapted. |
Given that this repo won't be able to demo all the possible "access really large data" options I'd make a new repo and link to it from here. re: C++ kernels, I think https://github.com/QuantStack/xeus-cling is the future there so I would make a pure Python example, instead of using JupyROOT which does weird stuff. The example can even use uproot to read the file. It is, after all, about demo'ing how to access large datasets, where the whole ROOT thing (and its weird ways of doing notebooks) is more of a distraction IMHO. |
@betatim Sounds good! Since I don't have the ability to make repos under
🚀 We're in agreement. I wasn't proposing using JupyROOt (just explaining why it was the way I had it so far).
I'm an |
created https://github.com/binder-examples/getting-data-xroot. In your first PR can you also add a |
Regrading the C++ kernel: I don't know whether I got you wrong but adding ROOT from conda-forge in the |
Done with binder using the environment from here: |
Here are some demo notebooks we use for teaching ROOT with C++ and Python, which are using XRootD and run now nicely with binder: https://github.com/stwunsch/root_dataframe_tutorial These notebooks show-case as well what you can do with the data mentioned above: |
We recently made it possible to talk to xrootd directly on mybinder.org. We should add/link to a repository that makes use of this (and preferably just this) to access very large files without having to download them first.
CERN has a lot of open-data that we could build on.
A starting point could be https://github.com/lukasheinrich/atlasbinder
The text was updated successfully, but these errors were encountered: