OpenScienceWorkshop2019

Welcome to the Open Science Workshop Github Repository! You can find the slides for the workshop on https://zenodo.org/record/3550236).

This repository contains some simple examples of how code, specifically, Jupyter Notebooks, integrate with Github and and be run online.

This repository has two demonstrations for hosting code online (see below).

Structure of a Respository

In general, there are several key files you will tend to see over and over in GitHub Repositories:

README.MD file
requirements.txt
LICENSE
.gitignore

The README.MD file is a description of your repository (more on that below).

The requirements.txt file is typically a list of python packages that the repository can use. These are typically compatable with the pip package mananger (documentation here), but the important point is to make sure that somewhere in the repository, it is clear exactly what is needed to run your code. The specifics of that are less important.

The LICENSE file is common, but generally not necessary for smaller projects. I've chosen the MIT license, and GitHub makes it easy to choose among several. There is more information here, but generally this is something you don't need to worry too much about unless you are building software.

The .gitignore file list all of the files that you don't want to include in the reposistory. A lot of programs create temporary and/or hidden files (often starting with a '.') that can clutter a repository. The .gitignore specifies files to ignore for the purposes of version control.

What can you host in GitHub?

(Almost) anything! GitHub suggests that repositories are less than 1GB and limits all files to a maximum of 100 MB. It's a great place for non-sensitive data, analyses, simulations, presentations, websites, etc.

Use your Github for whatever you want. It is an easy way to share files, with special tools for code, and specifically, python code.

How is this different from git?

GitHub is an online repository for gits, which is really just a project that uses git for version control. The basic features of both git and GitHub are very useful and simple to learn. Advanced users will find both to be very powerful and flexible programs, but don't let these complex features scare you off! The simplest use case for GitHub is a place to store data and code. The simplest use case for git is a way of tracking the changes you've made to a project.

There are a number of resources online about both. Here are a couple of helpful links to get you started:

Documenting your Github

What you are reading now is the README.MD file. This file is written in Markdown, a simplified HTML-based langauge that allows you to quickly and easily write a website. With markdown, you can quickly add links, basic formatting, even code. Github will automatically render any file named README.MD in the body of the respository. You can see the raw markdown for this README.MD file: raw markdown.

Here is what code looks like rendered:

def my_fun(a, b):
    return a + b

and here's a fun photo of me and a llama:

The point is you can do a lot with Markdown very easily. Your README.MD file can be as expressive as you like, and having a well documented and detailed README.MD file can really help make your work intelligable to someone else.

Here is a good guide on writting Markdown in README.MD files: Mastering Markdown. Markdown

Hosting code online

You have a couple of options that allow you to host the code in your repository online so that others can use the code without going through a time-consuming installation process. Making sure that your code will run on someone else's computer can be a huge pain point, but luckily, there are two good and easy solutions for hosting python code within Jupyter notebooks: Binder and Google Colab

Binder

Binder is a resource for running Jupyter notebooks online. It integrates directly with github and will install any packages listed in your requirements.txt. It creates a Docker image of your repository and allows you to run your code online.

We've created an example notebook to use with Binder. This example looks at the problem of change-points and how a simple reinforcement learning model handels them:

Google Colab

Google Colab is simliar, but more powerful. Colab has many packages pre-installed and comes with access to GPUs, making it ideal for sharing neural networks.

Below, we have a tutorial on writting code that's user-friendly and with some helpful functions in Colab.

Deep Learning in Colab

One of the nice things about Colab is that you can use GPUs for free to run distributed computing software like Tensorflow and PyTorch. Google has a number of excelent tutorials for this in the main page (here), but here is demonstration a Variational Autoencoder processing handwritten digits: link

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
JupyterNotebooks		JupyterNotebooks
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenScienceWorkshop2019