-
Notifications
You must be signed in to change notification settings - Fork 293
Raw vs Jupyter Kernels
What is the difference between raw
mode and jupyter
mode for kernels? What reasons did we have to make this difference?
The jupyter organization provides a server that allows for connecting to different kernels. Something like so:
In the jupyter extension, we call these jupyter kernels.
To use them, we just create a connection settings object that describes the URI for the server and pass it to the jupyterlab services API. The API will then allow us to:
- List existing kernels
- List existing sessions
- Start new kernels
- Execute code on those kernels
What then are raw kernels? This is the same diagram but with raw kernels:
Raw kernels means talking directly to the kernel process instead of going through the jupyter server.
Given this kernelspec:
{
"argv": ["python3", "-m", "ipykernel_launcher",
"-f", "{connection_file}"],
"display_name": "Python 3",
"language": "python"
}
A raw kernel will launch the kernel directly. This means:
- Compute open ports on the local machine (part of the
{connection_file}
argument above) - Compute the necessary environment (see more details here)
- Start the correct python process with the args in the
argv
setting - Connect using the zeromq package to the ports opened
- Create a raw websocket. This allows us to use the same jupyterlab services API to control the kernel.
- Use a patched version of the jupyterlab services API to connect to those ports. Patched because we have to do our own serialization of the messages as the kernel expects the messages serialized, but the zeromq library doesn't do that on its own.
whereas a jupyter kernel will:
- Compute a connection info for the server URI
- Connect using a normal Web socket to the URI and the 'name' of the kernel.
Why did we bother with this? All of those extra steps seem unnecessary when the jupyter lab
services API is so simple.
There are a number of reasons:
Reason | Description |
---|---|
Direct connection is faster | On windows, we found starting a python process takes up to 5 seconds. If we had to start jupyter first, that doubles the time. |
Less things to install. | IPython and IPykernel are necessary to get a python kernel up and running. This is a lot less stuff than installing jupyter into a python environment. |
Non python kernels. | Some kernels don't need python at all. We could support these kernels without having to install anything. There was a lot of customer feedback from other kernel owners around us having to install Jupyter to get our extension to work. |
Raw kernels fail faster. | Jupyter has its own logic to detect when a kernel goes down. By default it retries 5 times. This takes a lot longer than just having the process die as soon as we start it. |
Raw kernels die with more information. | Jupyter (especially remote) does not return errors all the time when kernels fail to start. In the raw kernel case we have a lot more information in the stderr from the kernel starting up. |
Raw kernels don't require adding custom kernelspecs for Jupyter to find. | Originally this was a problem because we thought we had to put them into the same directory as other kernelspecs, but now we can get Jupyter to find them elsewhere. But with raw we never need a kernelspec on disk. |
Raw kernels use less CPU and less memory. | There's no middleman process in the way. That middleman can also start other processes on the side to handle other things it doesn't need to do in our scenario. With raw, there is only the kernel process. |
Raw kernels don't need write permissions for folders | Jupyter requires kernelspecs and the location for a notebook to be writable |
Raw kernels don't write notebook checkpoint files | vscode-jupyter 6510 |
Raw kernels can use domain sockets. | Jupyter opens public ports on the machine. Note this was never implemented and is a theoretical advantage |
Raw kernels are easier to setup environment variables. | Since the jupyter server starts with an environment, all kernels share this environment unless the kernelspec has an override. This means for a jupyter kernel, the kernel spec has to be updated just before running a kernel. In the raw case, we just compute the environment in memory and pass it to the raw kernel |
Jupyter server flakiness | We consistently found startup issues when running jupyter in our CI tests. This seems to have calmed down, but anytime there's more code involved in a way to do something, there's more points of failure. Using raw eliminates a big chunk of code in the middle. |
We still support jupyter kernels for a number of reasons:
Reason | Description |
---|---|
ZeroMQ not supported everywhere | ZeroMQ uses native node modules, so we can't ship support for every OS out there. It also has some libstdc++ dependencies that may or may not be installed on a Unix machine. If we want to run everywhere, we need to use Jupyter sometimes |
Remote | We support connecting to a remote server URI to allow people to bring their own hardware. In fact this is the only way web support works at the moment |
- Contribution
- Source Code Organization
- Coding Standards
- Profiling
- Coding Guidelines
- Component Governance
- Writing tests
- Kernels
- Intellisense
- Debugging
- IPyWidgets
- Extensibility
- Module Dependencies
- Errors thrown
- Jupyter API
- Variable fetching
- Import / Export
- React Webviews: Variable Viewer, Data Viewer, and Plot Viewer
- FAQ
- Kernel Crashes
- Jupyter issues in the Python Interactive Window or Notebook Editor
- Finding the code that is causing high CPU load in production
- How to install extensions from VSIX when using Remote VS Code
- How to connect to a jupyter server for running code in vscode.dev
- Jupyter Kernels and the Jupyter Extension