Workgroup:DataScience

From NixOS Wiki
Revision as of 21:30, 2 December 2019 by imported>Ariutta (Reference unmerged pull request)

This workgroup is dedicated towards improving the state of the data science stack in Nixpkgs. This includes work on packages and modules for scientific computation, artificial intelligence and data processing, as well as data science IDEs.


JupyterLab

Work was done (but not merged) to easily deploy arbitrary kernels and jupyter extensions with nix. There are some limitations due to jupyterlab extensions relying heavily on npm and webpack to compile the javascript modules. Thus an unpure setup is easiest to get it working. The following `default.nix` shell will install 20 jupyerlab extension + 4 kernels (c, python, go, and ansible). All that needs to be edited by user is kernels, additionalExtensions, and buildInputs. The rest is automatic and will launch a jupyterlab instance for you.

{ pkgs ? import <nixpkgs> {}, pythonPackages ? pkgs.python36Packages }:

let kernels = [
      pkgs.python36Packages.ansible-kernel
      pythonPackages.jupyter-c-kernel
      pkgs.gophernotes
    ];

    additionalExtensions = [
      "@jupyterlab/toc"
      "@jupyterlab/fasta-extension"
      "@jupyterlab/geojson-extension"
      "@jupyterlab/katex-extension"
      "@jupyterlab/mathjax3-extension"
      "@jupyterlab/plotly-extension"
      "@jupyterlab/vega2-extension"
      "@jupyterlab/vega3-extension"
      "@jupyterlab/xkcd-extension"
      "jupyterlab-drawio"
      "@jupyterlab/hub-extension"
      "jupyterlab_bokeh"
    ];
in
pkgs.mkShell rec {
  buildInputs = [
    ### Base Packages
    pythonPackages.jupyterlab pkgs.nodejs

    ### Extensions
    pythonPackages.ipywidgets
    pythonPackages.ipydatawidgets
    pythonPackages.ipywebrtc
    pythonPackages.pythreejs
    pythonPackages.ipyvolume
    pythonPackages.jupyterlab-git
    pythonPackages.jupyterlab-latex
    pythonPackages.ipyleaflet
    pythonPackages.ipympl
  ] ++ kernels;

  shellHook = ''
    TEMPDIR=$(mktemp -d -p /tmp)
    mkdir -p $TEMPDIR
    cp -r ${pkgs.python36Packages.jupyterlab}/share/jupyter/lab/* $TEMPDIR
    chmod -R 755 $TEMPDIR
    echo "$TEMPDIR is the app directory"

    # kernels
    export JUPYTER_PATH="${pkgs.lib.concatMapStringsSep ":" (p: "${p}/share/jupyter/") kernels}"

# labextensions
${pkgs.stdenv.lib.concatMapStrings
     (s: "jupyter labextension install --no-build --app-dir=$TEMPDIR ${s}; ")
     (pkgs.lib.unique
       ((pkgs.lib.concatMap
           (d: pkgs.lib.attrByPath ["passthru" "jupyterlabExtensions"] [] d)
           buildInputs) ++ additionalExtensions))  }
jupyter lab build --app-dir=$TEMPDIR

# start jupyterlab
jupyter lab --app-dir=$TEMPDIR
  '';

}

Some recent examples of work done on libraries:


There has also been notable work on the data science infra :


with such highlights as @aborsu's Jupyter kernels written in Nix:

./modules/datasci.nix
...
  python3kernel = let

   env = (pkgs.python3.withPackages
     (pythonPackages: with pythonPackages; [
       ipykernel
       pandas
       scikitlearn
       ]));
  
  in {

    displayName = "Python 3 for machine learning";

    argv = [
      "$ {env.interpreter}"
      "-m"
      "ipykernel_launcher"
      "-f"
      "{connection_file}"
    ];
    language = "python";
    logo32 = "$ {env.sitePackages}/ipykernel/resources/logo-32x32.png";
    logo64 = "$ {env.sitePackages}/ipykernel/resources/logo-64x64.png";
  };
...


It looks like NixOS is well on its way to becoming a solid data science platform; the reproducible and language agnostic approach is a natural match to the task. But perhaps a coordinated effort be fruitful step up the game?

Lets continue the discussion here and at #nixos-data.

Channels

#nixos-data on Freenode

People

Ixxie