Workgroup:DataScience: Difference between revisions

From NixOS Wiki
imported>Ixxie
Created page with " This workgroup is dedicated towards improving the state of the data science stack in Nixpkgs. This includes work on packages and modules for scientific computation, artificia..."
 
m fix old link with latest nixpkgs master link
 
(25 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{outdated|Other than site-wide fixes, this page has not seen recent updates.}}
This workgroup is dedicated towards improving the state of the data science stack in Nixpkgs. This includes work on packages and modules for scientific computation, artificial intelligence and data processing, as well as data science IDEs.
=== JupyterLab ===
The [https://github.com/tweag/jupyterWith JupyterWith] repo "provides a Nix-based framework for the definition of declarative and reproducible Jupyter environments. These environments include JupyterLab - configurable with extensions - the classic notebook, and configurable Jupyter kernels."
Alternatively, [https://github.com/NixOS/nixpkgs/pull/49807 there is an unmerged pull request] with work to easily deploy arbitrary kernels and jupyter extensions with nix. There are some limitations due to jupyterlab extensions relying heavily on npm and webpack to compile the javascript modules. Thus an unpure setup was considered easiest to get it working. If the pull request were merged, the following `default.nix` shell would install 20 jupyerlab extension + 4 kernels (c, python, go, and ansible). All that would need to be edited by user would be kernels, additionalExtensions, and buildInputs. The rest would be automatic and would launch a jupyterlab instance for you.
<syntaxhighlight lang="nix">
{ pkgs ? import <nixpkgs> {}, pythonPackages ? pkgs.python36Packages }:
let kernels = [
      pkgs.python36Packages.ansible-kernel
      pythonPackages.jupyter-c-kernel
      pkgs.gophernotes
    ];
    additionalExtensions = [
      "@jupyterlab/toc"
      "@jupyterlab/fasta-extension"
      "@jupyterlab/geojson-extension"
      "@jupyterlab/katex-extension"
      "@jupyterlab/mathjax3-extension"
      "@jupyterlab/plotly-extension"
      "@jupyterlab/vega2-extension"
      "@jupyterlab/vega3-extension"
      "@jupyterlab/xkcd-extension"
      "jupyterlab-drawio"
      "@jupyterlab/hub-extension"
      "jupyterlab_bokeh"
    ];
in
pkgs.mkShell rec {
  buildInputs = [
    ### Base Packages
    pythonPackages.jupyterlab pkgs.nodejs
    ### Extensions
    pythonPackages.ipywidgets
    pythonPackages.ipydatawidgets
    pythonPackages.ipywebrtc
    pythonPackages.pythreejs
    pythonPackages.ipyvolume
    pythonPackages.jupyterlab-git
    pythonPackages.jupyterlab-latex
    pythonPackages.ipyleaflet
    pythonPackages.ipympl
  ] ++ kernels;
  shellHook = ''
    TEMPDIR=$(mktemp -d -p /tmp)
    mkdir -p $TEMPDIR
    cp -r ${pkgs.python36Packages.jupyterlab}/share/jupyter/lab/* $TEMPDIR
    chmod -R 755 $TEMPDIR
    echo "$TEMPDIR is the app directory"
    # kernels
    export JUPYTER_PATH="${pkgs.lib.concatMapStringsSep ":" (p: "${p}/share/jupyter/") kernels}"
# labextensions
${pkgs.lib.concatMapStrings
    (s: "jupyter labextension install --no-build --app-dir=$TEMPDIR ${s}; ")
    (pkgs.lib.unique
      ((pkgs.lib.concatMap
          (d: pkgs.lib.attrByPath ["passthru" "jupyterlabExtensions"] [] d)
          buildInputs) ++ additionalExtensions))  }
jupyter lab build --app-dir=$TEMPDIR
# start jupyterlab
jupyter lab --app-dir=$TEMPDIR
  '';
}
</syntaxhighlight>
Some recent examples of work done on libraries:
* [https://github.com/NixOS/nixpkgs/pulls?utf8=%E2%9C%93&q=is%3Apr+nlp+ nlp]
* [https://github.com/NixOS/nixpkgs/pulls?utf8=%E2%9C%93&q=is%3Apr+sklearn scikit-learn]
* [https://github.com/NixOS/nixpkgs/pulls?utf8=%E2%9C%93&q=is%3Apr+tensorflow tensorflow]
There has also been notable work on the data science infra :
* [https://github.com/NixOS/nixpkgs/pulls?utf8=%E2%9C%93&q=is%3Apr+jupyter Jupyter]
* [https://github.com/NixOS/nixpkgs/pull/38566 Jupyterlab package]
* [https://github.com/NixOS/nixpkgs/pulls?utf8=%E2%9C%93&q=is%3Apr+jupyterhub Jupyterhub]
with such highlights as [https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/services/development/jupyter/default.nix Jupyter kernels written in Nix]:
{{file|./modules/datasci.nix|nix|<nowiki>
...
  python3kernel = let
  env = (pkgs.python3.withPackages
    (pythonPackages: with pythonPackages; [
      ipykernel
      pandas
      scikitlearn
      ]));
 
  in {
    displayName = "Python 3 for machine learning";
    argv = [
      "$ {env.interpreter}"
      "-m"
      "ipykernel_launcher"
      "-f"
      "{connection_file}"
    ];
    language = "python";
    logo32 = "$ {env.sitePackages}/ipykernel/resources/logo-32x32.png";
    logo64 = "$ {env.sitePackages}/ipykernel/resources/logo-64x64.png";
  };
...
</nowiki>}}


This workgroup is dedicated towards improving the state of the data science stack in Nixpkgs. This includes work on packages and modules for scientific computation, artificial intelligence and data processing, as well as data science IDEs.
It looks like NixOS is well on its way to becoming a solid data science platform; the reproducible and language agnostic approach is a natural match to the task. But perhaps a coordinated effort be fruitful step up the game?
 
Lets continue the discussion here and at #nixos-data.
 
== Channels ==
 
[https://matrix.to/#/#datascience:nixos.org #datascience:nixos.org on Matrix]


== People ==
== People ==


[[User:Ixxie|Ixxie]]
[[User:Ixxie|Ixxie]]

Latest revision as of 06:08, 15 September 2024

This workgroup is dedicated towards improving the state of the data science stack in Nixpkgs. This includes work on packages and modules for scientific computation, artificial intelligence and data processing, as well as data science IDEs.

JupyterLab

The JupyterWith repo "provides a Nix-based framework for the definition of declarative and reproducible Jupyter environments. These environments include JupyterLab - configurable with extensions - the classic notebook, and configurable Jupyter kernels."

Alternatively, there is an unmerged pull request with work to easily deploy arbitrary kernels and jupyter extensions with nix. There are some limitations due to jupyterlab extensions relying heavily on npm and webpack to compile the javascript modules. Thus an unpure setup was considered easiest to get it working. If the pull request were merged, the following `default.nix` shell would install 20 jupyerlab extension + 4 kernels (c, python, go, and ansible). All that would need to be edited by user would be kernels, additionalExtensions, and buildInputs. The rest would be automatic and would launch a jupyterlab instance for you.

{ pkgs ? import <nixpkgs> {}, pythonPackages ? pkgs.python36Packages }:

let kernels = [
      pkgs.python36Packages.ansible-kernel
      pythonPackages.jupyter-c-kernel
      pkgs.gophernotes
    ];

    additionalExtensions = [
      "@jupyterlab/toc"
      "@jupyterlab/fasta-extension"
      "@jupyterlab/geojson-extension"
      "@jupyterlab/katex-extension"
      "@jupyterlab/mathjax3-extension"
      "@jupyterlab/plotly-extension"
      "@jupyterlab/vega2-extension"
      "@jupyterlab/vega3-extension"
      "@jupyterlab/xkcd-extension"
      "jupyterlab-drawio"
      "@jupyterlab/hub-extension"
      "jupyterlab_bokeh"
    ];
in
pkgs.mkShell rec {
  buildInputs = [
    ### Base Packages
    pythonPackages.jupyterlab pkgs.nodejs

    ### Extensions
    pythonPackages.ipywidgets
    pythonPackages.ipydatawidgets
    pythonPackages.ipywebrtc
    pythonPackages.pythreejs
    pythonPackages.ipyvolume
    pythonPackages.jupyterlab-git
    pythonPackages.jupyterlab-latex
    pythonPackages.ipyleaflet
    pythonPackages.ipympl
  ] ++ kernels;

  shellHook = ''
    TEMPDIR=$(mktemp -d -p /tmp)
    mkdir -p $TEMPDIR
    cp -r ${pkgs.python36Packages.jupyterlab}/share/jupyter/lab/* $TEMPDIR
    chmod -R 755 $TEMPDIR
    echo "$TEMPDIR is the app directory"

    # kernels
    export JUPYTER_PATH="${pkgs.lib.concatMapStringsSep ":" (p: "${p}/share/jupyter/") kernels}"

# labextensions
${pkgs.lib.concatMapStrings
     (s: "jupyter labextension install --no-build --app-dir=$TEMPDIR ${s}; ")
     (pkgs.lib.unique
       ((pkgs.lib.concatMap
           (d: pkgs.lib.attrByPath ["passthru" "jupyterlabExtensions"] [] d)
           buildInputs) ++ additionalExtensions))  }
jupyter lab build --app-dir=$TEMPDIR

# start jupyterlab
jupyter lab --app-dir=$TEMPDIR
  '';

}

Some recent examples of work done on libraries:


There has also been notable work on the data science infra :


with such highlights as Jupyter kernels written in Nix:

./modules/datasci.nix
...
  python3kernel = let

   env = (pkgs.python3.withPackages
     (pythonPackages: with pythonPackages; [
       ipykernel
       pandas
       scikitlearn
       ]));
  
  in {

    displayName = "Python 3 for machine learning";

    argv = [
      "$ {env.interpreter}"
      "-m"
      "ipykernel_launcher"
      "-f"
      "{connection_file}"
    ];
    language = "python";
    logo32 = "$ {env.sitePackages}/ipykernel/resources/logo-32x32.png";
    logo64 = "$ {env.sitePackages}/ipykernel/resources/logo-64x64.png";
  };
...


It looks like NixOS is well on its way to becoming a solid data science platform; the reproducible and language agnostic approach is a natural match to the task. But perhaps a coordinated effort be fruitful step up the game?

Lets continue the discussion here and at #nixos-data.

Channels

#datascience:nixos.org on Matrix

People

Ixxie