Administrator Guide

Notebook NYC-Taxi Set Up
29 RAPIDS Scaling on Dell EMC PowerEdge Servers
G Notebook NYC-Taxi Set Up
See below the steps to start the notebook server and the notebook example
1. Once within the container, start the Notebook Server on the host machine (this will run JupyterLab on
port 8888 on the host machine):
(rapids) root@container:/rapids/notebooks# bash utils/start-jupyter.sh
Note: To run JupyterLab on a different port, edit and modify the start-jupyter.sh file as below adding the
flag --port=<another_port>, and re-start the Notebook Server:
jupyter-lab --allow-root --ip=0.0.0.0 --no-browser --NotebookApp.token='' --port=<another_port>
2. To access Jupyter, open a browser with the url address:
http://<IP_local_host>:<port>/
The nyc-taxi notebook can be found in the following directory:
rapids/notebooks/contrib/intermediate_notebooks/E2E/taxi/NYCTaxi_E2E.ipynb
3. Modify the NYCTaxi_E2E.ipynb notebook and provide the data path in the volume mounted
previously:
base_path = '/home/dell/rapids/data/nyc-taxi/'
4. To run the data set on a specific year, proceed to comment the cells aimed to increase the data size
and limit the DataFrame to that year, example:
Limit the dataset to a specific year:
taxi_df = dask.dataframe.multi.concat([df_2015])
Include multiple years:
taxi_df = dask.dataframe.multi.concat([df_2014, df_2015, df_2016])