Using your Jupyter Notebook app

Once you have created your Jupyter Notebook app, you are ready to start using it. In this article, you will find some guidelines on how to install packages and access workspace files from your Python and R kernels.

Python kernel

Package management

The Python kernel comes with some of the most common data science packages pre-installed: NumPy, pandas, sklearn, dask, scipy, seaborn, and many more. To see the full list, run pip list in your Python notebook.

To install new packages, you should use conda:

!conda install package_name -y

This will install a package in the Python folder in the workspace files directory. To import and use the package in the same Jupyter session, you will need to restart the kernel (KernelRestart) or restart the Mini-app.

Please note that packages installed using conda without the ! prefix or using pip will not be retained once you close the session.

Accessing files

The Jupyter Notebook working directory is the location of your notebook file. You can print this directory in your notebook using the following command:

import os
print(os.getcwd())

In this example, the notebook is saved in scripts folder in the workspace files.

For example, if you wanted to open a CSV file located in your workspace files as a pandas DataFrame, you would use:

import pandas as pd    
data = pd.read_csv('/home/workspace/files/data.csv')

Please note that access to Blobs is currently not supported.

Accessing datasets

The database tables stored in the Datasets tab can be accessed using the psycopg2 package. The module provides functionality for connecting to the PostgreSQL server and performing SQL queries using Python programming language.

To connect to the workspace database:

import psycopg2 as p

# Connect to the workspace database
conn = p.connect("")
# Open a cursor to perform DB operations
cur = conn.cursor()

Run your SQL query. This will select all the columns from the breast_cancer dataset, and then will retrieve the first two rows as Python objects:

# Query the database and obtain data as Python objects
cur.execute("SELECT * FROM breast_cancer;")
cur.fetchmany(2)

Close the communication to the database:

# Close connection
cur.close()
conn.close()

For more information, please see the psycopg2 package documentation.

If you do not want to work with the workspace database directly, you could also try converting your dataset to a CSV file first and then reading that file in.

R kernel

Package management

The R kernel comes with a few pre-installed packages such as dplyr, knitr, RPostgreSQL, ggplot2, and ggvis. To see the full list, run installed.packages() in your R notebook.

To install a new package, simply run:

install.packages('package_name')

This will install a package in the /R/version_number/ folder in the workspace files directory. Once a package is installed, it can be loaded anytime, even after if the app has restarted.

Accessing files

The Jupyter Notebook working directory is the location of your notebook file. You can print this directory in your notebook using getwd() command.

In this example, the notebook is saved in the scripts folder in the workspace files.

For example, if you wanted to import data from a CSV file located in your workspace files into a data frame, you would use:

data <- read.csv('/home/workspace/files/data.csv')

Please note that access to Blobs is currently not supported.

Updated on July 28, 2021

Was this article helpful?

Related Articles

Not the solution you were looking for?
Click the link below to submit a support ticket
CONTACT SERVICE DESK