Using your Jupyter Notebook app

Once you have created your Jupyter Notebook app, you are ready to start using it. In this article, you will find some guidelines on how to install packages and access workspace files from your Python and R kernels.

Python kernel

Package management

The Python kernel comes with some of the most common data science packages pre-installed: NumPy, pandas, sklearn, dask, scipy, seaborn, and many more. To see the full list, run pip list in your Python notebook.

To install new packages, you should use pip:

!pip install <package_name>

We are working on providing users with a way to persist these packages between App runs, but for now you will need to install your packages every time you start the App.

Accessing files

The Jupyter Notebook working directory is the location of your notebook file. You can print this directory in your notebook using the following command:

import os
print(os.getcwd())

In this example, the notebook is saved in scripts folder in the workspace files.

For example, if you wanted to open a CSV file located in your workspace files as a pandas DataFrame, you would use:

import pandas as pd    
data = pd.read_csv('/home/workspace/files/data.csv')

Please note that access to Blobs is currently not supported.

Accessing datasets

The database tables stored in the Datasets tab can be accessed using the psycopg2 package. The module provides functionality for connecting to the PostgreSQL server and performing SQL queries using Python programming language.

To connect to the workspace database:

import psycopg2 as p

# Connect to the workspace database
conn = p.connect("")
# Open a cursor to perform DB operations
cur = conn.cursor()

Run your SQL query. This will select all the columns from the breast_cancer dataset, and then will retrieve the first two rows as Python objects:

# Query the database and obtain data as Python objects
cur.execute("SELECT * FROM breast_cancer;")
cur.fetchmany(2)

Close the communication to the database:

# Close connection
cur.close()
conn.close()

For more information, please see the psycopg2 package documentation.

If you do not want to work with the workspace database directly, you could also try converting your dataset to a CSV file first and then reading that file in.

R kernel

Package management

The R kernel comes with a few pre-installed packages such as dplyr, knitr, RPostgreSQL, ggplot2, and ggvis. To see the full list, run installed.packages() in your R notebook.

To install a new package, simply run:

install.packages('package_name')

This will install a package in the /R/version_number/ folder in the workspace files directory. Once a package is installed, it can be loaded anytime, even after if the app has restarted.

Accessing files

The Jupyter Notebook working directory is the location of your notebook file. You can print this directory in your notebook using getwd() command.

In this example, the notebook is saved in the scripts folder in the workspace files.

For example, if you wanted to import data from a CSV file located in your workspace files into a data frame, you would use:

data <- read.csv('/home/workspace/files/data.csv')

Please note that access to Blobs is currently not supported.

Updated on April 1, 2022

Was this article helpful?

Related Articles

Not the solution you were looking for?
Click the link below to submit a support ticket
CONTACT SERVICE DESK