Jupyter

Jupyter notebooks provide an interactive environment for data analysis, visualization, and code execution. They are particularly useful for bioinformatics workflows, exploratory data analysis, and sharing reproducible research.

Setting Up Jupyter on an EC2 Instance

When running Jupyter on an EC2 instance, you need to configure it to allow remote access securely. Follow these steps:

Install Jupyter (if not already installed):

pip install jupyter

Generate a config file:

jupyter notebook --generate-config

Create a password for added security:

jupyter notebook password

Edit the config file:

nano ~/.jupyter/jupyter_notebook_config.py

Add or modify these lines in the config file:

c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8888
c.NotebookApp.allow_origin = '*'
c.NotebookApp.allow_remote_access = True

Start Jupyter:

jupyter notebook --no-browser

Accessing Jupyter from Your Local Machine

Method 1: Direct Access (Less Secure)

Ensure port 8888 is open in your EC2 security group.

Access Jupyter using your EC2 instance's public IP:

https://<your-ec2-public-ip>:8888

Enter the password you set earlier.

Create an SSH tunnel:

ssh -i <your-key.pem> -L 8888:localhost:8888 ubuntu@<your-ec2-public-ip>

Access Jupyter locally:

http://localhost:8888

Enter the password you set earlier.

Best Practices

Use virtual environments to manage dependencies:

python -m venv jupyter_env
source jupyter_env/bin/activate
pip install jupyter

Install and use JupyterLab for a more feature-rich environment:

pip install jupyterlab
jupyter lab --no-browser

Use version control for notebooks. Install nbstripout to remove output before committing:

pip install nbstripout
nbstripout --install

For long-running tasks, use tmux or screen:

tmux new -s jupyter_session
jupyter notebook --no-browser
# Press Ctrl-B then D to detach
# To reattach: tmux attach -t jupyter_session

Regularly update Jupyter and its dependencies:

pip install --upgrade jupyter jupyterlab

Troubleshooting

If you're having trouble connecting to Jupyter, check the following:

  • EC2 Security Group: Ensure port 8888 is open (for direct access method).
  • Jupyter Configuration: Verify the config file settings.
  • Firewall: Check if the EC2 instance's firewall is blocking connections.
  • Jupyter Server: Confirm Jupyter is running and note any error messages.
  • SSL Certificate: If using HTTPS, ensure your certificate is valid and trusted.

Advanced Topics

  • Setting up JupyterHub for multi-user environments
  • Integrating Jupyter with AWS S3 for data storage
  • Using Jupyter kernels for different programming languages (R, Julia, etc.)

Remember to always prioritize security when working with remote servers and sensitive data. Regularly update your EC2 instance, Jupyter, and all dependencies to ensure you have the latest security patches.

Created by Ryan D. Najac for the Palomero Lab at the Institute for Cancer Genetics.
Page last updated on 2024-10-17.