Jupyter
Jupyter notebooks provide an interactive environment for data analysis, visualization, and code execution. They are particularly useful for bioinformatics workflows, exploratory data analysis, and sharing reproducible research.
Setting Up Jupyter on an EC2 Instance
When running Jupyter on an EC2 instance, you need to configure it to allow remote access securely. Follow these steps:
Install Jupyter (if not already installed):
pip install jupyter
Generate a config file:
jupyter notebook --generate-config
Create a password for added security:
jupyter notebook password
Edit the config file:
nano ~/.jupyter/jupyter_notebook_config.py
Add or modify these lines in the config file:
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8888
c.NotebookApp.allow_origin = '*'
c.NotebookApp.allow_remote_access = True
Start Jupyter:
jupyter notebook --no-browser
Accessing Jupyter from Your Local Machine
Method 1: Direct Access (Less Secure)
Ensure port 8888 is open in your EC2 security group.
Access Jupyter using your EC2 instance's public IP:
https://<your-ec2-public-ip>:8888
Enter the password you set earlier.
Method 2: SSH Tunnel (More Secure, Recommended)
Create an SSH tunnel:
ssh -i <your-key.pem> -L 8888:localhost:8888 ubuntu@<your-ec2-public-ip>
Access Jupyter locally:
http://localhost:8888
Enter the password you set earlier.
Best Practices
Use virtual environments to manage dependencies:
python -m venv jupyter_env
source jupyter_env/bin/activate
pip install jupyter
Install and use JupyterLab for a more feature-rich environment:
pip install jupyterlab
jupyter lab --no-browser
Use version control for notebooks. Install nbstripout to remove output before committing:
pip install nbstripout
nbstripout --install
For long-running tasks, use tmux
or screen
:
tmux new -s jupyter_session
jupyter notebook --no-browser
# Press Ctrl-B then D to detach
# To reattach: tmux attach -t jupyter_session
Regularly update Jupyter and its dependencies:
pip install --upgrade jupyter jupyterlab
Troubleshooting
If you're having trouble connecting to Jupyter, check the following:
- EC2 Security Group: Ensure port 8888 is open (for direct access method).
- Jupyter Configuration: Verify the config file settings.
- Firewall: Check if the EC2 instance's firewall is blocking connections.
- Jupyter Server: Confirm Jupyter is running and note any error messages.
- SSL Certificate: If using HTTPS, ensure your certificate is valid and trusted.
Advanced Topics
- Setting up JupyterHub for multi-user environments
- Integrating Jupyter with AWS S3 for data storage
- Using Jupyter kernels for different programming languages (R, Julia, etc.)
Remember to always prioritize security when working with remote servers and sensitive data. Regularly update your EC2 instance, Jupyter, and all dependencies to ensure you have the latest security patches.
Created by Ryan D. Najac for the Palomero Lab at the Institute for Cancer Genetics.Page last updated on 2024-10-17.