Skip to content

Commit e6ade7e

Browse files
authored
dev: add make notebook (#2528)
<!-- Thanks for opening a pull request! --> <!-- In the case this PR will resolve an issue, please replace ${GITHUB_ISSUE_ID} below with the actual Github issue id. --> <!-- Closes #${GITHUB_ISSUE_ID} --> # Rationale for this change Add 2 new Make command (`make notebook`) to spin up a jupyter notebook; `make notebook-infra` spins up jupyter notebook along with integration test infrastructure. ### Pyiceberg Example Notebook Pyiceberg example notebook (`notebooks/pyiceberg_example.ipynb`) is based on the https://py.iceberg.apache.org/#getting-started-with-pyiceberg page and doesn't require additional test infra. ### Spark Example Notebook Spark integration example notebook (`notebooks/spark_integration_example.ipynb`) is based on https://iceberg.apache.org/docs/nightly/spark-getting-started/ and requires integration test infrastructure (Spark, IRC, S3) With spark connect (#2491) and our testing setup, we can quickly spin up a local env with `make test-integration-exec` which includes: * spark * iceberg rest catalog * hive metastore * minio In the jupyter notebook, connect to spark easily ``` from pyspark.sql import SparkSession # Create SparkSession against the remote Spark Connect server spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate() spark.sql("SHOW CATALOGS").show() ``` ## Are these changes tested? Yes, run both `make notebook` and `make notebook-infra` locally and run the example notebooks ## Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. -->
1 parent e373ebd commit e6ade7e

File tree

10 files changed

+1480
-2
lines changed

10 files changed

+1480
-2
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,9 @@ bin/
4141
.mypy_cache/
4242
htmlcov
4343

44+
# Jupyter notebook checkpoints
45+
.ipynb_checkpoints/
46+
4447
pyiceberg/avro/decoder_fast.c
4548
pyiceberg/avro/*.html
4649
pyiceberg/avro/*.so

.pre-commit-config.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ repos:
3232
- id: ruff
3333
args: [ --fix, --exit-non-zero-on-fix ]
3434
- id: ruff-format
35+
- repo: https://github.com/nbQA-dev/nbQA
36+
rev: 1.9.1
37+
hooks:
38+
- id: nbqa-ruff
39+
args: [ --fix, --exit-non-zero-on-fix ]
3540
- repo: https://github.com/pre-commit/mirrors-mypy
3641
rev: v1.18.2
3742
hooks:

Makefile

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ test: ## Run all unit tests (excluding integration)
9797

9898
test-integration: test-integration-setup test-integration-exec test-integration-cleanup ## Run integration tests
9999

100-
test-integration-setup: ## Start Docker services for integration tests
100+
test-integration-setup: install ## Start Docker services for integration tests
101101
docker compose -f dev/docker-compose-integration.yml kill
102102
docker compose -f dev/docker-compose-integration.yml rm -f
103103
docker compose -f dev/docker-compose-integration.yml up -d --build --wait
@@ -153,6 +153,21 @@ docs-serve: ## Serve local docs preview (hot reload)
153153
docs-build: ## Build the static documentation site
154154
uv run $(PYTHON_ARG) mkdocs build -f mkdocs/mkdocs.yml --strict
155155

156+
# ========================
157+
# Experimentation
158+
# ========================
159+
160+
##@ Experimentation
161+
162+
notebook-install: ## Install notebook dependencies
163+
uv sync $(PYTHON_ARG) --all-extras --group notebook
164+
165+
notebook: notebook-install ## Launch notebook for experimentation
166+
uv run jupyter lab --notebook-dir=notebooks
167+
168+
notebook-infra: notebook-install test-integration-setup ## Launch notebook with integration test infra (Spark, Iceberg Rest Catalog, object storage, etc.)
169+
uv run jupyter lab --notebook-dir=notebooks
170+
156171
# ===================
157172
# Project Maintenance
158173
# ===================
@@ -167,6 +182,8 @@ clean: ## Remove build artifacts and caches
167182
@find . -name "__pycache__" -exec echo Deleting {} \; -exec rm -rf {} +
168183
@find . -name "*.pyd" -exec echo Deleting {} \; -delete
169184
@find . -name "*.pyo" -exec echo Deleting {} \; -delete
185+
@echo "Cleaning up Jupyter notebook checkpoints..."
186+
@find . -name ".ipynb_checkpoints" -exec echo Deleting {} \; -exec rm -rf {} +
170187
@echo "Cleanup complete."
171188

172189
uv-lock: ## Regenerate uv.lock file from pyproject.toml

dev/.rat-excludes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@ build
55
.gitignore
66
uv.lock
77
mkdocs/*
8+
notebooks/*

mkdocs/docs/contributing.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,41 @@ export PYICEBERG_CATALOG__TEST_CATALOG__ACCESS_KEY_ID=username
228228
export PYICEBERG_CATALOG__TEST_CATALOG__SECRET_ACCESS_KEY=password
229229
```
230230

231+
## Notebooks for Experimentation
232+
233+
PyIceberg provides Jupyter notebooks for quick experimentation and learning. Two Make commands are available depending on your needs:
234+
235+
### PyIceberg Examples (`make notebook`)
236+
237+
For basic PyIceberg experimentation without additional infrastructure:
238+
239+
```bash
240+
make notebook
241+
```
242+
243+
This will install notebook dependencies and launch Jupyter Lab in the `notebooks/` directory.
244+
245+
**PyIceberg Example Notebook** (`notebooks/pyiceberg_example.ipynb`) is based on the [Getting Started with PyIceberg](https://py.iceberg.apache.org/#getting-started-with-pyiceberg) page. It demonstrates basic PyIceberg operations like creating catalogs, schemas, and querying tables without requiring any external services.
246+
247+
### Spark Integration Examples (`make notebook-infra`)
248+
249+
For working with PyIceberg alongside Spark, use the infrastructure-enabled notebook environment:
250+
251+
```bash
252+
make notebook-infra
253+
```
254+
255+
This command spins up the full integration test infrastructure via Docker Compose, including:
256+
257+
- **Spark** (with Spark Connect)
258+
- **Iceberg REST Catalog** (using the [`apache/iceberg-rest-fixture`](https://hub.docker.com/r/apache/iceberg-rest-fixture) image)
259+
- **Hive Metastore**
260+
- **S3-compatible object storage** (Minio)
261+
262+
**Spark Example Notebook** (`notebooks/spark_integration_example.ipynb`) is based on the [Spark Getting Started](https://iceberg.apache.org/docs/nightly/spark-getting-started/) guide. This notebook demonstrates how to work with PyIceberg alongside Spark, leveraging the Docker-based testing setup for a complete local development environment.
263+
264+
After running `make notebook-infra`, open `spark_integration_example.ipynb` in the Jupyter Lab interface to explore Spark integration capabilities.
265+
231266
## Code standards
232267

233268
Below are the formalized conventions that we adhere to in the PyIceberg project. The goal of this is to have a common agreement on how to evolve the codebase, but also using it as guidelines for newcomers to the project.

mkdocs/docs/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,10 @@ Since the catalog was configured to use the local filesystem, we can explore how
198198
find /tmp/warehouse/
199199
```
200200

201+
## Try it yourself with Jupyter Notebooks
202+
203+
PyIceberg provides Jupyter notebooks for hands-on experimentation with the examples above and more. Check out the [Notebooks for Experimentation](contributing.md#notebooks-for-experimentation) guide.
204+
201205
## More details
202206

203207
For the details, please check the [CLI](cli.md) or [Python API](api.md) page.

0 commit comments

Comments
 (0)