Skip to content

fix(cache-downloader): Skip re-downloading already cached toolkit versions#84

Merged
mfranczel merged 4 commits intomainfrom
michal/blu-5873-dont-re-download-already-downloaded-toolkit-cache
Mar 27, 2026
Merged

fix(cache-downloader): Skip re-downloading already cached toolkit versions#84
mfranczel merged 4 commits intomainfrom
michal/blu-5873-dont-re-download-already-downloaded-toolkit-cache

Conversation

@mfranczel
Copy link
Copy Markdown
Contributor

@mfranczel mfranczel commented Mar 26, 2026

Summary

  • Cache-downloader now checks for the done file before downloading, skipping S3 download + extraction for versions already on disk
  • Fixes open().close() pattern to use Path.touch() to avoid potential resource warnings

Summary by CodeRabbit

  • Performance
    • Cache downloader now detects existing cached versions and skips redundant download/extract work, speeding repeated builds and cache operations.
  • Tests
    • Added unit tests covering both the skip-on-existing-cache behavior and the full download/extract flow when cache is absent.

Enhances the download_dependency function to check for an existing "done" file before initiating a download, preventing unnecessary downloads of already cached dependencies. Adds unit tests to verify this behavior, ensuring that downloads are skipped when the cache is present and triggered when it is absent.
@linear
Copy link
Copy Markdown

linear bot commented Mar 26, 2026

@mfranczel mfranczel self-assigned this Mar 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0d2b99d9-2efd-468c-8e1f-0e4a54053a93

📥 Commits

Reviewing files that changed from the base of the PR and between 7e637bb and d965444.

📒 Files selected for processing (1)
  • tests/unit/test_cache_downloader.py

📝 Walkthrough

Walkthrough

The PR makes download_dependency() idempotent by checking for a per-version "{py_ver}-done" marker file (Path(done_file).is_file()) and returning early with a timestamped message if it exists, avoiding directory creation and download/extract steps. After a successful download/extract, the marker is created using Path(done_file).touch() instead of open(...).close(). Unit tests were added to assert early exit when the marker exists and that subprocesses are invoked and the marker is created when it does not.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant FS as FileSystem
    participant SubP as Subprocess (aws/tar)

    Caller->>FS: check {py_ver}-done marker
    alt marker exists
        FS-->>Caller: marker present
        Caller->>Caller: log "already cached, skipping download"
    else marker missing
        Caller->>FS: create target directory
        Caller->>SubP: run aws s3 cp (Popen)
        SubP-->>Caller: aws returns
        Caller->>SubP: run tar -x (Popen)
        SubP-->>Caller: tar returns
        Caller->>FS: touch {py_ver}-done marker
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Updates Docs ⚠️ Warning No documentation updated in deepnote-toolkit repository for cache-downloader optimization feature. Update configuration guide or add cache-downloader README documenting the new caching behavior and done marker file mechanism.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly describes the main change: implementing cache detection to skip re-downloads of already cached toolkit versions, which matches the code modifications that add done-file checking.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

📦 Python package built successfully!

  • Version: 2.2.0.dev6+1894e99
  • Wheel: deepnote_toolkit-2.2.0.dev6+1894e99-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.2.0.dev6%2B1894e99/deepnote_toolkit-2.2.0.dev6%2B1894e99-py3-none-any.whl"

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.32%. Comparing base (9745351) to head (31709ba).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #84   +/-   ##
=======================================
  Coverage   74.32%   74.32%           
=======================================
  Files          94       94           
  Lines        5534     5534           
  Branches      824      824           
=======================================
  Hits         4113     4113           
  Misses       1155     1155           
  Partials      266      266           
Flag Coverage Δ
combined 74.32% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dockerfiles/cache-downloader/cache-downloader.py`:
- Around line 22-26: The check uses os.path.isfile(done_file) while the codebase
uses pathlib.Path; replace this with a Path-based check by ensuring done_file is
a pathlib.Path (or wrap it: Path(done_file)) and call its is_file() method
(i.e., use done_file.is_file()) in the block that prints the
"{datetime.datetime.now()}: {release_name} python{python_version} already
cached..." message so it follows the project's pathlib convention.

In `@tests/unit/test_cache_downloader.py`:
- Around line 184-188: The nested context managers should be combined into a
single with statement to make the test more concise and PEP8-friendly: replace
the two nested with patch.object calls (patch.object(_mod, "BASE_PATH",
str(tmp_path)) and patch.object(_mod.subprocess, "Popen") as mock_popen) with
one combined with patch.object(_mod, "BASE_PATH", str(tmp_path)),
patch.object(_mod.subprocess, "Popen") as mock_popen: before calling
download_dependency(release, py_ver, "fake-bucket") and keep
mock_popen.assert_not_called() after the combined block.
- Around line 206-210: The two nested context managers in the test should be
combined into a single with statement to simplify the code: merge the with
patch.object(_mod, "BASE_PATH", str(tmp_path)) and with
patch.object(_mod.subprocess, "Popen", side_effect=[mock_aws, mock_tar]) as
mock_popen into one line using a comma-separated with, keeping the same as
mock_popen binding and then call download_dependency(release, py_ver,
"fake-bucket") inside that single context.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8c0c1c98-952b-4597-86b2-3d16d0325133

📥 Commits

Reviewing files that changed from the base of the PR and between 1a9d469 and 7379d1c.

📒 Files selected for processing (2)
  • dockerfiles/cache-downloader/cache-downloader.py
  • tests/unit/test_cache_downloader.py

@deepnote-bot
Copy link
Copy Markdown

deepnote-bot commented Mar 26, 2026

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-84
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-03-27 15:23:04 (UTC)
📜 Deployed commit 98a5d66bddd951707fb7fd081890ffc09826f394
🛠️ Toolkit version 1894e99

Replaces os.path.isfile with pathlib.Path.is_file in the download_dependency function to improve code readability and adhere to project standards. Updates unit tests to maintain functionality while ensuring consistent use of context managers for patching.
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 26, 2026
@mfranczel mfranczel requested a review from tkislan March 26, 2026 09:03
@mfranczel mfranczel marked this pull request as ready for review March 26, 2026 09:03
@mfranczel mfranczel requested a review from a team as a code owner March 26, 2026 09:03
@mfranczel mfranczel requested review from m1so and removed request for m1so March 26, 2026 09:03
@mfranczel mfranczel merged commit 46807ba into main Mar 27, 2026
30 of 32 checks passed
@mfranczel mfranczel deleted the michal/blu-5873-dont-re-download-already-downloaded-toolkit-cache branch March 27, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants