Add embedding recipe: build domain-specific embeddings from raw documents#85
Draft
oliverholworthy wants to merge 3 commits intomainfrom
Draft
Add embedding recipe: build domain-specific embeddings from raw documents#85oliverholworthy wants to merge 3 commits intomainfrom
oliverholworthy wants to merge 3 commits intomainfrom
Conversation
e90e6b4 to
f0c63a4
Compare
f0c63a4 to
4fd435d
Compare
marcromeyn
reviewed
Mar 24, 2026
| @@ -0,0 +1,18 @@ | |||
| For weeks, the Amazon rainforest has been burning at a startling rate. Tens of thousands of fires have been recorded this year largely started by humans clearing land for logging, ranching or mining. | |||
Contributor
There was a problem hiding this comment.
I wonder if we should publish the dummy data on huggingface hub?
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
4fd435d to
67e75af
Compare
Move detailed documentation from the recipe README into docs/nemotron/embed/ to follow the nano3/super3 pattern. Add grid card and toctree entry in docs/index.md. Signed-off-by: Oliver Holworthy <oholworthy@nvidia.com>
Remove bundled sample data from the repo and download it on demand from HuggingFace (nvidia/Retrieval-Synthetic-NVDocs-v1). The SDG stage now supports hf:// URIs in corpus_dir config, e.g.: hf://nvidia/Retrieval-Synthetic-NVDocs-v1@<sha>/sample_corpus/nv_pp_random This keeps the repo lightweight while preserving zero-config quick start — the default config auto-downloads the sample corpus on first run. Signed-off-by: Oliver Holworthy <oholworthy@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
nemotron embed(sdg, prep, finetune, eval, export, deploy, run)nemo_runspecfor local-docker execution--helpTest plan
nemotron embed finetune --run local-dockerlaunches and streams logsnemotron embed finetune --dry-runshows config without executingnemotron embed finetune --helpdisplays config options from pydantic modelpytest tests/recipes/embed/nemotron nano3commands