Best practices for handling persistent 202 responses from the contributor stats API #190711

anurag-rajawat · 2026-03-26T15:07:51Z

anurag-rajawat
Mar 26, 2026

🏷️ Discussion Type

Question

💬 Feature/Topic Area

API

Body

Hello, folks 👋🏼!

What we're doing

We're using the Get all contributor commit activity endpoint to track active contributors across repositories in an organization. We've run into persistent 202 responses that we can't seem to resolve reliably and wanted to get community input on the best approach.

For each repo in an org, we call that API endpoint. When we get a 202, we wait and retry to allow GitHub's background job to complete.

What we've observed

When we polled the endpoint manually after getting a 202, stats were ready within 1-4 minutes in most cases. So we expected that retrying after 5 or 15 minutes would reliably return a 200. In practice, we're seeing the opposite, even after waiting 5 minutes or 15 minutes, a significant portion of repos still return 202 on retry.

A few specific observations:

For some repos, the stats never became available within our retry window of 15 minutes.
The Retry-After header is not present in 202 responses, so we have no signal for when to retry.
The endpoint costs 1-2 rate limit points per call unpredictably (confirmed via response headers), so we want to keep retries minimal.
The cache appears to be invalidated beyond just push-triggered resets as per best practices, in some cases, after successfully receiving stats (200), a subsequent call returned 202 again with no push activity in between.

Questions

Is there a recommended retry strategy for this endpoint with minimum number of API calls?
Why would a repo continue returning 202 when the background job typically completes in 1-4 minutes? Is there any throttling or queuing on GitHub's side when many repos from the same org are requested simultaneously?
Is there a more reliable alternative API to achieve the same goal, counting active contributors per org; that doesn't have this caching behavior?

Any help from the community or GitHub staff would be appreciated. Thanks!

Answered by Sagargupta16

Mar 28, 2026

Good follow-ups. I dug deeper into the official docs and want to correct/refine a few things from my earlier response.

On the persistent 202s across 76% of repos

I want to correct myself - I mentioned a "24-hour cache TTL" earlier, but the docs don't document any time-based TTL. The cache is actually keyed by the SHA of the default branch, and the only documented invalidation trigger is pushing to that branch.

The likely reason you're seeing 76% 202s even after 5-15 minute delays is concurrency. GitHub's best practices explicitly say to "make requests serially, not concurrently." When you fire off warming requests to dozens of repos simultaneously, you're likely overwhelming the backgroun…

View full answer

Sagargupta16 · 2026-03-27T00:34:06Z

Sagargupta16
Mar 27, 2026

The 202 response from the statistics endpoints is by design - GitHub computes these stats lazily and caches them. When you get a 202, it means "computing, try again soon."

Here's what works reliably:

Retry with exponential backoff

import time
import requests

def get_contributor_stats(owner, repo, token, max_retries=5):
    url = f"https://api.github.com/repos/{owner}/{repo}/stats/contributors"
    headers = {"Authorization": f"Bearer {token}"}
    
    for attempt in range(max_retries):
        resp = requests.get(url, headers=headers)
        if resp.status_code == 200:
            return resp.json()
        if resp.status_code == 202:
            wait = 2 ** attempt  # 1s, 2s, 4s, 8s, 16s
            time.sleep(wait)
            continue
        resp.raise_for_status()
    return None  # still computing after retries

Key things to know:

First request always returns 202 for repos that haven't been queried recently - the cache TTL is roughly 24 hours
Large repos take longer - repos with thousands of contributors can take 30+ seconds to compute
Warm the cache proactively - if you need stats for many repos, fire off a request to each one first (accept the 202s), wait 30-60 seconds, then fetch them all again
Use conditional requests - pass If-None-Match with the ETag from a previous 200 response to avoid recomputation when nothing changed
For org-wide scans, batch your initial requests and process results as they come in rather than waiting sequentially

The 202 is not an error - it's GitHub saying "I'm working on it." The exponential backoff pattern above handles it cleanly for most use cases.

0 replies

anurag-rajawat · 2026-03-28T08:53:44Z

anurag-rajawat
Mar 28, 2026
Author

Thanks @Sagargupta16 for the detailed response! The exponential backoff pattern makes sense for tight polling scenarios, and the 24-hour cache TTL is a useful data point we weren't aware of that explains why our weekly scans consistently hit cold caches.

A couple of follow-up questions based on what you shared:

Proactive warming — we tried this approach with delays of 5 and 15 minutes after the initial 202, but still saw ~76% of repos returning 202 on retry across one org. Given the cache TTL is 24 hours, would a much longer warm-up window (e.g. 30-60 minutes) realistically help, or is there something else going on for persistent 202s?
202 after a successful 200 — we observed cases where a repo returned 200 with data, and then a subsequent call (without any push activity in between) returned 202 again. If the cache TTL is 24 hours, what would cause the cache to be invalidated without a push? Is this expected behavior?

These edge cases are what's making this endpoint tricky to rely on for our use case.

2 replies

Sagargupta16 Mar 28, 2026

Good follow-ups. I dug deeper into the official docs and want to correct/refine a few things from my earlier response.

On the persistent 202s across 76% of repos

I want to correct myself - I mentioned a "24-hour cache TTL" earlier, but the docs don't document any time-based TTL. The cache is actually keyed by the SHA of the default branch, and the only documented invalidation trigger is pushing to that branch.

The likely reason you're seeing 76% 202s even after 5-15 minute delays is concurrency. GitHub's best practices explicitly say to "make requests serially, not concurrently." When you fire off warming requests to dozens of repos simultaneously, you're likely overwhelming the background job queue - GitHub has to compute stats for each repo, and those jobs compete for the same compute resources server-side.

What I'd recommend instead:

import time
import requests

def warm_and_fetch(repos, token):
    headers = {"Authorization": f"Bearer {token}"}
    results = {}

    # Phase 1: Serial warming - one repo at a time with a gap
    for owner, repo in repos:
        url = f"https://api.github.com/repos/{owner}/{repo}/stats/contributors"
        requests.get(url, headers=headers)
        time.sleep(1)  # Let each background job start without contention

    # Phase 2: Wait for computation
    time.sleep(60)

    # Phase 3: Serial fetch with per-repo retry
    for owner, repo in repos:
        url = f"https://api.github.com/repos/{owner}/{repo}/stats/contributors"
        for attempt in range(4):
            resp = requests.get(url, headers=headers)
            if resp.status_code == 200:
                results[f"{owner}/{repo}"] = resp.json()
                break
            time.sleep(2 ** attempt)
        time.sleep(0.5)  # Space out requests

    return results

The key difference from my earlier suggestion: serial requests with pauses between each, not parallel batch firing. For large orgs (100+ repos), you might also want to chunk into batches of 20-30 repos per warming cycle.

On 200 followed by 202 without any push

Per the docs, the cache is keyed by the SHA of the default branch, and pushing is the only documented invalidation trigger. But there are undocumented reasons this can happen:

Internal cache eviction - GitHub almost certainly runs an LRU or memory-pressure eviction policy on top of the SHA-based keying. The "stats are expensive to compute" note in the docs implies they don't keep them indefinitely.
Distributed cache inconsistency - GitHub's infrastructure is multi-region. Your first request may hit a node with a warm cache, while the next routes to a different node that doesn't have it.
Non-obvious pushes - GitHub Actions, Dependabot, merge queues, branch protection auto-merges, or GitHub Pages deployments can all push to the default branch. Check the repo's "Network" or "Activity" tab to verify nothing is pushing between your calls.
Force pushes or rebases - These change the SHA of the default branch even if the content looks the same, which would invalidate the cache.

Unfortunately, since GitHub doesn't document the internal eviction policy, there's no guaranteed way to prevent this. The most reliable workaround is to use conditional requests with ETags:

# Store the ETag from a successful 200
etag = resp.headers.get("ETag")

# On subsequent requests, pass it back
headers["If-None-Match"] = etag
resp = requests.get(url, headers=headers)
# 304 = data unchanged (free, no rate limit hit)
# 200 = fresh data
# 202 = cache was evicted, recomputing

This way, 304 responses are free (don't consume rate limit), and you only pay for actual recomputations.

TL;DR

My "24-hour TTL" claim was inaccurate - it's SHA-based, not time-based
For the 76% problem: switch from parallel to serial warming with 1s gaps
For the 200-then-202 problem: likely internal cache eviction or distributed inconsistency - not documented, but use ETags to mitigate
For org-wide counting, you might also consider the GraphQL API (contributionsCollection) as an alternative that doesn't have the 202 issue

Answer selected by anurag-rajawat

anurag-rajawat Mar 29, 2026
Author

Thanks for the detailed follow-up and for correcting the TTL point.

On the serial vs parallel warming — you're actually right, we were processing multiple repos concurrently since we queue all repos in a chunk simultaneously and multiple workers pick them up at the same time. So the background job queue contention could very well be a contributing factor to the high 202 rate we were seeing.

The distributed cache inconsistency explanation also fits well for the 200 -> 202 cases we observed.

We've already switched to the GraphQL API which solved the problem entirely — no 202s, consistent results, and more efficient at scale. For anyone else hitting this issue, that's probably the cleanest long-term solution.

skipbaki · 2026-03-28T10:44:24Z

skipbaki
Mar 28, 2026

You can try simple logging:

import time
import logging
def get_stats(contributor_id):
start = time.time()
logging.info(f"fetching stats for {contributor_id}")

......do pulling

if success:
elapsed = time.time() - start
logging.info(f"Got stats for {contributor_id} in {elapsed:.1f} seconds")
else:
logging.error(f"Failed to get stats for {contributor_id} after {elapsed:.1f}s")

.
.
.
Only thing I'd add is maybe logging when you hit a 202 so you know it's actually retrying versus just stuck...

if status == 202:
logging.info(f"Still computing stats for {contributor_id}, attempt {attempt}")

1 reply

anurag-rajawat Mar 29, 2026
Author

Thanks @skipbaki ! We do've logging in place and can confirm the retries were firing correctly at 5 and 15 minutes delay, so the issue isn't with the retry mechanism itself but with the endpoint still returning 202 even after those delays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Best practices for handling persistent 202 responses from the contributor stats API #190711

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GitHub Community

Best practices for handling persistent 202 responses from the contributor stats API #190711

Uh oh!

anurag-rajawat Mar 26, 2026

🏷️ Discussion Type

💬 Feature/Topic Area

Body

What we're doing

What we've observed

A few specific observations:

Questions

On the persistent 202s across 76% of repos

Replies: 3 comments · 3 replies

Uh oh!

Sagargupta16 Mar 27, 2026

Retry with exponential backoff

Key things to know:

Uh oh!

anurag-rajawat Mar 28, 2026 Author

Uh oh!

Sagargupta16 Mar 28, 2026

On the persistent 202s across 76% of repos

On 200 followed by 202 without any push

TL;DR

Uh oh!

anurag-rajawat Mar 29, 2026 Author

Uh oh!

Uh oh!

skipbaki Mar 28, 2026

Uh oh!

anurag-rajawat Mar 29, 2026 Author

anurag-rajawat
Mar 26, 2026

Replies: 3 comments 3 replies

Sagargupta16
Mar 27, 2026

anurag-rajawat
Mar 28, 2026
Author

anurag-rajawat Mar 29, 2026
Author

skipbaki
Mar 28, 2026

anurag-rajawat Mar 29, 2026
Author