Best practices for handling persistent 202 responses from the contributor stats API #190711
-
🏷️ Discussion TypeQuestion 💬 Feature/Topic AreaAPI BodyHello, folks 👋🏼! What we're doingWe're using the Get all contributor commit activity endpoint to track active contributors across repositories in an organization. We've run into persistent For each repo in an org, we call that API endpoint. When we get a What we've observedWhen we polled the endpoint manually after getting a A few specific observations:
Questions
Any help from the community or GitHub staff would be appreciated. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
|
The Here's what works reliably: Retry with exponential backoffimport time
import requests
def get_contributor_stats(owner, repo, token, max_retries=5):
url = f"https://api.github.com/repos/{owner}/{repo}/stats/contributors"
headers = {"Authorization": f"Bearer {token}"}
for attempt in range(max_retries):
resp = requests.get(url, headers=headers)
if resp.status_code == 200:
return resp.json()
if resp.status_code == 202:
wait = 2 ** attempt # 1s, 2s, 4s, 8s, 16s
time.sleep(wait)
continue
resp.raise_for_status()
return None # still computing after retriesKey things to know:
The 202 is not an error - it's GitHub saying "I'm working on it." The exponential backoff pattern above handles it cleanly for most use cases. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @Sagargupta16 for the detailed response! The exponential backoff pattern makes sense for tight polling scenarios, and the 24-hour cache TTL is a useful data point we weren't aware of that explains why our weekly scans consistently hit cold caches. A couple of follow-up questions based on what you shared:
These edge cases are what's making this endpoint tricky to rely on for our use case. |
Beta Was this translation helpful? Give feedback.
-
|
You can try simple logging: . if status == 202: |
Beta Was this translation helpful? Give feedback.
Good follow-ups. I dug deeper into the official docs and want to correct/refine a few things from my earlier response.
On the persistent 202s across 76% of repos
I want to correct myself - I mentioned a "24-hour cache TTL" earlier, but the docs don't document any time-based TTL. The cache is actually keyed by the SHA of the default branch, and the only documented invalidation trigger is pushing to that branch.
The likely reason you're seeing 76% 202s even after 5-15 minute delays is concurrency. GitHub's best practices explicitly say to "make requests serially, not concurrently." When you fire off warming requests to dozens of repos simultaneously, you're likely overwhelming the backgroun…