Improve workflow perfs by thomtrp · Pull Request #18376 · twentyhq/twenty

thomtrp · 2026-03-04T09:23:19Z

Workflow crons take a few minutes to run. Loading each repo takes ~200 to 300ms locally. Adding a lite mode so it takes less than 100ms.
Also doing batch promises.

Finally, cleaning runs timeout when there are too many. Doing batches as well.

greptile-apps · 2026-03-04T13:24:26Z

Greptile Summary

This PR improves workflow cron job performance by introducing a lite workspace context (skipping feature flags, permissions, indexes and RLS) that cuts workspace loading time from ~200–300ms to <100ms, and by parallelising per-workspace checks with Promise.allSettled in batches of 50. It also refactors workflow run cleanup to use batched raw SQL DELETE … RETURNING loops instead of a full ORM-delete-per-row approach, and eliminates a redundant getWorkflowRunOrFail DB call in the iterator action.

Key changes:

loadLiteWorkspaceContext added to GlobalWorkspaceOrmManager to skip expensive metadata loading (feature flags, permissions, indexes, RLS predicates).
Three cron jobs (WorkflowCleanWorkflowRunsCronJob, WorkflowHandleStaledRunsCronJob, WorkflowRunEnqueueCronJob) now process workspaces in parallel batches of 50 with Promise.allSettled.
WorkflowCleanWorkflowRunsJob replaces bulk ORM deletes with two batched raw-SQL loops in deleteOldRuns and deleteExcessRunsPerWorkflow.
skipAndFailSafelyStepsThenContinue is promoted to public and batches step-info updates into a single DB write.
updateWorkflowRunStepInfos now performs a per-step deep merge (preserving existing fields like result when only status is updated) rather than shallow replacement.
Iterator action removes the redundant second getWorkflowRunOrFail call; test mocks should include explicit call-count assertions to prevent regressions.

Confidence Score: 4/5

Safe to merge; performance improvements are well-scoped and logic changes are localised to cron/cleanup paths.
The lite-context optimisation is sound for read-only count queries. Iterator test should add explicit toHaveBeenCalledTimes(1) assertions to prevent regressions, but the current test setup will still detect actual behavioral changes. All batching and parallelization logic is correct and performance gains are substantial.
Iterator action test should include explicit call-count assertions for getWorkflowRunOrFail in the reset-loop and completion test cases.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Cron Job fires] --> B[Load all active workspaces]
    B --> C{Batch workspaces<br/>in groups of 50}
    C --> D[Promise.allSettled per batch]
    D --> E{checkAndEnqueue<br/>per workspace}
    E -->|lite context| F[loadLiteWorkspaceContext<br/>skip flags/perms/indexes]
    F --> G{hasRuns/hasStaledRuns<br/>hasNotStartedRuns?}
    G -->|yes| H[Enqueue job]
    G -->|no| I[Skip]
    H --> J[WorkflowCleanWorkflowRunsJob]
    J --> K[deleteOldRuns loop<br/>DELETE … RETURNING, BATCH=200]
    K -->|deletedCount > 0| K
    K -->|done| L[deleteExcessRunsPerWorkflow loop<br/>CTE + DELETE … RETURNING, BATCH=200]
    L -->|deletedCount > 0| L
    L -->|done| M[Log total deleted]

_{Last reviewed commit: 4a7995a}

greptile-apps · 2026-03-04T13:24:29Z

...modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts

+        `,
+      );
+
+      deletedCount = result[0].length;


Crash bug: result[0].length always causes TypeError or incorrect batching

dataSource.query(DELETE ... RETURNING id) returns an array of row objects, e.g. [{id: 'uuid1'}, {id: 'uuid2'}, ...]. Therefore:

result[0] is the first row object {id: 'uuid1'}, not the rows array

result[0].length is undefined on a row object — making deletedCount = undefined and the while (undefined > 0) check always false

This has two consequences:

The batch loop always exits after one iteration, so only the first 200 rows are ever deleted per job run

When there are no rows to delete (e.g. when only deleteExcessRunsPerWorkflow has work but deleteOldRuns does not), result is [], result[0] is undefined, and accessing undefined.length throws a TypeError, crashing the job

The same issue exists at the equivalent line in deleteExcessRunsPerWorkflow (line 129).

The fix is to use result.length in both places:

Suggested change

deletedCount = result[0].length;

deletedCount = result.length;

TypeORM's dataSource.query() returns a tuple [rows, affectedCount]

greptile-apps · 2026-03-04T13:24:31Z

...modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts

-        await workflowRunRepository.delete(workflowRunToDelete.id);
-      }
+      deletedCount = result[0].length;
+      totalDeleted += deletedCount;


Same result[0].length bug in deleteExcessRunsPerWorkflow

Same issue as in deleteOldRuns above — result is the array of returned row objects from TypeORM, so result[0] is a single row object {id: '...'} with no .length. The batch loop will always exit after the first iteration.

Suggested change

totalDeleted += deletedCount;

deletedCount = result.length;

see answer #18376 (comment)

packages/twenty-server/src/modules/workflow/workflow-runner/jobs/run-workflow.job.ts

...orkflow/workflow-runner/workflow-run-queue/cron/jobs/workflow-handle-staled-runs.cron.job.ts

...les/workflow/workflow-runner/workflow-run-queue/utils/get-runs-to-clean-find-options.util.ts

...modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts

cubic-dev-ai

2 issues found across 12 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/twenty-server/src/modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts">

<violation number="1" location="packages/twenty-server/src/modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts:88">
P2: `dataSource.query` results are row arrays; `result[0].length` is undefined and breaks the batching logic. Count rows with `result.length` instead.</violation>
</file>

<file name="packages/twenty-server/src/modules/workflow/workflow-runner/jobs/run-workflow.job.ts">

<violation number="1" location="packages/twenty-server/src/modules/workflow/workflow-runner/jobs/run-workflow.job.ts:165">
P1: The new `hasStepsToSkipOrFailSafely` check uses `isDefined` instead of array length, so empty skip/fail lists are treated as work to do and can prevent terminal workflow runs from being marked COMPLETED.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

packages/twenty-server/src/modules/workflow/workflow-runner/jobs/run-workflow.job.ts

...modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts

thomtrp · 2026-03-04T14:10:49Z

@greptile-ai

greptile-apps · 2026-03-04T14:21:24Z

Additional Comments (1)

packages/twenty-server/src/modules/workflow/workflow-executor/workflow-actions/iterator/__tests__/iterator-action.workflow-action.spec.ts, line 208
The PR removes one of the two mockResolvedValueOnce setups for getWorkflowRunOrFail to reflect the optimization (one fewer DB call), but the test lacks an explicit call count assertion. If a regression re-introduces the second call, the test will silently pass — the extra call would return undefined, potentially causing subtle failures rather than a clear assertion error.

Consider adding an explicit assertion to lock in this behavior:

      const result = await service.execute(input);

      expect(
        workflowRunWorkspaceService.getWorkflowRunOrFail,
      ).toHaveBeenCalledTimes(1);

      expect(result).toEqual({

Context Used: Rule from dashboard - Check that mocked functions in tests are called exactly once with the expected arguments. (source)

Weiko · 2026-03-04T15:02:37Z

...modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts

+    let deletedCount: number;
+
+    do {
+      const result = await this.dataSource.query(


If you really can't avoid raw queries, can we at least write it like this

const result = await this.dataSource.query( ` DELETE FROM ${schemaName}."workflowRun" WHERE id IN ( SELECT id FROM ${schemaName}."workflowRun" WHERE status IN ($1, $2) AND "createdAt" < NOW() - MAKE_INTERVAL(days => $3) LIMIT $4 ) RETURNING id; `, [ WorkflowRunStatus.COMPLETED, WorkflowRunStatus.FAILED, RUNS_TO_CLEAN_THRESHOLD_DAYS, batchSize, ], );

Weiko · 2026-03-04T15:12:52Z

...dules/workflow/workflow-runner/workflow-run-queue/cron/jobs/workflow-run-enqueue.cron.job.ts

@@ -81,6 +87,21 @@ export class WorkflowRunEnqueueCronJob {
    );
  }

+  private async checkAndEnqueue(workspaceId: string): Promise<boolean> {
+    const hasNotStartedRuns = await this.hasNotStartedRuns(workspaceId);


I'm wondering if we actually want to run 50 times executeInWorkspaceContext concurrently (as we know this is quite heavy). Or if we really want to do that I'd start with something lower like 10 🤔

Weiko · 2026-03-04T15:14:25Z

...nty-server/src/engine/twenty-orm/global-workspace-datasource/global-workspace-orm.manager.ts

+      authContext,
+      flatObjectMetadataMaps,
+      flatFieldMetadataMaps,
+      flatIndexMaps: {


note: Ideally those maps should throw if they are accessed in the ORM in "lite" mode. We should allow this new mode only in controlled paths where we know a limited subset of cache is accessed otherwise some things could fail silentely with empty maps

Asked quickly AI and looks a bit complicated. Let me know if you want that we dive into this

Asked quickly AI and looks a bit complicated. Let me know if you want that we dive into this

Ah @charlesBochet already merged it. Ok @thomtrp I'll show you what I had in mind tmr. 👍

twenty-eng-sync · 2026-03-04T17:30:29Z

Hey @thomtrp! After you've done the QA of your Pull Request, you can mark it as done here. Thank you!

Batch step info updates + remove duplicated workflow run fetch

4838f6d

charlesBochet added the -PR: awaiting author label Mar 4, 2026

charlesBochet assigned thomtrp Mar 4, 2026

Improve job perfs

1d88ad3

thomtrp force-pushed the tt-improve-workflow-perfs branch from 8bb9879 to 1d88ad3 Compare March 4, 2026 13:12

thomtrp marked this pull request as ready for review March 4, 2026 13:19

thomtrp added -PR: awaiting review and removed -PR: awaiting author labels Mar 4, 2026

greptile-apps bot reviewed Mar 4, 2026

View reviewed changes

sentry bot reviewed Mar 4, 2026

View reviewed changes

...modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts Show resolved Hide resolved

cubic-dev-ai bot reviewed Mar 4, 2026

View reviewed changes

packages/twenty-server/src/modules/workflow/workflow-runner/jobs/run-workflow.job.ts Show resolved Hide resolved

...modules/workflow/workflow-runner/workflow-run-queue/jobs/workflow-clean-workflow-runs.job.ts Show resolved Hide resolved

Put back statuses in runs to clean

4a7995a

thomtrp force-pushed the tt-improve-workflow-perfs branch from 9412aed to 4a7995a Compare March 4, 2026 14:18

Weiko reviewed Mar 4, 2026

View reviewed changes

Weiko added -PR: awaiting author and removed -PR: awaiting review labels Mar 4, 2026

Improve queries

f332c14

thomtrp added -PR: awaiting review and removed -PR: awaiting author labels Mar 4, 2026

charlesBochet approved these changes Mar 4, 2026

View reviewed changes

charlesBochet merged commit 911a46a into main Mar 4, 2026
66 checks passed

charlesBochet deleted the tt-improve-workflow-perfs branch March 4, 2026 17:29

	deletedCount = result[0].length;
	deletedCount = result.length;

Conversation

thomtrp commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Flowchart

Uh oh!

greptile-apps bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

thomtrp Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

thomtrp Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomtrp commented Mar 4, 2026

Uh oh!

greptile-apps bot commented Mar 4, 2026

Uh oh!

Weiko Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Weiko Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Weiko Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

thomtrp Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Weiko Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

twenty-eng-sync bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps bot commented Mar 4, 2026 •

edited

Loading