How might one implementing a sliding window using std::execution? #2016

pr0g · 2026-04-11T13:35:08Z

pr0g
Apr 11, 2026

Hi there,

I've recently been getting interested in std::execution, and I've been trying to see if I can create implementations of algorithms I've used before in async code using the API provided by std::execution. Unfortunately, I've been coming up short and was hoping I might be able to ask for clarification on the optimal/idiomatic way to create a sliding window of work.

The situation I'd like to solve for is, say I have 100 requests (just to keep things simple), and a downstream service I do not want to overwhelm. The service might take a little bit of time to handle each request, and can only handle a fixed load, so I'd like to send an initial batch of 10 requests immediately, and then wait for a response to come back, and as each response is returned, I then want to send another request immediately, creating a sliding window effect where there's never more than 10 requests in flight at once.

This is the first part of the problem, the next part, building on top of the first, would be to store the responses as they come back in an associative container, keyed by the response index. The reason for this is we want to take the responses, and return them in order to an upstream service, but as our responses might come back out of order, we need to store them in an associative container, and then have a separate thread that has an index/counter which it used to lookup if a response has come back in slot 0, then slot 1, 2... etc... This allows the results to be streamed back in order (in a simple example this streaming back could just be to a file that needs to have the responses recorded in the same order the requests were made).

The last thing to mention, is before we write the results to the map, we need to do some further processing on the response to get it into the right shape. This work is CPU bound, and to stop blocking the I/O thread, we have a separate thread pool to take the responses, do some processing, and then write them to the associative container before the results are picked up in order and streamed out (to a file or another service).

Now to do this the old-school way, I'd send a bunch of async requests with callbacks, as each callback returns, I spawn a worker thread, and move the response data to it so the processing can happen, and then have a mutex around an associative container (probably an unordered_map or equivalent for simplicity), and each worker thread would write to it when building the result is done. There'd then be a separate thread that would handle streaming the results back, which would be woken up on a condition variable each time a result is written to the associative container to see if it's the index it's waiting on. There might be a smarter/better/simpler way to do this, but that's that basic pattern.

Now if I want to do the equivalent with senders/receivers (std::execution), I think I'd need a combination of counting_scope, run_loop and some different schedulers. I have cobbled something together which looks something like this (I'm sure hopeless, but just a sketch/outline/idea):

{
    exec::static_thread_pool cpu_pool(std::thread::hardware_concurrency());
    auto cpu_sch = cpu_pool.get_scheduler();
   
    stdexec::counting_scope scope;
    stdexec::run_loop loop;
    auto main_sch = loop.get_scheduler();

    struct payload_t {
      float f; // some data...
    };
    struct response_t {
      int index;
    };
    struct data_t {
      int index;
      payload_t payload;
    };

    std::unordered_map<int, data_t> results;

    std::ofstream out{"out.txt"};

    const int request_count = 100;
    int out_index = 0;
    std::atomic<int> active = 0;
    std::counting_semaphore sem(10);
    for (int i = 0; i < request_count; i++) {
      stdexec::spawn(
        stdexec::just() | stdexec::then([&sem, &active, i]() noexcept {
          sem.acquire();
          active++;
          std::println(
            "task {} launching - active {}, thread {}", i, active.load(),
            std::this_thread::get_id());
          return response_t{.index = i};
        }) | stdexec::continues_on(cpu_sch)
          | stdexec::then([&sem, &active](const response_t& response) noexcept {
              std::println(
                "task {} running - active {}, thread {}", response.index,
                active.load(), std::this_thread::get_id());
              sem.release();
              active--;
              // do some processing...
              return data_t{
                .index = response.index,
                .payload = {.f = static_cast<float>(response.index)}};
            })
          | stdexec::continues_on(main_sch)
          | stdexec::then(
            [&results, &out_index, &out](const data_t& data) noexcept {
              std::println(
                "appending results: {}, thread {}", data.index,
                std::this_thread::get_id());
              results[data.index] = data;
              while (out_index < request_count) {
                if (
                  auto ordered_it = results.find(out_index);
                  ordered_it != results.end()) {
                  out_index++;
                  out << ordered_it->second.payload.f + 1.0f << std::endl;
                } else {
                  break;
                }
              }
            }),
        scope.get_token());
    }

    // deadlocks
    loop.run();
    stdexec::sync_wait(scope.join());
}

With this code I do get everything written in the correct order in out.txt (as I'm using std::endl to flush each write to the output), but this deadlocks with the run_loop logic right now, and if I try and if I try to do something like this...

stdexec::spawn(
    stdexec::starts_on(cpu_sch,
        scope.join()
          | stdexec::upon_error([&loop](auto&&) noexcept {
                std::println("scope.join() failed — shutting down");
                loop.finish();
            })
          | stdexec::then([&loop]() noexcept { loop.finish(); })),
    scope.get_token());

Then I get a very long compile error, complaining about error: no matching function for call to object of type 'stdexec::set_error_t' (which I'm not sure exactly how to solve, I've tried adding an extra error handler but that doesn't seem to have helped).

I also think the use of a semaphore here isn't great, as then I'm totally blocking a thread which could be doing something else useful when waiting for one of the 10 requests to come back (this could be solved by using a coroutine potentially, but I was thinking it should be possible to implement with just senders/receivers, though perhaps I'm mistaken on this).

I would be incredibly grateful and interested if someone could share with me a better/valid approach that would implement roughly what I'm thinking of. My understanding with std::execution is that all the primitives available today, should allow you to build any async computation, even if a friendlier API doesn't yet exist for it. This might be the case in my example, and if it fundamentally relies on a lower level building block, maybe it's not so simple, but I'd love to know if something like this is recommended/encouraged vs using raw threads.

Thank you very much for your time and I look forward to a response in some shape or form (sorry for such a long question, there's no obligation or rush to reply, I just thought I'd try my luck - if there are existing resources such as articles/talks/repos that would be worth looking at to better understand this by all means point me at those instead, thank you!).

maikel · 2026-04-11T20:05:54Z

maikel
Apr 11, 2026

Regarding the first question about the "window": this looks to me like you are looking for something like an "async semaphore". In libunifex there is the async_mutex that provides member functions async_lock() (returns a sender) and unlock(). https://github.com/facebookexperimental/libunifex/blob/main/include/unifex/v1/async_mutex.hpp
To me it looks as if you look for something similar for semaphores. A well design async mutex is probably an async object similar to how counting scope is an async object.

Regarding the "deadlock" with run_loop. Are you aware that sync_wait provides internally a run_loop that you can use to schedule work on?

0 replies

pr0g · 2026-04-11T21:33:02Z

pr0g
Apr 11, 2026
Author

Hi @maikel, thank you very much for your reply, I really appreciate it.

Yes I think an 'async semaphore' would be ideal, perhaps it might be possible for me to review the implementation of the async_lock in libunifex, and see if it's possible to recreate it in std::execution (though I'm not sure how difficult that may be).

I was not aware that sync_wait provides a run_loop itself, maybe I can tweak my example code in the short-term to use that instead and fix the deadlock issue.

I guess I'm just also wondering at a higher level, if it's difficult to build this type of thing today without async_mutex/async_semaphore, should one wait for such constructs to be made available in the library? Or is it reasonable to think you could write one yourself without too much trouble (and how much experience with the library is required for something like that to be practical?).

Thanks again, and I'll definitely take a look at what you mentioned.

3 replies

maikel Apr 12, 2026

I personally think the best thing here is to implement your own sender algorithms that ensure you don't block your execution threads with blocking synchronisation primitives.

I see two different approaches here that depend on your requirements:

Assume an unbounded inflow of async ops that all can call into an async semaphore. This approach does not make any assumptions about how many operations you provide storage for
Have a sender algorithm that only allocates a limited number of operations at once using a similar interface to bulk_let_value(100, 10, [](int I) { return just(); })

In general I think what's important is the sender protocol that is standardized that allows you to create your own async algorithms based to your needs. The number of algorithms for c++26 is limited and covers the most basic things needed for composition. Others might have a different opinion here.

pr0g Apr 12, 2026
Author

Hi @maikel,

Sorry I'm just replying, and thank you very much for your reply.

I think option 2 would fit my use case a bit better. It would essentially function a bit like bulk, only that bulk is a conveyor belt, where it can fit N items, and new items get added to the belt, while the processed ones fall off the end, never exceeding N.

As I'm a bit of a noob/beginner when it comes to std::execution, when it comes to acquiring the knowledge/understanding to implement your own sender algorithm, what might you recommend as a resource to learn about the fundamentals/building blocks of this approach? I think I still only really have a surface level understanding, and perhaps building something simpler to start or watching or reading a talk/article might help orientate me.

Thanks again for the feedback and suggestions,

Tom

maikel Apr 12, 2026

I am not aware of any articles that go that deep into what you want to do. I made an article once about sender factories using a scheduler as an example. I believe there are other similar ones as well plus conference videos. You can make your algorithm a factory as well. It's easier than making it a sender adaptor. I can try to take your example and make a write up for that. There is still a lot of design space open and tradeoffs between simplicity and versatility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How might one implementing a sliding window using std::execution? #2016

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How might one implementing a sliding window using std::execution? #2016

Uh oh!

pr0g Apr 11, 2026

Replies: 2 comments · 3 replies

Uh oh!

maikel Apr 11, 2026

Uh oh!

pr0g Apr 11, 2026 Author

Uh oh!

maikel Apr 12, 2026

Uh oh!

pr0g Apr 12, 2026 Author

Uh oh!

Uh oh!

maikel Apr 12, 2026

pr0g
Apr 11, 2026

Replies: 2 comments 3 replies

maikel
Apr 11, 2026

pr0g
Apr 11, 2026
Author

pr0g Apr 12, 2026
Author