Loading video player...
Get the source code for this video here → https://the-dotnet-weekly.kit.com/scaling-slow-apis Want to master Clean Architecture? Go here: https://dub.sh/clean-architecture Want to master Modular Monoliths? Go here: https://dub.sh/modular-monolith Join the .NET Architects Club: https://www.skool.com/mj-tech-community-5418/about Get the 2026 .NET Developer roadmap here → https://the-dotnet-weekly.ck.page/2026-roadmap Long-running API requests can quietly destroy your API’s scalability. At first, the solution seems obvious: accept the request, do the work, return the result. But what happens when that work takes several minutes? Now your users are waiting. Your API is holding connections open. And during traffic spikes, your server can quickly run out of capacity just handling slow requests. In this video, I’ll show you the right way to scale long-running API requests using a step-by-step architecture progression. We’ll start with the naive synchronous approach, then improve it by returning 202 Accepted, storing work in a jobs table, processing jobs in the background, and eventually introducing a queue with competing consumers. You’ll learn how to: - Avoid making users wait for slow API requests - Use 202 Accepted correctly - Track long-running work with a jobs table - Move processing into background workers - Use queues to smooth out traffic spikes - Scale background processing independently from your API - Notify users when processing is complete - Handle retries, failures, and dead-letter scenarios This design is especially useful for reports, large file processing, external API workflows, heavy database operations, and any API request that doesn’t need to return the final result immediately. Source code is available in the pinned comment. Check out my courses: https://www.milanjovanovic.tech/courses Read my Blog here: https://www.milanjovanovic.tech/blog Join my weekly .NET newsletter: https://www.milanjovanovic.tech Chapters