Operating Lambda: Application design and Service Quotas

vipul kumar
3 min readFeb 17, 2021


In this post, we will discuss, how to work with service quotas, when to request an increase, and architecting with quotas in mind.

Understanding Quotas

Service Quotas are guardrails that are put in place to protect your account and the workloads of other applications. Service quotas exist in all AWS services and consist of hard limits, which you cannot change, and soft limits, which you can request an increase for.

Soft limits have a default value that can be increased as your service grow. This protects the account from unexpected costs caused by unintended usage.

Quotas can be account level, region level, and may also include time interval restrictions.

Architecting with Service Quotas

Once you know which AWS services your application will use, you can compare the quotas across services and find any potential issues.

For example, API Gateway has a limit of 10000 requests per second and Lambda has a default quota of 1000 concurrent execution. Since API Gateway to Lambda is synchronous execution, it is possible to have more incoming requests than could be handled by Lambda. In this case, we need to increase the Lambda concurrency limit.

Another example is related to Payload. API Gateway supports payload up to 10 MB, Lambda’s payload limit is 6 MB but SQS message size hard limit is 256 Kb. In this case, to support bigger requests, we can upload the payload to S3 and pass the reference across the services.

A load test is very useful to identify any service quotas that act as a limiting factor for your service.

Using multiple AWS accounts for managing quotas

Some services have a limit at the account level that is shared across all the workloads running in the account, reducing the quota for each workload. Also, if we are using the same account for development and production traffic, unintentionally your test workload can exhaust the quotas for your production service.

This can be avoided by using different AWS accounts for different regions and environments. For example, each developer can have their own AWS account that can be used for development and testing without any possible impact on the Production environment.

Multiple AWS accounts can be managed using AWS Organization and policies can be enforced across the set of accounts that belong to the same organization.

Controlling traffic flow for server-based resources.

If your serverless application is making requests to other server-based resources like SQL database (RDS), the high scalability of serverless applications can overwhelm the downstream resources.

For example, if your Lambda is talking to RDS instance, 1000’s of Lambda will make a connection request to RDS, overwhelming the RDS server.

Hence, when working with server-based resources, APIs, or third-party services, it is important to know the limits around connections, transactions, and data transfers. And if your serverless application has the capacity to overwhelm those resources, it’s suggested to use the SQS queue to decouple the applications.