fix(redis): prevent false rate limits and code execution failures during redis outages#3289
Merged
waleedlatif1 merged 1 commit intostagingfrom Feb 21, 2026
Merged
fix(redis): prevent false rate limits and code execution failures during redis outages#3289waleedlatif1 merged 1 commit intostagingfrom
waleedlatif1 merged 1 commit intostagingfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Contributor
Greptile SummaryThis PR implements critical resilience improvements to prevent false rate limits and code execution failures during Redis outages. The changes shift from fail-closed to fail-open behavior across three key systems: Key Changes:
Testing: Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant App as Application
participant Redis as Redis Health Check
participant RateLimit as Rate Limiter
participant Storage as Storage Factory
participant IVM as IVM Distributed Lease
participant PG as PostgreSQL
Note over Redis: Every 30s PING check
Redis->>Redis: PING fails (3 consecutive)
Redis->>Redis: Force disconnect(true)
Redis->>Storage: Notify reconnect listeners
Storage->>Storage: Clear cached adapter
App->>RateLimit: Check rate limit
RateLimit->>Storage: Get storage adapter
alt Redis unavailable
Storage->>PG: Fall back to DbTokenBucket
Storage-->>RateLimit: Return DB adapter
else Redis available
Storage-->>RateLimit: Return Redis adapter
end
alt Storage error during check
RateLimit-->>App: Fail open (allow=true)
else Storage success
RateLimit-->>App: Return actual limit result
end
App->>IVM: Execute code
IVM->>IVM: Try acquire distributed lease
alt Redis unavailable
IVM->>IVM: Fall back to local execution
IVM-->>App: Execute with local pool limits
else Redis error
IVM->>IVM: Fall back to local execution
IVM-->>App: Execute with local pool limits
else Lease acquired
IVM-->>App: Execute normally
end
Last reviewed commit: 6e66168 |
…ing Redis outages
4e41016 to
6e66168
Compare
Collaborator
Author
|
@cursor review |
Collaborator
Author
|
@greptile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Context
On Feb 21, a stale TCP connection between ECS and Redis Cloud caused all Redis commands to time out. Because the rate limiter failed closed and the IVM lease returned errors on Redis failures, this silently blocked all non-manual workflow executions — producing 329 false "Rate limit exceeded" errors and 248 "Code execution is temporarily unavailable" errors across 15+ workspaces. An ECS restart fixed the immediate issue. These changes prevent recurrence.
Type of Change
Testing
Checklist