Tuesday, 28 January 2025

Resiliency Paterns

 

Resiliency Patterns in Microservices

Resiliency patterns help microservices remain reliable and available even when failures occur. These patterns ensure that a system can handle faults gracefully, preventing cascading failures and improving user experience.


1. Circuit Breaker Pattern

When to Use?

  • When a service call is failing repeatedly.
  • When a dependent service is slow or unresponsive.
  • To prevent excessive retries that can overload the system.

Why Use It?

  • Prevents system overload by stopping calls to a failing service.
  • Helps recover gracefully by allowing time for the failed service to restart.

Example:

  • A payment service in an e-commerce system relies on a third-party payment gateway.
  • If the gateway is down, the Circuit Breaker trips and blocks further requests, preventing unnecessary failures.
  • Once the gateway recovers, the Circuit Breaker resets and allows requests again.

Tools: Netflix Hystrix, Resilience4j


2. Retry Pattern

When to Use?

  • When temporary failures occur due to network issues or rate limits.
  • When the failure is intermittent and expected to recover soon.

Why Use It?

  • Automatically retries failed operations instead of failing immediately.
  • Reduces temporary errors from affecting the user experience.

Example:

  • A weather app makes API calls to a weather provider.
  • If a request fails due to network timeout, it retries after a short delay before showing an error to the user.

Tools: Spring Retry, Polly (.NET), Resilience4j


3. Bulkhead Pattern

When to Use?

  • When different services or operations should be isolated to prevent cascading failures.
  • When multiple components share resources like threads or database connections.

Why Use It?

  • Prevents one failing service from taking down the entire system.
  • Ensures that critical services continue running even if non-critical ones fail.

Example:

  • A food delivery app has:
    • Order Service
    • Restaurant Search Service
    • User Profile Service
  • If Restaurant Search is overwhelmed with requests, Bulkhead ensures it doesn’t consume all resources, keeping Order Processing unaffected.

Tools: Netflix Hystrix, Istio Service Mesh


4. Fallback Pattern

When to Use?

  • When a dependent service is unavailable, but a default response can be provided.
  • When some functionality is better than complete failure.

Why Use It?

  • Improves user experience by providing a degraded but usable service.
  • Helps maintain system functionality during failures.

Example:

  • A flight booking system calls an external Seat Availability API.
  • If the API is down, a fallback response shows “Availability data is currently unavailable, please try again later” instead of an error.

Tools: Resilience4j, Spring Cloud Hystrix


5. Timeouts Pattern

When to Use?

  • When calling a service that may respond slowly.
  • When preventing a request from hanging indefinitely.

Why Use It?

  • Ensures slow services don’t block system resources.
  • Prevents user frustration due to long waits.

Example:

  • A banking app requests a user's transaction history.
  • If the request takes longer than 3 seconds, it times out and returns a default response to avoid locking the user interface.

Tools: Spring Boot, Netflix Ribbon


6. Rate Limiting Pattern

When to Use?

  • When preventing excessive API requests from overloading services.
  • When managing quota-based API consumption.

Why Use It?

  • Protects services from DDoS attacks and spikes in traffic.
  • Ensures fair usage across clients.

Example:

  • A stock trading platform limits each user to 100 API calls per minute.
  • If a user exceeds the limit, they receive an error message: "Rate limit exceeded, try again later."

Tools: Kong API Gateway, AWS API Gateway, Nginx


7. Idempotency Pattern

When to Use?

  • When ensuring duplicate requests don’t cause unintended effects.
  • When dealing with financial transactions or order processing.

Why Use It?

  • Prevents accidental duplicate processing.
  • Ensures consistency even if a request is retried due to failures.

Example:

  • A payment service processes a request to charge $100.
  • If a network issue causes the client to retry the request, the system checks if it was already processed to avoid charging twice.

Tools: Unique Request IDs, Idempotency Keys (Stripe, PayPal)


8. Shadow Traffic Testing Pattern

When to Use?

  • When testing a new service version without affecting real users.
  • When ensuring a system can handle increased load before deployment.

Why Use It?

  • Identifies potential failures before releasing changes.
  • Helps validate resiliency under real traffic conditions.

Example:

  • A ride-sharing app launches a new Matching Algorithm.
  • It receives duplicate traffic alongside the existing system but doesn’t affect real users, allowing engineers to measure impact safely.

Tools: AWS Traffic Mirroring, Nginx Traffic Splitting


Final Thoughts

Resiliency patterns help microservices handle failures effectively and maintain a seamless user experience. Here’s a quick summary of when to use each pattern:

By implementing these patterns, you can build a fault-tolerant, scalable, and robust microservices architecture.

No comments:

Post a Comment