Google Kubernetes Engine (GKE)
This document provides a detailed overview of our application hosting architecture on Google Kubernetes Engine (GKE). It covers our rationale for choosing GKE, the cluster setup, networking, application configuration, and deployment strategies.
History and Rationale
Before 2020, our application was deployed on managed AWS EC2 hosts using the Capistrano gem and the Passenger web server. As part of a major modernization effort, we migrated to a container-based workflow using Docker and Kubernetes.
The primary goals of this migration were to achieve:
- Cloud Independence: Avoid vendor lock-in and create a more portable infrastructure.
- Scalability: Easily handle fluctuations in traffic and load.
- Reliability: Ensure high availability and fault tolerance for our services.
Kubernetes inherently provides these benefits. Scalability is achieved through horizontal pod autoscaling, which automatically adjusts the number of running application instances based on resource utilization, and node autoscaling, which adjusts the size of the cluster itself. Reliability is built-in through features like self-healing (restarting failed containers), rolling updates for zero-downtime deployments, and sophisticated health checks.
We chose GKE as our managed Kubernetes provider because it was invented by Google, which means it has first-class support and is deeply integrated with the Google Cloud ecosystem. This allows us to focus more on our application and less on managing the underlying infrastructure.
GKE Cluster Architecture
All our environments, from QA to Production, run within a single GKE cluster to simplify management. The cluster is configured with high availability and cost-effectiveness in mind.
- Region and Zones: We currently operate in the
us-east1region, with our nodes distributed across 3 availability zones for high availability. We may evaluate a multi-region setup in the future if necessary. - Node Pools: We utilize two distinct node pools:
production-general: A dedicated pool for our production workloads, ensuring they have isolated, stable resources.preemptible-pool: A cost-effective pool using preemptible VMs for all our QA and staging environments.
- Node Configuration:
- Machine Type:
e2-standard-4(4 vCPU, 16 GB Memory). - Image Type: Container-Optimized OS with
containerd, a secure and efficient operating system designed for running containers. - Node Autoscaling: Enabled for both node pools, allowing the cluster to automatically add or remove nodes based on overall resource demand.
- Machine Type:
Networking and Load Balancing
We leverage GKE Ingress for all external load balancing. This approach provides several key benefits out-of-the-box:
- Managed Services: We get free access to powerful Google Cloud services like Cloud CDN for caching static assets closer to our users and Cloud Armor for DDoS and web-application firewall protection.
- Automated SSL: GKE automatically provisions and renews SSL certificates for our domains (
www.catercow.com,qa.catercow.com) via theManagedCertificateresource. - Container-Native Routing: Ingress uses container-native load balancing, routing traffic directly to the appropriate service pods based on the URL path (
/api,/caterer,/customer/, etc.) defined iningress.yaml. This eliminates the need for an extra proxy layer, simplifying the architecture and reducing latency. All HTTP traffic is also automatically redirected to HTTPS.
The service.yaml file defines NodePort services for our applications, which are then exposed externally by the Ingress controller based on the routing rules.
Application Deployment on GKE
Our platform is composed of several microservices, each deployed as a separate workload in Kubernetes.
Production Resource Allocation
Below are the CPU and memory resources allocated to our main production services. All workloads are scheduled to run on the production-general node pool. We currently provision more resources than necessary, as the cost is minimal, and it provides a large buffer for traffic spikes.
| Service | Replica Count (Min-Max) | CPU Request | Memory Request | CPU Limit | Memory Limit |
|---|---|---|---|---|---|
| API | 3-10 (Autoscaled) | 500m | 3 Gi | Not Set | 4 Gi |
| Sidekiq | 1 | 1 | 2 Gi | 3 | 8 Gi |
| Nuxt Public | 2-4 (Autoscaled) | 500m | 250 Mi | Not Set | 1 Gi |
| Nuxt Caterer | 2 | 500m | 250 Mi | Not Set | 1 Gi |
| Nuxt Admin | 2 | 500m | 250 Mi | Not Set | 1 Gi |
| Nuxt Marketing | 2-6 (Autoscaled) | 500m | 250 Mi | Not Set | 2 Gi |
| Expo Attendee | 2 | 250m | 250 Mi | Not Set | 1 Gi |
Once the remaining SEO pages have been moved from Nuxt Public to Nuxt Marketing, all Nuxt 2 apps will be deployed statically as SPAs and served alongside the expo web export in a single nginx container for simplicity.
Scalability and Reliability Features
Our GKE architecture includes several key features to ensure our applications scale efficiently and run reliably with high availability.
Autoscaling
To handle fluctuating traffic demands automatically, we use the Horizontal Pod Autoscaler (HPA) for our key front-facing services. The HPA automatically increases or decreases the number of running pods based on CPU load.
- The
apideployment scales between 3 and 10 pods. - The
nuxt-publicdeployment (a Nuxt 2 application) scales between 2 and 4 pods. - The
nuxt3-marketingdeployment scales between 2 and 6 pods. - For all autoscaled services, a new pod is added when the average CPU utilization across all pods exceeds 90%.
Other services, like nuxt-caterer and nuxt-admin, run with a fixed replica count of 2 for redundancy. The sidekiq replica count is intentionally kept at 1 to prevent its cron jobs from being scheduled multiple times.
Health Checks and Reliability
We have implemented several mechanisms to ensure our services are robust and deployments are seamless.
- Readiness Probes: Every application deployment has a
readinessProbeconfigured. This probe checks if an application is ready to accept traffic before allowing the service to route requests to it. This is crucial for achieving zero-downtime deployments, as it prevents traffic from being sent to a pod that is still starting up. - Graceful Shutdown for Sidekiq: The
sidekiqdeployment is configured for graceful termination. It has aterminationGracePeriodSecondsof 60 seconds and apreStoplifecycle hook that quiets the workers. This ensures any long-running background jobs can finish safely before the pod is shut down during a deployment or node event, which is critical for data integrity. - Pod Affinity: The
nuxt3-marketingdeployment is configured with a pod affinity rule. This rule ensures that its pods are scheduled in the same availability zone as pods from ourapideployment, potentially reducing network latency between these two closely related services.
Deployment and Automation
Our deployment process is designed to be consistent, repeatable, and fully automated.
- Helm for Deployments: We use Helm to package and manage our Kubernetes applications. All our Kubernetes manifest files (
deployment.yaml,service.yaml, etc.) are Helm templates. - Environment Configuration: Instead of maintaining separate sets of manifests for each environment, we use a single set of templates. Environment-specific configurations (like replica counts, domain names, and resource limits) are passed in via values files (
values.production.yaml,values.qa.yaml). This approach ensures that our application is deployed in the exact same way across all environments, giving us high confidence that changes tested in staging will work in production. - CI/CD Pipeline: Deployments are fully automated via Semaphore CI. We do not make manual changes using the GKE dashboard; it is used for monitoring purposes only.
- Disaster Recovery: While we rely on Semaphore, it is not a hard dependency. In the event of an outage, deployments can be run manually by an engineer with the appropriate permissions using the same Helm commands and scripts from their local machine.