Google Kubernetes Engine (GKE)

This document provides a detailed overview of our application hosting architecture on Google Kubernetes Engine (GKE). It covers our rationale for choosing GKE, the cluster setup, networking, application configuration, and deployment strategies.

History and Rationale

Before 2020, our application was deployed on managed AWS EC2 hosts using the Capistrano gem and the Passenger web server. As part of a major modernization effort, we migrated to a container-based workflow using Docker and Kubernetes.

The primary goals of this migration were to achieve:

Cloud Independence: Avoid vendor lock-in and create a more portable infrastructure.
Scalability: Easily handle fluctuations in traffic and load.
Reliability: Ensure high availability and fault tolerance for our services.

Kubernetes inherently provides these benefits. Scalability is achieved through horizontal pod autoscaling, which automatically adjusts the number of running application instances based on resource utilization, and node autoscaling, which adjusts the size of the cluster itself. Reliability is built-in through features like self-healing (restarting failed containers), rolling updates for zero-downtime deployments, and sophisticated health checks.

We chose GKE as our managed Kubernetes provider because it was invented by Google, which means it has first-class support and is deeply integrated with the Google Cloud ecosystem. This allows us to focus more on our application and less on managing the underlying infrastructure.

GKE Cluster Architecture

All our environments, from QA to Production, run within a single GKE cluster to simplify management. The cluster is configured with high availability and cost-effectiveness in mind.

Region and Zones: We currently operate in the us-east1 region, with our nodes distributed across 3 availability zones for high availability. We may evaluate a multi-region setup in the future if necessary.
Node Pools: We utilize two distinct node pools:
- production-general: A dedicated pool for our production workloads, ensuring they have isolated, stable resources.
- preemptible-pool: A cost-effective pool using preemptible VMs for all our QA and staging environments.
Node Configuration:
- Machine Type: e2-standard-4 (4 vCPU, 16 GB Memory).
- Image Type: Container-Optimized OS with containerd, a secure and efficient operating system designed for running containers.
- Node Autoscaling: Enabled for both node pools, allowing the cluster to automatically add or remove nodes based on overall resource demand.

Networking and Load Balancing

We leverage GKE Ingress for all external load balancing. This approach provides several key benefits out-of-the-box:

Managed Services: We get free access to powerful Google Cloud services like Cloud CDN for caching static assets closer to our users and Cloud Armor for DDoS and web-application firewall protection.
Automated SSL: GKE automatically provisions and renews SSL certificates for our domains (www.catercow.com, qa.catercow.com) via the ManagedCertificate resource.
Container-Native Routing: Ingress uses container-native load balancing, routing traffic directly to the appropriate service pods based on the URL path (/api, /caterer, /customer/, etc.) defined in ingress.yaml. This eliminates the need for an extra proxy layer, simplifying the architecture and reducing latency. All HTTP traffic is also automatically redirected to HTTPS.

The service.yaml file defines NodePort services for our applications, which are then exposed externally by the Ingress controller based on the routing rules.

Application Deployment on GKE

Our platform is composed of several microservices, each deployed as a separate workload in Kubernetes.

Production Resource Allocation

Below are the CPU and memory resources allocated to our main production services. All workloads are scheduled to run on the production-general node pool. We currently provision more resources than necessary, as the cost is minimal, and it provides a large buffer for traffic spikes.

Service	Replica Count (Min-Max)	CPU Request	Memory Request	CPU Limit	Memory Limit
API	3-10 (Autoscaled)	500m	3 Gi	Not Set	4 Gi
Sidekiq	1	1	2 Gi	3	8 Gi
Nuxt Public	2-4 (Autoscaled)	500m	250 Mi	Not Set	1 Gi
Nuxt Caterer	2	500m	250 Mi	Not Set	1 Gi
Nuxt Admin	2	500m	250 Mi	Not Set	1 Gi
Nuxt Marketing	2-6 (Autoscaled)	500m	250 Mi	Not Set	2 Gi
Expo Attendee	2	250m	250 Mi	Not Set	1 Gi

Once the remaining SEO pages have been moved from Nuxt Public to Nuxt Marketing, all Nuxt 2 apps will be deployed statically as SPAs and served alongside the expo web export in a single nginx container for simplicity.

Scalability and Reliability Features

Our GKE architecture includes several key features to ensure our applications scale efficiently and run reliably with high availability.

Autoscaling

To handle fluctuating traffic demands automatically, we use the Horizontal Pod Autoscaler (HPA) for our key front-facing services. The HPA automatically increases or decreases the number of running pods based on CPU load.

The api deployment scales between 3 and 10 pods.
The nuxt-public deployment (a Nuxt 2 application) scales between 2 and 4 pods.
The nuxt3-marketing deployment scales between 2 and 6 pods.
For all autoscaled services, a new pod is added when the average CPU utilization across all pods exceeds 90%.

Other services, like nuxt-caterer and nuxt-admin, run with a fixed replica count of 2 for redundancy. The sidekiq replica count is intentionally kept at 1 to prevent its cron jobs from being scheduled multiple times.

Health Checks and Reliability

We have implemented several mechanisms to ensure our services are robust and deployments are seamless.

Readiness Probes: Every application deployment has a readinessProbe configured. This probe checks if an application is ready to accept traffic before allowing the service to route requests to it. This is crucial for achieving zero-downtime deployments, as it prevents traffic from being sent to a pod that is still starting up.
Graceful Shutdown for Sidekiq: The sidekiq deployment is configured for graceful termination. It has a terminationGracePeriodSeconds of 60 seconds and a preStop lifecycle hook that quiets the workers. This ensures any long-running background jobs can finish safely before the pod is shut down during a deployment or node event, which is critical for data integrity.
Pod Affinity: The nuxt3-marketing deployment is configured with a pod affinity rule. This rule ensures that its pods are scheduled in the same availability zone as pods from our api deployment, potentially reducing network latency between these two closely related services.

Deployment and Automation

Our deployment process is designed to be consistent, repeatable, and fully automated.

Helm for Deployments: We use Helm to package and manage our Kubernetes applications. All our Kubernetes manifest files (deployment.yaml, service.yaml, etc.) are Helm templates.
Environment Configuration: Instead of maintaining separate sets of manifests for each environment, we use a single set of templates. Environment-specific configurations (like replica counts, domain names, and resource limits) are passed in via values files (values.production.yaml, values.qa.yaml). This approach ensures that our application is deployed in the exact same way across all environments, giving us high confidence that changes tested in staging will work in production.
CI/CD Pipeline: Deployments are fully automated via Semaphore CI. We do not make manual changes using the GKE dashboard; it is used for monitoring purposes only.
Disaster Recovery: While we rely on Semaphore, it is not a hard dependency. In the event of an outage, deployments can be run manually by an engineer with the appropriate permissions using the same Helm commands and scripts from their local machine.