Skip to content

Cloud Architecture Frameworks

A comprehensive guide to cloud architecture frameworks and best practices across AWS, Azure, and Google Cloud Platform.


Purpose

Cloud architecture frameworks provide structured approaches to:

  • Design scalable, resilient, and secure cloud systems
  • Make informed architectural trade-offs
  • Leverage cloud provider best practices
  • Ensure operational excellence
  • Optimize costs

Well-Architected Frameworks

All major cloud providers offer "Well-Architected" frameworks based on core pillars of cloud design.

AWS Well-Architected Framework

Link: AWS Well-Architected Framework

Six Pillars:

1. Operational Excellence

Focus: Run and monitor systems to deliver business value and continually improve processes.

Key Practices:

  • Infrastructure as Code (IaC)
  • Frequent, small, reversible changes
  • Anticipate failure and learn from operational events
  • Runbooks and playbooks for operations

AWS Services: CloudFormation, Systems Manager, CloudWatch, X-Ray


2. Security

Focus: Protect information, systems, and assets while delivering business value.

Key Practices:

  • Implement strong identity foundation (IAM, least privilege)
  • Enable traceability (CloudTrail, Config, CloudWatch Logs)
  • Apply security at all layers
  • Automate security best practices
  • Protect data in transit and at rest
  • Prepare for security events

AWS Services: IAM, KMS, GuardDuty, Security Hub, WAF, Shield


3. Reliability

Focus: Ensure workloads perform intended functions correctly and consistently.

Key Practices:

  • Automatic recovery from failure
  • Test recovery procedures
  • Scale horizontally
  • Stop guessing capacity (use auto-scaling)
  • Manage change through automation

AWS Services: Auto Scaling, Multi-AZ deployments, Route 53, Elastic Load Balancing


4. Performance Efficiency

Focus: Use computing resources efficiently to meet requirements and maintain efficiency as demand changes.

Key Practices:

  • Democratize advanced technologies
  • Go global in minutes
  • Use serverless architectures
  • Experiment more often
  • Consider mechanical sympathy

AWS Services: Lambda, EC2 instance types, EBS volume types, CloudFront


5. Cost Optimization

Focus: Avoid unnecessary costs.

Key Practices:

  • Implement cloud financial management
  • Adopt a consumption model
  • Measure overall efficiency
  • Stop spending on undifferentiated heavy lifting
  • Analyze and attribute expenditure

AWS Services: Cost Explorer, Budgets, Reserved Instances, Savings Plans, S3 Intelligent-Tiering


6. Sustainability

Focus: Minimize environmental impact of cloud workloads.

Key Practices:

  • Understand your impact
  • Establish sustainability goals
  • Maximize utilization
  • Anticipate and adopt new, more efficient hardware and software
  • Use managed services
  • Reduce downstream impact

AWS Services: EC2 Auto Scaling, Graviton processors, S3 Intelligent-Tiering


Microsoft Azure Well-Architected Framework

Link: Azure Well-Architected Framework

Five Pillars:

1. Reliability

Focus: Ensure application can recover from failures and continue to function.

Key Practices:

  • Define availability and recovery targets (SLA, RTO, RPO)
  • Build redundancy and resilience
  • Design for scaling
  • Test disaster recovery

Azure Services: Availability Zones, Availability Sets, Azure Site Recovery, Traffic Manager


2. Security

Focus: Protect applications and data from threats.

Key Practices:

  • Plan security readiness
  • Design to protect confidentiality, integrity, availability
  • Embed security in all layers
  • Maintain governance and compliance

Azure Services: Microsoft Entra ID, Key Vault, Defender for Cloud, Azure Policy


3. Cost Optimization

Focus: Manage costs to maximize value delivered.

Key Practices:

  • Develop cost-management discipline
  • Design with cost-efficiency in mind
  • Optimize over time
  • Use monitoring and analytics

Azure Services: Cost Management, Advisor, Reserved Instances, Azure Hybrid Benefit


4. Operational Excellence

Focus: Keep application running in production reliably.

Key Practices:

  • Embrace DevOps culture
  • Establish development standards
  • Evolve operations with observability
  • Automate operations tasks

Azure Services: Azure Monitor, Application Insights, Azure Automation, Azure DevOps


5. Performance Efficiency

Focus: Adapt to changes in load efficiently.

Key Practices:

  • Define performance targets
  • Design for scalability
  • Optimize code, data, and infrastructure
  • Continuously monitor and optimize

Azure Services: Azure Monitor, Application Insights, Virtual Machine Scale Sets, Azure CDN


Google Cloud Architecture Framework

Link: Google Cloud Architecture Framework

Five Pillars:

1. Operational Excellence

Focus: Efficiently deploy, operate, monitor, and manage cloud workloads.

Key Practices:

  • Design for DevOps and SRE
  • Implement comprehensive monitoring and observability
  • Release and deploy with velocity and safety
  • Provision infrastructure with configuration management

GCP Services: Cloud Monitoring, Cloud Logging, Cloud Trace, Deployment Manager


2. Security, Privacy, and Compliance

Focus: Maximize security, ensure privacy, maintain compliance.

Key Practices:

  • Design with security in mind
  • Protect data in transit and at rest
  • Implement strong identity and access management
  • Log and monitor all access
  • Ensure compliance with regulations

GCP Services: Cloud IAM, Cloud KMS, Security Command Center, VPC Service Controls


3. Reliability

Focus: Design systems that are resilient and highly available.

Key Practices:

  • Design for high availability
  • Design for scale and growth
  • Design for resilient and durable data storage
  • Implement disaster recovery

GCP Services: Regional and multi-regional resources, Cloud Load Balancing, Cloud SQL, Cloud Storage


4. Cost Optimization

Focus: Maximize business value while minimizing costs.

Key Practices:

  • Plan for cost optimization from the start
  • Manage costs proactively
  • Optimize resource usage
  • Use committed use discounts and sustained use discounts

GCP Services: Cloud Billing, Recommender, Committed Use Discounts


5. Performance Optimization

Focus: Allocate and manage resources to meet performance requirements.

Key Practices:

  • Design for performance from the start
  • Monitor and measure performance
  • Optimize compute, storage, and network resources
  • Use caching and content delivery networks

GCP Services: Cloud CDN, Cloud Memorystore, Custom Machine Types, Premium Network Tier


Cloud Migration Strategies - The 6 R's

When migrating to the cloud, organisations typically follow one of six strategies:

1. Rehost ("Lift and Shift")

Description: Move applications to cloud without changes.

When to Use:

  • Quick migration required
  • Minimal business disruption needed
  • Skills gap in cloud-native development

Pros:

  • Fast migration
  • Low risk
  • Minimal changes

Cons:

  • Doesn't leverage cloud benefits
  • Higher long-term costs
  • Technical debt carried forward

Example: Move on-premises VM to EC2/Azure VM/Compute Engine with minimal modification.


2. Replatform ("Lift, Tinker, and Shift")

Description: Make minimal cloud optimizations without changing core architecture.

When to Use:

  • Want some cloud benefits without major redesign
  • Opportunity for easy optimizations exists

Pros:

  • Moderate cloud benefits
  • Relatively low risk
  • Faster than full refactor

Cons:

  • Partial cloud benefit realization
  • May require future refactoring

Example: Migrate database from on-premises SQL Server to Azure SQL Database (PaaS) instead of SQL on VM (IaaS).


3. Repurchase ("Drop and Shop")

Description: Replace existing application with cloud-native SaaS alternative.

When to Use:

  • SaaS alternative available and suitable
  • Want to exit custom software maintenance
  • Licensing costs high

Pros:

  • No infrastructure management
  • Automatic updates
  • Pay-as-you-go pricing

Cons:

  • Vendor lock-in
  • Data migration complexity
  • Customization limitations

Example: Replace on-premises Exchange with Microsoft 365, or on-premises CRM with Salesforce.


4. Refactor / Re-architect

Description: Redesign application to be cloud-native.

When to Use:

  • Need to add features, scale, performance
  • Want to maximize cloud benefits
  • Existing architecture has limitations

Pros:

  • Maximum cloud benefit
  • Improved scalability and resilience
  • Cost optimization opportunities

Cons:

  • Time-consuming
  • High cost upfront
  • Requires cloud-native skills

Example: Break monolithic application into microservices, use serverless (Lambda/Functions), containerize with Kubernetes.


5. Retire

Description: Decommission applications no longer needed.

When to Use:

  • Application redundant or unused
  • Functionality replaced by other systems
  • Cost of migration exceeds value

Pros:

  • Reduces complexity
  • Eliminates maintenance costs
  • Reduces attack surface

Example: Identify and shut down unused legacy applications discovered during migration assessment.


6. Retain (Revisit)

Description: Keep application on-premises for now.

When to Use:

  • Application requires major refactoring
  • Regulatory or compliance constraints
  • Not ready for cloud migration

Pros:

  • Defer decision to better time
  • Focus resources on high-value migrations
  • Avoid rushing critical systems

Example: Keep core banking system on-premises until cloud-ready replacement available.


Cloud-Native Architecture Principles

12-Factor App Methodology

Source: Originally created by Heroku, now widely adopted for cloud-native applications.

Link: 12factor.net

The 12 Factors:

  1. Codebase: One codebase tracked in version control, many deploys
  2. Dependencies: Explicitly declare and isolate dependencies
  3. Config: Store config in the environment (not in code)
  4. Backing Services: Treat backing services as attached resources
  5. Build, Release, Run: Strictly separate build and run stages
  6. Processes: Execute the app as one or more stateless processes
  7. Port Binding: Export services via port binding
  8. Concurrency: Scale out via the process model
  9. Disposability: Maximize robustness with fast startup and graceful shutdown
  10. Dev/Prod Parity: Keep development, staging, and production as similar as possible
  11. Logs: Treat logs as event streams
  12. Admin Processes: Run admin/management tasks as one-off processes

Microservices Architecture

Definition: Architectural style structuring application as collection of loosely coupled services.

Characteristics:

  • Services are small and focused on single business capability
  • Independently deployable
  • Organized around business capabilities
  • Decentralized governance and data management
  • Failure isolation

Benefits:

  • Independent scaling
  • Technology diversity
  • Fault isolation
  • Easier deployment and updates

Challenges:

  • Distributed system complexity
  • Inter-service communication overhead
  • Data consistency challenges
  • Testing complexity

When to Use: Large, complex applications requiring independent scaling and deployment of components.


Serverless Architecture

Definition: Cloud execution model where cloud provider manages infrastructure, executing code in response to events.

Characteristics:

  • No server management
  • Event-driven execution
  • Pay-per-execution pricing
  • Automatic scaling

Use Cases:

  • Event processing (file uploads, database changes)
  • APIs and web applications (via API Gateway)
  • Stream processing
  • Scheduled tasks (cron jobs)
  • Data transformation

AWS Services: Lambda, API Gateway, EventBridge, Step Functions Azure Services: Functions, Logic Apps, Event Grid GCP Services: Cloud Functions, Cloud Run, Eventarc

Benefits:

  • No infrastructure management
  • Cost-efficient for variable workloads
  • Automatic scaling
  • Built-in high availability

Challenges:

  • Cold start latency
  • Execution time limits
  • Vendor lock-in
  • Debugging complexity

Cloud Design Patterns

Resilience Patterns

Circuit Breaker

Problem: Prevent cascading failures when dependent service fails.

Solution: Monitor for failures; if threshold exceeded, circuit "opens" and fast-fails subsequent requests. Periodically retry to check if service recovered.

Implementation: AWS App Mesh, Azure Service Fabric, Spring Cloud Circuit Breaker


Retry with Exponential Backoff

Problem: Transient failures cause operations to fail.

Solution: Retry failed operations with increasing delays between retries.

Example: 1st retry after 1s, 2nd after 2s, 3rd after 4s, etc.


Bulkhead

Problem: Failure in one component exhausts resources for entire application.

Solution: Isolate resources (connection pools, threads) per service to contain failures.


Data Management Patterns

Database per Service (Microservices)

Problem: Shared database creates tight coupling between services.

Solution: Each microservice has its own database schema/instance.

Trade-off: Improved isolation vs. data consistency challenges.


Event Sourcing

Problem: Capturing all changes to application state difficult.

Solution: Store all changes as sequence of events rather than current state.

Benefits: Complete audit trail, event replay, temporal queries


CQRS (Command Query Responsibility Segregation)

Problem: Same model for reads and writes causes complexity.

Solution: Separate models for reading data (queries) and updating data (commands).

When to Use: Complex domains with different read/write patterns.


Scalability Patterns

Auto Scaling

Problem: Manual capacity management inefficient.

Solution: Automatically adjust resources based on demand metrics.

Types:

  • Horizontal (add/remove instances)
  • Vertical (increase/decrease instance size)
  • Predictive (ML-based forecasting)

Load Balancing

Problem: Distribute traffic across multiple instances.

Types:

  • Application Load Balancer (Layer 7 - HTTP/HTTPS)
  • Network Load Balancer (Layer 4 - TCP/UDP)
  • Global Load Balancer (multi-region)

Caching

Problem: Reduce load on backend systems and improve response times.

Strategies:

  • Cache-Aside: Application reads from cache, loads from DB on miss
  • Read-Through: Cache loads data automatically on miss
  • Write-Through: Write to cache and DB simultaneously
  • Write-Behind: Write to cache, async write to DB

Services: Redis (ElastiCache, Azure Cache, Memorystore), CloudFront/CDN


Security Patterns

Secrets Management

Problem: Hardcoded credentials in code create security risk.

Solution: Store secrets in dedicated secret management service.

Services: AWS Secrets Manager, Azure Key Vault, GCP Secret Manager


Defense in Depth

Problem: Single security layer insufficient.

Solution: Multiple layers of security controls.

Layers:

  1. Perimeter (firewall, DDoS protection)
  2. Network (VPC, security groups, NACLs)
  3. Compute (OS hardening, EDR)
  4. Application (WAF, input validation)
  5. Data (encryption at rest and in transit)
  6. Identity (IAM, MFA)

Multi-Cloud and Hybrid Cloud Architecture

Multi-Cloud Strategy

Definition: Using multiple cloud providers for different workloads or redundancy.

Reasons:

  • Avoid vendor lock-in
  • Leverage best-of-breed services
  • Geographic compliance requirements
  • Business continuity (provider failure mitigation)

Challenges:

  • Increased complexity
  • Skills gap (multiple platforms)
  • Data transfer costs
  • Inconsistent security controls

Best Practices:

  • Use cloud-agnostic tools (Terraform, Kubernetes)
  • Centralized identity management (federated SSO)
  • Unified monitoring and logging
  • Consistent security policies

Hybrid Cloud Architecture

Definition: Combining on-premises infrastructure with cloud resources.

Use Cases:

  • Gradual cloud migration
  • Data sovereignty requirements
  • Low-latency requirements
  • Legacy system dependencies

Connectivity Options:

  • VPN: Encrypted connection over internet
  • Dedicated Connection: AWS Direct Connect, Azure ExpressRoute, GCP Interconnect
  • SD-WAN: Software-defined WAN for multi-site connectivity

Challenges:

  • Network latency and bandwidth
  • Identity synchronization
  • Data consistency
  • Compliance complexity

Cloud Networking Architecture

Network Segmentation

VPC/VNet Design:

  • Separate VPCs/VNets per environment (dev, test, prod)
  • Separate VPCs/VNets per application or business unit
  • Use subnets to separate tiers (web, app, data)

Subnet Strategy:

  • Public Subnets: Internet-facing resources (load balancers, NAT gateways)
  • Private Subnets: Application servers, databases
  • DMZ/Perimeter Subnets: Security appliances, bastion hosts

Hub-and-Spoke Topology

Description: Central hub VPC/VNet connected to multiple spoke VPCs/VNets.

Benefits:

  • Centralized security controls (firewall, IDS/IPS)
  • Shared services (DNS, directory services)
  • Simplified management

Use Cases: Enterprises with multiple applications/business units.

AWS Implementation: Transit Gateway Azure Implementation: Virtual WAN, VNet peering GCP Implementation: VPC Network Peering, Cloud Interconnect


Cloud Storage Architecture

Storage Tiers and Lifecycle

AWS S3 Storage Classes:

  • S3 Standard: Frequently accessed data
  • S3 Intelligent-Tiering: Automatic tiering based on access patterns
  • S3 Standard-IA: Infrequently accessed data (monthly access)
  • S3 One Zone-IA: Infrequent access, single AZ
  • S3 Glacier Instant Retrieval: Archive, millisecond retrieval
  • S3 Glacier Flexible Retrieval: Archive, minutes-hours retrieval
  • S3 Glacier Deep Archive: Long-term archive, 12-hour retrieval

Azure Blob Storage Tiers:

  • Hot: Frequently accessed data
  • Cool: Infrequently accessed, 30-day minimum
  • Cold: Rarely accessed, 90-day minimum
  • Archive: Long-term archive, hours retrieval

GCP Storage Classes:

  • Standard: Frequently accessed
  • Nearline: Monthly access
  • Coldline: Quarterly access
  • Archive: Annual access

Best Practice: Implement lifecycle policies to automatically transition data to lower-cost tiers.


Cloud Database Architecture

Database Selection Guide

Workload Type AWS Azure GCP When to Use
Relational (OLTP) RDS, Aurora Azure SQL, PostgreSQL Cloud SQL Structured data, ACID transactions
NoSQL (Document) DocumentDB Cosmos DB Firestore Flexible schema, JSON documents
NoSQL (Key-Value) DynamoDB Table Storage, Cosmos DB Bigtable, Firestore Simple lookups, session storage
NoSQL (Wide Column) Keyspaces (Cassandra) Cosmos DB (Cassandra) Bigtable Time-series, IoT, high throughput
Graph Neptune Cosmos DB (Gremlin) - Relationships, social networks
In-Memory ElastiCache (Redis/Memcached) Cache for Redis Memorystore Caching, real-time analytics
Data Warehouse Redshift Synapse Analytics BigQuery Analytics, OLAP, BI

Database Scaling Strategies

Vertical Scaling (Scale Up): - Increase instance size (CPU, RAM) - Simpler but has limits - Requires downtime

Horizontal Scaling (Scale Out): - Add read replicas (read-heavy workloads) - Sharding (partition data across instances) - More complex but unlimited scaling

Multi-Region Replication: - Low latency for global users - Disaster recovery - Increased cost and complexity


Cost Optimization Strategies

Right-Sizing

  • Analyze resource utilization
  • Select appropriate instance types
  • Use burstable instances (T-series) for variable workloads

Reserved Capacity

  • Reserved Instances (1 or 3-year commitment): Up to 75% savings
  • Savings Plans: Flexible commitment-based discounts
  • Spot Instances: Up to 90% savings for interruptible workloads

Auto-Scaling

  • Scale down during off-hours
  • Use scheduled scaling for predictable patterns
  • Use target tracking for dynamic scaling

Storage Optimization

  • Implement lifecycle policies
  • Use appropriate storage classes
  • Delete unused snapshots and old backups
  • Enable S3 Intelligent-Tiering

Network Optimization

  • Minimize cross-region data transfer
  • Use CloudFront/CDN to reduce origin requests
  • Use VPC endpoints to avoid NAT gateway costs

Quick Selection Guide

Organisation Profile Recommended Cloud Strategy
Startup Single cloud (AWS/Azure/GCP), serverless where possible, managed services
SMB Single cloud, mix of IaaS and PaaS, gradual cloud-native adoption
Enterprise (single cloud) Well-Architected Framework adherence, landing zone, centralized governance
Enterprise (multi-cloud) Cloud-agnostic tools (Terraform, Kubernetes), unified security/monitoring
Regulated (financial, healthcare) Hybrid cloud, data residency controls, compliance-focused architecture
Global SaaS provider Multi-region, global load balancing, CDN, microservices

Common Cloud Architecture Mistakes

  1. Over-architecting initially: Start simple, evolve architecture
  2. Ignoring costs: No cost monitoring or optimization
  3. Single point of failure: No redundancy or multi-AZ deployment
  4. Lift-and-shift without optimization: Missing cloud benefits
  5. No disaster recovery plan: Assuming cloud provider handles everything
  6. Poor network design: Inadequate segmentation or overly complex routing
  7. Inadequate monitoring: No observability into system health
  8. Vendor lock-in without intention: Using proprietary services without considering portability
  9. Security as afterthought: Not designing security from the start
  10. No tagging strategy: Unable to track costs or resources by project/owner