☁️ Cloud28 de dezembro de 2024

Arquiteturas Escaláveis na AWS: Design Patterns e Melhores Práticas

Aprenda a projetar e implementar arquiteturas altamente escaláveis na AWS usando design patterns comprovados e serviços gerenciados.

Arquiteturas Escaláveis na AWS: Design Patterns e Melhores Práticas

A construção de arquiteturas escaláveis na AWS requer compreensão profunda dos serviços disponíveis e aplicação de design patterns comprovados. Este guia aborda estratégias práticas para criar sistemas que crescem com sua demanda.

Princípios de Arquitetura Escalável

Escalabilidade Horizontal vs Vertical

Escalabilidade Vertical (Scale Up):

  • Aumentar recursos de uma instância
  • Limitado pelo hardware máximo
  • Downtime durante upgrade
  • Adequado para aplicações monolíticas

Escalabilidade Horizontal (Scale Out):

  • Adicionar mais instâncias
  • Teoricamente ilimitado
  • Zero downtime
  • Requer arquitetura distribuída

Design Patterns Fundamentais

Stateless Applications:

# Exemplo de aplicação stateless
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: myapp:latest
        ports:
        - containerPort: 8080
        env:
        - name: DB_HOST
          value: "rds-endpoint.amazonaws.com"
        - name: REDIS_HOST
          value: "elasticache-endpoint.amazonaws.com"

Serviços de Computação Escalável

Auto Scaling Groups

Configuração básica:

{
  "AutoScalingGroupName": "web-app-asg",
  "MinSize": 2,
  "MaxSize": 20,
  "DesiredCapacity": 4,
  "LaunchTemplate": {
    "LaunchTemplateName": "web-app-template",
    "Version": "$Latest"
  },
  "VPCZoneIdentifier": [
    "subnet-12345678",
    "subnet-87654321"
  ],
  "TargetGroupARNs": [
    "arn:aws:elasticloadbalancing:region:account:targetgroup/web-app-tg/1234567890123456"
  ],
  "HealthCheckType": "ELB",
  "HealthCheckGracePeriod": 300
}

Políticas de scaling:

{
  "PolicyName": "cpu-scale-out",
  "PolicyType": "TargetTrackingScaling",
  "TargetTrackingConfiguration": {
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleOutCooldown": 300,
    "ScaleInCooldown": 300
  }
}

Elastic Load Balancing

Application Load Balancer (ALB):

# CloudFormation template
Resources:
  ApplicationLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Name: web-app-alb
      Scheme: internet-facing
      Type: application
      Subnets:
        - !Ref PublicSubnet1
        - !Ref PublicSubnet2
      SecurityGroups:
        - !Ref ALBSecurityGroup

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Name: web-app-tg
      Port: 80
      Protocol: HTTP
      VpcId: !Ref VPC
      HealthCheckPath: /health
      HealthCheckIntervalSeconds: 30
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 5

Container Orchestration

Amazon ECS com Fargate:

{
  "family": "web-app-task",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "256",
  "memory": "512",
  "executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
  "containerDefinitions": [
    {
      "name": "web-app",
      "image": "account.dkr.ecr.region.amazonaws.com/web-app:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "protocol": "tcp"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/web-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      }
    }
  ]
}

ECS Service com Auto Scaling:

{
  "serviceName": "web-app-service",
  "cluster": "production-cluster",
  "taskDefinition": "web-app-task:1",
  "desiredCount": 4,
  "launchType": "FARGATE",
  "networkConfiguration": {
    "awsvpcConfiguration": {
      "subnets": ["subnet-12345678", "subnet-87654321"],
      "securityGroups": ["sg-12345678"],
      "assignPublicIp": "DISABLED"
    }
  },
  "loadBalancers": [
    {
      "targetGroupArn": "arn:aws:elasticloadbalancing:region:account:targetgroup/web-app-tg/1234567890123456",
      "containerName": "web-app",
      "containerPort": 8080
    }
  ]
}

Banco de Dados Escalável

Amazon RDS com Read Replicas

Configuração Multi-AZ:

Resources:
  DatabaseInstance:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceIdentifier: production-db
      DBInstanceClass: db.r5.xlarge
      Engine: postgres
      EngineVersion: "13.7"
      AllocatedStorage: 100
      StorageType: gp2
      StorageEncrypted: true
      MultiAZ: true
      VPCSecurityGroups:
        - !Ref DatabaseSecurityGroup
      DBSubnetGroupName: !Ref DatabaseSubnetGroup
      BackupRetentionPeriod: 7
      PreferredBackupWindow: "03:00-04:00"
      PreferredMaintenanceWindow: "sun:04:00-sun:05:00"

  ReadReplica1:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceIdentifier: production-db-read-1
      DBInstanceClass: db.r5.large
      SourceDBInstanceIdentifier: !Ref DatabaseInstance
      PubliclyAccessible: false

Amazon DynamoDB

Configuração com Auto Scaling:

import boto3

dynamodb = boto3.client('dynamodb')

# Criar tabela com billing mode on-demand
table_response = dynamodb.create_table(
    TableName='UserSessions',
    KeySchema=[
        {
            'AttributeName': 'user_id',
            'KeyType': 'HASH'
        },
        {
            'AttributeName': 'session_id',
            'KeyType': 'RANGE'
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'user_id',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'session_id',
            'AttributeType': 'S'
        }
    ],
    BillingMode='ON_DEMAND',
    StreamSpecification={
        'StreamEnabled': True,
        'StreamViewType': 'NEW_AND_OLD_IMAGES'
    }
)

Caching e Performance

Amazon ElastiCache

Redis Cluster:

Resources:
  RedisSubnetGroup:
    Type: AWS::ElastiCache::SubnetGroup
    Properties:
      Description: Subnet group for Redis cluster
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2

  RedisCluster:
    Type: AWS::ElastiCache::ReplicationGroup
    Properties:
      ReplicationGroupId: production-redis
      ReplicationGroupDescription: Production Redis cluster
      NumCacheClusters: 3
      Engine: redis
      CacheNodeType: cache.r5.large
      CacheSubnetGroupName: !Ref RedisSubnetGroup
      SecurityGroupIds:
        - !Ref RedisSecurityGroup
      AtRestEncryptionEnabled: true
      TransitEncryptionEnabled: true
      AutomaticFailoverEnabled: true
      MultiAZEnabled: true

Amazon CloudFront

Distribuição CDN:

{
  "DistributionConfig": {
    "CallerReference": "production-cdn-2024",
    "Comment": "Production CDN distribution",
    "DefaultRootObject": "index.html",
    "Origins": [
      {
        "Id": "S3-production-static",
        "DomainName": "production-static.s3.amazonaws.com",
        "S3OriginConfig": {
          "OriginAccessIdentity": "origin-access-identity/cloudfront/E1234567890123"
        }
      },
      {
        "Id": "ALB-production-api",
        "DomainName": "api.production.com",
        "CustomOriginConfig": {
          "HTTPPort": 80,
          "HTTPSPort": 443,
          "OriginProtocolPolicy": "https-only"
        }
      }
    ],
    "DefaultCacheBehavior": {
      "TargetOriginId": "S3-production-static",
      "ViewerProtocolPolicy": "redirect-to-https",
      "CachePolicyId": "4135ea2d-6df8-44a3-9df3-4b5a84be39ad",
      "Compress": true
    },
    "CacheBehaviors": [
      {
        "PathPattern": "/api/*",
        "TargetOriginId": "ALB-production-api",
        "ViewerProtocolPolicy": "https-only",
        "CachePolicyId": "4135ea2d-6df8-44a3-9df3-4b5a84be39ad",
        "OriginRequestPolicyId": "88a5eaf4-2fd4-4709-b370-b4c650ea3fcf"
      }
    ],
    "Enabled": true,
    "PriceClass": "PriceClass_All"
  }
}

Arquiteturas Serverless

AWS Lambda com API Gateway

Função Lambda:

import json
import boto3
from decimal import Decimal

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')

def lambda_handler(event, context):
    try:
        # Extrair dados do evento
        user_id = event['pathParameters']['user_id']
        
        # Buscar usuário no DynamoDB
        response = table.get_item(
            Key={'user_id': user_id}
        )
        
        if 'Item' in response:
            # Converter Decimal para float para JSON
            item = json.loads(json.dumps(response['Item'], default=decimal_default))
            
            return {
                'statusCode': 200,
                'headers': {
                    'Content-Type': 'application/json',
                    'Access-Control-Allow-Origin': '*'
                },
                'body': json.dumps(item)
            }
        else:
            return {
                'statusCode': 404,
                'headers': {
                    'Content-Type': 'application/json',
                    'Access-Control-Allow-Origin': '*'
                },
                'body': json.dumps({'error': 'User not found'})
            }
            
    except Exception as e:
        return {
            'statusCode': 500,
            'headers': {
                'Content-Type': 'application/json',
                'Access-Control-Allow-Origin': '*'
            },
            'body': json.dumps({'error': str(e)})
        }

def decimal_default(obj):
    if isinstance(obj, Decimal):
        return float(obj)
    raise TypeError

API Gateway:

Resources:
  ApiGateway:
    Type: AWS::ApiGateway::RestApi
    Properties:
      Name: UserAPI
      Description: API for user management
      EndpointConfiguration:
        Types:
          - REGIONAL

  UsersResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      RestApiId: !Ref ApiGateway
      ParentId: !GetAtt ApiGateway.RootResourceId
      PathPart: users

  UserResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      RestApiId: !Ref ApiGateway
      ParentId: !Ref UsersResource
      PathPart: "{user_id}"

  GetUserMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      RestApiId: !Ref ApiGateway
      ResourceId: !Ref UserResource
      HttpMethod: GET
      AuthorizationType: NONE
      Integration:
        Type: AWS_PROXY
        IntegrationHttpMethod: POST
        Uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${GetUserFunction.Arn}/invocations"

Monitoramento e Observabilidade

CloudWatch Metrics e Alarms

Métricas customizadas:

import boto3
from datetime import datetime

cloudwatch = boto3.client('cloudwatch')

def put_custom_metric(metric_name, value, unit='Count', namespace='MyApp'):
    cloudwatch.put_metric_data(
        Namespace=namespace,
        MetricData=[
            {
                'MetricName': metric_name,
                'Value': value,
                'Unit': unit,
                'Timestamp': datetime.utcnow(),
                'Dimensions': [
                    {
                        'Name': 'Environment',
                        'Value': 'Production'
                    },
                    {
                        'Name': 'Service',
                        'Value': 'UserService'
                    }
                ]
            }
        ]
    )

# Exemplo de uso
put_custom_metric('UserRegistrations', 1)
put_custom_metric('ResponseTime', 150, 'Milliseconds')

Alarmes CloudWatch:

Resources:
  HighCPUAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: HighCPUUtilization
      AlarmDescription: Alarm when CPU exceeds 80%
      MetricName: CPUUtilization
      Namespace: AWS/EC2
      Statistic: Average
      Period: 300
      EvaluationPeriods: 2
      Threshold: 80
      ComparisonOperator: GreaterThanThreshold
      Dimensions:
        - Name: AutoScalingGroupName
          Value: !Ref AutoScalingGroup
      AlarmActions:
        - !Ref SNSTopic

  DatabaseConnectionsAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: HighDatabaseConnections
      AlarmDescription: Alarm when DB connections exceed 80% of max
      MetricName: DatabaseConnections
      Namespace: AWS/RDS
      Statistic: Average
      Period: 300
      EvaluationPeriods: 2
      Threshold: 80
      ComparisonOperator: GreaterThanThreshold
      Dimensions:
        - Name: DBInstanceIdentifier
          Value: !Ref DatabaseInstance

AWS X-Ray para Tracing

Instrumentação de aplicação:

from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
import requests

# Patch bibliotecas automaticamente
patch_all()

@xray_recorder.capture('process_user_request')
def process_user_request(user_id):
    # Criar subsegmento para operação de banco
    subsegment = xray_recorder.begin_subsegment('database_query')
    try:
        # Simular consulta ao banco
        user_data = get_user_from_database(user_id)
        subsegment.put_metadata('user_id', user_id)
        subsegment.put_annotation('user_type', user_data.get('type'))
    except Exception as e:
        subsegment.add_exception(e)
        raise
    finally:
        xray_recorder.end_subsegment()
    
    # Criar subsegmento para chamada externa
    subsegment = xray_recorder.begin_subsegment('external_api_call')
    try:
        response = requests.get(f'https://api.external.com/users/{user_id}')
        subsegment.put_metadata('response_status', response.status_code)
    except Exception as e:
        subsegment.add_exception(e)
        raise
    finally:
        xray_recorder.end_subsegment()
    
    return user_data

Otimização de Custos

Reserved Instances e Savings Plans

Análise de uso:

import boto3
from datetime import datetime, timedelta

ce = boto3.client('ce')

def analyze_ec2_usage():
    end_date = datetime.now().strftime('%Y-%m-%d')
    start_date = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
    
    response = ce.get_dimension_values(
        TimePeriod={
            'Start': start_date,
            'End': end_date
        },
        Dimension='SERVICE',
        Context='COST_AND_USAGE'
    )
    
    # Analisar custos por serviço
    cost_response = ce.get_cost_and_usage(
        TimePeriod={
            'Start': start_date,
            'End': end_date
        },
        Granularity='MONTHLY',
        Metrics=['BlendedCost'],
        GroupBy=[
            {
                'Type': 'DIMENSION',
                'Key': 'SERVICE'
            }
        ]
    )
    
    return cost_response

# Recomendações de Reserved Instances
def get_ri_recommendations():
    response = ce.get_reservation_purchase_recommendation(
        Service='EC2-Instance',
        LookbackPeriodInDays='SIXTY_DAYS',
        TermInYears='ONE_YEAR',
        PaymentOption='NO_UPFRONT'
    )
    
    return response['Recommendations']

Lifecycle Policies

S3 Lifecycle:

{
  "Rules": [
    {
      "ID": "LogsLifecycle",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

Disaster Recovery

Multi-Region Architecture

Cross-Region Replication:

Resources:
  # Primary region resources
  PrimaryDatabase:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceIdentifier: primary-db
      BackupRetentionPeriod: 7
      DeletionProtection: true

  # Cross-region read replica
  SecondaryDatabase:
    Type: AWS::RDS::DBInstance
    Properties:
      DBInstanceIdentifier: secondary-db
      SourceDBInstanceIdentifier: !Sub 
        - "arn:aws:rds:${PrimaryRegion}:${AWS::AccountId}:db:${PrimaryDBId}"
        - PrimaryRegion: us-east-1
          PrimaryDBId: !Ref PrimaryDatabase
      PubliclyAccessible: false

  # S3 Cross-Region Replication
  ReplicationRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: s3.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: ReplicationPolicy
          PolicyDocument:
            Statement:
              - Effect: Allow
                Action:
                  - s3:GetObjectVersionForReplication
                  - s3:GetObjectVersionAcl
                Resource: !Sub "${SourceBucket}/*"
              - Effect: Allow
                Action:
                  - s3:ReplicateObject
                  - s3:ReplicateDelete
                Resource: !Sub "${DestinationBucket}/*"

Conclusão

A construção de arquiteturas escaláveis na AWS requer uma abordagem holística que considera não apenas a capacidade de crescimento, mas também aspectos como custo, segurança, observabilidade e recuperação de desastres. A aplicação consistente desses design patterns e melhores práticas resultará em sistemas robustos e eficientes.

Próximos Passos

  • Implemente Infrastructure as Code com Terraform ou CDK
  • Configure pipelines de CI/CD para deployments automatizados
  • Explore arquiteturas event-driven com EventBridge
  • Considere implementação de chaos engineering para testes de resiliência