Arquiteturas Escaláveis na AWS: Design Patterns e Melhores Práticas
Aprenda a projetar e implementar arquiteturas altamente escaláveis na AWS usando design patterns comprovados e serviços gerenciados.
Arquiteturas Escaláveis na AWS: Design Patterns e Melhores Práticas
A construção de arquiteturas escaláveis na AWS requer compreensão profunda dos serviços disponíveis e aplicação de design patterns comprovados. Este guia aborda estratégias práticas para criar sistemas que crescem com sua demanda.
Princípios de Arquitetura Escalável
Escalabilidade Horizontal vs Vertical
Escalabilidade Vertical (Scale Up):
- Aumentar recursos de uma instância
- Limitado pelo hardware máximo
- Downtime durante upgrade
- Adequado para aplicações monolíticas
Escalabilidade Horizontal (Scale Out):
- Adicionar mais instâncias
- Teoricamente ilimitado
- Zero downtime
- Requer arquitetura distribuída
Design Patterns Fundamentais
Stateless Applications:
# Exemplo de aplicação stateless
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: myapp:latest
ports:
- containerPort: 8080
env:
- name: DB_HOST
value: "rds-endpoint.amazonaws.com"
- name: REDIS_HOST
value: "elasticache-endpoint.amazonaws.com"
Serviços de Computação Escalável
Auto Scaling Groups
Configuração básica:
{
"AutoScalingGroupName": "web-app-asg",
"MinSize": 2,
"MaxSize": 20,
"DesiredCapacity": 4,
"LaunchTemplate": {
"LaunchTemplateName": "web-app-template",
"Version": "$Latest"
},
"VPCZoneIdentifier": [
"subnet-12345678",
"subnet-87654321"
],
"TargetGroupARNs": [
"arn:aws:elasticloadbalancing:region:account:targetgroup/web-app-tg/1234567890123456"
],
"HealthCheckType": "ELB",
"HealthCheckGracePeriod": 300
}
Políticas de scaling:
{
"PolicyName": "cpu-scale-out",
"PolicyType": "TargetTrackingScaling",
"TargetTrackingConfiguration": {
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
},
"ScaleOutCooldown": 300,
"ScaleInCooldown": 300
}
}
Elastic Load Balancing
Application Load Balancer (ALB):
# CloudFormation template
Resources:
ApplicationLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: web-app-alb
Scheme: internet-facing
Type: application
Subnets:
- !Ref PublicSubnet1
- !Ref PublicSubnet2
SecurityGroups:
- !Ref ALBSecurityGroup
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: web-app-tg
Port: 80
Protocol: HTTP
VpcId: !Ref VPC
HealthCheckPath: /health
HealthCheckIntervalSeconds: 30
HealthyThresholdCount: 2
UnhealthyThresholdCount: 5
Container Orchestration
Amazon ECS com Fargate:
{
"family": "web-app-task",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"executionRoleArn": "arn:aws:iam::account:role/ecsTaskExecutionRole",
"containerDefinitions": [
{
"name": "web-app",
"image": "account.dkr.ecr.region.amazonaws.com/web-app:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
}
]
}
ECS Service com Auto Scaling:
{
"serviceName": "web-app-service",
"cluster": "production-cluster",
"taskDefinition": "web-app-task:1",
"desiredCount": 4,
"launchType": "FARGATE",
"networkConfiguration": {
"awsvpcConfiguration": {
"subnets": ["subnet-12345678", "subnet-87654321"],
"securityGroups": ["sg-12345678"],
"assignPublicIp": "DISABLED"
}
},
"loadBalancers": [
{
"targetGroupArn": "arn:aws:elasticloadbalancing:region:account:targetgroup/web-app-tg/1234567890123456",
"containerName": "web-app",
"containerPort": 8080
}
]
}
Banco de Dados Escalável
Amazon RDS com Read Replicas
Configuração Multi-AZ:
Resources:
DatabaseInstance:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: production-db
DBInstanceClass: db.r5.xlarge
Engine: postgres
EngineVersion: "13.7"
AllocatedStorage: 100
StorageType: gp2
StorageEncrypted: true
MultiAZ: true
VPCSecurityGroups:
- !Ref DatabaseSecurityGroup
DBSubnetGroupName: !Ref DatabaseSubnetGroup
BackupRetentionPeriod: 7
PreferredBackupWindow: "03:00-04:00"
PreferredMaintenanceWindow: "sun:04:00-sun:05:00"
ReadReplica1:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: production-db-read-1
DBInstanceClass: db.r5.large
SourceDBInstanceIdentifier: !Ref DatabaseInstance
PubliclyAccessible: false
Amazon DynamoDB
Configuração com Auto Scaling:
import boto3
dynamodb = boto3.client('dynamodb')
# Criar tabela com billing mode on-demand
table_response = dynamodb.create_table(
TableName='UserSessions',
KeySchema=[
{
'AttributeName': 'user_id',
'KeyType': 'HASH'
},
{
'AttributeName': 'session_id',
'KeyType': 'RANGE'
}
],
AttributeDefinitions=[
{
'AttributeName': 'user_id',
'AttributeType': 'S'
},
{
'AttributeName': 'session_id',
'AttributeType': 'S'
}
],
BillingMode='ON_DEMAND',
StreamSpecification={
'StreamEnabled': True,
'StreamViewType': 'NEW_AND_OLD_IMAGES'
}
)
Caching e Performance
Amazon ElastiCache
Redis Cluster:
Resources:
RedisSubnetGroup:
Type: AWS::ElastiCache::SubnetGroup
Properties:
Description: Subnet group for Redis cluster
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
RedisCluster:
Type: AWS::ElastiCache::ReplicationGroup
Properties:
ReplicationGroupId: production-redis
ReplicationGroupDescription: Production Redis cluster
NumCacheClusters: 3
Engine: redis
CacheNodeType: cache.r5.large
CacheSubnetGroupName: !Ref RedisSubnetGroup
SecurityGroupIds:
- !Ref RedisSecurityGroup
AtRestEncryptionEnabled: true
TransitEncryptionEnabled: true
AutomaticFailoverEnabled: true
MultiAZEnabled: true
Amazon CloudFront
Distribuição CDN:
{
"DistributionConfig": {
"CallerReference": "production-cdn-2024",
"Comment": "Production CDN distribution",
"DefaultRootObject": "index.html",
"Origins": [
{
"Id": "S3-production-static",
"DomainName": "production-static.s3.amazonaws.com",
"S3OriginConfig": {
"OriginAccessIdentity": "origin-access-identity/cloudfront/E1234567890123"
}
},
{
"Id": "ALB-production-api",
"DomainName": "api.production.com",
"CustomOriginConfig": {
"HTTPPort": 80,
"HTTPSPort": 443,
"OriginProtocolPolicy": "https-only"
}
}
],
"DefaultCacheBehavior": {
"TargetOriginId": "S3-production-static",
"ViewerProtocolPolicy": "redirect-to-https",
"CachePolicyId": "4135ea2d-6df8-44a3-9df3-4b5a84be39ad",
"Compress": true
},
"CacheBehaviors": [
{
"PathPattern": "/api/*",
"TargetOriginId": "ALB-production-api",
"ViewerProtocolPolicy": "https-only",
"CachePolicyId": "4135ea2d-6df8-44a3-9df3-4b5a84be39ad",
"OriginRequestPolicyId": "88a5eaf4-2fd4-4709-b370-b4c650ea3fcf"
}
],
"Enabled": true,
"PriceClass": "PriceClass_All"
}
}
Arquiteturas Serverless
AWS Lambda com API Gateway
Função Lambda:
import json
import boto3
from decimal import Decimal
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')
def lambda_handler(event, context):
try:
# Extrair dados do evento
user_id = event['pathParameters']['user_id']
# Buscar usuário no DynamoDB
response = table.get_item(
Key={'user_id': user_id}
)
if 'Item' in response:
# Converter Decimal para float para JSON
item = json.loads(json.dumps(response['Item'], default=decimal_default))
return {
'statusCode': 200,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*'
},
'body': json.dumps(item)
}
else:
return {
'statusCode': 404,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*'
},
'body': json.dumps({'error': 'User not found'})
}
except Exception as e:
return {
'statusCode': 500,
'headers': {
'Content-Type': 'application/json',
'Access-Control-Allow-Origin': '*'
},
'body': json.dumps({'error': str(e)})
}
def decimal_default(obj):
if isinstance(obj, Decimal):
return float(obj)
raise TypeError
API Gateway:
Resources:
ApiGateway:
Type: AWS::ApiGateway::RestApi
Properties:
Name: UserAPI
Description: API for user management
EndpointConfiguration:
Types:
- REGIONAL
UsersResource:
Type: AWS::ApiGateway::Resource
Properties:
RestApiId: !Ref ApiGateway
ParentId: !GetAtt ApiGateway.RootResourceId
PathPart: users
UserResource:
Type: AWS::ApiGateway::Resource
Properties:
RestApiId: !Ref ApiGateway
ParentId: !Ref UsersResource
PathPart: "{user_id}"
GetUserMethod:
Type: AWS::ApiGateway::Method
Properties:
RestApiId: !Ref ApiGateway
ResourceId: !Ref UserResource
HttpMethod: GET
AuthorizationType: NONE
Integration:
Type: AWS_PROXY
IntegrationHttpMethod: POST
Uri: !Sub "arn:aws:apigateway:${AWS::Region}:lambda:path/2015-03-31/functions/${GetUserFunction.Arn}/invocations"
Monitoramento e Observabilidade
CloudWatch Metrics e Alarms
Métricas customizadas:
import boto3
from datetime import datetime
cloudwatch = boto3.client('cloudwatch')
def put_custom_metric(metric_name, value, unit='Count', namespace='MyApp'):
cloudwatch.put_metric_data(
Namespace=namespace,
MetricData=[
{
'MetricName': metric_name,
'Value': value,
'Unit': unit,
'Timestamp': datetime.utcnow(),
'Dimensions': [
{
'Name': 'Environment',
'Value': 'Production'
},
{
'Name': 'Service',
'Value': 'UserService'
}
]
}
]
)
# Exemplo de uso
put_custom_metric('UserRegistrations', 1)
put_custom_metric('ResponseTime', 150, 'Milliseconds')
Alarmes CloudWatch:
Resources:
HighCPUAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: HighCPUUtilization
AlarmDescription: Alarm when CPU exceeds 80%
MetricName: CPUUtilization
Namespace: AWS/EC2
Statistic: Average
Period: 300
EvaluationPeriods: 2
Threshold: 80
ComparisonOperator: GreaterThanThreshold
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref AutoScalingGroup
AlarmActions:
- !Ref SNSTopic
DatabaseConnectionsAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: HighDatabaseConnections
AlarmDescription: Alarm when DB connections exceed 80% of max
MetricName: DatabaseConnections
Namespace: AWS/RDS
Statistic: Average
Period: 300
EvaluationPeriods: 2
Threshold: 80
ComparisonOperator: GreaterThanThreshold
Dimensions:
- Name: DBInstanceIdentifier
Value: !Ref DatabaseInstance
AWS X-Ray para Tracing
Instrumentação de aplicação:
from aws_xray_sdk.core import xray_recorder
from aws_xray_sdk.core import patch_all
import requests
# Patch bibliotecas automaticamente
patch_all()
@xray_recorder.capture('process_user_request')
def process_user_request(user_id):
# Criar subsegmento para operação de banco
subsegment = xray_recorder.begin_subsegment('database_query')
try:
# Simular consulta ao banco
user_data = get_user_from_database(user_id)
subsegment.put_metadata('user_id', user_id)
subsegment.put_annotation('user_type', user_data.get('type'))
except Exception as e:
subsegment.add_exception(e)
raise
finally:
xray_recorder.end_subsegment()
# Criar subsegmento para chamada externa
subsegment = xray_recorder.begin_subsegment('external_api_call')
try:
response = requests.get(f'https://api.external.com/users/{user_id}')
subsegment.put_metadata('response_status', response.status_code)
except Exception as e:
subsegment.add_exception(e)
raise
finally:
xray_recorder.end_subsegment()
return user_data
Otimização de Custos
Reserved Instances e Savings Plans
Análise de uso:
import boto3
from datetime import datetime, timedelta
ce = boto3.client('ce')
def analyze_ec2_usage():
end_date = datetime.now().strftime('%Y-%m-%d')
start_date = (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d')
response = ce.get_dimension_values(
TimePeriod={
'Start': start_date,
'End': end_date
},
Dimension='SERVICE',
Context='COST_AND_USAGE'
)
# Analisar custos por serviço
cost_response = ce.get_cost_and_usage(
TimePeriod={
'Start': start_date,
'End': end_date
},
Granularity='MONTHLY',
Metrics=['BlendedCost'],
GroupBy=[
{
'Type': 'DIMENSION',
'Key': 'SERVICE'
}
]
)
return cost_response
# Recomendações de Reserved Instances
def get_ri_recommendations():
response = ce.get_reservation_purchase_recommendation(
Service='EC2-Instance',
LookbackPeriodInDays='SIXTY_DAYS',
TermInYears='ONE_YEAR',
PaymentOption='NO_UPFRONT'
)
return response['Recommendations']
Lifecycle Policies
S3 Lifecycle:
{
"Rules": [
{
"ID": "LogsLifecycle",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 2555
}
}
]
}
Disaster Recovery
Multi-Region Architecture
Cross-Region Replication:
Resources:
# Primary region resources
PrimaryDatabase:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: primary-db
BackupRetentionPeriod: 7
DeletionProtection: true
# Cross-region read replica
SecondaryDatabase:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: secondary-db
SourceDBInstanceIdentifier: !Sub
- "arn:aws:rds:${PrimaryRegion}:${AWS::AccountId}:db:${PrimaryDBId}"
- PrimaryRegion: us-east-1
PrimaryDBId: !Ref PrimaryDatabase
PubliclyAccessible: false
# S3 Cross-Region Replication
ReplicationRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: s3.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: ReplicationPolicy
PolicyDocument:
Statement:
- Effect: Allow
Action:
- s3:GetObjectVersionForReplication
- s3:GetObjectVersionAcl
Resource: !Sub "${SourceBucket}/*"
- Effect: Allow
Action:
- s3:ReplicateObject
- s3:ReplicateDelete
Resource: !Sub "${DestinationBucket}/*"
Conclusão
A construção de arquiteturas escaláveis na AWS requer uma abordagem holística que considera não apenas a capacidade de crescimento, mas também aspectos como custo, segurança, observabilidade e recuperação de desastres. A aplicação consistente desses design patterns e melhores práticas resultará em sistemas robustos e eficientes.
Próximos Passos
- Implemente Infrastructure as Code com Terraform ou CDK
- Configure pipelines de CI/CD para deployments automatizados
- Explore arquiteturas event-driven com EventBridge
- Considere implementação de chaos engineering para testes de resiliência