Monitoramento¶
Observabilidade e monitoramento em produção.
Logs¶
Django Logging¶
# config/settings/production.py
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'formatters': {
'verbose': {
'format': '{levelname} {asctime} {module} {process:d} {thread:d} {message}',
'style': '{',
},
},
'handlers': {
'file': {
'level': 'INFO',
'class': 'logging.handlers.RotatingFileHandler',
'filename': '/var/log/hymsplat/django.log',
'maxBytes': 10485760, # 10MB
'backupCount': 5,
'formatter': 'verbose',
},
'error_file': {
'level': 'ERROR',
'class': 'logging.handlers.RotatingFileHandler',
'filename': '/var/log/hymsplat/error.log',
'maxBytes': 10485760,
'backupCount': 5,
'formatter': 'verbose',
},
},
'root': {
'handlers': ['file'],
'level': 'INFO',
},
'loggers': {
'django': {
'handlers': ['file', 'error_file'],
'level': 'INFO',
'propagate': True,
},
},
}
Gunicorn Logs¶
# gunicorn.conf.py
errorlog = "/var/log/gunicorn/error.log"
accesslog = "/var/log/gunicorn/access.log"
loglevel = "info"
Nginx Logs¶
Métricas¶
Django Prometheus¶
# config/settings/production.py
INSTALLED_APPS += ['django_prometheus']
MIDDLEWARE = [
'django_prometheus.middleware.PrometheusBeforeMiddleware',
# ... outros middlewares ...
'django_prometheus.middleware.PrometheusAfterMiddleware',
]
Métricas Disponíveis¶
- Request count
- Request latency
- Response status codes
- Database queries
- Cache hits/misses
Alertas¶
Sentry¶
# config/settings/production.py
import sentry_sdk
from sentry_sdk.integrations.django import DjangoIntegration
sentry_sdk.init(
dsn=env('SENTRY_DSN'),
integrations=[DjangoIntegration()],
traces_sample_rate=0.1,
send_default_pii=True
)
Alertas Recomendados¶
| Alerta | Condição | Ação |
|---|---|---|
| Error Rate | >1% em 5min | Investigar |
| Latency | p99 >2s | Otimizar |
| 5xx | >10 em 1min | Investigar |
| Disk | >80% | Limpar/expandir |
| Memory | >90% | Investigar leak |
Health Checks¶
Endpoint¶
# apps/core/views.py
from django.http import JsonResponse
from django.db import connection
def health_check(request):
"""Endpoint de health check."""
status = {'status': 'healthy'}
# Database
try:
with connection.cursor() as cursor:
cursor.execute('SELECT 1')
status['database'] = 'ok'
except Exception as e:
status['database'] = f'error: {e}'
status['status'] = 'unhealthy'
# TypeSense
try:
import requests
resp = requests.get(f'{TYPESENSE_HOST}/health', timeout=2)
status['typesense'] = 'ok' if resp.ok else 'error'
except Exception:
status['typesense'] = 'error'
# Redis
try:
import redis
r = redis.from_url(REDIS_URL)
r.ping()
status['redis'] = 'ok'
except Exception:
status['redis'] = 'error'
http_status = 200 if status['status'] == 'healthy' else 503
return JsonResponse(status, status=http_status)
Nginx Health¶
Dashboards¶
Grafana¶
Métricas recomendadas:
- Request Rate - Requests/segundo
- Error Rate - % de erros
- Latency - p50, p95, p99
- Active Users - Sessões ativas
- Search Latency - Tempo de busca
Exemplo Dashboard¶
{
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(django_http_requests_total[5m])"
}
]
},
{
"title": "Error Rate",
"type": "singlestat",
"targets": [
{
"expr": "sum(rate(django_http_requests_total{status=~\"5..\"}[5m])) / sum(rate(django_http_requests_total[5m])) * 100"
}
]
}
]
}
Comandos Úteis¶
# Logs em tempo real
sudo journalctl -u hymsplat -f
# Erros recentes
sudo tail -100 /var/log/hymsplat/error.log
# Nginx access
sudo tail -f /var/log/nginx/hymsplat.access.log | grep -v health
# Celery workers
poetry run celery -A config inspect active
# Redis stats
redis-cli info
Troubleshooting¶
Alto Tempo de Resposta¶
- Verificar queries lentas:
pg_stat_statements - Verificar cache:
redis-cli info stats - Verificar CPU/Memory:
htop
Erros 500¶
- Verificar logs:
/var/log/hymsplat/error.log - Verificar Sentry
- Verificar conexões: DB, Redis, TypeSense
Memória Alta¶
- Verificar workers:
ps aux | grep gunicorn - Verificar cache size
- Restart workers:
sudo systemctl restart hymsplat