Kế hoạch: Batch & Async Processing Feature
1. Hiện trạng
- 40+ scheduled jobs across payment, game, NFT, leaderboard, maintenance domains.
- 25+ JMS listeners với dedicated queues.
ProgrammaticEndpointRegistration tự động register per-machine/setting queues.
JMSListenerRecoveryScheduler tự động khôi phục listener lỗi.
- 4 Firebase listeners cho realtime events.
- Gift aggregation pipeline với Redis buffering.
2. Vấn đề đã biết
| # |
Vấn đề |
Mức độ |
| 1 |
Queue lag không được monitor |
High |
| 2 |
DLQ không có alerting |
High |
| 3 |
Firebase listener health không đo được |
Medium |
| 4 |
Job execution time không có metrics |
Medium |
| 5 |
Dynamic queue không tự update khi thêm machine runtime |
Medium |
3. Kế hoạch cải tiến
Phase 1: Observability (ưu tiên cao)
- Task 1-1: Integrate ActiveMQ metrics với Spring Actuator + Prometheus
- Task 1-2: Add DLQ alerting khi có messages trong
DLQ
- Task 1-3: Firebase listener health check scheduled
Phase 2: Stability
- Task 2-1: Review và chuẩn hóa DLQ strategy cho tất cả queues
- Task 2-2: Test
JMSListenerRecoveryScheduler hoạt động đúng
- Task 2-3: Dynamic queue update khi admin tạo machine/setting mới
- Task 3-1: Profile job execution times (Spring Batch metrics)
- Task 3-2: Tune concurrency cho từng queue type dựa theo production workload
Phase 4: Testing
- Task 4-1: Test dynamic queue registration (startup)
- Task 4-2: Test listener recovery (kill listener → check recovery)
- Task 4-3: Load test gift aggregation pipeline
4. Timeline
gantt
title Batch & Async - Improvement Plan
dateFormat YYYY-MM-DD
section Phase 1 Observability
Task 1-1 ActiveMQ metrics :crit, t1_1, 2025-01-01, 2d
Task 1-2 DLQ alerting :crit, t1_2, after t1_1, 1d
Task 1-3 Firebase health :t1_3, after t1_1, 2d
section Phase 2 Stability
Task 2-1 DLQ strategy :t2_1, after t1_2, 2d
Task 2-2 Recovery test :t2_2, after t2_1, 1d
Task 2-3 Dynamic update :t2_3, after t2_2, 3d
section Phase 3 Performance
Task 3-1 Job metrics :t3_1, after t1_3, 2d
Task 3-2 Concurrency tune :t3_2, after t3_1, 2d
section Phase 4 Testing
Task 4-1 Dynamic queue :t4_1, after t2_3, 2d
Task 4-2 Recovery test :t4_2, after t4_1, 1d
Task 4-3 Gift load test :t4_3, after t4_2, 3d
5. Rủi ro
| Rủi ro |
Ảnh hưởng |
Biện pháp |
| Queue backlog đột biến |
Processing delay |
Queue lag alert + scale consumers |
| Job overlap |
Double processing |
fixedDelay + distributed lock (ShedLock) |
| Firebase listener crash |
Events lost |
Health check + auto-reconnect |
| Dynamic queue không đủ |
Rooms stuck |
Monitor queue count vs machine count |
| DLQ accumulation |
Silent errors |
DLQ monitoring + alert |