Incremental Synchronization System
Overview
The CDP implements an automated incremental synchronization system that keeps data up-to-date with minimal resource usage and maximum efficiency.
Architecture
Components
-
Smart Sync Worker (Railway Service)
- Repository:
https://github.com/NomadaDigital01/nerdistan-worker - Service: Runs independently on Railway
- Schedule: Automated execution every hour
- Repository:
-
Incremental Sync Manager
- Location:
scripts/incremental_sync.py - Purpose: Captures new and updated orders
- Batch size: 500 orders per execution
- Location:
-
Database Tables
cdp.order_events: Main order storagecdp.tenants: Tenant configuration and statuscdp.sync_checkpoints: Progress tracking
Synchronization Schedule
Daily Incremental Sync
- Frequency: Every 6 hours
- Times: 00:00, 06:00, 12:00, 18:00 (Argentina timezone)
- Scope: Last 24 hours of data
- Duration: ~5-10 minutes per tenant
Historical Sync
- Trigger: Automatically when gaps detected
- Batch size: 7 days at a time
- Priority: Lower than daily sync
- Checkpoints: Resumable after interruptions
Full Sync
- Frequency: Weekly (Sundays at 2 AM)
- Scope: Complete 30-day refresh
- Purpose: Data integrity verification
Configuration
Tenant Settings
-- Each tenant has these sync configurations
sync_enabled: boolean -- Enable/disable sync
sync_frequency_hours: integer -- Hours between syncs (typically 6)
last_sync_at: timestamp -- Last successful sync
go_live_date: date -- Start date for historical data
initial_sync_completed: boolean -- Full sync status
Current Active Tenants
| Tenant | Sync Frequency | Last Sync Status | Daily Orders |
|---|---|---|---|
| PetBaar | 6 hours | ✅ Active | ~40-50 |
| Seven Sport | 6 hours | ✅ Active | ~55-60 |
| Chelsea | 6 hours | ✅ Active | ~60-65 |
| Mundo Juguete | 6 hours | ✅ Active | ~55-60 |
| Kangoo Pet Food | 6 hours | ✅ Active | ~40-45 |
| Celada SA | 6 hours | ✅ Active | ~60-65 |
| Ferreira | 6 hours | ✅ Active | ~60-65 |
| Digital Farma | 6 hours | ✅ Active | ~40-45 |
| Zapatos Net | 6 hours | ✅ Active | ~15-20 |
| Bercovich SA | 6 hours | ✅ Active | ~10-15 |
| Essential | 6 hours | ✅ Active | ~60-65 |
Sync Process Flow
graph TD
A[Smart Worker Starts] --> B{Check Time}
B -->|Business Hours| C[Daily Sync]
B -->|Off Hours| D[Historical Sync]
B -->|Sunday 2AM| E[Weekly Full Sync]
C --> F[Get Recent Orders]
D --> G[Process Historical Gaps]
E --> H[Complete Refresh]
F --> I[Update CDP Tables]
G --> I
H --> I
I --> J[Update Checkpoints]
J --> K[Update last_sync_at]
K --> L[Complete]
Data Volume Statistics
Daily Incremental Load
- Average: ~1,000 orders/day across all tenants
- Peak: ~1,500 orders/day (weekdays)
- Low: ~700 orders/day (weekends)
Current Database Size
- Total Orders: 155,777
- Unique Customers: 30,272
- Products: 49,831
- Date Range: January 2023 - Present
Monitoring
Key Metrics
- Sync Status Check
SELECT
tenant_name,
last_sync_at,
CASE
WHEN last_sync_at > NOW() - INTERVAL '6 hours' THEN '✅ Current'
WHEN last_sync_at > NOW() - INTERVAL '12 hours' THEN '⚠️ Behind'
ELSE '❌ Stale'
END as status
FROM cdp.tenants
WHERE is_active = true;
- Daily Order Growth
SELECT
DATE(order_date) as date,
COUNT(*) as new_orders
FROM cdp.order_events
WHERE order_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY DATE(order_date)
ORDER BY date DESC;
- Sync Gaps Detection
SELECT
tenant_name,
MAX(order_date) as last_order,
CURRENT_DATE - DATE(MAX(order_date)) as days_behind
FROM cdp.tenants t
JOIN cdp.order_events o ON t.tenant_id = o.tenant_id
GROUP BY tenant_name
HAVING CURRENT_DATE - DATE(MAX(order_date)) > 1;
Troubleshooting
Common Issues
Orders Not Updating
- Check
last_sync_atin tenants table - Verify
sync_enabled = true - Check Railway worker logs:
railway logs --service smart-sync-worker
Duplicate Orders
- System checks for existing
order_idbefore insert - Multi-policy orders handled with
sales_channelfield - Only 3 duplicates detected in 155K orders (0.002%)
Missing Historical Data
- Verify
go_live_dateis set correctly - Check
initial_sync_completedstatus - Review checkpoint table for gaps
Manual Sync Trigger
# Force full sync for specific tenant
SYNC_MODE=full python scripts/smart_sync_hybrid_inverse.py --tenant-id 20
# Run incremental sync manually
python scripts/incremental_sync.py
# Check sync progress
python check_sync_progress.py
Performance Optimization
Best Practices
- Batch Processing: 50 orders per page for quick responses
- Checkpoint System: Resume from last successful point
- Off-peak Historical: Heavy processing during night hours
- Connection Pooling: Reuse database connections
Resource Usage
- CPU: ~10-15% during sync
- Memory: ~200-300MB per worker
- Network: ~50-100 requests/minute to VTEX API
- Database: ~500-1000 inserts/minute
Integration with CDP
After synchronization completes:
- Real-time Events trigger customer profile updates
- RFM Segmentation recalculates automatically
- CLV Predictions update for affected customers
- Journey Stages progress based on new activity
Future Enhancements
- Real-time webhook integration
- Parallel tenant processing
- Automatic retry mechanism
- Data quality validation
- Sync performance dashboard
Last updated: September 20, 2025