Skill v1.0.0
currentAutomated scan96/100version: "1.0.0" name: test-data-management description: "Strategic test data generation, management, and privacy compliance. Use when creating test data, handling PII, ensuring GDPR/CCPA compliance, or scaling data generation for realistic testing scenarios." category: specialized-testing priority: high tokenEstimate: 1000 agents: [qe-test-data-architect, qe-test-executor, qe-security-scanner] implementation_status: optimized optimization_version: 1.0 last_optimized: 2025-12-02 dependencies: [] quick_reference_card: true tags: [test-data, faker, synthetic, gdpr, pii, anonymization, factories] trust_tier: 3 validation: schema_path: schemas/output.json validator_path: scripts/validate-config.json eval_path: evals/test-data-management.yaml
Test Data Management
<default_to_action> When creating or managing test data:
- NEVER use production PII directly
- GENERATE synthetic data with faker libraries
- ANONYMIZE production data if used (mask, hash)
- ISOLATE test data (transactions, per-test cleanup)
- SCALE with batch generation (10k+ records/sec)
Quick Data Strategy:
- Unit tests: Minimal data (just enough)
- Integration: Realistic data (full complexity)
- Performance: Volume data (10k+ records)
Critical Success Factors:
- 40% of test failures from inadequate data
- GDPR fines up to €20M for PII violations
- Never store production PII in test environments
</default_to_action>
Quick Reference Card
When to Use
- Creating test datasets
- Handling sensitive data
- Performance testing with volume
- GDPR/CCPA compliance
Data Strategies
| Type | When | Size | |
|---|---|---|---|
| Minimal | Unit tests | 1-10 records | |
| Realistic | Integration | 100-1000 records | |
| Volume | Performance | 10k+ records | |
| Edge cases | Boundary testing | Targeted |
Data Anonymization
// Maskingfunction maskEmail(email) {const [user, domain] = email.split('@');return `${user[0]}***@${domain}`;}// john@example.com → j***@example.comfunction maskCreditCard(cc) {return `****-****-****-${cc.slice(-4)}`;}// 4242424242424242 → ****-****-****-4242// Anonymize production dataconst anonymizedUsers = prodUsers.map(user => ({id: user.id, // Keep ID for relationshipsemail: `user-${user.id}@example.com`, // Fake emailfirstName: faker.person.firstName(), // Generatedphone: null, // Remove PIIcreatedAt: user.createdAt // Keep non-PII}));
Database Transaction Isolation
// Best practice: use transactions for cleanupbeforeEach(async () => {await db.beginTransaction();});afterEach(async () => {await db.rollbackTransaction(); // Auto cleanup!});test('user registration', async () => {const user = await userService.register({email: 'test@example.com'});expect(user.id).toBeDefined();// Automatic rollback after test - no cleanup needed});
Agent-Driven Data Generation
// High-speed generation with constraintsawait Task("Generate Test Data", {schema: 'ecommerce',count: { users: 10000, products: 500, orders: 5000 },preserveReferentialIntegrity: true,constraints: {age: { min: 18, max: 90 },roles: ['customer', 'admin']}}, "qe-test-data-architect");// GDPR-compliant anonymizationawait Task("Anonymize Production Data", {source: 'production-snapshot',piiFields: ['email', 'phone', 'ssn'],method: 'pseudonymization',retainStructure: true}, "qe-test-data-architect");
Agent Coordination Hints
Memory Namespace
aqe/test-data-management/├── schemas/* - Data schemas├── generators/* - Generator configs├── anonymization/* - PII handling rules└── fixtures/* - Reusable fixtures
Fleet Coordination
const dataFleet = await FleetManager.coordinate({strategy: 'test-data-generation',agents: ['qe-test-data-architect', // Generate data'qe-test-executor', // Execute with data'qe-security-scanner' // Validate no PII exposure],topology: 'sequential'});
Related Skills
- database-testing - Schema and integrity testing
- compliance-testing - GDPR/CCPA compliance
- performance-testing - Volume data for perf tests
Remember
Never use production PII directly. Always use synthetic data or properly anonymized production snapshots.
With Agents: qe-test-data-architect generates 10k+ records/sec with realistic patterns, relationships, and constraints. Agents ensure GDPR/CCPA compliance automatically and eliminate test data bottlenecks.