<< All versions
Skill v1.0.0
Trusted Publisher100/100google-gemini/gemini-managed-agents-templates/data-explorer
──Details
PublishedMay 19, 2026 at 10:30 PM
Content Hashsha256:f0783cdb93dd674c...
Git SHA
──Files
Files (1 file, 2.0 KB)
SKILL.md2.0 KBactive
SKILL.md · 65 lines · 2.0 KB
version: "1.0.0" name: data-explorer description: >- General-purpose data profiling and exploration. Use when first encountering any dataset to understand its structure, quality, and analysis potential.
Data explorer skill
Profile any tabular dataset (CSV, JSON, Parquet) and produce a structured summary the other skills can consume.
Workflow
- Scan workspace: list all data files in the workspace directory.
- Load and profile each file:
- Row count, column count
- Column names, data types, null counts, unique counts
- Basic statistics (min, max, mean, median, std for numerics)
- Value counts for categorical columns (top 10)
- Correlation matrix for numeric columns
- Assess data quality:
- Missing value percentage per column
- Potential data type issues (e.g., numbers stored as strings)
- Duplicate row detection
- Outlier detection (IQR method)
- Output a structured profile as JSON for downstream skills.
- Recommend analysis directions based on what you found.
Output format
json
{"files": [{"filename": "customers.csv","rows": 91,"columns": 7,"schema": [{"name": "CustomerID", "dtype": "object", "nulls": 0, "unique": 91},{"name": "CompanyName", "dtype": "object", "nulls": 0, "unique": 91}],"quality": {"missing_pct": {"Region": 0.60},"duplicates": 0},"recommendations": ["CustomerID is a unique string identifier","Region column has a high missing percentage (60%)","Can be joined with orders.csv on CustomerID to analyze customer behavior"]}]}
Key rules
- Never assume a specific dataset. Profile whatever is present.
- If no data files are found, inform the user and ask them to upload.
- Use
pandasfor profiling. It is pre-installed in the sandbox. - Use
select_dtypes(include=["object", "str"])for categorical columns. - For large files (>100K rows), profile a sample first and note the sampling.