Skill v1.0.0

Trusted Publisher100/100

google-gemini/gemini-managed-agents-templates/data-explorer

──Details

PublishedMay 19, 2026 at 10:30 PM

Content Hashsha256:f0783cdb93dd674c...

Git SHA

──Files

Files (1 file, 2.0 KB)

SKILL.md2.0 KBactive

SKILL.md · 65 lines · 2.0 KB

version: "1.0.0" name: data-explorer description: >- General-purpose data profiling and exploration. Use when first encountering any dataset to understand its structure, quality, and analysis potential.

Data explorer skill

Profile any tabular dataset (CSV, JSON, Parquet) and produce a structured summary the other skills can consume.

Workflow

Scan workspace: list all data files in the workspace directory.
Load and profile each file:

Row count, column count
Column names, data types, null counts, unique counts
Basic statistics (min, max, mean, median, std for numerics)
Value counts for categorical columns (top 10)
Correlation matrix for numeric columns

Assess data quality:

Missing value percentage per column
Potential data type issues (e.g., numbers stored as strings)
Duplicate row detection
Outlier detection (IQR method)

Output a structured profile as JSON for downstream skills.
Recommend analysis directions based on what you found.

Output format

json

{
  "files": [
    {
      "filename": "customers.csv",
      "rows": 91,
      "columns": 7,
      "schema": [
        {"name": "CustomerID", "dtype": "object", "nulls": 0, "unique": 91},
        {"name": "CompanyName", "dtype": "object", "nulls": 0, "unique": 91}
      ],
      "quality": {
        "missing_pct": {"Region": 0.60},
        "duplicates": 0
      },
      "recommendations": [
        "CustomerID is a unique string identifier",
        "Region column has a high missing percentage (60%)",
        "Can be joined with orders.csv on CustomerID to analyze customer behavior"
      ]
    }
  ]
}

Key rules

Never assume a specific dataset. Profile whatever is present.
If no data files are found, inform the user and ask them to upload.
Use pandas for profiling. It is pre-installed in the sandbox.
Use select_dtypes(include=["object", "str"]) for categorical columns.
For large files (>100K rows), profile a sample first and note the sampling.

All versions v1.0.1 →