<< All versions
Skill v1.0.0
Trusted Publisher100/100google-gemini/gemini-managed-agents-templates/scanner
──Details
PublishedMay 19, 2026 at 10:30 PM
Content Hashsha256:e924cff68e453911...
Git SHA
──Files
Files (1 file, 1.4 KB)
SKILL.md1.4 KBactive
SKILL.md · 31 lines · 1.4 KB
version: "1.0.0" name: scanner description: Scans a website deeply, converting HTML pages to markdown, respecting robots.txt, and updating the snapshots log.
Scanner Skill
Use this skill to scan and analyze all relevant pages under a target website domain to build or refresh a local customer support corpus.
Embedded Script
bash
python skills/scanner/scripts/scan.py <URL> [--force]
Arguments
| Argument | Description | |
|---|---|---|
<URL> | The start/seed URL of the website to analyze (e.g. https://example.com) | |
--force | Force scanning and bypass the 24-hour cache check |
Features
- Robots.txt Compliance: Checks
robots.txtbefore parsing. If restricted, aborts scanning. Make sure to perform scanning only on sites that are allowed. - Domain-Locked Recursive Scanning: Only analyzes links within the same domain/subdomain to avoid leaking to other websites.
- HTML to Markdown: Converts HTML structure into clean, readable Markdown text suitable for LLM document matching.
- Caching & Snapshot Maintenance: Creates or updates
.agents/workspace/snapshots.jsonwith mapping and timestamp. - Corpus Directory Index: Automatically generates
.agents/workspace/pages/index.mdlisting and explaining all files in a structured, clickable table format so agents can locate matching topics immediately.