<< All versions
Skill v1.0.1
currentAutomated scan100/100ratacat/claude-skills/ebook-extractor
7 files
──Details
PublishedJune 3, 2026 at 12:59 PM
Content Hashsha256:c8d1daf08db63e98...
Git SHAd4b471d05dda
Bump Typepatch
──Files
Files (1 file, 1.9 KB)
SKILL.md1.9 KBactive
SKILL.md · 67 lines · 1.9 KB
version: "1.0.1" name: ebook-extractor description: Use when user wants to extract text from ebooks (EPUB, MOBI, PDF). Use for converting ebooks to plain text for analysis, processing, or reading. Handles all common ebook formats.
Ebook Text Extractor
Overview
Extract plain text from EPUB, MOBI, and PDF files using Python scripts. No LLM calls - pure text extraction.
Supported Formats
| Format | Tool Used | Notes | |
|---|---|---|---|
| EPUB | ebooklib + BeautifulSoup | Direct parsing, preserves structure | |
| MOBI | Calibre ebook-convert | Converts to EPUB first, then extracts | |
PyMuPDF (fitz) | Fast, handles most PDFs well |
Usage
Unified extractor (auto-detects format):
bash
python3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.epubpython3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.mobipython3 ~/.claude/skills/ebook-extractor/scripts/extract.py /path/to/book.pdf
Output options:
bash
# To stdout (default)python3 scripts/extract.py book.epub# To filepython3 scripts/extract.py book.epub -o output.txtpython3 scripts/extract.py book.epub > output.txt
Format-specific scripts:
bash
python3 scripts/extract_epub.py book.epubpython3 scripts/extract_mobi.py book.mobipython3 scripts/extract_pdf.py book.pdf
Setup
bash
# One-command setup (installs all dependencies)~/.claude/skills/ebook-extractor/setup.sh# Or manually:pip install -r ~/.claude/skills/ebook-extractor/requirements.txtbrew install calibre # macOS, for MOBI support
Script Location
~/.claude/skills/ebook-extractor/scripts/
Common Issues
| Problem | Solution | |
|---|---|---|
| Missing package | Run setup.sh or pip install -r requirements.txt | |
| MOBI fails | Ensure Calibre is installed: brew install calibre | |
| PDF garbled | Some PDFs are image-based; OCR needed (not supported) |