Skill v1.0.1
currentAutomated scan100/1001 files
version: "1.0.1" name: ai-avatar-video description: "Create AI avatar and talking head videos with OmniHuman, Fabric, PixVerse via inference.sh CLI. Models: OmniHuman 1.5, OmniHuman 1.0, Fabric 1.0, PixVerse Lipsync. Capabilities: audio-driven avatars, lipsync videos, talking head generation, virtual presenters. Use for: AI presenters, explainer videos, virtual influencers, dubbing, marketing videos. Triggers: ai avatar, talking head, lipsync, avatar video, virtual presenter, ai spokesperson, audio driven video, heygen alternative, synthesia alternative, talking avatar, lip sync, video avatar, ai presenter, digital human" allowed-tools: Bash(infsh *)
AI Avatar & Talking Head Videos
Create AI avatars and talking head videos via inference.sh CLI.
[Image: AI Avatar & Talking Head Videos]
Quick Start
curl -fsSL https://cli.inference.sh | sh && infsh login# Create avatar video from image + audioinfsh app run bytedance/omnihuman-1-5 --input '{"image_url": "https://portrait.jpg","audio_url": "https://speech.mp3"}'
Available Models
| Model | App ID | Best For | |
|---|---|---|---|
| OmniHuman 1.5 | bytedance/omnihuman-1-5 | Multi-character, best quality | |
| OmniHuman 1.0 | bytedance/omnihuman-1-0 | Single character | |
| Fabric 1.0 | falai/fabric-1-0 | Image talks with lipsync | |
| PixVerse Lipsync | falai/pixverse-lipsync | Highly realistic |
Search Avatar Apps
infsh app list --search "omnihuman"infsh app list --search "lipsync"infsh app list --search "fabric"
Examples
OmniHuman 1.5 (Multi-Character)
infsh app run bytedance/omnihuman-1-5 --input '{"image_url": "https://portrait.jpg","audio_url": "https://speech.mp3"}'
Supports specifying which character to drive in multi-person images.
Fabric 1.0 (Image Talks)
infsh app run falai/fabric-1-0 --input '{"image_url": "https://face.jpg","audio_url": "https://audio.mp3"}'
PixVerse Lipsync
infsh app run falai/pixverse-lipsync --input '{"image_url": "https://portrait.jpg","audio_url": "https://speech.mp3"}'
Generates highly realistic lipsync from any audio.
Full Workflow: TTS + Avatar
# 1. Generate speech from textinfsh app run infsh/kokoro-tts --input '{"text": "Welcome to our product demo. Today I will show you..."}' > speech.json# 2. Create avatar video with the speechinfsh app run bytedance/omnihuman-1-5 --input '{"image_url": "https://presenter-photo.jpg","audio_url": "<audio-url-from-step-1>"}'
Full Workflow: Dub Video in Another Language
# 1. Transcribe original videoinfsh app run infsh/fast-whisper-large-v3 --input '{"audio_url": "https://video.mp4"}' > transcript.json# 2. Translate text (manually or with an LLM)# 3. Generate speech in new languageinfsh app run infsh/kokoro-tts --input '{"text": "<translated-text>"}' > new_speech.json# 4. Lipsync the original video with new audioinfsh app run infsh/latentsync-1-6 --input '{"video_url": "https://original-video.mp4","audio_url": "<new-audio-url>"}'
Use Cases
- Marketing: Product demos with AI presenter
- Education: Course videos, explainers
- Localization: Dub content in multiple languages
- Social Media: Consistent virtual influencer
- Corporate: Training videos, announcements
Tips
- Use high-quality portrait photos (front-facing, good lighting)
- Audio should be clear with minimal background noise
- OmniHuman 1.5 supports multiple people in one image
- LatentSync is best for syncing existing videos to new audio
Related Skills
# Full platform skill (all 150+ apps)npx skills add inferencesh/skills@inference-sh# Text-to-speech (generate audio for avatars)npx skills add inferencesh/skills@text-to-speech# Speech-to-text (transcribe for dubbing)npx skills add inferencesh/skills@speech-to-text# Video generationnpx skills add inferencesh/skills@ai-video-generation# Image generation (create avatar images)npx skills add inferencesh/skills@ai-image-generation
Browse all video apps: infsh app list --category video
Documentation
- Running Apps - How to run apps via CLI
- Content Pipeline Example - Building media workflows
- Streaming Results - Real-time progress updates