Skill v1.0.0
currentAutomated scan100/100version: "1.0.0" name: alayarenderer-generative-world description: AI coding agent skill for AlayaRenderer — a generative world rendering framework with inverse rendering (RGB→G-buffers) and game editing (G-buffers+text→stylized video) using fine-tuned video diffusion models. triggers:
- use AlayaRenderer to render a scene
- run inverse renderer on video
- game editing with G-buffers
- stylize video with text prompt using AlayaRenderer
- extract albedo normal depth from video
- set up AlayaRenderer generative world renderer
- fine-tune diffusion renderer for G-buffers
- run Wan2.1 game editing inference
AlayaRenderer — Generative World Renderer
Skill by ara.so — Daily 2026 Skills collection.
AlayaRenderer is a two-stage framework for high-quality video rendering:
- Inverse Renderer (RGB → G-buffers): Extracts albedo, normal, depth, roughness, and metallic maps from RGB video using a fine-tuned Cosmos-Transfer1-DiffusionRenderer 7B model.
- Game Editing (G-buffers + Text → Stylized RGB): Synthesizes photorealistic, stylized RGB video from G-buffer inputs using a fine-tuned Wan2.1 1.3B model via DiffSynth-Studio.
Installation
Clone the Repository
git clone --recurse-submodules https://github.com/ShandaAI/AlayaRenderer.gitcd AlayaRenderer
Important: Use--recurse-submodules— DiffSynth-Studio is a git submodule required for Game Editing.
Two Separate Conda Environments (Recommended)
The two models have conflicting dependencies. Use separate environments:
# Environment 1: Inverse Rendererconda create -n inverse_renderer python=3.10 -yconda activate inverse_renderercd inverse_renderer# Follow inverse_renderer/ instructions for Cosmos-Transfer1 setup# Environment 2: Game Editingconda create -n game_editing python=3.10 -yconda activate game_editingcd game_editing# Follow DiffSynth-Studio setup instructions
Model Weights
| Model | Base Model | Size | HuggingFace Link | |
|---|---|---|---|---|
| Inverse Renderer | Cosmos-Transfer1-DiffusionRenderer 7B | ~7B params | Brian9999/world_inverse_renderer | |
| Game Editing | Wan2.1 1.3B | ~1.3B params | Brian9999/stylerenderer |
Download and Place Weights
# Inverse Renderer — replace the base checkpointhuggingface-cli download Brian9999/world_inverse_renderer \--local-dir inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B# Game Editing — place in game_editing models directorymkdir -p game_editing/models/train/Wan2.1-T2V-1.3B_gbufferhuggingface-cli download Brian9999/stylerenderer \--local-dir game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer
Inverse Renderer Usage
The inverse renderer decomposes an RGB video into 5 G-buffer channels: albedo, normal, depth, roughness, metallic.
Setup
cd inverse_renderer# Follow Cosmos-Transfer1-DiffusionRenderer environment setup# Ensure checkpoint is at:# inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/
Inference
Refer to the inverse_renderer/ subdirectory for the full inference script. The general pattern follows Cosmos-Transfer1-DiffusionRenderer conventions:
# inverse_renderer/run_inverse.py (typical pattern)import torchfrom pathlib import Path# Input: path to RGB videoinput_video = "path/to/rgb_video.mp4"output_dir = "outputs/gbuffers/"# The model outputs 5 synchronized channels:# - albedo (diffuse color)# - normal (surface orientation)# - depth (scene geometry)# - roughness (surface roughness)# - metallic (metallic property)
Game Editing Usage
Quick Start — CLI Inference
cd game_editingCUDA_VISIBLE_DEVICES=0 python \examples/wanvideo/model_inference/inference_gbuffer_caption.py \--checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \--gpu 0 \--style snowy_winter \--prompt "the scene is set in a frozen, snow-covered environment under cold, pale winter light with falling snowflakes, creating a silent and ethereal winter wonderland atmosphere." \--gbuffer_dir test_dataset \--save_dir outputs/ \--num_frames 81 \--height 480 \--width 832
CLI Parameters
| Parameter | Description | Example | |
|---|---|---|---|
--checkpoint | Path to fine-tuned .safetensors weights | models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors | |
--gpu | GPU device index | 0 | |
--style | Named style preset | snowy_winter, rainy, night, sunset | |
--prompt | Text description of target lighting/atmosphere | See examples below | |
--gbuffer_dir | Directory containing G-buffer input frames/video | test_dataset | |
--save_dir | Output directory for rendered video | outputs/ | |
--num_frames | Number of frames to generate (must be 8n+1) | 81 | |
--height | Output height in pixels | 480 | |
--width | Output width in pixels | 832 |
G-buffer Directory Structure
test_dataset/├── albedo/│ ├── frame_0000.png│ ├── frame_0001.png│ └── ...├── normal/│ ├── frame_0000.png│ └── ...├── depth/│ ├── frame_0000.png│ └── ...├── roughness/│ ├── frame_0000.png│ └── ...└── metallic/├── frame_0000.png└── ...
Style Prompt Examples
# Cyberpunk night scene--style night \--prompt "neon-lit urban environment at night with rain-slicked streets reflecting colorful neon signs, creating a cyberpunk noir atmosphere"# Golden hour / sunset--style sunset \--prompt "warm golden hour lighting with long shadows and a glowing amber sky, soft cinematic atmosphere"# Rainy urban--style rainy \--prompt "overcast rainy day with wet surfaces, soft diffuse lighting, and atmospheric fog creating a moody cinematic look"# Fantasy / stylized--style fantasy \--prompt "magical forest environment with bioluminescent plants, ethereal blue-green lighting, and mystical particle effects"# Foggy morning--style foggy \--prompt "early morning dense fog with soft diffused light creating a mysterious and quiet atmosphere"
Multi-GPU Inference
# Run on specific GPUCUDA_VISIBLE_DEVICES=1 python \examples/wanvideo/model_inference/inference_gbuffer_caption.py \--checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \--gpu 1 \--style rainy \--prompt "heavy rainfall with dark storm clouds and dramatic lightning in the distance" \--gbuffer_dir my_gbuffers \--save_dir outputs/rainy_scene \--num_frames 81 --height 480 --width 832
Full Pipeline: RGB Video → Stylized Output
# Step 1: Extract G-buffers from RGB video (Inverse Renderer env)conda activate inverse_renderercd inverse_rendererpython run_inverse.py \--input path/to/gameplay_video.mp4 \--output_dir ../game_editing/test_dataset/# Step 2: Apply game editing style (Game Editing env)conda activate game_editingcd ../game_editingCUDA_VISIBLE_DEVICES=0 python \examples/wanvideo/model_inference/inference_gbuffer_caption.py \--checkpoint models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensors \--gpu 0 \--style snowy_winter \--prompt "frozen tundra with blizzard conditions, pale blue-white lighting and drifting snow" \--gbuffer_dir test_dataset \--save_dir outputs/final_render \--num_frames 81 --height 480 --width 832
Online Demos
| Demo | URL | |
|---|---|---|
| Game Editing Demo | https://huggingface.co/spaces/Brian9999/game-editing | |
| Project Page | https://alaya-studio.github.io/renderer/ |
Dataset Overview
The AlayaRenderer dataset (release pending) features:
- 4M+ frames at 720p / 30 FPS
- 6 synchronized channels: RGB + albedo, normal, depth, metallic, roughness
- 40 hours from Cyberpunk 2077 and Black Myth: Wukong
- Average clip length: 8 minutes, up to 53 minutes continuous
- Weather variants: sunny, rainy, foggy, night, sunset
- Motion blur variant via sub-frame interpolation
Architecture Summary
RGB Video Input│▼┌─────────────────────────────────────┐│ Inverse Renderer ││ (Cosmos-Transfer1 7B fine-tuned) ││ RGB → [albedo, normal, depth, ││ roughness, metallic] │└─────────────────┬───────────────────┘│ G-buffers▼┌─────────────────────────────────────┐│ Game Editing ││ (Wan2.1 1.3B fine-tuned) ││ G-buffers + Text Prompt ││ → Stylized RGB Video │└─────────────────────────────────────┘
Troubleshooting
Submodule not found / DiffSynth-Studio missing
# If cloned without --recurse-submodules:git submodule update --init --recursive
CUDA Out of Memory
- Reduce
--num_frames(try41instead of81) - Reduce resolution:
--height 320 --width 576 - Ensure no other processes are using the GPU:
CUDA_VISIBLE_DEVICES=0
num_frames must follow 8n+1 pattern
Valid values: 9, 17, 25, 33, 41, 49, 57, 65, 73, 81
# Valid--num_frames 81 # 8*10 + 1 ✓--num_frames 41 # 8*5 + 1 ✓# Invalid--num_frames 80 # ✗--num_frames 60 # ✗
Checkpoint not found
# Verify checkpoint placementls game_editing/models/train/Wan2.1-T2V-1.3B_gbuffer/model.safetensorsls inverse_renderer/checkpoints/Diffusion_Renderer_Inverse_Cosmos_7B/
Version conflicts between models
Always use the two separate conda environments (inverse_renderer and game_editing). Do not install both models' dependencies in one environment.
Citation
@article{huang2026generativeworldrenderer,title={Generative World Renderer},author={Zheng-Hui Huang and Zhixiang Wang and Jiaming Tan and Ruihan Yu and Yidan Zhang and Bo Zheng and Yu-Lun Liu and Yung-Yu Chuang and Kaipeng Zhang},journal={arXiv preprint arXiv:2604.02329},year={2026}}