Skill v1.0.1
currentLLM-judged scan95/1002 files
version: "1.0.1" name: gcp-cloud-run description: Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-driven architecture with Pub/Sub. risk: unknown source: vibeship-spawner-skills (Apache 2.0) date_added: 2026-02-27
GCP Cloud Run
Specialized skill for building production-ready serverless applications on GCP. Covers Cloud Run services (containerized), Cloud Run Functions (event-driven), cold start optimization, and event-driven architecture with Pub/Sub.
Principles
- Cloud Run for containers, Functions for simple event handlers
- Optimize for cold starts with startup CPU boost and min instances
- Set concurrency based on workload (start with 8, adjust)
- Memory includes /tmp filesystem - plan accordingly
- Use VPC Connector only when needed (adds latency)
- Containers should start fast and be stateless
- Handle signals gracefully for clean shutdown
Patterns
Cloud Run Service Pattern
Containerized web service on Cloud Run
When to use: Web applications and APIs,Need any runtime or library,Complex services with multiple endpoints,Stateless containerized workloads
# Dockerfile - Multi-stage build for smaller imageFROM node:20-slim AS builderWORKDIR /appCOPY package*.json ./RUN npm ci --only=productionFROM node:20-slimWORKDIR /app# Copy only production dependenciesCOPY --from=builder /app/node_modules ./node_modulesCOPY src ./srcCOPY package.json ./# Cloud Run uses PORT env variableENV PORT=8080EXPOSE 8080# Run as non-root userUSER nodeCMD ["node", "src/index.js"]
// src/index.jsconst express = require('express');const app = express();app.use(express.json());// Health check endpointapp.get('/health', (req, res) => {res.status(200).send('OK');});// API routesapp.get('/api/items/:id', async (req, res) => {try {const item = await getItem(req.params.id);res.json(item);} catch (error) {console.error('Error:', error);res.status(500).json({ error: 'Internal server error' });}});// Graceful shutdownprocess.on('SIGTERM', () => {console.log('SIGTERM received, shutting down gracefully');server.close(() => {console.log('Server closed');process.exit(0);});});const PORT = process.env.PORT || 8080;const server = app.listen(PORT, () => {console.log(`Server listening on port ${PORT}`);});
# cloudbuild.yamlsteps:# Build the container image- name: 'gcr.io/cloud-builders/docker'args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA', '.']# Push the container image- name: 'gcr.io/cloud-builders/docker'args: ['push', 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA']# Deploy to Cloud Run- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'entrypoint: gcloudargs:- 'run'- 'deploy'- 'my-service'- '--image=gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA'- '--region=us-central1'- '--platform=managed'- '--allow-unauthenticated'- '--memory=512Mi'- '--cpu=1'- '--min-instances=1'- '--max-instances=100'- '--concurrency=80'- '--cpu-boost'images:- 'gcr.io/$PROJECT_ID/my-service:$COMMIT_SHA'
Structure
project/ ├── Dockerfile ├── .dockerignore ├── src/ │ ├── index.js │ └── routes/ ├── package.json └── cloudbuild.yaml
Gcloud_deploy
Direct gcloud deployment
gcloud run deploy my-service \ --source . \ --region us-central1 \ --allow-unauthenticated \ --memory 512Mi \ --cpu 1 \ --min-instances 1 \ --max-instances 100 \ --concurrency 80 \ --cpu-boost
Cloud Run Functions Pattern
Event-driven functions (formerly Cloud Functions)
When to use: Simple event handlers,Pub/Sub message processing,Cloud Storage triggers,HTTP webhooks
// HTTP Function// index.jsconst functions = require('@google-cloud/functions-framework');functions.http('helloHttp', (req, res) => {const name = req.query.name || req.body.name || 'World';res.send(`Hello, ${name}!`);});
// Pub/Sub Functionconst functions = require('@google-cloud/functions-framework');functions.cloudEvent('processPubSub', (cloudEvent) => {// Decode Pub/Sub messageconst message = cloudEvent.data.message;const data = message.data? JSON.parse(Buffer.from(message.data, 'base64').toString()): {};console.log('Received message:', data);// Process messageprocessMessage(data);});
// Cloud Storage Functionconst functions = require('@google-cloud/functions-framework');functions.cloudEvent('processStorageEvent', async (cloudEvent) => {const file = cloudEvent.data;console.log(`Event: ${cloudEvent.type}`);console.log(`Bucket: ${file.bucket}`);console.log(`File: ${file.name}`);if (cloudEvent.type === 'google.cloud.storage.object.v1.finalized') {await processUploadedFile(file.bucket, file.name);}});
# Deploy HTTP functiongcloud functions deploy hello-http \--gen2 \--runtime nodejs20 \--trigger-http \--allow-unauthenticated \--region us-central1# Deploy Pub/Sub functiongcloud functions deploy process-messages \--gen2 \--runtime nodejs20 \--trigger-topic my-topic \--region us-central1# Deploy Cloud Storage functiongcloud functions deploy process-uploads \--gen2 \--runtime nodejs20 \--trigger-event-filters="type=google.cloud.storage.object.v1.finalized" \--trigger-event-filters="bucket=my-bucket" \--region us-central1
Cold Start Optimization Pattern
Minimize cold start latency for Cloud Run
When to use: Latency-sensitive applications,User-facing APIs,High-traffic services
1. Enable Startup CPU Boost
gcloud run deploy my-service \--cpu-boost \--region us-central1
2. Set Minimum Instances
gcloud run deploy my-service \--min-instances 1 \--region us-central1
3. Optimize Container Image
# Use distroless for minimal imageFROM node:20-slim AS builderWORKDIR /appCOPY package*.json ./RUN npm ci --only=productionFROM gcr.io/distroless/nodejs20-debian12WORKDIR /appCOPY --from=builder /app/node_modules ./node_modulesCOPY src ./srcCMD ["src/index.js"]
4. Lazy Initialize Heavy Dependencies
// Lazy load heavy librarieslet bigQueryClient = null;function getBigQueryClient() {if (!bigQueryClient) {const { BigQuery } = require('@google-cloud/bigquery');bigQueryClient = new BigQuery();}return bigQueryClient;}// Only initialize when neededapp.get('/api/analytics', async (req, res) => {const client = getBigQueryClient();const results = await client.query({...});res.json(results);});
5. Increase Memory (More CPU)
# Higher memory = more CPU during startupgcloud run deploy my-service \--memory 1Gi \--cpu 2 \--region us-central1
Optimization_impact
- Startup_cpu_boost: 50% faster cold starts
- Min_instances: Eliminates cold starts for traffic spikes
- Distroless_image: Smaller attack surface, faster pull
- Lazy_init: Defers heavy loading to first request
Concurrency Configuration Pattern
Proper concurrency settings for Cloud Run
When to use: Need to optimize instance utilization,Handle traffic spikes efficiently,Reduce cold starts
Understanding Concurrency
# Default concurrency is 80# Adjust based on your workload# For I/O-bound workloads (most web apps)gcloud run deploy my-service \--concurrency 80 \--cpu 1# For CPU-bound workloadsgcloud run deploy my-service \--concurrency 1 \--cpu 1# For memory-intensive workloadsgcloud run deploy my-service \--concurrency 10 \--memory 2Gi
Node.js Concurrency
// Node.js is single-threaded but handles I/O concurrently// Use async/await for all I/O operations// GOOD - async I/Oapp.get('/api/data', async (req, res) => {const [users, products] = await Promise.all([fetchUsers(),fetchProducts()]);res.json({ users, products });});// BAD - blocking operationapp.get('/api/compute', (req, res) => {const result = heavyCpuOperation(); // Blocks other requests!res.json(result);});
Python Concurrency with Gunicorn
FROM python:3.11-slimWORKDIR /appCOPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txtCOPY . .# 4 workers for concurrencyCMD exec gunicorn --bind :$PORT --workers 4 --threads 2 main:app
# main.pyfrom flask import Flaskapp = Flask(__name__)@app.route('/api/data')def get_data():return {'status': 'ok'}
Concurrency_guidelines
- Concurrency=1: Only for CPU-bound or unsafe code
- Concurrency=8 20: Memory-intensive workloads
- Concurrency=80: Default, good for I/O-bound
- Concurrency=250: Maximum, for very lightweight handlers
Pub/Sub Integration Pattern
Event-driven processing with Cloud Pub/Sub
When to use: Asynchronous message processing,Decoupled microservices,Event-driven architecture
Push Subscription to Cloud Run
# Create topicgcloud pubsub topics create orders# Create push subscription to Cloud Rungcloud pubsub subscriptions create orders-push \--topic orders \--push-endpoint https://my-service-xxx.run.app/pubsub \--ack-deadline 600
// Handle Pub/Sub push messagesconst express = require('express');const app = express();app.use(express.json());app.post('/pubsub', async (req, res) => {// Verify the request is from Pub/Subif (!req.body.message) {return res.status(400).send('Invalid Pub/Sub message');}try {// Decode message dataconst message = req.body.message;const data = message.data? JSON.parse(Buffer.from(message.data, 'base64').toString()): {};console.log('Processing order:', data);await processOrder(data);// Return 200 to acknowledgeres.status(200).send('OK');} catch (error) {console.error('Processing failed:', error);// Return 500 to trigger retryres.status(500).send('Processing failed');}});
Publishing Messages
const { PubSub } = require('@google-cloud/pubsub');const pubsub = new PubSub();async function publishOrder(order) {const topic = pubsub.topic('orders');const messageBuffer = Buffer.from(JSON.stringify(order));const messageId = await topic.publishMessage({data: messageBuffer,attributes: {type: 'order_created',priority: 'high'}});console.log(`Published message ${messageId}`);return messageId;}
Dead Letter Queue
# Create DLQ topicgcloud pubsub topics create orders-dlq# Update subscription with DLQgcloud pubsub subscriptions update orders-push \--dead-letter-topic orders-dlq \--max-delivery-attempts 5
Cloud SQL Connection Pattern
Connect Cloud Run to Cloud SQL securely
When to use: Need relational database,Migrating existing applications,Complex queries and transactions
# Deploy with Cloud SQL connectiongcloud run deploy my-service \--add-cloudsql-instances PROJECT:REGION:INSTANCE \--set-env-vars INSTANCE_CONNECTION_NAME="PROJECT:REGION:INSTANCE" \--set-env-vars DB_NAME="mydb" \--set-env-vars DB_USER="myuser"
// Using Unix socket connectionconst { Pool } = require('pg');const pool = new Pool({user: process.env.DB_USER,password: process.env.DB_PASS,database: process.env.DB_NAME,// Cloud SQL connector uses Unix sockethost: `/cloudsql/${process.env.INSTANCE_CONNECTION_NAME}`,max: 5, // Connection pool sizeidleTimeoutMillis: 30000,connectionTimeoutMillis: 10000,});app.get('/api/users', async (req, res) => {const client = await pool.connect();try {const result = await client.query('SELECT * FROM users LIMIT 100');res.json(result.rows);} finally {client.release();}});
# Python with SQLAlchemyimport osfrom sqlalchemy import create_enginedef get_engine():instance_connection_name = os.environ["INSTANCE_CONNECTION_NAME"]db_user = os.environ["DB_USER"]db_pass = os.environ["DB_PASS"]db_name = os.environ["DB_NAME"]engine = create_engine(f"postgresql+pg8000://{db_user}:{db_pass}@/{db_name}",connect_args={"unix_sock": f"/cloudsql/{instance_connection_name}/.s.PGSQL.5432"},pool_size=5,max_overflow=2,pool_timeout=30,pool_recycle=1800,)return engine
Best_practices
- Use connection pooling (max 5-10 per instance)
- Set appropriate idle timeouts
- Handle connection errors gracefully
- Consider Cloud SQL Proxy for local development
Secret Manager Integration
Securely manage secrets in Cloud Run
When to use: API keys, database passwords,Service account keys,Any sensitive configuration
# Create secretecho -n "my-secret-value" | gcloud secrets create my-secret --data-file=-# Mount as environment variablegcloud run deploy my-service \--update-secrets=API_KEY=my-secret:latest# Mount as file volumegcloud run deploy my-service \--update-secrets=/secrets/api-key=my-secret:latest
// Access mounted as environment variableconst apiKey = process.env.API_KEY;// Access mounted as fileconst fs = require('fs');const apiKey = fs.readFileSync('/secrets/api-key', 'utf8');// Access via Secret Manager API (when not mounted)const { SecretManagerServiceClient } = require('@google-cloud/secret-manager');const client = new SecretManagerServiceClient();async function getSecret(name) {const [version] = await client.accessSecretVersion({name: `projects/${projectId}/secrets/${name}/versions/latest`});return version.payload.data.toString();}
Sharp Edges
/tmp Filesystem Counts Against Memory
Severity: HIGH
Situation: Writing files to /tmp directory in Cloud Run
Symptoms: Container killed with OOM error. Memory usage spikes unexpectedly. File operations cause container restarts. "Container memory limit exceeded" in logs.
Why this breaks: Cloud Run uses an in-memory filesystem for /tmp. Any files written to /tmp consume memory from your container's allocation.
Common scenarios:
- Downloading files temporarily
- Creating temp processing files
- Libraries caching to /tmp
- Large log buffers
A 512MB container that downloads a 200MB file to /tmp only has ~300MB left for the application.
Recommended fix:
Calculate memory including /tmp usage
# cloudbuild.yamlsteps:- name: 'gcr.io/cloud-builders/gcloud'args:- 'run'- 'deploy'- 'my-service'- '--memory=1Gi' # Include /tmp overhead- '--image=gcr.io/$PROJECT_ID/my-service'
Stream instead of buffering
# BAD - buffers entire file in /tmpdef process_large_file(bucket_name, blob_name):blob = bucket.blob(blob_name)blob.download_to_filename('/tmp/large_file')with open('/tmp/large_file', 'rb') as f:process(f.read())# GOOD - stream processingdef process_large_file(bucket_name, blob_name):blob = bucket.blob(blob_name)with blob.open('rb') as f:for chunk in iter(lambda: f.read(8192), b''):process_chunk(chunk)
Use Cloud Storage for large files
from google.cloud import storagedef process_with_gcs(bucket_name, input_blob, output_blob):client = storage.Client()bucket = client.bucket(bucket_name)# Process directly to/from GCSinput_blob = bucket.blob(input_blob)output_blob = bucket.blob(output_blob)with input_blob.open('rb') as reader:with output_blob.open('wb') as writer:for chunk in iter(lambda: reader.read(65536), b''):processed = transform(chunk)writer.write(processed)
Monitor memory usage
import psutilimport loggingdef log_memory():memory = psutil.virtual_memory()logging.info(f"Memory: {memory.percent}% used, "f"{memory.available / 1024 / 1024:.0f}MB available")
Concurrency=1 Causes Scaling Bottlenecks
Severity: HIGH
Situation: Setting concurrency to 1 for request isolation
Symptoms: Auto-scaling creates many container instances. High latency during traffic spikes. Increased cold starts. Higher costs from more instances.
Why this breaks: Setting concurrency to 1 means each container handles only one request at a time. During traffic spikes:
- 100 concurrent requests = 100 container instances
- Each instance has cold start overhead
- More instances = higher costs
- Scaling takes time, requests queue up
This should only be used when:
- Processing is truly single-threaded
- Memory-heavy per-request processing
- Using thread-unsafe libraries
Recommended fix:
Set appropriate concurrency
# For I/O-bound workloads (most web apps)gcloud run deploy my-service \--concurrency=80 \--max-instances=100# For CPU-bound workloadsgcloud run deploy my-service \--concurrency=4 \--cpu=2# Only use 1 when absolutely necessarygcloud run deploy my-service \--concurrency=1 \--max-instances=1000 # Be prepared for many instances
Node.js - use async properly
// With high concurrency, ensure async operationsconst express = require('express');const app = express();app.get('/api/data', async (req, res) => {// All I/O should be asyncconst data = await fetchFromDatabase();const enriched = await enrichData(data);res.json(enriched);});// Concurrency 80+ is safe for async I/O workloads
Python - use async framework
from fastapi import FastAPIimport asyncioimport httpxapp = FastAPI()@app.get("/api/data")async def get_data():# Async I/O allows high concurrencyasync with httpx.AsyncClient() as client:response = await client.get("https://api.example.com/data")return response.json()# Concurrency 80+ safe with async framework
Calculate concurrency
concurrency = memory_limit / per_request_memoryExample:- 512MB container- 20MB per request overhead- Safe concurrency: ~25
CPU Throttled When Not Handling Requests
Severity: HIGH
Situation: Running background tasks or processing between requests
Symptoms: Background tasks run extremely slowly. Scheduled work doesn't complete. Metrics collection fails. Connection keep-alive breaks.
Why this breaks: By default, Cloud Run throttles CPU to near-zero when not actively handling a request. This is "CPU only during requests" mode.
Affected operations:
- Background threads
- Connection pool maintenance
- Metrics/telemetry emission
- Scheduled tasks within container
- Cleanup operations after response
Recommended fix:
Enable CPU always allocated
# CPU allocated even outside requestsgcloud run deploy my-service \--cpu-throttling=false \--min-instances=1# Note: This increases costs but enables background work
Use startup CPU boost for initialization
# Boost CPU during cold start onlygcloud run deploy my-service \--cpu-boost \--cpu-throttling=true # Default, throttle after request
Move background work to Cloud Tasks
from google.cloud import tasks_v2import jsondef create_background_task(payload):client = tasks_v2.CloudTasksClient()parent = client.queue_path("my-project", "us-central1", "my-queue")task = {"http_request": {"http_method": tasks_v2.HttpMethod.POST,"url": "https://my-service.run.app/process","body": json.dumps(payload).encode(),"headers": {"Content-Type": "application/json"}}}client.create_task(parent=parent, task=task)# Handle response immediately, background via Cloud Tasks@app.post("/api/order")async def create_order(order: Order):order_id = await save_order(order)# Queue background processingcreate_background_task({"order_id": order_id})return {"order_id": order_id, "status": "processing"}
Use Pub/Sub for async processing
# Move heavy processing to separate servicesteps:# Main service - responds quickly- name: 'gcr.io/cloud-builders/gcloud'args: ['run', 'deploy', 'api-service','--cpu-throttling=true']# Worker service - processes messages- name: 'gcr.io/cloud-builders/gcloud'args: ['run', 'deploy', 'worker-service','--cpu-throttling=false','--min-instances=1']
VPC Connector 10-Minute Idle Timeout
Severity: MEDIUM
Situation: Cloud Run service connecting to VPC resources
Symptoms: Connection errors after period of inactivity. "Connection reset" or "Connection refused" errors. Sporadic failures to VPC resources. Database connections drop unexpectedly.
Why this breaks: Cloud Run's VPC connector has a 10-minute idle timeout on connections. If a connection is idle for 10 minutes, it's silently closed.
Affects:
- Database connection pools
- Redis connections
- Internal API connections
- Any persistent VPC connection
Recommended fix:
Configure connection pool with keep-alive
# SQLAlchemy with connection recyclingfrom sqlalchemy import create_engineengine = create_engine(DATABASE_URL,pool_size=5,max_overflow=2,pool_recycle=300, # Recycle connections every 5 minutespool_pre_ping=True # Validate connection before use)
TCP keep-alive for custom connections
import socketsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60)sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)
Redis with connection validation
import redispool = redis.ConnectionPool(host=REDIS_HOST,port=6379,socket_keepalive=True,socket_keepalive_options={socket.TCP_KEEPIDLE: 60,socket.TCP_KEEPINTVL: 60,socket.TCP_KEEPCNT: 5},health_check_interval=30)client = redis.Redis(connection_pool=pool)
Use Cloud SQL Proxy sidecar
# Use Cloud SQL connector which handles reconnection# requirements.txtcloud-sql-python-connector[pg8000]
from google.cloud.sql.connector import Connectorimport sqlalchemyconnector = Connector()def getconn():return connector.connect("project:region:instance","pg8000",user="user",password="password",db="database")engine = sqlalchemy.create_engine("postgresql+pg8000://",creator=getconn)
Container Startup Timeout (4 minutes max)
Severity: HIGH
Situation: Deploying containers with slow initialization
Symptoms: Deployment fails with "Container failed to start". Service never becomes healthy. "Revision failed to become ready" errors. Works locally but fails on Cloud Run.
Why this breaks: Cloud Run expects your container to start listening on PORT within 4 minutes (240 seconds). If it doesn't, the instance is killed.
Common causes:
- Heavy framework initialization (ML models, etc.)
- Waiting for external dependencies at startup
- Large dependency loading
- Database migrations on startup
Recommended fix:
Enable startup CPU boost
gcloud run deploy my-service \--cpu-boost \--startup-cpu-boost
Lazy initialization
from functools import lru_cachefrom fastapi import FastAPIapp = FastAPI()# Don't load at import timemodel = None@lru_cache()def get_model():global modelif model is None:# Load on first request, not at startupmodel = load_heavy_model()return model@app.get("/predict")async def predict(data: dict):model = get_model() # Loads on first call onlyreturn model.predict(data)# Startup is fast - model loads on first request
Start listening immediately
import asynciofrom fastapi import FastAPIimport uvicornapp = FastAPI()# Global state for async initializationinitialized = asyncio.Event()@app.on_event("startup")async def startup():# Start background initializationasyncio.create_task(async_init())async def async_init():# Heavy initialization happens after server startsawait load_models()await warm_up_connections()initialized.set()@app.get("/ready")async def ready():if not initialized.is_set():raise HTTPException(503, "Still initializing")return {"status": "ready"}@app.get("/health")async def health():# Always respond - health check passesreturn {"status": "healthy"}
Use multi-stage builds
# Build stage - slowFROM python:3.11 as builderWORKDIR /appCOPY requirements.txt .RUN pip wheel --no-cache-dir --wheel-dir /wheels -r requirements.txt# Runtime stage - fast startupFROM python:3.11-slimWORKDIR /appCOPY --from=builder /wheels /wheelsRUN pip install --no-cache /wheels/* && rm -rf /wheelsCOPY . .CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
Run migrations separately
# Don't migrate on startup - use Cloud Buildsteps:# Run migrations first- name: 'gcr.io/cloud-builders/gcloud'entrypoint: 'bash'args:- '-c'- |gcloud run jobs execute migrate-job --wait# Then deploy- name: 'gcr.io/cloud-builders/gcloud'args: ['run', 'deploy', 'my-service', ...]
Second Generation Execution Environment Differences
Severity: MEDIUM
Situation: Migrating to or using Cloud Run second-gen execution environment
Symptoms: Network behavior changes. Different syscall support. File system behavior differences. Container behaves differently than in first-gen.
Why this breaks: Cloud Run's second-generation execution environment uses a different sandbox (gVisor) with different characteristics:
- More Linux syscalls supported
- Full /proc and /sys access
- Different network stack
- No automatic HTTPS redirect
- Different tmp filesystem behavior
Recommended fix:
Explicitly set execution environment
# First generation (legacy)gcloud run deploy my-service \--execution-environment=gen1# Second generation (recommended for most)gcloud run deploy my-service \--execution-environment=gen2
Handle network differences
# Second-gen doesn't auto-redirect HTTP to HTTPSfrom fastapi import FastAPI, Requestfrom fastapi.responses import RedirectResponseapp = FastAPI()@app.middleware("http")async def redirect_https(request: Request, call_next):# Check X-Forwarded-Proto headerif request.headers.get("X-Forwarded-Proto") == "http":url = request.url.replace(scheme="https")return RedirectResponse(url, status_code=301)return await call_next(request)
GPU access (second-gen only)
# GPUs only available in second-gengcloud run deploy ml-service \--execution-environment=gen2 \--gpu=1 \--gpu-type=nvidia-l4
Check execution environment
import osdef get_execution_environment():# Second-gen has different /proc structuretry:with open('/proc/version', 'r') as f:version = f.read()if 'gVisor' in version:return 'gen2'except:passreturn 'gen1'
Request Timeout Configuration Mismatch
Severity: MEDIUM
Situation: Long-running requests or background processing
Symptoms: Requests terminated before completion. 504 Gateway Timeout errors. Processing stops unexpectedly. Inconsistent timeout behavior.
Why this breaks: Cloud Run has multiple timeout configurations that must align:
- Request timeout (default 300s, max 3600s for HTTP, 60m for gRPC)
- Client timeout
- Downstream service timeouts
- Load balancer timeout (for external access)
Recommended fix:
Set consistent timeouts
# Increase request timeout (max 3600s for HTTP)gcloud run deploy my-service \--timeout=900 # 15 minutes
Handle long-running with webhooks
from fastapi import FastAPI, BackgroundTasksimport httpxapp = FastAPI()@app.post("/process")async def process(data: dict, background_tasks: BackgroundTasks):task_id = create_task_id()# Start background processingbackground_tasks.add_task(long_running_process,task_id,data,data.get("callback_url"))# Return immediatelyreturn {"task_id": task_id, "status": "processing"}async def long_running_process(task_id, data, callback_url):result = await heavy_computation(data)# Callback when doneif callback_url:async with httpx.AsyncClient() as client:await client.post(callback_url, json={"task_id": task_id,"result": result})
Use Cloud Tasks for reliable long-running
from google.cloud import tasks_v2def create_long_running_task(data):client = tasks_v2.CloudTasksClient()parent = client.queue_path(PROJECT, REGION, "long-tasks")task = {"http_request": {"http_method": tasks_v2.HttpMethod.POST,"url": "https://worker.run.app/process","body": json.dumps(data).encode(),"headers": {"Content-Type": "application/json"}},"dispatch_deadline": {"seconds": 1800} # 30 min}return client.create_task(parent=parent, task=task)
Streaming for long responses
from fastapi import FastAPIfrom fastapi.responses import StreamingResponse@app.get("/large-report")async def large_report():async def generate():for chunk in process_large_data():yield chunkreturn StreamingResponse(generate(), media_type="text/plain")
Validation Checks
Hardcoded GCP Credentials
Severity: ERROR
GCP credentials must never be hardcoded in source code
Message: Hardcoded GCP service account credentials. Use Secret Manager or Workload Identity.
GCP API Key in Source Code
Severity: ERROR
API keys should use Secret Manager
Message: Hardcoded GCP API key. Use Secret Manager.
Credentials JSON File in Repository
Severity: ERROR
Service account JSON files should not be in source control
Message: Credentials file detected. Add to .gitignore and use Secret Manager.
Running as Root User
Severity: WARNING
Containers should not run as root for security
Message: Dockerfile runs as root. Add USER directive for security.
Missing Health Check in Dockerfile
Severity: INFO
Cloud Run uses HTTP health checks, Dockerfile HEALTHCHECK is optional
Message: No HEALTHCHECK in Dockerfile. Cloud Run uses its own health checks.
Hardcoded Port in Application
Severity: WARNING
Port should come from PORT environment variable
Message: Hardcoded port. Use PORT environment variable for Cloud Run.
Large File Writes to /tmp
Severity: WARNING
/tmp uses container memory, large writes can cause OOM
Message: /tmp writes consume memory. Consider Cloud Storage for large files.
Synchronous File Operations
Severity: WARNING
Sync file ops block the event loop in async apps
Message: Synchronous file operations. Use async versions for better concurrency.
Global Mutable State
Severity: WARNING
Global state issues with concurrent requests
Message: Global mutable state may cause issues with concurrent requests.
Thread-Unsafe Singleton Pattern
Severity: WARNING
Singletons need thread safety for concurrency > 1
Message: Singleton pattern - ensure thread safety if using concurrency > 1.
Collaboration
Delegation Triggers
- user needs AWS serverless -> aws-serverless (Lambda, API Gateway, SAM)
- user needs Azure containers -> azure-functions (Azure Container Apps, Functions)
- user needs database design -> postgres-wizard (Cloud SQL design, AlloyDB)
- user needs authentication -> auth-specialist (Firebase Auth, Identity Platform)
- user needs AI integration -> llm-architect (Vertex AI, Cloud Run + LLM)
- user needs workflow orchestration -> workflow-automation (Cloud Workflows, Eventarc)
When to Use
Use this skill when the request clearly matches the capabilities and patterns described above.
Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.