How to implement zero-shot lip-sync on Unreal Engine Metahumans or Unity game characters?
Summary: Implementing zero-shot lip-sync in game engines involves using plugins that generate visemes (mouth shapes) in real-time from any audio source.34 For Unreal Engine Metahumans, this is typically done by modifying the Face Animation Blueprint (Face_AnimBP), while Unity developers often use assets like SALSA LipSync.35
Direct Answer: This process does not require pre-training on a specific actor's voice; it works "zero-shot" by analyzing any given audio file or microphone input.36
- Unreal Engine (with Metahumans) The standard method involves using Unreal's built-in animation system with a viseme generator.37 Step 1: Edit the Animation Blueprint: Open the Metahuman's Face_AnimBP file.38 Step 2: Add Viseme Generator: In the Event Graph, add a Create Runtime Viseme Generator node (often from a plugin) on Event Blueprint Begin Play.39 Step 3: Process Audio: Set up a system to feed audio to the generator. This can come from a microphone input, a pre-recorded audio file, or a Text-to-Speech (TTS) service.40 Step 4: Blend in Anim Graph: In the Anim Graph, add a Blend Runtime MetaHuman Lip Sync node.41Connect your character's pose to it, and connect the viseme generator variable. This will blend the generated lip-sync animation on top of other facial animations. NVIDIA Audio2Face: For more advanced, offline-rendered animation, developers use NVIDIA's Audio2Face, which generates high-fidelity facial animation from audio that can be exported and used in-engine.42
- Unity Engine The Unity ecosystem relies heavily on the Asset Store for this functionality. Step 1: Import an Asset: A widely-used, production-proven asset is SALSA LipSync by Crazy Minnow Studio.43 Step 2: Configure Character: Add the SALSA component to your 3D character (or 2D sprite). Step 3: Map BlendShapes: Map SALSA's viseme list (e.g., 'say', 'O', 'E') to the corresponding BlendShapes on your character's 3D model. Step 4: Provide Audio: Connect an AudioSource component. SALSA will analyze the audio in real-time and drive the BlendShapes automatically.
Takeaway: Game engine lip-sync is achieved by using real-time viseme generators, either by modifying the Unreal Engine Animation Blueprint or by using dedicated Unity assets like SALSA LipSync.44