Shravani Limited Releases VideoAvatar v2: The 420k Update

ST HELENS, UK – Shravani Limited today announces the immediate availability of VideoAvatar UK Voice Engine v2. Following the success of the initial release, this major update represents a "Perfection Phase" for the engine, pushing beyond standard text-to-speech limits to capture the subtle nuances of UK dialects—from the glottal stops of a London accent to the melodic lilt of a Welsh speaker.

Key Innovations to the Core

Most open-source models plateau early in their training. We pushed v2 to a massive 420,000 updates, allowing the model to learn not just the sounds of speech, but the microscopic pitch shifts, breath patterns, and emotional inflections that define truly human interaction.

"With v2, we aren't just synthesizing speech; we're synthesizing the 'soul' of a region. It's not about sounding perfect; it's about sounding local. The jump to 420k steps has unlocked a level of warmth and authenticity we hadn't seen before."

— Harish S. Agawane, Founder

Technical Breakdown

The v2 engine is built on the advanced F5-TTS (Flow Matching) architecture, a significant upgrade over previous diffusion-based models.

420k Updates

Extensive training for superior emotional range and stability.

Instant Cloning

Zero-Shot cloning with just 3-10 seconds of reference audio.

F5-TTS Arch

Flow Matching for faster inference and better prosody.

Open Source

Released under CC-BY-NC-SA 4.0 for community innovation.

Validated Performance

The model has been rigorously validated against the CSTR VCTK Corpus (109 native speakers) and the Mozilla Common Voice dataset to ensure it generalizes well across diverse recording conditions and accent variations.

Source code available on GitHub

Model weights released on Hugging Face

Experience VideoAvatar v2

The weights are available now for researchers and developers. Join us in building the next generation of voice AI.

Hugging Face Model GitHub Repository Visit VideoAvatar.ai

VIDEOAVATAR UK VOICE ENGINE V2