On Monday, Tencent, the Chinese Internet giant known for its video gaming empire and chat app WeChat, unveiled a new version of its open-source video generation model DynamicsCrafter on GitHub. It's a reminder that some of China's biggest tech firms are quietly accelerating efforts to make a dent in the text and image-to-video space.
Like other video production tools on the market, DynamicCrafter uses the diffusion method to turn captions and still images into seconds-long videos. Inspired by the natural phenomenon of diffusion in physics, diffusion models in machine learning can transform simple data into more complex and realistic data, similar to how one moves from an area of high density to another area of low density.
The second-generation Dynacrafter is churning out videos at 640×1024 pixel resolution, an upgrade from its initial release in October that featured 320×512 videos. An academic paper published by the team behind DynamicsCrafter claims that its technology differs from competitor technologies in that it extends the applicability of image animation techniques to “more general visual content.”
“The key idea is to use movement before text-to-video diffusion models by including the image as a guide in the production process,” the paper says. “Traditional” methods, by comparison, “mainly focus on animating natural scenes with animate dynamics (eg clouds and fluid) or domain-specific movements (eg: human hair or body movements).”
In a demo comparing DynamicsCrafter, Stable Video Diffusion (launched in November), and the recently hyped Pica Labs (see below), the Tencent model's result looks a bit more animated than the others. Inevitably, the models chosen favor DynamicsCrafter, and after my first few attempts none of the models left the impression that the AI could make full-fledged movies anytime soon.
However, generative video has high hopes of being the next focal point in the AI race after the boom of generative text and images. Hence startups and tech incumbents are expected to pour resources into this sector. This is no exception in China. In addition to Tencent, TikTok's parent ByteDance, Baidu and Alibaba have each released their video diffusion models.
ByteDance's MagicVideo and Baidu's UniVG have both posted demos on GitHub, though neither is available to the public. Like Tencent, Alibaba has made its video generation model VGen open source, which has become increasingly popular among Chinese tech firms hoping to reach the global developer community.