Keyframer: Apple’s new AI-editor for generating animations with text input | Tech News

Researchers at Apple have published a new paper detailing their prototype AI-powered animation tool named Keyframer. Apple said that Keyframer uses OpenAI’s GPT-4 model to generate animated illustrations from 2D images based on the user’s prompt. Additionally, it has multiple editing modes, allowing users to directly edit generated animations or explicitly prompt the AI tool to tweak the animation in a certain manner. 


According to the research paper, Keyframer utilises Large Language Model’s (LLM) capability of generating codes and working on text prompts. Keyframer can take a Scalable Vector Graphic (SVG) file, which is a 2D image format, as input and generate CSS code to animate the input image as per the user’s request. CSS code essentially describes how elements and subjects should appear on screen. Keyframer uses SVG format as these images can be scaled and resized without any loss in quality. 


For editing the generated animations, Apple said that the tool provides two formats: a Code Editor, where the user can edit the CSS code directly, and a Properties Editor, which has a “dynamically created UI-layer” for editing CSS properties. The Property Editor format is designed specifically for users who are less familiar with coding. Apple said that this format is modelled after UI elements from more prominent graphic editing software such as Adobe Illustrator.


Keyframer is not available publicly and is currently in its early stages of development. In its current state, the keyframer tool has only been tested for simple animation tasks such as for generating loading sequences and visualising data. Generating complex animations using simple text prompts is not possible as of now. 


Earlier this month, Apple published another research paper explaining their MLLM-Guided Image Editing (MGIE) AI Model, which is capable of editing an image using text prompts. Apple’s MGIE model can effectively handle a wide range of editing scenarios, from simple colour adjustments to more complex object manipulations. 

The MGIE model consists of a Multimodal Large Language Model that expands users request and provides “concise expressive instructions” that the diffusion model uses to edit the input image. According to the research paper, this way of editing allows the MGIE model to address ambiguous commands by the user to achieve the desired output.


MGIE is available as an open-source project on GitHub and can be downloaded with code, data and pre-trained models.

Originally appeared on: TheSpuzz

iSlumped