

Meta has introduced SAM Audio, a new open-source AI model in its Segment Anything Model (SAM) family. The model is designed to identify, separate, and isolate specific sounds from complex audio mixtures. SAM Audio can perform audio editing using text prompts, visual cues, or time stamps, automating tasks that traditionally required specialised tools and manual effort. It is released under Meta’s SAM Licence, allowing both research and commercial use.
According to Meta, SAM Audio is a unified audio AI model that understands three types of prompts. With text prompts, users can describe sounds such as “background noise” or “music.” Visual prompts allow users to click on an object or person in a video to isolate the sound coming from that source. Time-based prompts let users mark a specific section of the audio timeline to target and extract a particular sound, making precise audio separation easier and faster.
SAM Audio is available for download via Meta’s website, GitHub, and Hugging Face, and can also be tested online through the Segment Anything Playground. Technically, it works as a generative separation model that extracts target and residual audio stems using advanced diffusion and audio encoding techniques. Early testing suggests the model is fast and efficient, offering a powerful new approach to AI-driven audio editing.













Comments (0)
No comments yet
Be the first to comment!