

AI startup DeepSeek has introduced an innovative open-source model called DeepSeek-OCR, designed to revolutionize how artificial intelligence understands and processes written text. Released on Monday, this new approach transforms how large language models (LLMs) handle long documents and complex text inputs.Instead of analyzing text line by line, DeepSeek-OCR converts plain text into 2D pixel maps, essentially turning words into images. This lets AI models “see” text rather than just “read” it — a process the company calls Context Optical Compression. The technique helps compress massive amounts of data into compact “vision tokens,” which are far smaller and faster to process than traditional text tokens.
For instance, a 1,000-word article could be represented with just 100 vision tokens. This makes the system more efficient, reduces memory load, and allows AI to handle much larger and more detailed documents without losing context. The model works by capturing an image of a document, using a custom vision encoder to break it into patches, and then generating condensed visual representations. A decoder then interprets these to reconstruct the meaning.AI expert Andrej Karpathy, co-founder of OpenAI, praised the project, calling it a major step forward in efficiency and reasoning. He also noted its potential to eventually remove the need for tokenizers altogether.DeepSeek-OCR is currently available on GitHub, where it quickly gained over 6,700 stars within a day of release. It’s open-source under the MIT license, making it free for both academic and commercial use.













Comments (0)
No comments yet
Be the first to comment!