Artificial intelligence keeps surprising us. Just when we thought large language models (LLMs) were all about reading and writing text, new research is showing they can also learn directly from images — even from the tiny pixels that make up a picture.
A recent study called DeepSeek-OCR takes this idea further. It’s designed to read text from images, like a super-smart version of the scanners that turn printed pages into digital files. But instead of just converting pictures into text, DeepSeek-OCR lets the model understand the pixels themselves. That raises an exciting question: could future AI models skip words entirely and just “think in pixels”?
This idea builds on a trend known as multimodal AI, where systems can handle more than one kind of input — for example, both pictures and text. OpenAI’s GPT-4o, released back in May 2024, was already doing this, and was much better at understanding context because of it.
But there’s another reason researchers are looking for change: cost. Training and running huge AI models takes enormous computing power. A McKinsey report in June 2024 found that AI training costs have been growing by about 20 percent each year. To keep progress affordable, scientists are exploring compression techniques — ways to make models smaller and faster without losing smarts.
One interesting example is ChunkLLM, a lightweight system that speeds up long-text processing by breaking data into small, meaningful chunks. Instead of wasting power re-reading everything, it learns when and where to focus attention — a clever shortcut that saves time and memory.
It’s a pattern we’ve seen before. In the early days of the semiconductor industry, engineers used scan compression to test chips faster and cheaper while keeping performance high. Now, AI researchers are doing something similar: compressing how models learn and think.
From compressed circuits to compressed thoughts, the goal stays the same — do more with less. And maybe, just maybe, the next big leap in AI won’t come from more data, but from smarter ways of seeing and thinking.
REFERENCES
Haoran Wei, Yaofeng Sun, Yukun Li [2510.18234] DeepSeek-OCR: Contexts Optical Compression arXiv:2510.18234 [cs.CV] https://doi.org/10.48550/arXiv.2510.18234 [v1] Tue, 21 Oct 2025 02:41:44 UTC (7,007 KB)
Haojie Ouyang, Jianwei Lv, Lei Ren, Chen Wei, Xiaojie Wang, Fangxiang Feng [2510.02361] ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference arXiv:2510.02361 [cs.CL] https://doi.org/10.48550/arXiv.2510.02361 [v1] Sun, 28 Sep 2025 11:04:00 UTC (427 KB)

 
 
 
 
No comments:
Post a Comment