面包屑图标 当前位置: 首页
AI资讯
热点详情

olmOCR-7B:高效开源文档提取专用模型

AI热点日报
AI热点日报时间:2026-07-04
热点解读

文档提取始终是AI处理领域的一大难题——PDF与扫描图像中的内容看似简单,但想要干净利落地还原为纯文本,常常会出现各种状况。olmOCR-7B的发布,为这一领域带来了突破性进展。它基于Qwen2-VL-7B-Instruct模型,在25万页数据集上进行了针对性微调,核心目标就是:将PDF和文档图像高

文档提取始终是AI处理领域的一大难题——PDF与扫描图像中的内容看似简单,但想要干净利落地还原为纯文本,常常会出现各种状况。olmOCR-7B的发布,为这一领域带来了突破性进展。它基于Qwen2-VL-7B-Instruct模型,在25万页数据集上进行了针对性微调,核心目标就是:将PDF和文档图像高效转换为清晰、结构化的纯文本。下面我们将深入解析其强大之处,并与现有主流工具进行对比,看看它的实际优势究竟有多大。

olmOCR-7B:文档提取专用模型

olmOCR-7B:领先的PDF转文本与文档提取模型

语言模型始终依赖于纯文本进行训练、推理和服务,文本质量直接决定了最终效果。噪声文本会导致训练不稳定、模型性能下降,甚至对用户请求输出混乱的结果。但关键问题是,大量有价值的数据并非以干净的网页格式存在,而是隐藏在PDF等电子文档中。PDF的设计初衷是在固定大小的页面上渲染内容,而非保留逻辑文本结构,因此解析起来极为棘手:字符编码、位置信息、格式标记交织在一起,要正确恢复标题、段落、表格、方程,并按阅读顺序排列,难度相当大。

1、olmOCR模型核心能力概述

为了攻克这一难题,研究团队推出了olmOCR——一个专门将PDF和文档图像转换为干净结构化纯文本的高性能工具包。它究竟有哪些与众不同之处?

  • 卓越性能:在25万页数据集上完成微调,数据来源涵盖各类PDF,包括数字原生文档和公共领域书籍扫描件。覆盖范围广泛,提取准确度有充分保障。
  • 极致成本效益:使用olmOCR工具包处理一百万页PDF,成本仅约190美元,相当于使用GPT-4o API批量处理相同页面的三十二分之一。这一成本优势足以让许多团队重新评估API方案。
  • Markdown输出格式:输出采用Markdown格式,解析和处理非常便捷。能够应对方程、表格、手写文字,并准确按照正确阅读顺序处理最复杂的多列布局。
  • 开箱即用体验:针对SGLang和vLLM推理引擎进行了全面优化,从单个GPU到数百个GPU均可高效扩展,内置了应对常见解析失败和元数据错误的启发式方法。
  • 完全开源透明:基于Qwen2-VL-7B-Instruct构建,模型权重、微调数据集、训练和推理代码全部开源,没有任何保留。

下面我们直接拿olmOCR与其他几款主流文档提取工具进行对比,看看实际效果差距究竟有多大。

2、olmOCR vs. 其他文档提取工具:实测对比

通过样本文档对比,重点关注处理质量的关键差异。点击不同标签可查看各工具的输出结果。

2.1 手写信件识别

olmOCR

Executive Mansion,

Washington City,

January 15th, 1864

Major General Hitchcock, Commissioner of Exchanges, is authorized and directed to offer Brigadier General Trimble, now a prisoner of war in Fort McHenry, in exchange for Major White, who is held as a prisoner at Richmond. He is also directed to send forward the offer of exchange by Henry M. Warfield, Esq. of Baltimore, under a flag of truce, and give him a pass to City Point.

Abraham Lincoln

Marker

necuhve Mansion Vastington amany layor Seneral Hitchcocks Commissioner of Cachanges, is anthonged and directed to offer Bingadier General Trin prisoner of war in Fort Inctienny, in exchange now w Major White, who is held as a preises at Richmond Ite is also directed to vand forwards the offer of exchange by Stenny in. Warfield, Eag. of Baltimore, under aflag 11 mice, and give him apass to tity Point. Abrakan Sincolus 

GOT OCR 2.0

43571
Bachington City
January 10th 1864.
Major General Architect, Commissioner of aivachangera
is authorized and directed by ffeed Bngader General Trelmble,
new a firemen of war in Fert nchery in exchange for
Mayor White, who held a a firemen at Hillmannd.
He is aker conducted by end forward the offer of exchange
by Henry in. Warfield, Lag. of Balthmore, under a flag
of three, and five him afaies to City Bink.
Abraham Lincoln

MinerU

No text detected.

2.2 数学教科书文本提取

一本包含方程式的微积分教科书,测试工具对数学公式的还原能力。

olmOCR

3.4 EXERCISES

For the following exercises, the given functions represent the position of a particle tra veling along a horizontal line.

a. Find the velocity and acceleration functions.

b. Determine the time intervals when the object is slowing down or speeding up.

150. ( s(t) = 2t^3 - 3t^2 - 12t + 8 )

151. ( s(t) = 2t^3 - 15t^2 + 36t - 10 )

152. ( s(t) = rac{t}{1 + t^2} )

153. A rocket is fired vertically upward from the ground. The distance s in feet that the rocket tra vels from the ground after t seconds is given by ( s(t) = -16t^2 + 560t ).

a. Find the velocity of the rocket 3 seconds after being fired.

b. Find the acceleration of the rocket 3 seconds after being fired.

154. A ball is thrown downward with a speed of 8 ft/s from the top of a 64-foot-tall building. After t seconds, its height above the ground is given by ( s(t) = -16t^2 - 8t + 64 ).

a. Determine how long it takes for the ball to hit the ground.

b. Determine the velocity of the ball when it hits the ground.

155. The position function ( s(t) = t^2 - 3t - 4 ) represents the position of the back of a car backing out of a driveway and then driving in a straight line, where s is in feet and t is in seconds. In this case, ( s(t) = 0 ) represents the time at which the back of the car is at the garage door, so ( s(0) = -4 ) is the starting position of the car, 4 feet inside the garage.

a. Determine the velocity of the car when ( s(t) = 0 ).

b. Determine the velocity of the car when ( s(t) = 14 ).

156. The position of a hummingbird flying along a straight line in t seconds is given by ( s(t) = 3t^3 - 7t ) meters.

a. Determine the velocity of the bird at ( t = 1 ) sec.

b. Determine the acceleration of the bird at ( t = 1 ) sec.

c. Determine the acceleration of the bird when the velocity equals 0.

157. A potato is launched vertically upward with an initial velocity of 100 ft/s from a potato gun at the top of an 85-foot-tall building. The distance in feet that the potato tra vels from the ground after t seconds is given by ( s(t) = -16t^2 + 100t + 85 ).

a. Find the velocity of the potato after 0.5 s and 5.75 s.

b. Find the speed of the potato at 0.5 s and 5.75 s.

c. Determine when the potato reaches its maximum height.

d. Find the acceleration of the potato at 0.5 s and 1.5 s.

e. Determine how long the potato is in the air.

f. Determine the velocity of the potato upon hitting the ground.

158. The position function ( s(t) = t^3 - 8t ) gives the position in miles of a freight train where east is the positive direction and t is measured in hours.

a. Determine the direction the train is tra veling when ( s(t) = 0 ).

b. Determine the direction the train is tra veling when ( a(t) = 0 ).

c. Determine the time intervals when the train is slowing down or speeding up.

159. The following graph shows the position ( y = s(t) ) of an object moving along a straight line.

a. Use the graph of the position function to determine the time intervals when the velocity is positive, negative, or zero.

b. Sketch the graph of the velocity function.

c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.

d. Determine the time intervals when the object is speeding up or slowing down.

Marker

- a. Determine the direction the train is tra veling when *s*(*t*) = 0.
- b. Determine the direction the train is tra veling when *a*(*t*) = 0.
- c. Determine the time intervals when the train is slowing down or speeding up.

159. The following graph shows the position *y* = *s*(*t*) of an object moving along a straight line.

![](_page_0_Figure_34.jpeg)

- negative, or zero. b. Sketch the graph of the velocity function.
- c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.
- d. Determine the time intervals when the object is speeding up or slowing down.

GOT OCR 2.0

Chapter 3 | Derivatives
273
3.4 EXERCISES
For the following exercises, the given functions represent
the position of a particle tra veling along a horizontal line.
a.
Find the velocity and acceleration functions.
b.
Determine the time intervals when the object is
slowing down or speeding up.
150.
s(t) = 2t3 −3t2 −12t + 8
151.
s(t) = 2t3 −15t2 + 36t −10
152.
s(t) =
t
1 + t2
153.
A rocket is fired vertically upward from the ground.
The distance s in feet that the rocket tra vels from the
ground after t seconds is given by s(t) = −16t2 + 560t.
a.
Find the velocity of the rocket 3 seconds after being
fired.
b.
Find the acceleration of the rocket 3 seconds after
being fired.
154.
A ball is thrown downward with a speed of 8 ft/
s from the top of a 64-foot-tall building. After t seconds,
its height above the ground is given by s(t) = −16t2 −8t + 64.
a.
Determine how long it takes for the ball to hit the
ground.
b.
Determine the velocity of the ball when it hits the
ground.
155.
The position function s(t) = t2 −3t −4 represents
the position of the back of a car backing out of a driveway
and then driving in a straight line, where s is in feet and
t is in seconds. In this case, s(t) = 0 represents the time
at which the back of the car is at the garage door, so
s(0) = −4 is the starting position of the car, 4 feet inside
the garage.
a.
Determine the velocity of the car when s(t) = 0.
b.
Determine the velocity of the car when s(t) = 14.
156.
The position of a hummingbird flying along a straight
line in t seconds is given by s(t) = 3t3 −7t
2
2
2
...
2
2
2
a.
Use the graph of the position function to determine
the time intervals when the velocity is positive,
negative, or zero.
b.
Sketch the graph of the velocity function.
c.
Use the graph of the velocity function to determine
the time intervals when the acceleration is positive,
negative, or zero.
d.
Determine the time intervals when the object is
speeding up or slowing down.
157.
A potato is launched vertically upward with an initial
velocity of 100 ft/s from a potato gun at the top of an
85-foot-tall building. The distance in feet that the potato
tra vels from the ground after t seconds is given by
s(t) = −16t2 + 100t + 85.
a.
Find the velocity of the potato after 0.5 s and
5.75 s.
b.
Find the speed of the potato at 0.5 s and 5.75 s.
c.
Determine when the potato reaches its maximum
height.
d.
Find the acceleration of the potato at 0.5 s and 1.5
s.
e.
Determine how long the potato is in the air.
f.
Determine the velocity of the potato upon hitting
the ground.
158.
The position function s(t) = t3 −8t gives the
position in miles of a freight train where east is the positive
direction and t is measured in hours.
a.
Determine the direction the train is tra veling when
s(t) = 0.
b.
Determine the direction the train is tra veling when
a(t) = 0.
c.
Determine the time intervals when the train is
slowing down or speeding up.
159.
The following graph shows the position y = s(t) of
an object moving along a straight line.
155.
The position of a hummingbird flying along a straight
line in t seconds is given by s(t) = 3t3 −7t
2
2 3
3
3
.....
125.5
126
126.5

MinerU

a. Determine the direction the train is tra veling when $s(t)=0$ .  
b. Determine the direction the train is tra veling when $a(t)=0$ .
c. Determine the time intervals when the train is slowing down or speeding up.

159. The following graph shows the position $y=s(t)$ of an object moving along a straight line.

a. Use the graph of the position function to determine the time intervals when the velocity is positive, negative, or zero.
b. Sketch the graph of the velocity function.
c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.
d. Determine the time intervals when the object is speeding up or slowing down.

2.3 历史文档还原

一份古老的历史文献,文字褪色且光照条件不佳,考验模型的抗干扰能力。

olmOCR

Christians beha ving themselves like Mahomedans.

4. The natives soon had reason to suspect the viceroy's sincerity in his expressions of regret at the proceedings of which they complained. For about this time the Dominican friars, under pretence of building a convent, erected a fortress on the island of Solor, which, as soon as finished, the viceroy garrisoned with a strong force. The natives very naturally felt indignant at this additional encroachment, and took every opportunity to attack the garrison. The monks, forgetful of their peaceable profession, took an active part in these skirmishes, and many of them fell sword in hand.

The Mahomedan faith has been appropriately entitled, The religion of the sword; and with equal propriety may we so designate the religion of these belligerent friars. The Portuguese writers give an account of one of their missionaries, Fernando Vinagre, who was as prompt in the field of battle as at the baptismal font. This man, though a secular priest, undertook the command of a squadron that was sent to the assistance of the rajah of Tidore, on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour, at another in a surplice; and even occasionally, baptizing the converts of his sword without putting off his armour, but covering it with his ecclesiastical vest. In this crusade he had two

Maker

## **IN INDIA *** BOOK TI. S69
Christians beha ving themselves like Ma borne- a. dans.3 . extquotedblleft5/0-
*t>.*

The natives soon had reason to suspect the viceroy, viceroy's sincerity in his expressions of regret at the proceedings of which they complained. extquotedblleft n. extquotedblleft' For about this time the Dominican friars, under pretence of building a. convent, erected a fortress on the island of Sol or, which, as soon as finished, the viceroy garrisoned with a strong force. The natives' very naturally felt indig-S nant at this additional encroachment, and took every opportunity to attack the garrison. The monks, forgetful/ of their peaceable profession, took an active part in these skirmishes, and many of tbg.tr fell sword in hand.

The i'lfinomedan faith has been appropriately entitled., extquoteleft The religion of the sword extquoteright,; and with equal propriety may we so designate the re- . i'gv.m of these belligerent friars. The Portugu writers give an account of one of their extquoteleft missionaries, extquoteright Fernando Vinagre, who was as prompt in the field of battle as at the baptismal font. This man, though a secular priest, undertook the command of a squadron that was I sent to the assistance of the rajah of Tidore,4 on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour, ; at another in a surplice; and even occasionally, baptizing the converts of his sword without putting off his armour, but covering it with his ecclesiastical vest. In this crusade5 he had two
> 3 Geddes History, &c., pp. 24---27. Pudet hae c opprobria nobis Vel dici potuisse.
> 4 Called extquoteleft T a d u ra extquoteright or extquoteleft D a c o, extquoteright an island in the Indian Ocean, one of the Moluccas
> 5 extquoteleft These extquoteleft a la D ra g o o n extquoteright conversions. extquoteright Geddes' History, p. 27.

GOT OCR 2.0

 IN INDIA:  BOOK U 269 Christians beha ving themselves like Mahome-  1670.  4. The natives son had reason to suspect the Viceroy' s vice roy' s sincerity in his expressions of regret in s in e eri ty at the proceedings of which they complained.  fl it ars.  For about this time the Dominican f mars, under pre ten ce of building a convent, erected a for-  tress on the island of Sol or, which, as soon as finished, the vice roy garrisoned with a strong force. The natives very naturally felt indig-  nant at this additional encroachment, and took every opportunity to attack the garrison. The monks, forgetful of their peaceable profession,  took an active part in these skirmishes, and many of the n fell sword in hand.  The Mh on med an faith has been appropriately entitled. The religion of the sword; and with e ral Tropriety may we so designate the re-  gian of these belligerent friars. The Port u-  gue s writers give an account of one of their mission are s, Fer endo Vina gre, who was as prompt in the fe ld of battle as at the baptismal font. This man, though a secular priest, un-  der took the command of a squadron that was sent to the assistance of the rajah of Tidore, on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour,  at another in a surplice; and even occasionally,  baptizing the converts of his sword without put-  ting off his armour, but covering it with his ecclesiastical vest. In this crusade he had two 3 Ged des History, & c. , pp. 24-27.  P ude th aec opp rob ria nobis Vel die ipo tui sse.  Called Tadur u or Daco, an island in the Indian Ocean,  one of the Mol ucc as These a laDra goon conversions. Ged des History, p. 27.

MinerU

 ININDIASY BOOKU
Christians bcha ving.themselves like Mahome dans.3

4.The natives soon had reason to suspect ihe viceroy's sincerity in his expressions of regret at the proceedings of which they complained. For about this time the Dominican friars,under pretenceof building a convent,erected a for tress on the island of Solorwhich,as soon as finishedthe viceroy garrisoned with a strong force. The natives very naturally felt indig nant at this additional encroachment, and took every pportunity to attack the garrison.The monks,forgetful of their peaceable profession took an activa part in these skirmishes, and many of tbein feil sword in hand.

TheMahornedan faithhas been appropriately ntitled.The religion of the swordand with equal propriety may we so designate the region of these belligerent friars.The Portugueswriters give an account of one of their missionarzes,femando Vinagre,who was as prompt in the field of battle as at the baptismal font. This man, though a secular priest, undertook the command of a squadron that was sent to the assistance of the rajah of Tidore,4 on which occasion he is said to ha ve acted in the twofold capacity of a great commander, and a great apostle, at one time appearing in armour, at another in a surplice;and even occasionally baptizing the converts of his sword without put ting off his armour, but covering it with his ecclesiastical vest.In this crusadehe had two

3、olmOCR的构建方法

要训练olmOCR,首先必须解决高质量训练数据的获取问题。研究团队开发了一种名为“文档锚定”的技术——简单来说,就是充分利用PDF文件中自带的文本和元数据,以提升提取质量。

图1:文档锚定在典型页面上的工作原理示例

该方法会提取相关的图像位置和文本块,然后将它们拼接并插入到模型提示中。当提示视觉语言模型(VLM)获取文档的纯文本版本时,模型会同时参考锚定的文本和页面的栅格化图像。

借助文档锚定技术,团队使用GPT-4o对25万页进行了标注,数据来自网络爬取的公开PDF和互联网档案馆扫描的公共领域书籍。数据分布十分多元:60%学术论文、12%小册子、11%法律文件、6%图表、5%幻灯片、4%其他类型。

在训练模型本身时,团队对Qwen2-VL-7B-Instruct检查点进行了微调,并精心优化了大规模批处理推理管道。采用SGLang,使得olmOCR转换一百万页PDF仅需190美元——大约是GPT-4o API成本的1/32。结果不仅大幅降低了成本,在人类评估中,olmOCR也优于其他流行的OCR工具。

图2:olmOCR与其他流行工具的ELO评分对比

评估方面,团队将olmOCR的输出与Marker、MinerU、GOT-OCR 2.0进行了对比。收集了11名研究人员的成对判断,从2,017份PDF中采样并得到452次有意义的比较,计算ELO评分。olmOCR的ELO得分超过1800,显著领先所有竞争对手。在直接比较中,olmOCR在61.3%的情况下优于Marker,58.6%优于GOT-OCR,71.4%优于MinerU——生成干净、结构良好文本的能力确实突出。

更多评估细节可以查阅技术报告。

4、如何获取olmOCR

首次发布的olmOCR包含演示、模型权重、微调数据集、一份简短技术报告,以及最重要的——高效推理管道。

访问GitHub仓库可以安装olmOCR并探索文档。在有GPU的机器上,只需运行以下命令:

python -m olmocr.pipeline ./localworkspace --pdfs tests/gnarly_pdfs/horribleocr.pdf

团队计划尽快发布更多定量基准测试,以帮助开发更好的PDF提取模型并评估它们的性能。

原文链接:https://olmocr.allenai.org/blog

热点追踪提示词
你是一名 AI 行业编辑,请围绕下面这条热点输出一份资讯解读:
热点:olmOCR-7B:高效开源文档提取专用模型要求:
1. 先用一句话解释这条热点在讲什么
2. 再总结它为什么重要
3. 说明会影响哪些 AI 产品或内容方向
4. 最后给出 3 个适合资讯站使用的标题
来源:https://www.53ai.com/news/finetuning/2025032721587.html
ai 人工智能

游乐网为非赢利性网站,所展示的游戏/软件/文章内容均来自于互联网或第三方用户上传分享,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系youleyoucom@outlook.com。

相关热点
AI热点2026-07-04 18:54
OpenAI开放GPT-4o定制功能,企业轻松打造专属AI助手

8月21日,OpenAI首次开放企业客户用自有数据微调旗舰模型GPT-4o,支持文本数据,训练约1-2小时。此前仅可微调较小模型,此举大幅降低定制门槛,无需第三方服务,企业可快速实现个性化AI应用。

AI热点2026-07-04 18:54
免费AI旅行规划工具一键定制行程

免费AI旅行规划工具,可快速生成个性化定制行程,适合个人与家庭出行。能处理开放式问题,提供全面路线、亲子活动和悠闲节奏方案,并支持在线预订机票住宿,同时提供丰富旅行灵感及详细攻略。

AI热点2026-07-04 18:53
ClicKarma AI驱动Google广告防护解决方案

需求人群 首先,这类工具主要面向哪些用户?答案很明确——任何投放Google广告、因无效点击和恶意竞争而焦头烂额的广告主。核心痛点集中在以下三个方面: 保护Google广告免受恶意点击侵害,简单说就是防止竞争对手或机器人白白消耗你的广告预算。 确保广告预算仅用于真实用户的互动,每一分钱都必须具备真实

AI热点2026-07-04 18:53
微软知识探索API自然语言交互式搜索结构化数据

说到结构化数据的交互式搜索,许多团队都面临一个尴尬的局面:数据整理得井井有条,但用户想要查询信息,还得编写复杂的查询语句。有没有一种方式,能让用户直接用自然语言提问,系统就能自动理解并返回精准结果?答案是肯定的——Microsoft Knowledge Exploration API正是为此而设计的

延伸阅读