Skip to content

Latest commit

 

History

History
29 lines (25 loc) · 3.63 KB

abstract.md

File metadata and controls

29 lines (25 loc) · 3.63 KB

总结

摘要:顾名思义,是对通篇文章的总结。一般形式是首先引出问题(Introduction),然后简述相关相关工作(Related work),为了解决该问题,我们提出了什么什么方法(Method),然后实验结论(Experiments)等。上述有些部分可以在摘要里一笔带过,也可以不提,但是Method/Experiments是通篇的灵魂,是必须介绍的。

case分析

如下,找到一些比较好的摘要,逐个分析,最后抽象出一套固定的模版。

引出问题

Previous scene text detection methods have progressed substantially over the past years. However, limited by the receptive field of CNNs and the simple representations like rectangle bounding box or quadrangle adopted to describe text, previous methods may fall short when dealing with more challenging text instances, such as extremely long text and arbitrarily shaped text.

解决问题

To address these two problems, we present a novel text detector namely LOMO, which localizes the text progressively for multiple times (or in other word, LOok More than Once).

方法详解

LOMO consists of a direct regressor (DR), an interative refinement module (IRM) and a shape expression module (SEM). At first, text proposals in the form of quadrangle are generated by DR branch. Next, IRM progressively perceives the entire long text by iterative refinement based on the extracted feature blocks of preliminary proposals. Finally, a SEM is introduced to reconstruct more precise representation of irregular text by considering the geometry properties of text instance, including text region, text center line and border offsets.

实验结果

The state-of-the art results on serveral public benchmarks including ICDAR2017-RCTW, SCUT-CTW1500, Total-Text, ICDAR2015 and ICDAR17-MLT confirm the striking robustness and effectiveness of LOMO.

引出问题

Previous approaches for scene text detection have already achieved promising performance across various benchmarks. However, they usually fall short when dealing with challenging scenarios, even when equipped with deep neural network models, because the overall performance is determined by the interplay of multiple stages and components in the pipelines.

解决问题

In this work, we propose a simple yet powerful pipeline that yields fast and accurate text detection in natural scenes.

方法详解

The pipeline directly predicts words or text lines of arbitrary orientations and quadrilateral shapes in full images, eliminating unnecessary intermediate steps (eg., candidate aggregation and word partitioning), with a single neural network. The simplicity of our pipeline allows concentrating efforts on designing loss functions and neural network architecture.

实验结果

Experiments on standard datasets including ICDAR 2015, COCO-Text and MSRA-TD500 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods in terms of both accuracy and efficiency. On the ICDAR 2015 dataset, the proposed algorithm achieves an F-score of 0.7829 at 13.2fps at 720p resolution.

点评

scenarios v.s. scenes: 前者更倾向case,后者更倾向场景。