SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network

Zichuan Liu, Yixing Li, Fengbo Ren, Wang Ling Goh, Hao Yu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.

Original languageEnglish (US)
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PublisherAAAI press
Pages7194-7201
Number of pages8
ISBN (Electronic)9781577358008
StatePublished - Jan 1 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: Feb 2 2018Feb 7 2018

Other

Other32nd AAAI Conference on Artificial Intelligence, AAAI 2018
CountryUnited States
CityNew Orleans
Period2/2/182/7/18

Fingerprint

Recurrent neural networks
Data storage equipment
Processing

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Liu, Z., Li, Y., Ren, F., Goh, W. L., & Yu, H. (2018). SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (pp. 7194-7201). AAAI press.

SqueezedText : A real-time scene text recognition by binary convolutional encoder-decoder network. / Liu, Zichuan; Li, Yixing; Ren, Fengbo; Goh, Wang Ling; Yu, Hao.

32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI press, 2018. p. 7194-7201.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, Z, Li, Y, Ren, F, Goh, WL & Yu, H 2018, SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network. in 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI press, pp. 7194-7201, 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, United States, 2/2/18.
Liu Z, Li Y, Ren F, Goh WL, Yu H. SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI press. 2018. p. 7194-7201
Liu, Zichuan ; Li, Yixing ; Ren, Fengbo ; Goh, Wang Ling ; Yu, Hao. / SqueezedText : A real-time scene text recognition by binary convolutional encoder-decoder network. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI press, 2018. pp. 7194-7201
@inproceedings{fd38bf76f898407abdf07a765cdec757,
title = "SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network",
abstract = "A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.",
author = "Zichuan Liu and Yixing Li and Fengbo Ren and Goh, {Wang Ling} and Hao Yu",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
pages = "7194--7201",
booktitle = "32nd AAAI Conference on Artificial Intelligence, AAAI 2018",
publisher = "AAAI press",

}

TY - GEN

T1 - SqueezedText

T2 - A real-time scene text recognition by binary convolutional encoder-decoder network

AU - Liu, Zichuan

AU - Li, Yixing

AU - Ren, Fengbo

AU - Goh, Wang Ling

AU - Yu, Hao

PY - 2018/1/1

Y1 - 2018/1/1

N2 - A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.

AB - A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.

UR - http://www.scopus.com/inward/record.url?scp=85051479641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051479641&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85051479641

SP - 7194

EP - 7201

BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

PB - AAAI press

ER -