SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network

Zichuan Liu; Yixing Li; Fengbo Ren; Wang Ling Goh; Hao Yu

SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network

Zichuan Liu, Yixing Li, Fengbo Ren, Wang Ling Goh, Hao Yu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.

Original language	English (US)
Title of host publication	32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Publisher	AAAI press
Pages	7194-7201
Number of pages	8
ISBN (Electronic)	9781577358008
State	Published - 2018
Event	32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States Duration: Feb 2 2018 → Feb 7 2018

Publication series

Name	32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Other

Other	32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Country/Territory	United States
City	New Orleans
Period	2/2/18 → 2/7/18

ASJC Scopus subject areas

Artificial Intelligence

Cite this

SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network. / Liu, Zichuan; Li, Yixing; Ren, Fengbo et al.
32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI press, 2018. p. 7194-7201 (32nd AAAI Conference on Artificial Intelligence, AAAI 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liu, Z, Li, Y, Ren, F, Goh, WL & Yu, H 2018, SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network. in 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, AAAI press, pp. 7194-7201, 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, United States, 2/2/18.

@inproceedings{fd38bf76f898407abdf07a765cdec757,

title = "SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network",

abstract = "A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.",

author = "Zichuan Liu and Yixing Li and Fengbo Ren and Goh, {Wang Ling} and Hao Yu",

note = "Funding Information: The work by Arizona State University is supported by Cisco Research Center (CG#594589). Publisher Copyright: Copyright {\textcopyright} 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 ; Conference date: 02-02-2018 Through 07-02-2018",

year = "2018",

language = "English (US)",

series = "32nd AAAI Conference on Artificial Intelligence, AAAI 2018",

publisher = "AAAI press",

pages = "7194--7201",

booktitle = "32nd AAAI Conference on Artificial Intelligence, AAAI 2018",

}

TY - GEN

T1 - SqueezedText

T2 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

AU - Liu, Zichuan

AU - Li, Yixing

AU - Ren, Fengbo

AU - Goh, Wang Ling

AU - Yu, Hao

N1 - Funding Information: The work by Arizona State University is supported by Cisco Research Center (CG#594589). Publisher Copyright: Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

PY - 2018

Y1 - 2018

N2 - A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.

AB - A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoder-decoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs character-level sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1,000,000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0.86, precision of 0.88 and F-score of 0.87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1.01 MB for B-CEDNet and 3.23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.

UR - http://www.scopus.com/inward/record.url?scp=85051479641&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051479641&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85051479641

T3 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

SP - 7194

EP - 7201

BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

PB - AAAI press

Y2 - 2 February 2018 through 7 February 2018

ER -

SqueezedText: A real-time scene text recognition by binary convolutional encoder-decoder network

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this