An adaptive n-gram transformer for multi-scale scene text recognition
Yan X, Fang Z, Jin Y (2023)
Knowledge-Based Systems: 110964.
Zeitschriftenaufsatz
| Veröffentlicht | Englisch
Download
Es wurden keine Dateien hochgeladen. Nur Publikationsnachweis!
Autor*in
Abstract / Bemerkung
While vision transformers have been highly successful in improving the performance in image-based tasks, not much work has been reported on applying transformers to scene text recognition due to the complexities in the visual appearance of multi-scale texts. To fill the gap, this paper proposes an adaptive n-gram transformer for multi-scale scene text recognition (ANT-STR). In ANT-STR, an adaptive n-gram embedding that is able to automatically determine the optimal size of each image patch is designed to fully explore the potential semantic correlations between neighboring visual patches, which is essential for feature extraction from multi-scale scene texts. On top of the adaptive n-gram embedding, a patch-based n-gram attention mechanism is introduced into ANT-STR to further process the feature maps for multi-scale texts. In addition, the loss function is rectified to take into account both multi-scale character-based identification and contextual coherence scoring. Comparative studies are conducted on five widely used benchmark datasets and a new multi-scale scene text dataset collected from tourism scenes in Indonesia. Our experimental results demonstrate that ANT-STR performs considerably better compared to the state-of-the-art, especially in handling complex multi-scale scene texts.
Erscheinungsjahr
2023
Zeitschriftentitel
Knowledge-Based Systems
Art.-Nr.
110964
ISSN
09507051
Page URI
https://pub.uni-bielefeld.de/record/2982971
Zitieren
Yan X, Fang Z, Jin Y. An adaptive n-gram transformer for multi-scale scene text recognition. Knowledge-Based Systems. 2023: 110964.
Yan, X., Fang, Z., & Jin, Y. (2023). An adaptive n-gram transformer for multi-scale scene text recognition. Knowledge-Based Systems, 110964. https://doi.org/10.1016/j.knosys.2023.110964
Yan, Xueming, Fang, Zhihang, and Jin, Yaochu. 2023. “An adaptive n-gram transformer for multi-scale scene text recognition”. Knowledge-Based Systems: 110964.
Yan, X., Fang, Z., and Jin, Y. (2023). An adaptive n-gram transformer for multi-scale scene text recognition. Knowledge-Based Systems:110964.
Yan, X., Fang, Z., & Jin, Y., 2023. An adaptive n-gram transformer for multi-scale scene text recognition. Knowledge-Based Systems, : 110964.
X. Yan, Z. Fang, and Y. Jin, “An adaptive n-gram transformer for multi-scale scene text recognition”, Knowledge-Based Systems, 2023, : 110964.
Yan, X., Fang, Z., Jin, Y.: An adaptive n-gram transformer for multi-scale scene text recognition. Knowledge-Based Systems. : 110964 (2023).
Yan, Xueming, Fang, Zhihang, and Jin, Yaochu. “An adaptive n-gram transformer for multi-scale scene text recognition”. Knowledge-Based Systems (2023): 110964.
Export
Markieren/ Markierung löschen
Markierte Publikationen
Web of Science
Dieser Datensatz im Web of Science®Suchen in