A Hierarchical Cluster Tree Approach Leveraging Delaunay Triangulation
DOI:
https://doi.org/10.18662/brain/14.3/482Keywords:
hierarchical clustering, document image layout analysis, Delaunay triangulation, cluster tree formation, layout element segmentation, advanced document image processingAbstract
This research introduces a robust and reliable technique for structuring document image pages hierarchically, harnessing the power of Delaunay triangulation. Central to our approach is the formation of a cluster tree, which encapsulates the page's content through strategically exploiting layout elements arrangements and their relative distances. By applying our technique, we proficiently categorize the page into distinct clusters encompassing images, titles, and paragraphs. The consequent hierarchical framework, founded on the cluster tree, establishes a durable and trustworthy blueprint of the document layout, thereby accelerating document comprehension and examination.
References
Nagy, G., Seth, S. & Viswanathan. K. (2004). A prototype document image analysis system for technical journals. Computer, 7(25), 10–22.
Baird, H. S. (2002). Document image defect models and their uses. Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR ’93). Tsukuba Science City. doi:10.1109/icdar.1993.395781
Kise, K., Sato, A. & Iwata, M. (1998). Segmentation of page images using the area Voronoi diagram. Computer Vision and Image Understanding: CVIU, 70(3), 370–382. doi:10.1006/cviu.1998.0684
Wang, J., Bourbakis, N. G. & Triantafyllidis, G. A. (1999). Document image analysis using Voronoi tessellation. IEEE International Conference on Systems, Man, and Cybernetics, 1, 230–234.
Zhu, G. & Doermann, D. (2007). Automatic Document Logo Detection. Ninth International Conference on Document Analysis and Recognition (ICDAR 2007). 2. doi:10.1109/icdar.2007.4377038
Qumsiyeh, R. J. (1995). Line detection in document images. In International Conference on Image Processing . 477–480.
Jain, A.K. & Dubes, R.C. (1988). Algorithms for clustering data. Prentice-Hall
Lee, D. T. & Schachter, B. J. (1980). Two algorithms for constructing a Delaunay triangulation. International Journal of Computer & Information Sciences, 9(3), 219–242. doi:10.1007/bf00977785
Lee, D. T. & Lin, A. K. (1986). Generalized delaunay triangulation for planar graphs. Discrete & Computational Geometry, 1(3), 201–217. doi:10.1007/bf02187695
Guibas, L. J. & Stolfi, J. (1985). Primitives for the manipulation of general subdivisions and the computation of Voronoi diagrams. ACM Transactions on Graphics, 11(3). doi:10.1145/800061.808751
Wenyin, L. & Dori, D. (1997). A protocol for performance evaluation of line detection algorithms. Machine Vision and Applications, 9(5–6), 240–250. doi:10.1007/s001380050045
Jain, A. K. & Yu, B. (1998). Document representation and its application to page decomposition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 294–308. doi:10.1109/34.667886
Baird, H. S. (2005). The State of the Art of Document Image Degradation Modeling,” in Document Analysis Systems VI DAS 2004. Lecture Notes in Computer Science, B. H. and S. A.L, Eds., Berlin, Heidelberg: Springer.
Sarkar, Prateek, Baird, H. S. & Zhang, X. (2004). Training on severely degraded text-line images. Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings. doi:10.1109/icdar.2003.1227624
Sarkar, P. & Bhowmick, P. (2011). An Approach to Document Image Block Classification and Grouping Using ICA. 2011 International Conference on Document Analysis and Recognition.
Gong, Y. & Liu, X. (2016). Document Clustering via Matrix Representation. 2016 International Conference on Pattern Recognition (ICPR).
Coustaty, M., Bertet, K., Visani, M. & Ogier, J. M. (2011). A New Multi-layered Bitmap Model for Document Image Analysis. 2011 International Conference on Document Analysis and Recognition.
Boiangiu, C. A., Cananau, D. C. & Bucur, I. (2008). A Hierarchical Clustering Method Aimed at Document Layout Understanding and Analysis. International Journal of Mathematical Models and Methods in Applied Sciences, 2(1), 413–422.
Otsu N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics,9(1), 62–66.
Sauvola, J. & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern Recognition,33(2), 225–236 doi:10.1016/s0031-3203(99)00055-2.
Suzuki, S. & Be, K. (1985). Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing, 30(1), 32–46. doi:10.1016/0734-189x(85)90016-7
Gan, G., Ma, C. & Wu, J. (2007). Data Clustering: Theory, Algorithms, and Applications. SIAM.
Watson, D. F. (1981). Computing the n-dimensional Delaunay tessellation with application to Voronoi polytopes. The Computer Journal, 24(2), 167–172. doi:10.1093/comjnl/24.2.167
Haralick, R. M. & Shapiro, L. G. (1992). Computer and Robot Vision, Volume II. Upper Saddle River, NJ: Pearson.
Doermann, D. &Tombre, K. (1998). Progress in Pattern Recognition, Image Analysis and Applications. Springer-Verlag.
Louloudis, G., Gatos, B., Pratikakis, I. &Halatsis, C. (2009). Text line and word segmentation of handwritten documents. Pattern Recognition, 42(12), 3169–3183. doi:10.1016/j.patcog.2008.12.016
Chen, H., Bloomberg, D. S. & Baird, H. S. (1999). Document Image Defect Models and Their Uses,” in Document Analysis Systems. DAS 1998. Lecture Notes in Computer Science. Springer.
Kong, Y., Franke, K. &Rosenhahn, B. (2010). Using Graph Cuts for Document Layout Extraction. Advances in Visual Computing. ISVC 2010. Springer.
Antonacopoulos, A. (1998). Page segmentation using the description of the background. Computer Vision and Image Understanding: CVIU, 70(3), 350–369. doi:10.1006/cviu.1998.0691
Malisiewicz, T., Gupta, A. & Efros, A. A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. 2011 International Conference on Computer Vision. doi:10.1109/iccv.2011.6126229
Ren, S., He, K., Girshick, R. & Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. doi:10.1109/TPAMI.2016.2577031
Bodla, N., Singh, B., Chellappa, R. & Davis, L. S. (2017). Soft-NMS -- improving object detection with one line of code. http://arxiv.org/abs/1704.04503
Hosang, J., Benenson, R., Dollár, P. & Schiele, B. (2016). What makes for effective detection proposals? IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(4), 814–830. doi:10.1109/TPAMI.2015.2465908
He, P., Cai, Z., Tian, X. & Zuo, W. (2019). AP Loss for Multi-box Detection. In Proceedings of the British Machine Vision Conference. BMVC.
Sharma, A., Ojha, U. & Govindaraju, V. (2013). Adapting state-of-the-art printed document layout analysis techniques for handwriting identification. Pattern Recognition, 46(3), 881–895.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 The Authors & LUMEN Publishing House
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant this journal right of first publication, with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work, with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g. post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g. in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as an earlier and greater citation of published work (See The Effect of Open Access).
BRAIN. Broad Research in Artificial Intelligence and Neuroscience Journal has an Attribution-NonCommercial-NoDerivs
CC BY-NC-ND