Neurosymbolic Approach to Processing of Educational Texts for Educational Standard Compliance Analysis

Authors

DOI:

https://doi.org/10.14529/jsfi250310

Keywords:

topic modeling, keyword extraction, symbolic NLP, large language model, textbook analysis

Abstract

This article presents a neurosymbolic approach for analyzing the alignment between textbook content and educational standards. The study addresses the problem of assessing terminological coherence by evaluating a corpus of textbooks against the Russian Federal State Educational Standard. We employ a hybrid methodology combining classical symbolic NLP methods for topic modeling (keyword extraction and term alignment) with qualitative analysis and use of modern large language models for items not found algorithmically. The experimental results on a corpus of 5 textbooks on Physics for the 7th grade and corresponding educational standard indicate a mean coverage of standard topics of 71% across all textbooks with use of the symbolic methods. Application of large language model (ChatGPT 5) for the qualitative analysis recovered 51% keywords initially missed by the abovementioned methods. The findings are relevant for researchers in educational linguistics, computational linguistics, curriculum developers, and textbook authors. The proposed pipeline offers a scalable tool for automating analysis of educational content compliance, reducing the workload for manual expert assessment. This work contributes to the development of AI-assisted methodologies in educational standard alignment and textbook quality control.

References

Alier, M., Casañ, M.J., Filvà, D.A.: Smart Learning Applications: Leveraging LLMs for Contextualized and Ethical Educational Technology. In: Gonçalves, J.A.d.C., Lima, J.L.S.d.M., Coelho, J.P., et al. (eds.) Proceedings of TEEM 2023. pp. 190–199. Springer Nature Singapore, Singapore (2024). https://doi.org/10.1007/978-981-97-1814-6_18

Baddour, M., Paquelet, S., Rollier, P., et al.: Phenotypes Extraction from Text: Analysis and Perspective in the LLM Era. In: 2024 IEEE 12th International Conference on Intelligent Systems (IS). pp. 1–8 (2024). https://doi.org/10.1109/IS61756.2024.10705235

Chataut, S., Do, T., Gurung, B.D.S., et al.: Comparative Study of Domain Driven Terms Extraction Using Large Language Models (2024), https://arxiv.org/abs/2404.02330

Esfahani, M.N.: Content Analysis of Textbooks via Natural Language Processing. American Journal of Education and Practice 8(4), 36–54 (2024). https://doi.org/10.47672/ajep.2252

Foisy, L.O.M., Proulx, E., Cadieux, H., et al.: Prompting the Machine: Introducing an LLM Data Extraction Method for Social Scientists. Social Science Computer Review (2025). https://doi.org/10.1177/08944393251344865

Galushko, I.N.: Historical documents classification using BERT: LLM and historical domain. Perm University Herald. History 2(69), 147–158 (Jun 2025). https://doi.org/10.17072/2219-3111-2025-2-147-158

Gilbert, M., Crutchfield, A., Luo, B., et al.: Using a Large Language Model (LLM) for Automated Extraction of Discrete Elements from Clinical Notes for Creation of Cancer Databases. International Journal of Radiation Oncology*Biology*Physics 120(2, Supplement), e625 (2024). https://doi.org/10.1016/j.ijrobp.2024.07.1375

Glazkova, A.V., Morozov, D.A., Vorobeva, M.S., et al.: Keyword Generation for Russian-Language Scientific Texts Using the mT5 Model. Autom. Control Comput. Sci. 58(7), 995–1002 (Dec 2024). https://doi.org/10.3103/S014641162470041X

Glazkova, A., Morozov, D., Garipov, T.: Key Algorithms for Keyphrase Generation: Instruction-Based LLMs for Russian Scientific Keyphrases. In: Panchenko, A., Gubanov, D., Khachay, M., et al. (eds.) Analysis of Images, Social Networks and Texts. pp. 107–119. Springer Nature Switzerland, Cham (2025). https://doi.org/10.1007/978-3-031-88036-0_5

Grootendorst, M.: KeyBERT: Minimal Keyword Extraction with BERT. https://www.maartengrootendorst.com/blog/keybert (2020), accessed: 2025-08-29

Jo, T.: Keyword Extraction. In: Text Mining: Concepts, Implementation, and Big Data Challenge, pp. 421–443. Springer Nature Switzerland, Cham (2024). https://doi.org/10.1007/978-3-031-75976-5_20

Kapoor, S., Gil, A., Bhaduri, S., et al.: Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling (2024), https://arxiv.org/abs/2409.15626

Kuhn, J.: Computational text analysis within the Humanities: How to combine working practices from the contributing fields? Language Resources and Evaluation 53(4), 565–602 (Dec 2019). https://doi.org/10.1007/s10579-019-09459-3

Liu, J., Shang, Z., Ke, W., et al.: LLM-Guided Semantic-Aware Clustering for Topic Modeling. In: Che, W., Nabende, J., Shutova, E., et al. (eds.) Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 18420–18435. Association for Computational Linguistics, Vienna, Austria (Jul 2025). https://doi.org/10.18653/v1/2025.acl-long.902

Maragheh, R.Y., Fang, C., Irugu, C.C., et al.: LLM-TAKE: Theme-Aware Keyword Extraction Using Large Language Models. In: 2023 IEEE International Conference on Big Data (BigData). pp. 4318–4324 (2023). https://doi.org/10.1109/BigData59044.2023.10386476

Monakhov, S.I., Turchanenko, V.V., Cherdakov, D.N.: Terminology use in school textbooks: corpus analysis. RUDN Journal of Language Studies, Semiotics and Semantics 14(3), 437–456 (2023). https://doi.org/10.18413/2313-8912-2023-9-1-0-3

Mu, Y., Dong, C., Bontcheva, K., et al.: Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling. In: Calzolari, N., Kan, M.Y., Hoste, V., et al. (eds.) Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). pp. 10160–10171. ELRA and ICCL, Torino, Italia (May 2024), https://aclanthology.org/2024.lrec-main.887/

Papagiannopoulou, E., Tsoumakas, G.: A review of keyphrase extraction. WIREs Data Mining and Knowledge Discovery 10(2), e1339 (2020). https://doi.org/10.1002/widm.1339

Peng, R., Liu, K., Yang, P., et al.: Embedding-based Retrieval with LLM for Effective Agriculture Information Extracting from Unstructured Data (2023), https://arxiv.org/abs/2308.03107

Priyanshu, A., Vijay, S.: AdaptKeyBERT: An Attention-Based approach towards Few-Shot & Zero-Shot Domain Adaptation of KeyBERT (2022), https://arxiv.org/abs/2211.07499

Sakhovskiy, A., Tutubalina, E., Solovyev, V., et al.: Topic Modeling as a Method of Educational Text Structuring. In: 2020 13th International Conference on Developments in eSystems Engineering (DeSE). pp. 399–405 (2020). https://doi.org/10.1109/DeSE51703.2020.9450232

Solnyshkina, M.I., Solovyev, V.D., Gafiyatova, E.V., et al.: Text complexity as interdisciplinary problem. Issues of Cognitive Linguistics (1), 18–39 (2022). https://doi.org/10.20916/1812-3228-2022-1-18-39

Solovyev, V., Solnyshkina, M., Tutubalina, E.: Topic Modeling for Text Structure Assessment: The case of Russian Academic Texts. Journal of Language and Education 9(3), 143–158 (Sep 2023). https://doi.org/10.17323/jle.2023.16604

Song, M., Jiang, H., Shi, S., et al.: Is ChatGPT A Good Keyphrase Generator? A Preliminary Study (2023), https://arxiv.org/abs/2303.13001

Steiss, J., Tate, T., Graham, S., et al.: Comparing the quality of human and ChatGPT feedback of students’ writing. Learning and Instruction 91, 101894 (2024). https://doi.org/10.1016/j.learninstruc.2024.101894

Sukying, A., Barrot, J.S.: Friend or Foe? Investigating the Alignment of English Language Teaching (ELT) Textbooks with the National English Curriculum Standards. The Asia-Pacific Education Researcher 34(2), 793–801 (Apr 2025). https://doi.org/10.1007/s40299-024-00896-5

Tunyan, E.G., Sazikov, R.S., Kharlamov, S.A.: Automatic Extraction of Keywords and Summaries for Knowledge Base Population. Russian Journal of Cybernetics 6(2), 108–113 (Jun 2025), https://en.jcyb.ru/nisii_tech/article/view/413

Vanin, A., Bolshev, V., Panfilova, A.: Applying LLM and Topic Modelling in Psychotherapeutic Contexts (2024), https://arxiv.org/abs/2412.17449

Wu, X., Nguyen, T., Luu, A.T.: A survey on neural topic models: methods, applications, and challenges. Artificial Intelligence Review 57(2), 18 (Jan 2024). https://doi.org/10.1007/s10462-023-10661-7

Yang, J.: Integrated application of LLM model and knowledge graph in medical text mining and knowledge extraction. Social Medicine and Health Management 5(2), 56–62 (2024). https://doi.org/10.23977/socmhm.2024.050208

Yang, X., Zhao, H., Phung, D., et al.: LLM Reading Tea Leaves: Automatically Evaluating Topic Models with Large Language Models. Transactions of the Association for Computational Linguistics 13, 357–375 (2025). https://doi.org/10.1162/tacl_a_00744

Zagatti, F.R., Lucrédio, D., Caseli, H.d.M.: Unsupervised Statistical Keyword Extraction Pipeline: Is LLM All You Need? In: Paes, A., Verri, F.A.N. (eds.) Intelligent Systems. pp. 460–475. Springer Nature Switzerland, Cham (2025). https://doi.org/10.1007/978-3-031-79032-4_32

Zhang, T., Kishore, V., Wu, F., et al.: BERTScore: Evaluating Text Generation with BERT. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net (2020), https://openreview.net/forum?id=SkeHuCVFDr

Downloads

Published

2025-12-25

How to Cite

Prokopyev, N. A., Solnyshkina, M. I., & Solovyev, V. D. (2025). Neurosymbolic Approach to Processing of Educational Texts for Educational Standard Compliance Analysis. Supercomputing Frontiers and Innovations, 12(3), 141–156. https://doi.org/10.14529/jsfi250310