AVANTES: New solutions in software development to reduce the gap between natural and programming languages

Program
Program for Development of Projects in the Field of Artificial Intelligence

Article author and Principal Investigator of the Project:
Dr. Boško Nikolić
Full Professor, School of Electrical Engineering, University of Belgrade

Dr. Boško Nikolić, Full Professor, School of Electrical Engineering, University of Belgrade

Considering that we can combine two different worlds of human action and two different ways of thinking and working, we connected the methodologies of technical and social sciences and launched the AVANTES project (Advancing Novel Textual Similarity-based Solutions in Software Development). The University of Belgrade School of Electrical Engineering and the Faculty of Philology have joined forces, that is an institution that started Software Engineering studies 20 years ago and is one of our oldest higher education institutions dealing with the Serbian language. Our goal was to reduce the gap between natural and programming languages and bring software development and writing programming code closer to the communication we have in natural languages, such as Serbian and English. The importance of such a project was also recognized by the Science Fund of the Republic of Serbia and was funded under the Program for Development of projects in the Field of Artificial Intelligence.

Working in modern-day software companies is dynamic and changeable. Fairly frequently developers change teams or companies. Then a newly hired developer joins a new team and happens to be unfamiliar with the current software development and design used within that team. It also happens that two teams or two companies merge into one during the development of software products and accept the code written by one of them. In such cases, there may be a problem of inadequate use of already existing programme code. Also, there is a practical problem of copyright infringement, when someone else’s code is used in a software product without a license and the owner’s knowledge. In all these examples there is a need to recognize the similarity between different programme codes.

Under the auspices of our project, we have developed a new software tool for similarity recognition based on code similarity and program comment similarity, which during software development helps in recognizing similar parts of code, models, and databases. In this way, the effort of maintaining software systems, reusing testing, or correcting similar errors is minimized. We have recognized and defined the relationship between descriptions of code, models and databases, written in natural language, and the software components they describe. Also, we found a similarity between the natural and programming languages through which the used elements are defined.

Implementing our tools is based on today’s very popular field of artificial intelligence, Natural Language Processing (NLP). It is a field that studies methods for computer processing and interpretation of textual data written in one of the natural languages. Semantic problems in the processing of natural languages are important for understanding these questions, or tasks aimed at correctly understanding the meaning of texts. We were particularly interested in the semantic similarity of texts of different lengths (cross-level semantic similarity) when two texts of different lengths have the same semantic meaning. We dealt with this in Serbian and English languages, and we conducted research in two domains – newspaper articles and comments in the program code. The second question we solved was semantic code search. This procedure aims to find a code that semantically corresponds to a natural language query.

During the project implementation, we adhered to the principle of openness and public availability of scientific results. The obtained results have been presented in top international journals with open access and international conferences and thus are available to the wider international research community. In addition, we would like to underline that the first models and publicly available datasets for semantic similarity of texts of different lengths and semantic code search for the Serbian language were created.

Our project can have an impact on a wide range of software development stakeholders. Innovations in natural language processing technologies applied to the Serbian language have been implemented. Researched topics, such as semantic similarities of texts of different lengths and semantic searches of code, are very relevant to new-generation software development methods and represent very promising research areas. Our following activities, which directly and indirectly resulted from the described project, speak in support of this.

Our laboratory – Belgrade Data Innovation Hub – which, among other things, presented the results and datasets of the AVANTES project, this year received a silver plaque for contributions to the European data community in seven areas – infrastructure and technological development, services, projects and applications, impact on the ecosystem, business strategy and sustainability, opportunities within the European data federation and ethics, by the European Big Data Value Association (BDVA), with the support of the European Commission. The COMtext.SR project aims to develop a basic set of resources and tools for automatic processing of texts in the Serbian language, both for ekavian and ijekavian dialects, which will be publicly available under a license that allows their use for any purpose, including commercial ones. The focus is on domains of texts that have not been considered so far in publicly available academic or commercial resources and tools for the Serbian language, such as legal-administrative, financial, medical, etc. The project is implemented by a consortium made by our Innovation Centre and the ReLDI Language Data Centre, with the financial support of domestic and foreign companies and foundations. The first goal was achieved in 2023, namely the improvement of text search in legal-administrative documents. The well-known German company Henkel recognized the new opportunities offered by our research and a long-term business cooperation was established which entails the provision of consulting services for the automation of the integration of business software systems in this company.

The support of the Science Fund has proven to be invaluable. The positive decision, along with the evaluations and detailed comments of international experts, gave us the know-how and guidelines for our research, as well as experience on how to define and present our future projects. A special incentive for scientific research was given to the youngest members of the team, who have since defended two doctorates and submitted four doctoral topics.

Program
Program for Development of Projects in the Field of Artificial Intelligence

Project Budget:
EUR 198,261

Scientific and Research Organizations:

School of Electrical Engineering, University of Belgrade
Innovation Centre of the School of Electrical Engineering
Faculty of Philology, University of Belgrade

Project team members:

Dr. Zaharije Radivojević, Associate Professor, School of Electrical Engineering, University of Belgrade
Dr. Dražen Dr.ašković, Associate Professor, School of Electrical Engineering, University of Belgrade
Dr. Vuk Batanović, Researcher, Innovation Center of the School of Electrical Engineering, University of Belgrade
MA Vladimir Jocović, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Tamara Šekularac, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Marko Mićović, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Uroš Radenković, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Jelica Cincović, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Jelica Cincović, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Adrian Milaković, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Dušan Stojković, Teaching Assistant, School of Electrical Engineering, University of Belgrade
MA Aleksa Srblјanović, Teaching Assistant, School of Electrical Engineering, University of Belgrade
Dr. Maja Miličević Petrović, Associate Professor, Department of Interpreting and Translation, University of Bologna
Dr. Radoslava Trnavac, Associate Professor, Faculty of Philology, University of Belgrade
Dr. Tanja Samardžić, Researcher, Institute of Computational Linguistics, University of Zurich
Dr. Borko Kovačević, Associate Professor, Faculty of Philology, University of Belgrade