IIT-Madras develops AI models for processing text in 11 Indian regional languages

The project was collaborated under a joint initiative with AI4Bharat

IIT-madras-AI4Bharat-indian-languages-AI

TED NewsDesk, CHENNAI. The faculty of the Indian Institute of Technology Madras (IIT-M) has developed Artificial Intelligence (AI) models and datasets to process texts in 11 Indian languages. It is a joint initiative with “AI4Bharat,” a platform for building AI solutions for problems relevant to the country, a release from IIT-M said on Tuesday.

This open-source tool, completely free of cost, can be downloaded here.

“The multilingual AI models and datasets developed through this initiative will provide the essential building blocks to students, faculty, start-ups and industry to work on Indian language tools and push the frontiers of technology,” the release said.

Researchers from the institute and AI4Bharat released AI models and datasets for the following languages- Tamil, Hindi, Malayalam, Telugu, Kannada, Punjabi, Bengali, Odia, Assamese, Gujarati and Marathi.

As India moves towards a digital economy, Indian languages must find a space online, as per Mitesh M Khapra, Assistant Professor in the Department of Computer Science and Engineering, IIT-M. “This requires a lot of innovation in creating input tools, datasets, and AI models for Indian languages,” he said.

“For example, imagine a learner who posts a question on an e-learning platform in Tamil or Hindi or any other numerous Indian regional languages. There is a need for tools that can automatically process such questions written in Indian languages and classify them into specific topics,” he added. Such tools were already available for English and other foreign languages but not for Indian ones.

AI4Bharat is an initiative by Khapra and Pratyush Kumar, Assistant Professor in the Department of Computer Science and Engineering, IIT Madras. It functions to provide solutions to India-specific issues in a community-driven, open-sourced manner, the release added. Kumar said that the initiative “is one of the few attempts in academia” to develop and publicly release large scale multilingual AI models containing millions of parameters trained on billions of tokens from 11 Indian languages, completely free and open-source. 

Initiatives and innovative solutions like these aim to solve not only India-centric issues but also mark the strides taken in the field of AI in the country.

 

Sources: Hindustan Times, Indian Express