Endangered languages of India in cyberspace


back to issue

SEVERAL scholars predict that almost 90% of the world’s languages will disappear by the end of the 21st century thereby reducing the number from about 7000 to 700 or so. As is well known, at least 96% of the living languages that are either official languages of a nation or a dominant language of a region are spoken by only 4% of its people. Based on the criteria developed by UNESCO,1 2471 languages are endangered,2 out of which 197 of them are spoken in India that makes it the country with the most endangered languages in the world. One might wonder how these languages would survive in this era of advanced information and communications technology given the fact that English and other high-resourced languages dominate the scene. Though this situation is typical in many countries, our focus here is only on such languages spoken in India.

The dynamics underlying the language endangerment in India are more complex than those prevailing in the rest of the world. Of the total population, 96.71% speak one of the 22 scheduled languages as mother tongues while 3.29% speak ‘other’ tongues.3 It is the tiny part of India, namely, the North East, that houses most of the non-scheduled languages (at least 62 out of 99) without which the linguistic diversity of the country would be half of what it is now.

There are many languages labelled as ‘tribal’4 languages in India. Except Bodo and Santali, none of them are scheduled languages. As stated in the Census of India (2011), India has about 8.6% of tribal population who are the speakers of most of these small languages. They are concentrated in three distinct zones: (i) North East India; (ii) Central and East India comprising Madhya Pradesh, Bihar, Andhra Pradesh, Telangana, Jharkhand, Chhattisgarh, Orissa and West Bengal; and (iii) Southern India. All these tribes (henceforth referred to as indigenous communities) speak a variety of languages that are low-resourced, less-explored, under-described and less-documented. Most of these do not have any writing systems. In some of these communities, there are thousands of speakers remaining while others have only a handful of them, for instance, Ruga spoken in Meghalaya is left with only three fluent speakers.


Numerical strength cannot be the only criterion to determine whether a language is endangered or not because there are at least 49 languages in UNESCO’s list that have more than 100,000 speakers –15 of them endangered and 34 unsafe.5 Only in a few cases, the language loss could be due to a smaller number of speakers in that language which, of course, does matter because if we lose these speakers, their cultures, along with their languages, would be lost. Intergeneration transmission of language and culture does not seem to take place in many communities mainly because they do not live in their traditional lands any-more and further, the younger generation tend to assimilate into the dominant culture and language of the region.

Thus, the languages spoken by these smaller communities are disappearing from the globe at a faster rate than before in the history of mankind due to various other causes such as economic globalization, urbanization, language shift due to the dominant language(s) of the nation or region and several other sociocultural, political and linguistic reasons that prevail in India. This has a direct impact on the intangible cultural heritage and their identity. Endangerment of these languages leads to language loss which will eventually lead to an irreparable loss of indigenous wisdom and knowledge.

Despite globalization and the dominant role played by the major languages of the world, the advances in information and communication technology provide ample opportunities to make these languages visible in cyberspace. This, in turn, will certainly strengthen the efforts being made by government, non-governmental organizations (NGOs), academicians, language enthusiasts, community people, and many others to protect, preserve and promote these languages in all possible ways.


Documenting linguistic and cultural knowledge of the smaller languages of India is not a new thing. Christian missionaries began this task much earlier. Dictionaries and grammar sketches were published in many of the lesser-known languages of the country. Sir George Grierson, an Irish linguist and civil servant, in the Linguistic Survey of India (1903-28) documented 364 languages and dialects spoken in British India.6 Thus, the term ‘endangered’ might be relatively new but the concept has existed for a very long time.

The Government of India through its several language policies began the task of documentation of tribal and border languages through the Central Institute of Indian Languages (CIIL) since the 1970s and over the past five decades the institute has produced several bilingual and multilingual dictionaries, grammars, primers, pictorial glossaries and other materials for the development and promotion of the country’s tribal languages.


In addition to this, the then Ministry of Human Resource Development (MHRD), Government of India also started the Scheme for Protection and Preservation of Endangered Languages housed at the CIIL in 2013 with a goal of producing dictionaries, grammars and ethnolinguistic sketches. The MHRD also set up several Centres for Endangered Languages in many central universities through the University Grants Commission (UGC). The Ministry of Home Affairs (MHA) too through its Language Division in Kolkata began projects such as ‘Mother Tongue Survey of India’ and ‘New Linguistic Survey of India’ to document linguistic knowledge of several tribal languages of the nation. Besides this, several PhD theses, grammars and dictionaries are being produced on endangered, tribal and lesser-known languages by individual researchers, universities and other academicians across the globe.

Recently, with the help of digital tools and advances in data management systems, multifunctional documentation has been the norm in India regarding endangered languages. The deliverables of the above-mentioned projects and researches by individuals and educational institutions are mainly to: (i) To build lexical knowledge resources such as dictionaries, pictorial glossaries and encyclopedia of endangered languages intended for linguists, native speakers, and language learners; (ii) To build grammars for researchers and for community users; and (iii) To initiate capacity-building for safeguarding endangered languages of minority linguistic groups through literacy, lifelong learning and education and to share knowledge with like-minded organizations, agencies and institutions.

Due to a lack of technological and linguistic resources, the outcomes of most of these projects and research are not completely digitized and made easily available and accessible to anyone with an internet connection. The publications are mainly in the print media and only recently efforts have been made by many institutions to digitize the available books, dictionaries and other print materials, for instance, the Bharatavani7 Project of CIIL, Shodhganga8 project of UGC, among others.


Technologies are seen as one of the most effective tools for language maintenance, revitalization and development of a language. Use of technology was, in fact, considered as one of the factors for assessing vitality and endangerment of languages by UNESCO. The last two decades have seen the presence of many endangered languages in cyberspace all over the world. Still there seems to be a digital divide in that not all these languages get digital inclusion. In the digital world, many of the endangered languages of India are almost ‘extinct’. They do not find any visibility be it in social networking, internet banking, e-commerce, healthcare facilities or education. This will certainly deprive the younger generation of their language and identity who will be forced to shift to the dominant language and culture of the region that is technologically rich.

Low-resourced, severely and critically endangered languages face multifaceted challenges for embracing and using digital technologies. The scenario in India is very bleak. First, there is a lack of resource to make these languages ‘technology ready’. Little drops of water make a mighty ocean! It is essential for linguists, anthropologists, community speakers and software giants to team up and carry out digital data pooling – digitization of the existing recorded data lying all over the country in many institutions on and in these endangered languages. The use of artificial intelligence techniques speeds up many tasks such as transcription of recorded data. Such activities are popular in several countries, for instance, Australia has partnered with Google and uses open-source AI platforms like Tensor Flow that save millions and millions of human transcribing hours. Microsoft has come up with AI based translators such as Microsoft Translator Hub that does text-to-speech translations in run time. Such initiatives are yet to catch up in India.


The lack of orthography is another serious bottleneck. This is related to literacy. In order to enter cyberspace, the unwritten languages should first be reduced to ‘writing’. There is an urgent need to develop orthographies. But this is not an easy task either. Pappuswamy9 discusses the issues and challenges in devising effective orthographies for unwritten languages. Without orthography, developing resources for such languages have a low priority.

Most of the technologically rich languages have interactive dictionaries and corpora freely available on the world wide web. Such resources are slowly coming up for the endangered languages too. The impact of globalization causes these ‘small’ languages of India to disappear faster than linguists can build machine readable corpora and other lexical resources. Until orthographies are devised and put in use, their oral traditions, literatures and histories could be used to create interactive digital resources. Talking dictionaries and spoken corpora could be produced for use by children and adults in the community and loaded with additional linguistic and anthropological information for use in academia.


On the one hand, it may seem like digital and social media networks are major players in wiping out small languages. But on the other hand, technology is a super saviour for safeguarding these languages. Several communities that have access to mobile phones and other handheld or larger electronic gadgets try to exploit technology. Many mobile apps allow them to simply record audio and video messages in their mother tongues. Even speakers of languages that do not have a writing system, use roman script or script of the language of wider communication of their region to communicate with their families and other community members in their mother tongue. The script might not be theirs, but they are happy to use their language in the social media.

Yet another problem is that many in the community are technologically illiterate. Nevertheless, it is important to get the elders to share knowledge and wisdom of their language and culture and social life with the younger ones through video chats that could be automatically recorded by the corpora developers.

Another major issue is access to technology. There are several communities, for instance, Mnar in Jirang village of Ri Bhoi district of Meghalaya, who live without electricity or places which have poor internet connectivity or the ones that are geographically and socially isolated from mainland India (for instance, Andamanese and Nicobarese). Without basic amenities, the digital world is just a dream for many communities. Nonetheless, it is important to create digital space and digital resources for their future use.

Digitization does not mean simply developing digital tools. In order to promote the endangered and dormant languages, two perspectives should be considered: (i) to ensure sharing of knowledge and resources across the community for language maintenance and cultural sustainability, and (ii) to ensure information is authentic, all stakeholders including academicians and community members, should work together to eliminate social and power hierarchies. In order to do this effectively, significant planning needs to be done at different levels.


Language technology planning, family language policy and community-based language planning are essential. Language technology planning will open doors for language use in text editors, online dictionaries and cell phones for languages spoken by the indigenous communities. Virtual keyboards, lexical and corpora-based easy-to-use digitals tools should be developed in as many low-resourced languages as possible with the help of the community so that indigenous communities are not deprived of their linguistic rights to use technology like everyone else in the world. MOOCs and other online teaching materials could be created for these languages in a bi/multilingual environment.

The Government of India’s Ek Bharat Shreshtha Bharat initiative announced in the National Education Policy 2020 could be implemented by developing both print and digital materials on the tribal or indigenous languages of India that could be utilized in the fun project/activity on ‘The Languages of India’ in Grades 6-8. Short films and documentaries could also be produced on the culture and society of the indigenous communities and made available as open-source digital resources so that there is considerable amount of visibility in cyberspace.


Taking insights from two distinct areas of research, child language acquisition and language policy, a new framework called Family Language Policy (FLP) evolved almost a decade ago that considers the family as an integral site of language policy.10 Family language policy will certainly foster language conservation and revitalization. Language marks one’s identity. Thus, it is important to be an active user of one’s own native tongue while communicating with kith and kin. Technology can certainly add to this activity.

Mobile apps for day-to-day communication in social media such as Email, WhatsApp, Facebook and Twitter, will not only be the means to get people to use their language but also serve as a dynamic corpus of their languages in cyberspace. Rather than relying on slow or no internet connectivity to share media with fellow community members, Bluetooth and other local wireless networks could be used on mobile phones, tablets, iPads and other portable devices. This would ensure easy distribution of the family and community related ceremonies, festivals and other ‘sharable’ rituals within the family network, and thereby add to the conservation of their languages and cultures.

Community based language planning11 is generally motivated by the needs and desires of the local communities, oftentimes shaped by the available resources and opportunities. This type of planning when equipped with technology will empower these low-resourced languages in several ways. Communities have used several devices since the early days, the most common ones being community radio stations which are perhaps the most modest, cost-effective and popular technology devices that reach out to thousands of people. These are still popular among many communities in India, for instance, Radio Kotagiri 90.4 MHz among the Nilgiri indigenous communities of Tamil Nadu to listen and talk in their own language. This station broadcasts in several local languages such as Toda, Jenu Kurumba, Betta Kurumba, Kota, Baduga, Soliga, Paniya and Kattunaickan.


There are several other resources such as YouTube channels, Twitter and Google groups, available for languages like Atong, Khasi, Betta Kurumba among others. These are created primarily by individuals and access to the public may be restricted. It is crucial for the community to build unique websites for use by members of the linguistic families and subfamilies of the region. Youngsters from the respective communities could be trained to facilitate the upload of multimedia information about the linguistic, cultural and other ethnic knowledge of their societies. Public recordings can be played by anyone, anywhere at any time but access to private recordings could be made available only to certain authorized community members thus maintaining secrecy of the esoteric indigenous knowledge within the community.


Besides this, the community should also take interest in developing resources both in print and digital modes for the benefit of their younger generation. For instance, they can contribute effectively through crowd sourcing technology to the development of talking dictionaries of cognate indigenous languages with the goal of promoting multilingualism on the one hand and providing space for development of their native languages or mother tongues on the other hand. This could be achieved with the help of fluent speakers and all the digital materials developed could be used in families and communities to transmit their language across generations.

As Strubell notes, ‘the way people bring up their families – including the language they choose – is not for the authorities to decide.’12 Family is the place where language choices are made. Language maintenance depends on the choices made in the families. Thus, it is in one sense the responsibility of the elders to transmit their languages to the next generation.

In this digital era, the more the younger generation become actively engaged online the greater is the threat to their worldviews getting altered as they would subconsciously be absorbed in the western outlook of life. This could marginalize indigenous populations around the world by depriving them of their culture and identity.13 In order to prevent this from happening, it is essential to develop digital tools in the native tongues of the indigenous communities and focus on capacity building in this area by seeking support from the families and communities who are the keepers of the languages.


To sum up, technology can be used to safeguard, revitalize and promote endangered languages. Despite the several issues and challenges, it is imperative to find concrete and multifaceted measures and means to document, digitize and disseminate indigenous knowledge using technology before it is too late. The existing language policies of the nation should be revised to accommodate language technology planning and support family language policies and community based language planning. The decade 2022-2032 has been declared the Decade of Indigenous Languages by UNESCO. Efforts should be made by governments, NGOs, communities, researchers, language enthusiasts, software developers and other computer professionals to come together and undertake the huge responsibility of sharing the digital space with the indigenous communities to showcase their languages and cultures.

This would undoubtedly help accelerate the technological empowerment of these languages by 2032 which in turn will empower the communities as well, thereby eliminating social, political and economic inequalities existing in the modern world. It would not just reverse the language shift among indigenous populations but also help in sharing their invaluable indigenous knowledge with the rest of the world.



1. UNESCO Ad Hoc Expert Group on Endangered Languages, Language Vitality and Endangerment, UNESCO, 2003.

2. Christopher Moseley (ed.), Atlas of the World’s Languages in Danger (3rd edn.). UNESCO Publishing, Paris, 2010. Retrieved 9 September 2018 from http://www.unesco.org/culture/en/endangeredlanguages/atlas

3. Census of India 2011, Office of the Registrar General, Ministry of Home Affairs, GoI. Retrieved 9 September 2019 from http://www.censusindia.gov.in

4. The Government of India has recognized some of the tribes as notified tribes and the languages of these indigenous communities are referred to as ‘tribal languages’ in the Constitution (Article 342).

5. Umarani Pappuswamy, ‘Language Endangerment, Ethnic Minorities and Identity Crisis in the Indian Subcontinent’, Science and Culture (‘Ciência e Cultura’) 71(4), 2019, pp. 29-38. São Paulo out./dez. 2019, versão On-line ISSN 2317-6660.

6. George Grierson, Linguistic Survey of India (compiled and edited). Office of the Superintendent of Government Printing, Calcutta, 1903-28.

7. See https://bharatavani.in/ for details.

8. https://shodhganga.inflibnet.ac.in/ is a huge digital repository of Indian theses.

9. Umarani Pappuswamy, ‘Issues and Challenges in the Search of Effective Orthography for Unwritten Languages of North-east India’ in Anvita Abbi (ed.), Unwritten Languages of India. Sahitya Akademi, Delhi, 2017, pp. 117-159.

10. K.A. King, L. Fogle and A. Logan-Terry, ‘Family Language Policy’, Language and Linguistics Compass 2(5), September 2008, pp. 907-922.

11. M. Paul Lewis and Gary Simons, Sustaining Language Use: Perspectives on Community-based Language Development. SIL International, Dallas, 2016.

12. M. Strubell, ‘Catalan a Decade Later’ in J.A. Fishman (ed.), Can Threatened Languages Be Saved? Reversing Language Shift, Revisited: A 21st Century Perspective. Multilingual Matters, Clevedon, UK, 2001, p. 268.

13. Bushra Ebadi, ‘Technology Alone Can’t Preserve Endangered Languages’, 2018. Retrieved 5 May 2021 from https://www.cigionline.org/articles/technology-alone-cant-preserve-endangered-languages