Introduction to Newly Established Discipline of Language Data Science and Applications

The Institute of Corpus Studies and Applications of Shanghai International Studies University has recently established a new discipline of Language Data Science and Applications in 2020 to recruit master's and doctoral students. Responding to the New Liberal Arts Development Initiative by the Ministry of Education in China and embracing interdisciplinarity, the new discipline aims to educate talents in the field of language intelligence.

The discipline of Language Data Science and Applications involves multiple disciplines like Information Science, Statistics, Linguistics and Translation. It aims to study various types, states, attributes of language data, so as to reveal the laws behind human language and language behavior and explore the applications of language data in the fields of intelligence education and artificial intelligence. Based on the applications of corpus and database, this discipline carries out language-data-driven researches in language, translation, intelligence education and other related fields of artificial intelligence, so as to realize the organic combination of data science and research in the fields of linguistics, translatology, intelligence education and language intelligence, and reveal and explain the essence of language and translation and promote the applications of language data in the fields of intelligence education and language intelligence. The main research directions are language data and language research, language data and translation research, language data and intelligence education, and language data and artificial intelligence.

Language data and language studies:

On the basis of quantitative research on language, this direction combines multivariate statistics and visualization methods to study semantics, morphology, phonetics, lexicology, syntax and discourse analysis, and formally describe language laws. The specific research fields are corpus linguistics, statistical linguistics, econometric linguistics, and computational linguistics.

Language data and translation studies:

This direction mainly focuses on corpus-based translation studies, digital humanities and translation studies, as well as the construction of language database or corpus. 

Language data and intelligence education:

This direction combines the mining and analysis technology of big educational data to explore the research and applications of language data in AI enabled education, support scientific decision-making and implement intelligent education.

Language data and AI:

This direction focuses on language intelligence, machine translation, deep learning and other fields. It is based on massive corpus data, uses the information processing mechanism of artificial intelligence, and promotes the industry-and-research cooperation of language intelligence research through the organic combination of linguistics and artificial intelligenc