First International Forum on Language Data Science and Applications

Date for Forum: 11-13 November of 2022, Shanghai Time, China

Online conference


Organizing Committee

  • Organizing Chair
    HU Kaibao, Shanghai International Studies University, China

  • Co-Chairs
    Yukio Tono, Tokyo University of Foreign Studies, Japan
    CUI Feng, Nanyang Technological University, Singapore

Program Committee

David Machin, Shanghai International Studies University, China

HAN Ziman, Shanghai International Studies University, China

HONG Huaqing, Shanghai International Studies University, China

LEI Lei, Shanghai International Studies University, China

Local Organizing Committee

  • Co-Chairs
    GENG Qiang, Shanghai International Studies University, China
    Muhammad Afzaal, Shanghai International Studies University, China

  • Members
    LIU Huan, Shanghai International Studies University, China
    ZHANG Kai, Shanghai International Studies University, China
    ZHOU Zhifei, Shanghai International Studies University, China
    WANG Yao, Shanghai International Studies University, China

Keynote & Invited Speakers

Yukio Tono

Title: What contribution the CEFR and the CEFR-J can make to develop a comprehensive learning system for ELT in Japan


In this talk, I will first report on the previous work in the CEFR-J project, showing how corpora have been used for identifying criterial language features for characterizing given CEFR levels. One of the aims of the CEFR-J project is to conduct so-called “Reference Level Descriptions (RLDs),” whose purpose was to identify lexical, grammatical and textual features representing each of the CEFR-J levels. We aimed to develop a valid method of profiling CEFR levels using both coursebook corpora as input and learner corpora as output. I will show major pedagogical linguistic resources we have developed to support the CEFR-J implementation and its implications for designing a comprehensive learning system for ELT using the CEFR and the CEFR-J.


Yukio Tono (PhD, Lancaster University) is a professor in corpus linguistics at Tokyo University of Foreign Studies, JAPAN. His research interests include learner corpus research, statistical modelling of L2 acquisition processes, corpus applications for foreign language learning, and the integration of corpus approaches with CEFR-based research. He is an editorial board member of several international journals (Applied Corpus Linguistics, Corpora, International Journal of Lexicography, English Teaching & Learning) and book series (Studies in Corpus Linguistics). His publications include Corpus-based Language Studies (with Tony McEnery and Richard Xiao, 2006, Routledge), Developmental and Crosslinguistic Perspectives in Learner Corpus Research(with Kawaguchi, Y. & Minegishi, M. (eds.), 2012, John Benjamin), and A Frequency Dictionary of Japanese (with Yamazaki, M. & Maekawa, K. , 2006, Routledge). 

Gwen Bouvier

Title: Corpus linguistics studies and critical discourse analysis (CDA)


Chinese scholarship has seen a growing interest in combining corpus linguistics studies with critical discourse analysis (CDA). There is indeed great potential to extend the strengths of quantitative corpus linguistics with a more qualitative approach like CDA.  This may be important also for bringing research more into line with the expectations of a greater range of leading international journals.

In this presentation, I look at the way this could be executed as a two layered project. 

We start by looking at some of the reasons why corpus research may be seen as less attractive to international journals associated with discourse studies.  In such cases this tends to be because research projects begin with data collection as an objective in itself, rather than this being driven by a clear need to answer a research question based in the literature and in society. ‘Findings’ tend to therefore take the form of reporting on patterns found in the corpus, rather than clearly accounting for how a gap in the literature is being filled.

We then consider how this can be rectified. How can we best use corpus analysis as part of doing discourse analysis that might better align with the needs of international journals? I will share some examples from my own work that applies CDA to social media data. The examples I present come from two cases, both involving hashtags on the Chinese social media platform Weibo. In the first, it is where mothers seek out and share information and guidance on parenting. In the second, women exchange knowledge about health and fitness, sharing workouts and expressing ideas about their lives and identity.

I talk through the processes I undertook in the research process in the steps to producing the published paper. We will consider how corpus analysis can be best used as part of this process, combined with CDA.

Bio of the Speaker:

Gwen Bouvier (PhD University of Wales) is a Distinguished Professor at Shanghai International Studies University, Institute of Corpus Studies and Applications.  Her main research interests are digital communication and civic debate on social media.  Professor Bouvier's publications have drawn on critical discourse analysis, multimodality based on social semiotics, and online ethnography.  She is the Associate Editor for Social Semiotics.  Her latest publications include the book Qualitative Research Using Social Media, Routledge 2022 and the articles ‘Where Neoliberalism shapes Confucian notions of child rearing: influencers, experts and discourses of intensive parenting on Chinese Weibo’ in Discourse, Context and Media 2022 (Winner of Editor’s Choice Award) and ‘What gets lost in Twitter ‘cancel culture’ hashtags? Calling out racists reveals some limitations of social justice campaigns’, Discourse & Society 2021.

CUI Feng (Nanyang Technological University, Singapore)

Language, Translation, and Society: Observations and Reflections on Translating in Singapore


The Bilingual Policy has been a cornerstone in the Singapore education system since the 1980s, where students are required to learn both the English language and a Mother Tongue Language based on their ethnicity. Yet, mistranslations are still commonplace in the bilingual society. This talk aims to reflect on the bilingual policy and the current state of translation in Singapore through exploring various case studies, the Citizen Translators’ Project, and the recently developed SG Translate Machine Translation (MT) engine.

Dr. CUI Feng is a Senior Lecturer and a Ph.D. Supervisor in the Chinese Department at Nanyang Technological University (NTU), Singapore. He is currently serving as the Deputy Director of the Master of Arts in Translation and Interpretation (MTI) program at NTU. Dr. Cui is also an Honorary Professor of School of Languages and Communication Studies at Beijing Jiaotong University and an Honorary Research Associate of Research Centre for Translation at the Chinese University of Hong Kong and. His research focuses on translation history in China, translation theories, 20th-century Chinese literature, and comparative literature. Dr. Cui has published more than 40 journal papers and book chapters, including papers in SSCI, A&HCI, CSSCI, and THCI journals. His two monographs, Translation, Literature, and Politics: Using World Literature as an Example (1953-1966) was published by Nanjing University Press in 2019, and The Brief History of Translation Thought in China was published by Nankai University Press in 2021. Han Suyin: Literature, Politics and Translation (The Special Issue of the Journal of Postcolonial Writing), an A&HCI journal he edited, was published by Routledge Publisher in 2021. Medio-translatology: Concepts and Applications, a book he edited, was published by Springer in 2022.