许洪志 上海外国语大学语料库研究院助理研究员
邮箱: hxu@shisu.edu.cn
教育背景
2011 – 2015 语言学博士, 香港理工大学中文与双语学系
2005 – 2008 软件工程硕士,清华大学软件学院
2000 – 2004 计算机科学学士, 成都理工大学计算机科学与技术系
工作经历
2019.10-至今 助理研究员,上海外国语大学语料库研究院
2015–2019 博士后研究员 (计算语言形态学),宾夕法尼亚大学计算机系
2014–2015 科研助理, 香港理工大学中文与双语学系
2008–2011 助理研究员 (信息抽取,情感分析),NEC 中国研究院
项目
2015-2019LORELEI(LOw REsource Languages and Emergent Incidents:低资源自然语言处理以及紧急情况检测)
主要项目以及资源
2020汉语多词语法成份标注语料据。2020年汉语多词语法成份识别任务(PARSEME VMWE Shared Tasks)测试数据。
2020无监督语言形态分析系统 V2.0 (Xu, et al., 2020). 下载地址: https://github.com/xuhongzhi/ParaMA2.
2015-2018无监督语言形态分析系统 (Xu, et al., 2018). 下载地址: https://github.com/xuhongzhi/ParaMA.
学术任职
国际会议以及期刊程序委员会: COLING 2020; PACLIC 2019; ACL Workshop on Designing Meaning Representations (DMR) 2019; Chinese Lexical Semantics Workshop (CLSW) 2015 - 2018, IALP (International Association of Asian Language Processing) 2016 - 2018, Reviewer of the journal Lingua Sinica 2014 & 2016.
发表论文
专著
Hongzhi Xu. (To Appear). Chinese Aspectual System: Theory and Computation. Springer.
数据资源
Karl Neergaard, Hongzhi Xu, Chu-Ren Huang. Database of Word Level Statistics - Mandarin LDC2020L01. Philadelphia: Linguistic Data Consortium, 2020. ISBN: 1-58563-914-1.
期刊论文
Karl David Neergaard, Hongzhi Xu, James Sneed German, Chu-Ren Huang. Database of Word-Level Statistics for Mandarin Chinese. Behavior Research Methods. (Final Review). SSCI
Hongzhi Xu, Menghan Jiang, Jingxia Lin, Chu-Ren Huang. 2020. Light Verb Variations and Varieties of Mandarin Chinese: Comparable Corpus Driven Approaches to Grammatical Variations. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1515/cllt-2019-0049. SSCI
Hongzhi Xu. 2019. The Experiential Aspect of Mandarin Chinese (-guo): Semantics and Pragmatics. Lingua, https://doi.org/10.1016/j.lingua.2019.102714. SSCI
Hongzhi Xu, Chunping Li, Li Li, and Hongyu Shi. 2015. Accelerating the Training Process of Support Vector Machines by Random Partition. International Journal of Computer Theory and Engineering. 7(1): 29-33. EI
学术会议论文
Hongzhi Xu, Jordan Kodner, Mitch Marcus, Charles Yang. 2020. Unsupervised Learning of Language Morphology by Exploring Language Typology(基于语言类型学的无监督语言形态分析系统). In Proceedings of the 58th Annual Conference of Association of Computational Linguistics (ACL).
Justin Mott, Ann Bies, Stephanie Strassel, Jordan Kodner, Caitlin Richter, Hongzhi Xu, Mitch Marcus. 2020. Morphological Segmentation for Low Resource Languages. International Conference on Language Resource and Evaluation (LREC). Spain.
Hongzhi Xu, Mitchell Marcus, Charles Yang, and Lyle Ungar. 2018. Unsupervised morphology learning with statistical paradigms (基于词形变化表的无监督语言形态分析系统). In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018), pages 44–54. Santa Fe, New Mexico, USA. (领域主席推荐论文奖)
Karl Neergaard, Hongzhi Xu, and Chu-Ren Huang. 2016. Database of Mandarin Neighborhood Statistics. (汉语语音形似度以及统计数据库)In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). pp. 23–28. Portorož, Slovenia.
Hongzhi Xu, Enrico Santus, Anna Laszlo and Chu-Ren Huang. 2015. LLT-PolyU: Identifying Sentiment Intensity in Ironic Tweets(微博反讽以及情感信息的识别). SemEval 2015 Task 11: Sentiment Analysis of Figurative Language in Twitter, collocated with North American Chapter of Association of Computational Linguistics (NAACL 2015). Denver, Colorado, U.S.A.
Qingqing Zhao, Chu-Ren Huang, and Hongzhi Xu. 2015. Auditory Synaesthesia and Near Synonyms: A Corpus-Based Analysis of sheng1 and yin1 in Mandarin Chinese. (基于语料库的汉语‘声’与‘音’的区别研究) In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 2015). pp. 315–322. Shanghai, China.
Piyoros Tungthamthiti, Enrico Santus, Hongzhi Xu, Chu-Ren Huang and Kiyoaki Shirai. 2015. Sentiment Analyzer with Rich Features for Ironic and Sarcastic Tweets(反讽微博信息的识别). In Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation (PACLIC 2015). pp. 178–187. Shanghai, China.
Hongzhi Xu, Dingxu Shi and Chu-Ren Huang. 2015. A New Categorization Framework for Chinese Adverbs (汉语副词分类系统研究). The 16th Chinese Lexical Semantic Workshop (CLSW 2015), LNAI. Beijing, China.
Hongzhi Xu and Chu-Ren Huang. 2014. Annotate and Identify Modalities, Speech Acts and Finer-Grained Event Types in Chinese Text (汉语情态、言语行为以及事件类型的标注与识别). COLING Workshop on Lexical and Grammatical Resources for Language Processing. Dublin, Ireland.
Jingxia Lin, Hongzhi Xu, Menghan Jiang and Chu-Ren Huang. 2014. Annotation and Classification of Light Verbs and Light Verb Variations in Mandarin Chinese(汉语轻动词语言变体的语料库标注以及自动识别). COLING Workshop on Lexical and Grammatical Resources for Language Processing. Dublin, Ireland.
Chu-Ren Huang, Jingxia Lin, Menghan Jiang and Hongzhi Xu. 2014. Corpus-based Study and Identification of Mandarin Chinese Light Verb Variations (基于语料库的汉语轻动词语言变体的研究). COLING Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects. Dublin, Ireland.
Hongzhi Xu and Chu-Ren Huang. 2013. A Rule System for Chinese Time Entity Recognition by Comprehensive Linguistic Study(基于规则的汉语时间名词以及短语的自动识别). The 6th International Joint Conference on Natural Language Processing (IJCNLP 2013). Nagoya, Japan.
Hongzhi Xu and Chu-Ren Huang. 2013. Primitives of Events and the Semantic Representation(事件的细分结构以及语义表示). The 6th International Conference on Generative Approaches to the Lexicon (GL 2013). Pisa, Italy.
Hongzhi Xu, Helen Kaiyun Chen, Chu-Ren Huang, Qin Lu, Tin-Shing Chiu, Dingxu Shi. 2012. A Grammar-informed Corpus-based Sentence Database for Linguistic and Computational Studies (基于汉语语法点的语料库标注系统). In Proceedings of International Conference on Language Resources and Evaluation (LREC 2012). Istanbul, Turkey.
Shan Wang, Chu-Ren Huang and Hongzhi Xu. 2012. Compositionality of NN Compounds: A Case Study on [N1+Artifactual-Type Event Nouns] (基于生成词库理论的汉语NN复合词的研究). In 26th Pacific Asia Conference on Language, Information and Computation (PACLIC 2012). pages 70–79. Bali, Indonesia.
Jingxia Lin, Chu-Ren Huang, Huarui Zhang and Hongzhi Xu. 2012. The Headedness of Mandarin Chinese Serial Verb Constructions: A Corpus-Based Study (汉语序列动词结构的中心词讨论). In 26th Pacific Asia Conference on Language, Information and Computation (PACLIC 2012). Bali, Indonesia. (最佳论文奖)
Hongzhi Xu, Kai Zhao, Likun Qiu and Changjian Hu. 2010. Expanding Chinese Sentiment Dictionaries from Large Scale Unlabeled Corpus (基于大规模为标注语料库的汉语情感词典的扩充方法). In Proceedings of the 24rd Pacific Asia Conference on Language, Information and Computation (PACLIC 2010). Sendai, Japan.
Hongzhi Xu, Changjian Hu and Guoyang Shen. 2009. Discovery of Dependency Tree Patterns for Relation Extraction (基于汉语句子句法结构的关系抽取系统). In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC 2009). Hong Kong, China.
Hongzhi Xu and Chunping Li. 2008. Combining context features by Canonical Belief Network for Chinese Part-of-Speech Tagging (基于Bayes信念网的多特征融合的汉语词性标注系统). The Third International Joint Conference on Natural Language Processing (IJCNLP 2008). Hyderabad, India.
Hongzhi Xu and Chunping Li. 2007. A Novel Term Weighting Scheme for Automated Text Categorization (一个基于分类统计的特征加权方法). In Proceedings of the 7th International Conference of Intelligent Systems Design and Applications (ISDA 2007). Rio de Janeiro, Brazil.
会议报告
Ying Liu, Hongzhi Xu. 2019. Clefts in Mandarin: How exhaustive are they? A large-scale corpus and experimental study of shi…(de) sentences. In 12th International Workshop on Theoretical East Asian Linguistics (TEAL-12)
Hongzhi Xu, Jordan Kodner. 2019. Unsupervised Learning of Language Morphology by Exploring Language Typology (基于语言类型学的无监督语言形态分析系统). Presented in the 4th American International Morphology Meeting (AIMM 2019). Stony Brook, New York, USA.
Hongzhi Xu. 2017. Unsupervised Morphology Learning with Statistical Paradigms(基于词形变化表的无监督语言形态分析系统). CLUNCH at Penn.
Francesca Strik Lievers, Hongzhi Xu and Ge Xu. 2013. A Methodology for the Extraction of Lexicalized Synaesthesia from Corpora (语言中通感表达的自动抽取与发现). Presented in the 19th conference of International Congress of Linguistics (ICL 2013). Geneva.
Hongzhi Xu and Chu-Ren Huang. 2013. The Generative Lexicon for Chinese Lexical Semantics: A Case Study on chī (eat) (基于生成词库理论的汉语动词‘吃’的宾语分析). Annual Conference of the International Association of Chinese Linguistics (IACL 2013). Taipei, Taiwan.
Hongzhi Xu and Shan Wang. 2012. Chinese Relative Clause: Descriptive or Restrictive (汉语描述性与限制性关系子句的讨论). Annual Conference of the International Association of Chinese Linguistics (IACL 2012). Hong Kong, China.