LIVAC Synchronous Corpus
LIVAC wani harshe ne da ba a saba gani ba wanda ake kiyaye shi sosai tun 1995. Bambanta da sauran kamfanoni masu zaman kansu, LIVAC ta rungumi tsarin "Windows" mai tsauri kuma na yau da kullun wajen sarrafawa da tace manyan rubuce-rubucen kafofin watsa labarai daga wakilan jama'ar Sinanci kamar Beijing, Hong Kong, Macau, Taipei, Singapore, Shanghai, da Guangzhou, da kuma Shenzhen . [1] Abubuwan da ke ciki suna maimaituwa da gangan a mafi yawan lokuta, wakilta ta samfuran rubutu da aka zana daga editoci, labarai na gida da na duniya, labaran giciye- Mashigin tekun Taiwan, da labarai kan kudi, wasanni da nishaɗi. [2] Ya zuwa shekarar 2023, an tace sama da haruffa biliyan 3 na rubutun labarai, wanda aka sarrafa da kuma tantance haruffa miliyan 700 kuma sun samar da fadada ƙamus na Pan-China na kalmomi miliyan 2.5 daga kafofin watsa labaru na Pan-Chinese. Ta hanyar tsattsauran bincike bisa tsarin ilimin lissafi, LIVAC a lokaci guda ta tattara cikakkun bayanai masu ma'ana masu ma'ana game da yaren Sinanci da kuma al'ummominsu na magana daban-daban a cikin mahallin Pan-Sin, kuma sakamakon ya nuna tsayi mai tsayi da muhimmanci. tsaye da kuma sauye-sauye masu tasowa. [3] [4]
LIVAC Synchronous Corpus | |
---|---|
software | |
Bayanai | |
Farawa | ga Yuli, 1995 |
Amfani | text corpus (en) |
Ranar wallafa | ga Yuli, 1995 |
Operating system (en) | cross-platform (en) |
Shafin yanar gizo | livac.org |
Hanyar "Windows" ita ce mafi kyawun fasalin LIVAC kuma ya ba da damar nazarin rubutun kafofin watsa labaru na Pan-Chin don ƙididdige su bisa ga halaye daban-daban kamar wurare, lokaci da yanki . Don haka, nau'ikan nazarin kwatance daban-daban da aikace-aikace a cikin fasahar bayanai gami da haɓaka sabbin aikace-aikacen sabbin abubuwa sun kasance mai yiwuwa. [5] [6] Bugu da ƙari, LIVAC ta ba da damar yin la'akari da ci gaban tsayin daka, sauƙaƙe bincike mai mahimmanci a cikin Mahimmanci (KWIC) da kuma cikakken nazarin kalmomin da aka yi niyya da abubuwan da ke cikin su da kuma tsarin harshe a cikin shekaru 25 da suka gabata, dangane da abubuwan da aka ambata a sama. wuri, lokaci da batun . Sakamako daga ɗimbin bayanan tattara bayanai da ke ƙunshe a cikin LIVAC sun ba da damar noman bayanai na rubutu na sunaye masu kyau, sunayen wuri, sunayen ƙungiyoyi, sabbin kalmomi, da jerin sunayen mako-mako da na shekara-shekara na ƙididdigar kafofin watsa labarai. Aikace-aikacen da ke da alaƙa sun haɗa da kafa bayanan fi'ili da bayanan sifa, ƙirƙira fihirisar jin daɗi, da ma'adinan ra'ayi masu alaƙa, don aunawa da kwatanta shaharar da manyan kafofin watsa labaru na duniya ke da shi a cikin kafofin watsa labaru na kasar Sin (LIVAC Annual Pan-Chinese Celebrity Rosters, daga baya aka sake masa suna a matsayin Pan-Chinese Newsmaker Rosters), [7] [8] [9] da kuma hada sabbin bayanan bayanan kalmomi (LIVAC Annual Pan-Chinese New Word Rosters). [10] [11] [12] A kan wannan, ana yin nazarin fitowar, yaduwa da canza sabbin kalmomi, da buga kamus na neologisms . [13] [14]
An mayar da hankali a kwanan nan kan ma'auni tsakanin kalmomin dissyllabic da girma kalmomin trisyllabic a cikin yaren Sinanci, da nazarin kwatancen fi'ili masu haske a cikin al'ummomin Sinawa guda uku. da kuma alakar amfani da harshe da kuma amfani da harshe a matsayin abin da ke nuni da sauyin zamanin da aka yi a kasar Sin. An ƙaddamar da sabon nau'in LIVAC 3.1 a cikin Fabrairu 2024.,..
sarrafa bayanan Corpus
gyara sashe- Accessing media texts, manual input, etc.
- Text unification including conversion from simplified to traditional Chinese characters, stored as Big5 and Unicode versions
- Automatic word segmentation
- Automatic alignment of parallel texts
- Manual verification, part-of-speech tagging
- Extraction of words and addition to regional sub-corpora
- Combination of regional sub-corpora to update the LIVAC corpus, and master lexical database
Lakabi don sarrafa bayanai
gyara sashe- Rukunin da aka yi amfani da su sun haɗa da gabaɗaya sharuɗɗa da sunaye masu dacewa, kamar: gaba ɗaya sunaye, sunayen sunaye, ƙananan lakabi; yanki, kungiyoyi da ƙungiyoyin kasuwanci, da dai sauransu; lokaci, prepositions, wurare, da dai sauransu; tari-kalmomi; kalmomin lamuni; harka-kalma; lambobi, da dai sauransu.
- Gina rumbun adana bayanai na sunayen da suka dace, sunayen wuri, da takamaiman sharuɗɗan, da sauransu.
- Ƙirƙirar rosters: "sabbin rubutun kalmomi", "shahararru ko halayen kafofin watsa labaru", "maganin sunan wuri", kalmomi masu haɗaka da kalmomin da suka dace
- Sauran sassan magana tagging ga sub-database, kamar gama-gari sunaye, lambobi, ƙididdiga na lamba, nau'ikan fi'ili daban-daban, da na siffa, karin magana, lallausan gabaɗaya, haɗin kai, barbashi alamar yanayi, onomatopoeia, interjection, da sauransu.
Aikace-aikace
gyara sashe- Haɗa ƙamus na Pan- Sinanci ko ƙamus na gida
- Binciken fasahar sadarwa, kamar shigar da rubutun Sinanci mai tsinkaya don wayoyin hannu, magana ta atomatik zuwa canza rubutu, ma'adinan ra'ayi
- Nazarin kwatancen kan ci gaban harshe da al'adu a yankunan Pan-China, musamman a wani muhimmin lokaci na tarihi a kasar Sin ta zamani.
- Koyarwar harshe da bincike koyo, da jujjuya magana zuwa rubutu
- Sabis na musamman akan bincike na harshe da binciken ƙamus na ƙungiyoyin ƙasa da ƙasa da hukumomin gwamnati
</br>Ana samar da aikace-aikacen da ke sama ta ayyuka masu zuwa:
- Binciken Rabe-raben Kalma
- Binciken Kalma
- Misali Zaɓin Jumla
- Kwatanta kalmomi da yawa
- Kalmar Cloud
Duba kuma
gyara sashe- British National Corpus
- Oxford English Corpus
- Corpus of Contemporary American English (COCA)
- 語料庫
Manazarta
gyara sashe- ↑ Tsou, Benjamin; Lai, Tom; Chan, Samuel; and Wang, William S.-Y. (Eds). (1998). Quantitative and Computational Studies on the Chinese Language 《漢語計量與計算研究》. Language Information Sciences Research Centre, City University Press.
- ↑ Tsou, B. K., Kwong, O.Y. (Eds). (2015). Linguistic Corpus and Corpus Linguistics in the Chinese Context (Journal of Chinese Linguistics Monograph Series Number 25), Hong Kong: Chinese University Press.
- ↑ Tsou, Benjamin. (2004). "Chinese Language Processing at the Dawn of the 21st Century", in C R Huang and W Lenders (eds) Language and Linguistics Monograph Series B: Frontiers in Linguistics I, pp.189–207. Institute of Linguistics, Academia Sinica.
- ↑ Tsou, B. K. (2017). Loanwords in Mandarin Through Other Chinese Dialects. In R. Sybesma, W. Behr, Y. Gu, Z. Handel, C.-T. Huang & J. Myers (Eds.), The Encyclopaedia of Chinese Language and Linguistics (Vol. 2, pp. 641-647). Leiden; Boston: BRILL
- ↑ Tsou, Benjamin, and Kwong, Olivia. (2015). LIVAC as a Monitoring Corpus for Tracking Trends beyond Linguistics. In Tsou, Benjamin, and Kwong, Olivia., (eds.), Linguistic Corpus and Corpus Linguistics in the Chinese Context (Journal of Chinese Linguistics Monograph Series No.25). Hong Kong: The Chinese University Press, pp. 447-471.
- ↑ Tsou, Benjamin. (2016). Skipantism Revisited: Along with Neologisms and Terminological Truncation. In Chin, Chi-on Andy and Kwok, Bit-chee and Tsou, Benjamin K., (eds.), Commemorative Essays for Professor Yuen-Ren Chao: Father of Modern Chinese Linguistics. Taiwan: Crane Publishing. pp. 343-357.
- ↑ CityU releases 2015 LIVAC Pan-Chinese Media Personality Roster, City University of Hong Kong, Hong Kong, 28 December 2015.
- ↑ CityU releases 2016 LIVAC Pan-Chinese Media Personality Roster Archived 2017-07-15 at the Wayback Machine, City University of Hong Kong, Hong Kong, 02 January 2017.
- ↑ CityU releases 2019 LIVAC Pan-Chinese Media Personality Roster, City University of Hong Kong, Hong Kong, 07 January 2019.
- ↑ CityU releases 2014 Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 12 February 2015.
- ↑ CityU releases 2015 LIVAC Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 04 February 2016.
- ↑ CityU releases 2019 LIVAC Pan-Chinese New Word Rosters, City University of Hong Kong, Hong Kong, 09 January 2019.
- ↑ 鄒嘉彥、游汝杰(編)(2007),《21世紀華語新詞語詞典》(簡體字版),上海,復旦大學出版社。
- ↑ 鄒嘉彥、游汝杰(編)(2010),《全球華語新詞語詞典》,北京,商務印書館。