ASR Assessment Used in Modern Chinese Language Classroom
Author Note
Abstract
This action research study aims to help meet the developmental needs of Chinese language learners. Recently, there has been an increase in the research of student engagement, active instruction, and the use of technology in the classroom. However, research on the use of Automatic Speech Recognition (ASR) for Chinese language learning remains scarce. The investigation in this paper reveals both the advantages and the limitations of using Artificial Intelligence (AI)-based ASR in spoken language assessment. The findings indicate that the use of ASR does benefit the outcome of students’ Chinese language learning. ASR also adds additional assessment methods for teachers’ practice of multiple modalities, so that they can further improve student engagement and better address different learning styles. Our research used the data collected by transcribing text from students’ speech to support the claim that ASR increases the frequency of teacher-student interaction, while also enabling teachers to identify student errors and provide constructive feedback more promptly. This suggests that using ASR as an official summative assessment method in the Chinese language classroom is practical and effective.
Key words: Automatic Speech Recognition (ASR), Chinese language education, ASR, Multiple modalities, Assessment.
Instreduction
The purpose of this paper is to explore the possibility of assessing students’ oral Chinese ability through analyzing the output text produced by Automatic Speech Recognition (ASR) technology. In our research, ASR technology is used as an assessment instrument for Chinese language learners, and it shows substantial potential for the evaluation of Chinese pronunciation, including accuracy with the four tones in speech. It even provides learners with the opportunity to do self-corrections. The latest generation of ASR claims to be able to process more authentic dialogue for man-machine interaction than ever before. Traditionally, spoken test performances were analyzed using a range of measures including grammatical accuracy, complexity, vocabulary, pronunciation, fluency (Iwashita, 2008).. Currently, ASR applications can be used to test speech accuracy and are accessible to the owners of an iPhone or any other smartphone. It can be used as a digital assistant to parse Chinese speech. For example, Apple’s Siri (2011), Windows Phone (Cortana, 2014), Google (Home, 2016), Amazon (Alexa, Echo Dot), and Samsung (Bixby, Dec. 10, 2017) are all available in the North American market. These Chinese voice applications are widely accessible to American students, who made this research possible. If an ASR device, such as Siri, can understand the questions students produce in Chinese and reply with authentic answers, then the students can improve their interpersonal and presentational communication skills. In addition, Android users have already used Google’s voice-recognition technology to send text messages (Vanian, 2017). Students must simply turn on the language choice function to do so. If Chinese learners can send a teacher their speech transcribed to characters, then their pronunciation score can be derived from the correctness rate of the text message, which is tied to the accuracy of phonic syntax input. A series of analyses given in this paper demonstrates the feasibility of using intelligent speech recognition technology to improve the effectiveness of the study of Chinese language learners.
Literature Review
Numerous studies have been conducted on the assessment of foreign language ability, but using AI-based ASR to assess learners’ language development has only become possible in recent years, along with the innovation of modern technology and its broad accessibility among the general population. As such, relevant research has been exceptionally limited. There has been discussion about whether flipped classrooms for learning classical Chinese could be supported by a mobile device-assisted learning system (Wang, 2016), and whether integrating teaching strategies into interactive response system (IRS) activities would be effective in facilitating teaching and learning (Wang, 2017). However, the accuracy scores derived from this assessment provided no quantitative literature reviews whatsoever. Recently, the Chinese company iFLYTEK announced the release of six education products using intelligent technologies, including ASR, and the collection of 35 billion samples of learning data (Peng, 2017). This indicates that recent technology has enabled us to use ASR as an assessment instrument for the Chinese language classroom. Moreover, the research of University of Oregon Chinese Flagship found that ASR lacked methodology for measuring spoken Chinese (Clark, 2010); they could only provide the traditional four modalities to assess the oral output. This research demonstrated the necessity and possibility to develop an easily adaptable method for assessing oral Chinese. Two decades ago, the use of speech recognition software as an English language oral assessment instrument had been addressed by English language scholars (Coniam, 1998). This could be used in a similar manner for studying Chinese language assessment. Additionally, the quantitative assessment of the learners’ fluency in the second language by means of automatic speech recognition technology needs to be employed (Cucchiarini, 2000). Furthermore, the nature of speaking proficiency in English as a second language has been developed in the context of a larger project, and the rating scale for a new international test of English for academic purposes had also been executed (Iwashita, 2008). Our research used the same methodology to measure the test results in a smaller scale assessment that may limit the stringency of the result. Nevertheless, it exhibits the use of ASR technology as an assessment instrument for Chinese language education, which has substantially benefited Chinese teachers in evaluating students’ pronunciation, including the accuracy with four tones. Oremus introduced the historical development of ASR and its current advancement (Oremus, 2014). The same method was employed in our study. The students’ various mobile devices can all become engaging tools to assess their proficiency of spoken Chinese. Snyder et al. (2016) evaluated the effectiveness of using "flipped" instruction in a secondary social studies classroom. The discussion in this article provided insight for curricular decision-making when implementing the ASR technology in the Chinese language classroom.
Research Questions and Hypotheses
After examination of the literature related to ASR, several research questions remain to be discussed. For example, how accurate is ASR in measuring speech proficiency, and can it be used as an official summative assessment method? To what degree can students be engaged in this type of assessment, and is it appropriate to use this technology in a high school classroom considering that teenagers are easily distracted by electronic devices?
On the basis of the above research questions, two hypotheses will be formulated in our research. The first stated that implementing ASR technology in the Chinese language classroom is feasible and effective. The second stated that the ASR method plays an important role on learners’ motivation and sustainable interest in Chinese study, regardless of concerns of distraction.
Methodology
Participants
This research involved thirteen high school students, six boys and seven girls, ranging from age 14 to 18. Eleven of them were first-year Chinese learners, with Mandarin Chinese level novice low according to the American Council on the Teaching of Foreign Languages (ACTFL) speaking proficiency guidelines; the other two students’ Chinese levels were medium high and advanced mid. Four of them were from an urban school and nine of them were from a suburban school.
The students came from three classes and were naturally organized into three groups, using WeChat, Apple text message app, and the Siri app, respectively. Students in Group I had studied Chinese for six months and learned nearly 100 sentences; however, they are only able to produce about 30% of these sentences to make their own dialogues without referring back to the text. . Students in Group II had also learned Chinese for six months but in a distance learning setting; although they have also learned 100 sentences, it remains a major challenge for them to replicate these sentences in real life communication or to make coherent dialogues. Students in Group III had over six years of part-time study; they are heritage students, and they can use AI voice assistants like Siri of Apple or Alexa of Amazon for more complex speech.
None of the students had prior experience with interactive ASR in Chinese. Before the three practice trials, they received two weeks of instruction on how to use different types of ASR technology. The seven students in Group I were instructed to use a Chinese social media app, WeChat. Two of them used Android phones, four of them used iPhones, and one had no phone, but shared a phone with others. The four students in Group II were instructed to use the Apple text message application. The two students in Group III were instructed to use the Siri application. The demographic characteristics of the students who participated in the study and the number of students assigned to each group are shown in Table 1.
Table1. ASR Participant Demographics
Number of Students | % of Students in Overall Study | ||
Experimental method in each group | WeChat app (Group I) | 7 | 53.8% |
Text Message app (Group II) | 4 | 15.4% | |
Siri app (Group III) | 2 | 30.8% | |
Gender across all three groups | Female | 7 | 53.8% |
Male | 6 | 46.2% |
Procedures of the experiment
Students in all three groups used the latest ASR technology to either convert vocal input to character output, or use man-machine conversation to acquire information.
The Group I students were instructed to interact in a group chat setting, with teacher involvement. They spoke short sentences to their devices which automatically converted their speech to text. They shared their text among the group and compared their accuracy in performing the lesson dialogues.
The Group II students were instructed to interact with the teacher individually. They also used their own devices to automatically convert their speech to text, and then they took pictures of the text via phone screenshot and sent them to the teacher through email as a record of assessment.
The Group III students were instructed to use daily conversation with Siri to conduct an inquiry on classical Chinese poetry, the weather in Beijing, and the closest library in town. In this third model, students used verbal Chinese to request information from Siri .
For all three groups, the teacher provided immediate feedbacks to students, so that they would note their incorrect pronunciations, and sometimes they even conducted self-corrections. After three trials of performances by each group spanning multiple days, the teacher instructed students to count the number of incorrect words to gauge their improvement. Upon making corrections, students sent them to the teacher for an official summative assessment.
Measurements
The measurements covered oral proficiency, listening comprehension, and conversational skills. A set of comparative data was gathered in each of the three trials that were performed in the experiment. The three trials included the first, the second, and the third time using ASR throughout a 45-day period.
In order to identify whether students will support the use of ASR technology in Chinese classroom and what instruction the students may need, this research included a survey containing six statements related to ASR. Students indicated whether they agree or disagree by using a five-level Likert scale, detailed as follows::
Strongly disagree
Disagree
Neither agree nor disagree
Agree
Strongly agree
The survey was anonymous and students provided voluntary responses through Google form.
Results and Discussion
Effectiveness of ASR and Quantitative Assessment
The three trials in our research project presented some interesting preliminary results. Firstly, our research suggests that ASR is an effective way to assess students’ speaking and reading proficiency in both interpretive and presentational communication skills. Secondly, it instantly provides teachers with specific information about imperfections in student pronunciation so that students receive prompt feedback from the teacher through ASR interaction that consequently increases their learning cognition. Students also become more attentive towards their speech and motivated to make self corrections before submitting their work. This resonated with other research statements such as “enhancing the learning in conversation courses designed to develop spontaneous second language (L2) oral proficiency” (Miller, 2013). Lastly, this application of ASR technology in the Chinese classroom filled the historically missing rapid quantitative measurement method for oral Chinese language learning assessment. This is the first time that a teacher can score student’s oral expression on the ASR text output. It would be a welcome addition to the Computerized Assessment of Proficiency (CAP) that was designed to measure proficiency in Chinese reading, listening, writing, and speaking, based on the underlying principles of the Standards-based Measurement of Proficiency (STAMP) (Clark, 2009).
Our research results supported both of our hypotheses. A statistical data analysis and performance chart was constructed according to the designed test model and data collection as shown in Table 2. These results suggest that there is a clear distinction between traditional assessment and ASR assessment. Traditionally, teachers had no quantitative measurement method to evaluate student pronunciation with the exception of using their own listening judgments. However, with ASR, the performance of both Group I and Group II was measured quantitatively without using the teacher’s listening judgment. The text material, converted from speech, was graded according to the correctness rate in quantitative measures. The research results and students’ achievements showed that ASR ratings of fluency in speech were reliable and effective. Among the 92 sentences produced by the nine students, the correlation with the correctness rate varied between 0.9 and 0.98 as shown in Figure 1. The accuracy of speech performance from the first to last trial presented an upward trend.
Table 2. All three trial results from a 45-day period by ASR input
Students | First Trial Incorrectness Rate | Second Trial Incorrectness Rate | Third Trial Incorrectness Rate | Total number of incorrect characters per student | |
Group I | Student 1 | 0/13 | 0/20 | 0/16 | 0/49 |
Student2 | 0/0 | 1/20 0.05 | 3/18 0.166 | 4/38 0.105 | |
Student3 | 0/33 | 1/20 0.05 | 0/32 | 1/85 0.012 | |
Student 4 | 7/39 0.179 | 2/20 0.1 | 1/25 0.04 | 10/84 0.119 | |
Student5 | 0/49 | 0/0 | 0/23 | 0/72 | |
Student6 | 1/11 0.09 | 0/20 | data not available | 1/33 0.03 | |
Student7 | N/A | N/A | N/A | ||
Group II | Student 1 | 8/15 0.53 | 8/29 0.275 | 0/9 | 16/53 0.301 |
Student 2 | 2/29 0.069 | 1/107 0.009 | 3/136 0.022 | ||
Student 3 | 0 | 0 | 0 | 0 | |
Student 4 | 2/10 0.2 | 0/0 | 0/26 | 2/36 0.056 | |
Total Incorrectness Rate among Students in Groups I & II during Each Trail | 20/189 0.106 | 13/236 0.055 | 4/123 0.033 | 37/586 0.063 |
Group III | Chinese Classical Poem | Weather Inquiry | Library Location Inquiry | Correctness Rate | Degree of Involvement |
Student 1 | 0/4 | 0/2 | 0/1 | 100% | 100% |
Student2 | 1/4 | 0/2 | 0/1 | 86% | 100% |
Figure 1. The Correlation of Trials and Accuracy Rating
Students’ Motivation and Engaging Learning Environment
The experiment results showed that, with the assistance of ASR, students who have less than one year of Chinese study can convert their relatively complex speech into a text-based format, which, according to the teacher’s observation, surpassed those who had studied Chinese for years without ASR assistance in the classroom. This can become positive motivation for students starting a one-year language-study program to perform practical, language-based speech in real life communication.
Table 3. Likert Scale Survey Result of Using ASR in Chinese Classroom
Six Survey Questions | # of students selected 1, 2, 3, 4, 5, respectively, in Likert Scale | ||||
Q1: I like to use ASR because it is fun to use | 1 | 2 | 1 | 4 | 5 |
Q2: I like to use ASR because it is easy to use | 1 | 2 | 3 | 4 | 4 |
Q3: I communicate more frequently with the teacher by using ASR | 3 | 3 | 2 | 1 | 4 |
Q4: It improved my Chinese pronunciation | 3 | 1 | 2 | 1 | 6 |
Q5: It increased my interest of studying foreign language | 3 | 3 | 1 | 2 | 4 |
Q6: I will use ASR more often in learning Chinese | 4 | 0 | 0 | 3 | 6 |
The survey results showed the degree of enthusiasm of participants in using ASR in their Chinese language classroom (see Table 3). The results supported the beliefs we held prior to the experiment: using mobile devices in the classroom is easier and more efficient than using computers, and ASR enables teachers to provide instant feedback and thus increases teacher-student interaction. According to the survey, students enjoyed using ASR to submit their oral assignments, and the teacher could easily measure their performance accurately and quantitatively. ASR applied in the classroom enhanced student engagement with human-machine conversation.
Moreover, over 65% of students responded that ASR is fun and easy to use and they are engaged when using it. 69% of students responded that they want to use ASR more often in learning Chinese. In Group III, after the ASR method was introduced to two students who had learned Chinese part-time over six years, they voluntarily asked Siri numerous questions in Chinese. This served the purpose of this research well: creating an engaging learning environment and motivating student’s interest in practicing on Chinese language.
Concerns and Considerations
Some concerns arose during the research. Firstly, the amount of time prescribed for using ASR in the classroom must be carefully planned in order to minimize distractions from the common attention diffusion of teenagers. Secondly, although mobile devices have become widely accessible, what if students do not have one? Can we use other alternatives such as Chromebooks, iPads, or computers? Will the difference of teaching platforms cause classroom time management issues? Thirdly, as was reflected by their responses to survey question #4, students were not sure whether ASR could help them improve their pronunciation. ASR technologies are over-intelligent, so as to correct phonic errors automatically. Even if the student's pronunciation of the four tones was not sufficiently accurate, ASR will still able to interpret speech and produce correct text output. This may not be a positive influence because some students may stop striving for more accurate pronunciation in the long run. Another potential negative effect of using ASR is that it may decrease students’ desire to improve their writing proficiency.
A Surprising Result
One thing that surprised us was that the responses to survey question #3 (“I communicate more frequently with the teacher by using ASR”) were evenly spread, meaning this phenomena did not show strong agreement among students. However, in reality, ASR did increase quality and quantity of the students’ interactions with the teacher. Throughout the experiment, students actively participated in all exercises and assessments. They were highly motivated and some even took initiative to communicate in Chinese, which had not been the case prior to completing the ASR exercises. In the past, these students would use Chinese to communicate only when they were required. During and after this experiment, they began using Chinese to respond whenever they could, even to prompts in English.
Conclusions
Our research indicates that ASR can support real-time evaluation, error identification, and self-correction functions for L2 learners' speech proficiency. In this experiment, 27 authentic sentences were used 92 times among ten students in Group 1 and Group 2 (One student did not participate in the three trials due to the surgery in hospital, but showed high performance in a separate evaluation using ASR afterwards). There were three trials spread out over a 45-day period to demonstrate the correlation between trials and correctness rate. The improvement of pronunciation had been detected through the positive relationship between attempts and resulting accuracy. Using ASR as an assessment tool to engage students to develop communicative proficiency in the target language was also successful. The goal of creating an engaging learning environment therefore, to increase the students’ interest of learning Chinese through ASR assessment in Chinese language classroom, has been achieved. At the present time, the iFLYTEK’s ASR technology has been applied in 94% of primary and middle schools in Singapore (Guanchazhe, 2017). In the United States, we tested mobile phones with ASR, used by American students in secondary schools, to increase the effectiveness of Chinese language study. In addition, we can continue to apply ASR technology in the classroom to assist language study in the future through authentic conversation with voice input. In the past, if students did not know certain Chinese characters, resources would be very limited to do research on their own; but now with the new ASR technology, they can make simple conversation with Siri or other ASR applications to do their own research. That will be particularly helpful for the first-year Chinese learners who have limited Chinese proficiency. For future study, voice input technology is expected to become significantly more well developed. To foster 21st-century students, we educators have the obligation of embedding the latest technology in our classrooms, so that we may improve the effectiveness of teaching and learning in the AI-voice command era.
References
Clark, M. (2009). Chinese Computerized Assessment of Proficiency (CAP). CASLS Technical
Report 2010-1. Retrieved from https://casls.uoregon.edu/cap/TechReport/Chinese.pdf
Coniam, D. (1998). The Use of Speech Recognition Software as an English Language Oral
Assessment Instrument: An Exploratory Study. CALICO Journal, Vol. 15, (No. 4), pp.7, 23. Retrieved from https://www.jstor.org/stable/24147601?seq=1#page_scan_tab_contents
Cucchiarini,C., Strik, H., & Boves, L. (2000). Quantitative assessment of second language
learners’ fluency by means of automatic speech recognition technology. The Journal of
the Acoustical Society of America, pp. 107, 989. Retrieved from
http://asa.scitation.org/doi/abs/10.1121/1.428279
Guanchazhe (Observer). (2017). iFLYTEK announced the release of six education products
using intelligent technologies, including ASR, and the collection of 35 billion samples
of learning data.
Retrieved from http://tech.163.com/17/0309/11/CF36M23100097U7T.html
iFLYTEK official web. (2018) Retrieved from
http://www.iflytek.com/en/content/details_10_1681.html
Iwashita, N., Brown, A., & Mcnamara, T. (2008). Assessed Levels of Second Language Speaking Proficiency: How Distinct? Oxford University Press 2008. Retrieved fromhttps://eclass.uoa.gr/modules/document/file.php/ENL264/testing%20speaking.pdf
Miller, J. S. (2013). Improving oral proficiency by raising metacognitive awareness with
recordings. In J. Levis & K. LeVelle (Eds.). Proceedings of the 4th Pronunciation in
Second Language Learning and Teaching Conference. Aug. 2012. 101-111.
Retrieved from https://apling.engl.iastate.edu/alt-content/uploads/2015/05/PSLLT_4th_Proceedings_2012.pdf
Oremus, W. (2014). I Didn’t Type This Article. Retrieved from
Peng, Y. (2017). iFLYTEK announced the release of six education products using intelligent
technologies, including ASR, and the collection of 35 billion samples of learning data
Retrieved from https://www.jiemodui.com/N/85948.html
Ren, B., From Siri to IDF Speech Recognition is changing who? November 11, 2017 20:00
Source: Car home Type: Original Edit: Ren Bo RetrievedRetrieved from
https://www.autohome.com.cn/user/201711/909060.html
Snyder, C. Besozzi, D., Lawrence, P., & Oppenlander, J. (2016). Is Flipping Worth the Fuss: A
Mixed Methods Case Study of Screencasting in The social Studies Classroom. American
Secondary Education 45(1)
Snyder, C., Lawrence, M. P. & Besozzi, D. (2014). Cast from the Past: Using Screencasting in
the Social Studies Classroom. The Social Studies, DOI: 10.1080/00377996.2014.951472
Link: http://dx.doi.org/10.1080/00377996.2014.951472
Vanian, J. (2017). Google Challenges Apple's Siri in Dictating Messages. Fortune February 23,
2017. Retrieved from http://fortune.com/2017/02/23/google-iphone-keyboard-voice/
Appendix A
The Response Percentage of Students’ Survey
Appendix B
Examples of Chinese Sentences, Dialogue, Classical Poem Used in the Experiment
我叫…, 很高興認識你。
你在哪兒工作?
圖書館在哪兒?
這個周末你想做什麽?看電影還是去跳舞?
請進,請進,快進來!
我來介紹一下。這是我的同學……
你想喝點什麽?咖啡還是茶?
周末你忙嗎?我想請你去看電影。
什麽電影?
美國電影。
7:30可以。
登鸛雀樓
(唐) 王之渙
白日依山盡,
黃河入海流。
欲窮千裏目,
更上一層樓。