by Yu Jingsong
Software and Microelectronics School of Peking University, Beijing, China
At the end of 2006, the Software and Microelectronics School of Peking University established a new computer aided translation Master program, which comprises translation education and the theories and the practice of CAT tools. We hope our students could actively adapt to use modern information technologies in their professional work as we think that merely learning several IT courses, such as basic level programming, natural language processing, and information publication technologies are not enough. The ideal pedagogy should immerse the students in technical learning environment for all translation courses and translation practices during their whole learning process.
The Computer Aided Translator Training Platform serves as an online learning management system yet puts emphasis on translation education. It seamlessly combines translation memory techniques, corpus linguistics methods, electronic dictionaries, termbases and knowledgebase of translation cases in a well-integrated system. We have also implemented several specialized tools and gadgets which utilize these online electronic reference resources for teaching and learning activities. The whole kit includes online translation assignment tool, manually or automatically error annotation tool, statistic error reports (e.g. class range or personal periodical comparison), students peer review module, discussion board/wiki (group translation project) tool, online translation quiz tool and even more. Customized for translation studies, these tools can support and extend traditional translation curriculum objectives and engage students in collaborative and interactive learning, either in distance education class or using them as supplemental tools for classroom courses.
2. INFRASTRUCTURE OF THE CATTP SYSTEM
The CATTP enriches the present translation pedagogy and changes in a large measure the traditional shape of translation education. To obtain an easy and fast developing platform and a very strong code base for continuous progress, we use a very famous open source course management system called Sakai as our starting point. The CATTP inherits the functions from the matured Sakai system. The original Sakai’s functions can still be used as system administrators’ or instructors’ requirements, e.g. grade book module or students enrollment module. We develop our modules and plug-ins strictly in conformity with Sakai’s coding regulations and try not to modify the Sakai system too much to hope if Sakai upgrades, then we can upgrade CATTP easily and quickly.
Most of the online language resources are stored in the backend database with authentication module and information retrieval tools which can perform full-text and metadata search. We call it language server. The language server equipped with efficient statistic analysis functions can work on translation cases knowledge base, learner’s corpus and some other learning materials while the Sakai modules we designed readily adapt language server into Sakai learning environments and handle user interface.
The standardized XML data exchange protocol and web services technologies create very flexible system architecture. Either user interface modules or backend database can be replaced or upgraded freely without affecting the other parts of the system.
We notice that the continually expanding language resources require our web system, corpus and database system to response the users quickly enough. The infrastructure of the CATTP system, backend language server and frontend mature open source Sakai course management system, makes it possible to scale up along with the system burden and the amount of data resources. We can always choose the most suitable technology to optimize the performance of each component in the CATTP system if we keep the web services interface unchanged.
3. THE KNOWLEDGEBASE OF TRANSLATION CASES
New translation teachers always find it very difficult to choose proper material and use them effectively, even though they are professional translators already. They have to learn from experienced translation instructors. The knowledge of translation education needs to be disseminated in the institution or in the translation educator’s community. We have tried to create a convenient Web 2.0 style collaborative working environment for the translation instructors in the CATTP system.
First of all, the translation cases are stored in the knowledgebase with a set of descriptive metadata and can be retrieved and displayed later. With a flexible web-based tool, all instructors have the privilege to create translation cases, which then will be transformed into a special XML format and stored in the knowledge base.
It is very easy to reuse a translation case. The instructors could browse the cases in the knowledge base or use a search engine to find some cases they need for the class. Once they decide to use a certain one, they can make a hard copy for hand outs or automatically create a hyperlink in the CATTP system with a short summary. If the students click the hyperlink, a window will pop up to show the appointed parts of the translation case.
The range of reusing those cases and other learning materials can be set to private, limited or public by the copyright holder. For example, an instructor writes a translation case and sets it as limited. Then, his colleges in the institute will see it. “We provide, we share and we enjoy the fruition.” Any instructor can reuse those non-private translation cases in their classes to improve their teaching practice and students’ performance as well.
We also create a set of metadata to facilitate the knowledge base reuse. Generally, the more specific the metadata is, the more useful it becomes. This, however, may impose overly burdensome requirements on the instructors. In addition to some very common data (creator, source, language pair, copyright, style and so on), tagging technology associated with Web 2.0 sites infrastructure (www.flicker.com is an example) could be a preferred choice. Instructors assign free terms or keywords to a knowledge case in some special fields, e.g. grammar points. Other instructors and students can add more tags if they want to. All tags will be counted to generate “tags cloud” or “tags list” to help the system user to choose proper words as a tag or to find the hotspots in the translation study.
The translation cases in the knowledgebase are not simply stored or retrieved. They can be continually modified and improved by all the instructors while they use them. The history of the modification and the contribution of each user is also recorded. The administrator or other instructors could backward any modification to any previous version. If the instructors use a translation case online, the students’ responses, comments and discussions will be recorded and attached to the case. To put it simply, we create a wiki-like translation case knowledgebase in the CATTP system. We encourage instructors to use CATTP as a teaching aid within their translation class and we are convinced that the system will definitely help them considerably.
4. FLEXIBLE LEARNING ENVIRONMENT
A couple of innovative online teaching tools which greatly contribute to student collaborative study have been developed. The CATTP system provides practical modules geared to diversified requirements for translation study, including online translation assignment tool, group translation project tool, student peer review tool and discussion board/wiki tool. Compared with the Sakai version of these tools, they at least support a pair of languages, capable of connecting and communicating with the backend CATTP language server.
The CATTP system is based on the Sakai course management system. We did not remove any original teaching tools. So, they are still can be used for online translation courses. In fact, if an instructor wants to run the CATTP smoothly for a complete online translation course, he or she must use some of them, e.g., the grade-book is one of these must-have tools.
Using these tools wisely, we can create some very useful teaching pattern for translation education. For example, one assignment could be set 2 due dates. Before the first one, the students submit their personal translation work into the CATTP drop box and then they will receive 2 or 3 pieces of other students’ work randomly selected for proof reading. The students are supposed to annotate the errors they find and write down comments using the online tools as same as their instructor uses. After the second due date, their instructor and teaching assistants will grade both the original assignments and students’ peer reviews.
With the help of the CATTP system, these procedures can highly inspire the students’ creative thinking and maximize the value of one single assignment. However, it avoids subjecting the instructors to many more burdens because the errors annotated by the students and instructors will be automatically compared and highlighted to assist the instructor’s work. The error rates will also be calculated and recorded for each student. When the students receive their work, all the translated sentences and errors will be displayed very clearly with different colors based on the sources and error types.
By working only on one assignment, the students could learn several different correct translation methods and techniques for one sentence and identify more mistakes they might make and take precautions accordingly. We believe that this is one efficient and quick way to train the students to be professional translators and it is otherwise almost impossible to put in practice without the CATTP system.
Another example is the translation wiki tool. This totally customized wiki plug-in for Sakai supports that everyone writes down one source language passage and completes multiple translated target language passages along with discussions and comments. Just like normal wiki software, anyone can add a new translation unit or modify the existed versions in one wiki page and discuss the glory points or the weakness of other’s work.
As we know, using the comparative analysis methods are very important research skills in translation study. With the CATTP system, the instructors can import this handy wiki tool into their course sites and put on some original passages in one topic that the students can work on. Then they monitor each student’s participation and contribution.
This translation wiki tool can also be used as a translation assignment tool for group project. The students groups will have an uncontrolled playground to contribute and share their thinking. It is much better than face to face group conferences and other communication methods, such as email or IM tools (AOL, MSN or others). At the due date, the wiki pages contain the final best results and all intermediate revised versions left in the history section. So, it is possible for the instructors to trace the contribution from each student.
Now we are trying to develop the automatic alignment and compare utilities for these translation teaching tools to assist the students and instructors in indicating the differences among several versions more accurately and quickly.
5. THE EMBEDDED TRANSLATOR LEARNERS’ CORPORA IN CATTP
Generally, after the assignments and quizzes are graded by the teachers, these materials will be transferred to the backend language server along with the annotation of the translation errors or comments and alignment information to compose a scalable translation learner’s corpora: the instructors could conduct the further relevant analysis at personal level, at class level, at institution level or even at public level. The language server supports full-text information retrieval, sentence similarity calculation and statistical analysis on about 1-100 millions of sentence-pairs or gigabytes parallel corpus.
As we mentioned above, an online WYSIWYG type annotator and corpus viewer tool has been developed and can be used by the instructors, teacher assistants and students. We have also designed new translation-error tagging schemas for several target languages, especially for Chinese language. The schemas emphasize the easy-of-use for human users and also cater for the automatic statistic analysis procedure requirements. The schema also supports fine tuning when it is translated from different source languages.
The analysis module and full-text search module are designed for the instructors and the students to investigate into the corpus to find out translation examples or generate graph charts of the distribution of the errors types, wrong words or wrong grammar patterns. Such analyses have been found popular among translation researchers in recent years. However, the CATTP enables every user in the system to do the research on real-time assignments materials to assist their jobs when they get the proper authorization.
When mathematic equations and computer algorithms are aptly applied in translation studies, they inspire new thinking and new pedagogy evolution. In the CATTP system, the dynamic growing large-scale annotated corpora are even more useful than other static corpora. As long as we keep using the CATTP system, it will produce increasingly high yield of fruit. We also believe that the progress of the natural language processing technologies would help us to dig out more useful results and information to guide the translation teaching and learning.
6. THE ELECTRONIC RESOURCES FOR STUDENT TRANSLATOR
We have tried to simulate the professional translators’ workbench in the cyber space. The customized Sakai course management system connects to the backend language server to provide translation memories, parallel corpora, integrated electronic dictionaries, and project-based termbank services for instructors and students. When the students open our CATTP website, they can refer these all-in-one resources for all kinds of learning activities.
The electronic resources are new style reference materials for modern professional translators. The student must be familiar to use them. Actually, this is a very important reason to prompt us to develop the CATTP system and use it as an infrastructural teaching facility for the Computer Aided Translation Master program in our institute.
However, we can separate the electronic reference resources from the CATTP system and maybe it will be a little easier to develop or integrate; but if they are embedded in, the instructors have the abilities to control and guide their students to learn to use them.
On the other hand, the CATTP system also encourages the students to build up their own electronic resources for their studies. For example, the students in business translation courses will create their own business termbank and translation memories collected from their required readings and even from their assignments. These valuable and well-tailored databases can be taken with the students and used for their future career. Learning and practicing therefore integrate seamlessly in the education process.
7. IMPROVE THE TECHNICAL COMPETENCE OF THE STUDENT TRANSLATORS
Since practiced translation and interpretation skills are not easy to be obtained in the classroom, one of the major benefits of using CATTP system in translation education is to ameliorate the learning process and to help the students to become qualified professional translators especially for large scale language service projects. For this purpose, the mastery of the modern information technologies for everyday translation and localization work are increasingly important.
However, it is almost impossible to make the students to fully understand the theories behind the user interface of the CAT software system if simply teaching them how to use commercialized Computer Aided Translation software. We expect our students to take home to themselves and make the utmost use of the new information technologies to assist their translation performance. Otherwise, it is maybe advisable to choose a short term specific commercialized CAT software training course other than to join our Master program.
It is always true that using CAT software is easy for a software engineer but is difficult for a translator. Sometimes the translators want some new features to expedite their work. But unfortunately, they feel rather difficult to tell the software developers what they really want or maybe they do not know that the function they figure out is impossible to be implemented. There is always a huge gap between these two kinds of people. One goal of our CAT Master program is trying to fill it up.
The CAT software relies on a group of fast developing technologies. If we do not teach the basic and some advanced natural language processing theories as well as sufficient practical information technologies, it will be difficult for the students to deal with the new generation CAT tools and translation management system even though they are at home in the current version software.
We expect the students to fully understand the way of the IT engineer thinking and facilitate their work by creatively employing new technologies. If they cannot design new systems by themselves, they could specify a viable specification for the ideal system thus others can help them to make it real. That is why we steer the instructors and students to study translation in a technical environment combining with corpora analysis tools, electronic resources, and collaboration support. As a result, they will be conditioned to the translation job with electronic tools and working online within a language service team as an active member when they finally begin their true world job.
By briefly introducing the CATTP system, we integrate many technologies into translation. However, the instructors should not use technology for its own sake, they should find ways to use the CATTP system so as to enhance instruction and improve student’s learning. Technology integration should not be focused on the technology that is being used at present, but on the student activities conducted by applying modern technologies to be qualified professionals for modern language services.
Now we are committed ourselves to developing and testing the whole CATTP system, although some key technologies and modules have already used in our teaching activities. A complete system with all available tools working together will multiply the effects and power of the separated modules. The first beta test version will be online in several months.
After the CATTP system has been fully tested in the Software and Microelectronics School of Peking University, it will be opened to other translation study institutes and subject to public examination.
There is a learning curve to becoming proficient both in information technologies and translation study, but once the skills are learned and activities are identified and located or developed, it will be hard for instructors and students to throw them away. We hope the development of the CATTP system will benefit more institutes and push the translation education study moving forward.
 Armstrong S., Baudrion P., Kübler N., Popescu-Belis A., Secara A., Thomas M., & Volanschi A. (2006) – “Annotation for a Learner Translation Corpus”. TALC 2006 (7th Conference on Teaching and Language Corpora), Paris, France.
 Castagnoli S., Ciobanu D., Kübler N., Kunz K. & A. Volanschi (2006). “Designing a Learner Translator Corpus for Training Purposes”. Presented at TaLC 2006. Paris, France.
 Doyle & Scott, M. (2003): “Translation Pedagogy and Assessment: Adopting ATA's Framework for Standard ErrorMarking”, the ATA Chronicle, Nov/Dec.
 Hodász, G., T. Grőbler & B. Kis (2004). “Translation Memory as a Robust Example-based Translation System”. the Ninth European Association for Machine Translation Workshop(EAMT-04). Malta.
 ISLE (2000). The International Standards for Language Engineering(ISLE) Classification of Machine Translation Evaluations. Marina del Rey, CA: USC Information Sciences Institute.
[ 6] Johnson, D., Johnson, R. & Smith, K. (1991) Active Learning: Cooperation in the College Classroom. Edina, M n: Interaction Book Co.
 Neubert, A. (2004) “Case Studies in Translation: The Study of Translation Cases”. Across Languages and Cultures 5(1):5-21.
 Popescu-Belis A. & Estrella P. (2007) “Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus”. Proceedings of ACL 2007 (45th International Conference of the Association for Computational Linguistics) Interactive Poster and Demonstration Sessions, Prague, Czech Republic, 93-96.
 Secară, A. (2005). “Translation Evaluation- a State of the Art Survey” eCoLoRe/MeLLANGE Workshop Proceedings. 39-44.
 eCoLoTrain project. http://ecolotrain.uni-saarland.de/index.php?id=702&L=1
 MeLLANGE project. http://mellange.eila.jussieu.fr/
 Sakai project. http://www.sakaiproject.org/