A Study of Neural Machine Translation from Chinese to Urdu

DOI: https://doi.org/10.32629/jai.v2i4.82

Zeeshan Khan

Abstract

Machine Translation (MT) is used for giving a translation from a source language to a target language. Machine translation simply translates text or speech from one language to another language, but this process is not sufficient to give the perfect translation of a text due to the requirement of identification of whole expressions and their direct counterparts. Neural Machine Translation (NMT) is one of the most standard machine translation methods, which has made great progress in the recent years especially in non-universal languages. However, local language translation software for other foreign languages is limited and needs improving. In this paper, the Chinese language is translated to the Urdu language with the help of Open Neural Machine Translation (OpenNMT) in Deep Learning. Firstly, a Chineseto Urdu language sentences datasets were established and supported with Seven million sentences. After that, these datasets were trained by using the Open Neural Machine Translation (OpenNMT) method. At the final stage, the translation was compared to the desired translation with the help of the Bleu Score Method.

Keywords

Machine Translation; Neural Machine Translation; Non-Universal Languages; Chinese; Urdu; Deep Learning

References

  1. . Jonathan Slocum. A survey of machine translation: Its history, current status, and future prospects. 1985; 11(1): 1-17.
  2. . Bai L, Liu W. A Practice on Neural Machine Translation from Indonesian to Chinese. Recent Trends in Intelligent Computing, Communication and Devices 2020; 33-38.
  3. . Godase A, Govilkar S. Machine translation development for Indian languages and its approaches. Behavioral & Brain Sciences 2015; 4(2): 55-74.
  4. . Papineni K, Roukos S, Ward T, et al. BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics 2002; 311-318.
  5. . Okpor M. Machine translation approaches: Issues and challenges. International Journal of Computer Science Issues 2014; 11(5): 159.
  6. . Mall S. and Jaiswal U. Survey: Machine translation for Indian language. 2018; 13(1): 202-209.
  7. . Hutchins WJ, Somers HL. An introduction to machine translation. Academic Press London 1992; Vol. 362.
  8. . Marcu D, Wong D. A phrase-based, joint probability model for statistical machine translation. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing 2002.
  9. . Zafar M, Masood A. Interactive English to Urdu machine translation using example-based approach. International Journal on Computer Science & Engineering 2009; 1(3): 275-282.
  10. . Pathak AK, Acharya P, Balabantaray RC. A case study of Hindi – English example-based machine translation. Innovations in Soft Computing and Information Technology 2019; 7-16.
  11. . Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 2014.
  12. . Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. Computer Science 2014.
  13. . Eriguchi A, Hashimoto K, Tsuruoka Y. Tree-to-sequence attentional neural machine translation. 2016.
  14. . Yang B, Wong DF, Xiao T, et al. Towards bidirectional hierarchical representations for attention-based neural machine translation. 2017.
  15. . Chen H, Huang S, Chiang D, et al. Improved neural machine translation with a syntax-aware encoder and decoder. 2017.
  16. . Wu S, Zhang D, Yang N, et al. Sequence-to-dependency neural machine translation. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics 2017; Vol. 1.
  17. . Eriguchi A, Tsuruoka Y, Cho K. Learning to parse and translate improves neural machine translation. 2017.
  18. . Aharoni R, Goldberg Y. Towards string-to-tree neural machine translation. 2017.
  19. . Du W, Black AW. Top-down structurally-constrained neural response generation with lexicalized probabilistic context-free grammar. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2019; Vol. 1.
  20. . Khan NJ, Anwar W, Durrani N. Machine translation approaches and survey for Indian languages. 2017.
  21. . Kalchbrenner N, Blunsom P. Recurrent continuous translation models. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing 2013.
  22. . Bilal M, Israr H, Shahid M, et al. Sentiment classification of Roman-Urdu opinions using Naive Bayesian, Decision Tree and KNN classification techniques. Journal of King Saud University Computer & Information Sciences 2016; 28(3): 330-344.
  23. . Alam M, Hussain S. Sequence to sequence networks for Roman-Urdu to Urdu transliteration. International Multi-topic Conference (INMIC) 2017. IEEE.
  24. . Mukhtar N, Khan MA. Urdu sentiment analysis using supervised machine learning approach. International Journal of Pattern Recognition and Artificial Intelligence 2018; 32(2): 1851001.
  25. . Usman M. Urdu text classification using majority voting. 2016; 7(8): 265-273.
  26. . Yang N, Liu S, Li M, et al. Word alignment modeling with context dependent deep neural network. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics 2013; Vol. 1.
  27. . Auli M, Galley M, Quirk C, et al. Joint language and translation modeling with recurrent neural networks. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing 2013; 1044-1054.
  28. . Liu L, Taro W, Eiichiro S, et al. Additive neural networks for statistical machine translation. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics 2013; Vol. 1.
  29. . Mikolov T, Karafiat M, Burget L, et al. Recurrent neural network based language model. Eleventh Annual Conference of the International Speech Communication Association 2010.
  30. . Post M, Callison-Burch C, Osborne M. Constructing parallel corpora for six Indian languages via crowdsourcing. Proceedings of the Seventh Workshop on Statistical Machine Translation 2012; 401-409.
  31. . Baker P, Hardie A, McEnery T, et al. EMILLE, a 67-million word corpus of Indic languages: data collection, mark-up and harmonisation. Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02) 2002.
  32. . Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units. 2015.
  33. . Luong T, Brevdo E, Zhao R. Neural machine translation (seq2seq) tutorial. 2017.
Copyright © 2020 Zeeshan Khan

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Refbacks

  • There are currently no refbacks.