You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/conference_101719.tex
+46-16Lines changed: 46 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@
46
46
\maketitle
47
47
48
48
\begin{abstract}
49
-
All widely used and useful programming languages have a common problem. They restrict entry on the basis of knowledge of the English language. The lack of knowledge of English poses a major hurdle to many newcomers who do not have the resources, in terms of time and money, to learn the language. Furthermore, studies back up the fact that learning is better when it’s done in the person’s local language. Therefore, we propose a language wrapper built on top of the Python programming language which can be directly used in the native Urdu language. This eliminates the need for any intermediate language as well. In the future, we aim to scale the language to encapsulate more languages to increase the availability of programming.
49
+
%All widely used and useful programming languages have a common problem. They restrict entry on the basis of knowledge of the English language. The lack of knowledge of English poses a major hurdle to many newcomers who do not have the resources, in terms of time and money, to learn the language. Furthermore, studies back up the fact that learning is better when it’s done in the person’s local language. Therefore, we propose a language wrapper built on top of the Python programming language which can be directly used in the native Urdu language. This eliminates the need for any intermediate language as well. In the future, we aim to scale the language to encapsulate more languages to increase the availability of programming.
50
50
\end{abstract}
51
51
52
52
\begin{IEEEkeywords}
@@ -55,15 +55,41 @@
55
55
56
56
\section{Introduction}
57
57
58
-
The first step to better understanding is communication. We use language to communicate concepts and produce thoughts. When teaching and educating, it is preferable to speak in the learner's native language to facilitate the understanding process.
58
+
%The first step to better understanding is communication. We use language to communicate concepts and produce thoughts. When teaching and educating, it is preferable to speak in the learner's native language to facilitate the understanding process.
59
59
60
-
Computer science is a field dominated by the English language. Beginners must be acquainted with English to understand basic logical constructs. Without a real-life translator, learning programming concepts just by looking at English words is often impossible.
60
+
%Computer science is a field dominated by the English language. Beginners must be acquainted with English to understand basic logical constructs. Without a real-life translator, learning programming concepts just by looking at English words is often impossible.
61
61
62
-
Therefore, there is a need for formal translation of programming constructs into other languages and a usable framework for learners to program in their native language.
62
+
%Therefore, there is a need for formal translation of programming constructs into other languages and a usable framework for learners to program in their native language.
63
63
64
64
\section{Related Work}
65
65
66
-
Studies have shown that students face problems learning the syntax and rules of a programming language initially [1]. However, learning in a local language has a significant positive impact on learning outcomes [2]. This supports the idea that programming should not be restricted by a language barrier.
66
+
- High-level computer programming languages have been a staple to accelerating software and algorithm development from the mid 19th century, and have predominantly been in English.
67
+
68
+
- Students in K-12 are able to learn better in their localized language. Start with general education, then with computer-related education.
- A Framework for the Localization of Programming Languages
78
+
79
+
...
80
+
81
+
- There have been multiple attempts at non-english (needs a better term, maybe monolingual) programming languages (Russian, Kalaam, chronological order)
82
+
- Issues?
83
+
84
+
- There have been multiple attempts at multilingual programming languages (Hedy, chronological order)
85
+
- Bigger Issues?
86
+
87
+
- There have been multiple attempts at localizing existing programming languages (PsueToPy, UrduScript, Chinese Python)
88
+
- Biggest Issues?
89
+
- Scalability to different languages
90
+
91
+
92
+
% Studies have shown that students face problems learning the syntax and rules of a programming language initially [1]. However, learning in a local language has a significant positive impact on learning outcomes [2]. This supports the idea that programming should not be restricted by a language barrier.
Our framework allows easy support for UniversalPython plugins in existing Python IDEs. It acts as a bridge, translating Urdu code into English before passing it to the IDE.
83
109
84
-
\section{Approach}
85
-
86
-
\subsection{Making the Language}
110
+
\section{Design of UniversalPython}
87
111
88
112
We propose a framework as shown in “Fig. 1“, which is a wrapper around the Python Engine. The user writes code in Urdu, for example:
89
113
114
+
Throughout this paper, we take the example of **Urdu**, the national language of Pakistan, and a traditional Right-to-left language, as a translation language.
115
+
116
+
The user writes code in Urdu, for example:
117
+
118
+
90
119
This is passed to “UniversalPython”, which first translates the code to English using Lexical Analysis and Parsing with the PLY library.
91
120
To do this, it first loads the Urdu dictionary, which is a YAML file containing mappings from Urdu to English. This dictionary is a mapping of each Urdu word to a Python keyword. Here is an example of such a dictionary:
92
121
In PLY, we have the option to reserve some keywords so that the library automatically tokenizes them. We set a Grammar Rule which, whenever a reserved keyword (i.e. a word which is present as a key in a language dictionary) is tokenized, simply looks for the token in the language dictionary (key) and replaces it with the corresponding English keyword (value).
93
122
We also set a Grammar Rule to ignore all content within double quotes or single quotes (i.e. strings and docstring) and to ignore content in comments (which start with a \#). We achieve this using Regular Expressions.
94
123
Urdu numbers also lie on the Unicode scale. ۰ (or roman 0) is at 1776, while ۹ (or roman 9) is at 1785.
95
124
96
-
Keeping this in mind, we define a Regular Expression
97
-
which detects any symbols equal to or between this range, 1776 - 1785. These are the Urdu digits. We then use the same language dictionary to translate these numbers into roman numerals. For example, ۵ becomes 5, ۹۰ becomes 90, ۱۰ becomes 10, ۲۰۲۲ becomes 2022.
125
+
Keeping this in mind, we define a Regular Expression which detects any symbols equal to or between this range, 1776 - 1785. These are the Urdu digits. We then use the same language dictionary to translate these numbers into roman numerals. For example, ۵ becomes 5, ۹۰ becomes 90, ۱۰ becomes 10, ۲۰۲۲ becomes 2022.
98
126
Furthermore, we also replace all periods ( . ) and commas ( , ), as these look different in Urdu than they do in English.
99
127
100
128
The rest of the code remains untouched. Whether it is a symbol,like :, ( or ), etc, or even if it causes the lexer to crash, we ignore it so we can preserve the original structure of the code as much as possible. It is not required to translate such symbols and errors anyway: They are meant to be handled by Python, not UniversalPython. Back to our initial example code, our engine detects and replaces it with print , is replaced with if ,
101
129
is replaced with else , and all Arabic digits are replaced with Roman digits. Hence the translated code would be:
102
130
103
131
The above code is essentially vanilla Python code which can now simply be executed by the Python engine. The Urdu variable name does not present any issue to Python, as from Python 3.0 onwards, Unicode is fully supported. So the above code is passed on to the Python engine. The Python engine outputs some response; it can be some print statements which the user entered, compiler/interpreter warnings, and/or errors. This response is passed up to UniversalPython, where again it is tokenized to replace keywords in case of error messages. In our example, since there are no errors, it simply outputs the response as-is. So the response would be:
104
132
105
-
To demonstrate the ease by which plugins can be made for UniversalPython, we made a wrapper for the IPython kernel in which we imported UniversalPython as a package, and processed the code (i.e. translated it from Urdu to English) before it was passed onto IPython. This way, we overrode the
106
-
107
-
functions and achieved a working kernel for UniversalPython. It works line-by-line while maintaining the program memory. This also gave us a visual interface for the language to test it thoroughly.
133
+
To demonstrate the ease by which plugins can be made for UniversalPython, we made a wrapper for the IPython kernel in which we imported UniversalPython as a package, and processed the code (i.e. translated it from Urdu to English) before it was passed onto IPython. This way, we overrode the functions and achieved a working kernel for UniversalPython. It works line-by-line while maintaining the program memory. This also gave us a visual interface for the language to test it thoroughly.
108
134
109
135
The Urdu dictionary is a YAML file containing mappings from Urdu to English keywords, is used for translation. The PLY library allows reserving keywords for automatic tokenization and replacement with their English equivalents. Additionally, grammar rules are set to ignore content within quotes, comments, and Urdu numbers (which lie on the Unicode scale).
110
136
111
137
\section{Experimentation}
112
138
113
-
We propose the following requirements which UniversalPython should meet in order than it is considered effective:
139
+
We propose the following metrics which UniversalPython should meet in order than it is considered effective:
114
140
115
141
1) Programs which work in Python, should be recreatable in UniversalPython, and vice versa.
116
142
117
143
2) UniversalPython should operate as a reasonable speed which at least does not disturb the programmer.
118
144
119
-
3) Converting Plugins made for Python, into Urdu
120
-
Python, should be fairly easy.
145
+
3) UniversalPython should be able to translate from one non-English language to another
146
+
147
+
4) A benchmark should be made against other existing multilingual and non-English monolingual languages
148
+
149
+
5) A user experience test should be conducted to find out user acceptability towards a language in their native tongue.
121
150
122
151
\subsection{Benchmarks with Python}\label{AA}
123
152
@@ -159,6 +188,7 @@ \section*{Acknowledgment}
159
188
I would like to acknowledge my mentor, Dr Omer Beg, for always guiding me to the right path during my Bachelor's, professional life, and Master's degree.
160
189
161
190
\begin{thebibliography}{00}
191
+
162
192
\bibitem{b1} G. Eason, B. Noble, and I. N. Sneddon, ``On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,'' Phil. Trans. Roy. Soc. London, vol. A247, pp. 529--551, April 1955.
163
193
\bibitem{b2} J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68--73.
164
194
\bibitem{b3} I. S. Jacobs and C. P. Bean, ``Fine particles, thin films and exchange anisotropy,'' in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271--350.
0 commit comments