X Tutup
Skip to content

Commit 1502e6d

Browse files
committed
chore: update paper outline
1 parent 0a6f700 commit 1502e6d

File tree

5 files changed

+62
-21
lines changed

5 files changed

+62
-21
lines changed

paper/conference_101719.aux

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,7 @@
66
\@writefile{toc}{\contentsline {section}{\numberline {II}Related Work}{1}{}\protected@file@percent }
77
\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {II-A}}Multilingual/Localized Programming Languages}{1}{}\protected@file@percent }
88
\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {II-B}}Multilingual Programming Environments}{1}{}\protected@file@percent }
9-
\@writefile{toc}{\contentsline {section}{\numberline {III}Approach}{1}{}\protected@file@percent }
10-
\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {III-A}}Making the Language}{1}{}\protected@file@percent }
9+
\@writefile{toc}{\contentsline {section}{\numberline {III}Design of UniversalPython}{1}{}\protected@file@percent }
1110
\@writefile{toc}{\contentsline {section}{\numberline {IV}Experimentation}{2}{}\protected@file@percent }
1211
\@writefile{toc}{\contentsline {subsection}{\numberline {\mbox {IV-A}}Benchmarks with Python}{2}{}\protected@file@percent }
1312
\newlabel{AA}{{\mbox {IV-A}}{2}{}{}{}}

paper/conference_101719.log

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
This is XeTeX, Version 3.141592653-2.6-0.999996 (TeX Live 2024) (preloaded format=xelatex 2024.11.28) 7 DEC 2024 12:51
1+
This is XeTeX, Version 3.141592653-2.6-0.999996 (TeX Live 2024) (preloaded format=xelatex 2024.11.28) 7 DEC 2024 16:32
22
entering extended mode
33
restricted \write18 enabled.
44
file:line:error style messages enabled.
@@ -544,6 +544,10 @@ LaTeX Font Info: Trying to load font information for U+msb on input line 46.
544544
(/usr/local/texlive/2024basic/texmf-dist/tex/latex/amsfonts/umsb.fd
545545
File: umsb.fd 2013/01/14 v3.01 AMS symbols B
546546
)
547+
Underfull \hbox (badness 3058) in paragraph at lines 114--115
548+
[][]\TU/ptm/bx/it/10 Throughout this paper, we take the example of[]
549+
[]
550+
547551
Missing character: There is no ۰ (U+06F0) in font [lmroman10-regular]:mapping=tex-text;!
548552
Missing character: There is no ۹ (U+06F9) in font [lmroman10-regular]:mapping=tex-text;!
549553
[1
@@ -562,7 +566,15 @@ Missing character: There is no ۲ (U+06F2) in font [lmroman10-regular]:mapping=t
562566
Missing character: There is no 􏰁 (U+10FC01) in font [lmroman10-regular]:mapping=tex-text;!
563567
Missing character: There is no 􏰃 (U+10FC03) in font [lmroman10-regular]:mapping=tex-text;!
564568
Missing character: There is no 􏰀 (U+10FC00) in font [lmroman10-regular]:mapping=tex-text;!
565-
[2]
569+
570+
Underfull \hbox (badness 4108) in paragraph at lines 139--140
571+
[][]\TU/ptm/bx/it/10 We propose the following metrics which Univer-[]
572+
[]
573+
574+
575+
Underfull \vbox (badness 2150) has occurred while \output is active []
576+
577+
[2]
566578

567579
** Conference Paper **
568580
Before submitting the final camera ready copy, remember to:
@@ -587,7 +599,7 @@ LaTeX Font Warning: Some font shapes were not available, defaults substituted.
587599
Here is how much of TeX's memory you used:
588600
10058 strings out of 476290
589601
227352 string characters out of 5789827
590-
1949839 words of memory out of 5000000
602+
1947839 words of memory out of 5000000
591603
31899 multiletter control sequences out of 15000+600000
592604
562182 words of font info for 97 fonts, out of 8000000 for 9000
593605
319 hyphenation exceptions out of 8191

paper/conference_101719.pdf

-1.08 KB
Binary file not shown.

paper/conference_101719.synctex.gz

-482 Bytes
Binary file not shown.

paper/conference_101719.tex

Lines changed: 46 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@
4646
\maketitle
4747

4848
\begin{abstract}
49-
All widely used and useful programming languages have a common problem. They restrict entry on the basis of knowledge of the English language. The lack of knowledge of English poses a major hurdle to many newcomers who do not have the resources, in terms of time and money, to learn the language. Furthermore, studies back up the fact that learning is better when it’s done in the person’s local language. Therefore, we propose a language wrapper built on top of the Python programming language which can be directly used in the native Urdu language. This eliminates the need for any intermediate language as well. In the future, we aim to scale the language to encapsulate more languages to increase the availability of programming.
49+
% All widely used and useful programming languages have a common problem. They restrict entry on the basis of knowledge of the English language. The lack of knowledge of English poses a major hurdle to many newcomers who do not have the resources, in terms of time and money, to learn the language. Furthermore, studies back up the fact that learning is better when it’s done in the person’s local language. Therefore, we propose a language wrapper built on top of the Python programming language which can be directly used in the native Urdu language. This eliminates the need for any intermediate language as well. In the future, we aim to scale the language to encapsulate more languages to increase the availability of programming.
5050
\end{abstract}
5151

5252
\begin{IEEEkeywords}
@@ -55,15 +55,41 @@
5555

5656
\section{Introduction}
5757

58-
The first step to better understanding is communication. We use language to communicate concepts and produce thoughts. When teaching and educating, it is preferable to speak in the learner's native language to facilitate the understanding process.
58+
% The first step to better understanding is communication. We use language to communicate concepts and produce thoughts. When teaching and educating, it is preferable to speak in the learner's native language to facilitate the understanding process.
5959

60-
Computer science is a field dominated by the English language. Beginners must be acquainted with English to understand basic logical constructs. Without a real-life translator, learning programming concepts just by looking at English words is often impossible.
60+
% Computer science is a field dominated by the English language. Beginners must be acquainted with English to understand basic logical constructs. Without a real-life translator, learning programming concepts just by looking at English words is often impossible.
6161

62-
Therefore, there is a need for formal translation of programming constructs into other languages and a usable framework for learners to program in their native language.
62+
% Therefore, there is a need for formal translation of programming constructs into other languages and a usable framework for learners to program in their native language.
6363

6464
\section{Related Work}
6565

66-
Studies have shown that students face problems learning the syntax and rules of a programming language initially [1]. However, learning in a local language has a significant positive impact on learning outcomes [2]. This supports the idea that programming should not be restricted by a language barrier.
66+
- High-level computer programming languages have been a staple to accelerating software and algorithm development from the mid 19th century, and have predominantly been in English.
67+
68+
- Students in K-12 are able to learn better in their localized language. Start with general education, then with computer-related education.
69+
70+
- https://unesdoc.unesco.org/ark:/48223/pf0000161121
71+
% - https://www.taylorfrancis.com/chapters/edit/10.4324/9781315779560-12/finding-space-non-dominant-languages-education-language-policy-medium-instruction-timor-leste-2000%E2%80%932012-kerry-taylor-leech
72+
% - https://www.researchgate.net/publication/316884125_Learning_to_Code_in_Localized_Programming_Languages
73+
74+
75+
76+
- https://dl.acm.org/doi/abs/10.1145/3051457.3051464
77+
- A Framework for the Localization of Programming Languages
78+
79+
...
80+
81+
- There have been multiple attempts at non-english (needs a better term, maybe monolingual) programming languages (Russian, Kalaam, chronological order)
82+
- Issues?
83+
84+
- There have been multiple attempts at multilingual programming languages (Hedy, chronological order)
85+
- Bigger Issues?
86+
87+
- There have been multiple attempts at localizing existing programming languages (PsueToPy, UrduScript, Chinese Python)
88+
- Biggest Issues?
89+
- Scalability to different languages
90+
91+
92+
% Studies have shown that students face problems learning the syntax and rules of a programming language initially [1]. However, learning in a local language has a significant positive impact on learning outcomes [2]. This supports the idea that programming should not be restricted by a language barrier.
6793

6894
\subsection{Multilingual/Localized Programming Languages}
6995

@@ -81,43 +107,46 @@ \subsection{Multilingual Programming Environments}
81107

82108
Our framework allows easy support for UniversalPython plugins in existing Python IDEs. It acts as a bridge, translating Urdu code into English before passing it to the IDE.
83109

84-
\section{Approach}
85-
86-
\subsection{Making the Language}
110+
\section{Design of UniversalPython}
87111

88112
We propose a framework as shown in “Fig. 1“, which is a wrapper around the Python Engine. The user writes code in Urdu, for example:
89113

114+
Throughout this paper, we take the example of **Urdu**, the national language of Pakistan, and a traditional Right-to-left language, as a translation language.
115+
116+
The user writes code in Urdu, for example:
117+
118+
90119
This is passed to “UniversalPython”, which first translates the code to English using Lexical Analysis and Parsing with the PLY library.
91120
To do this, it first loads the Urdu dictionary, which is a YAML file containing mappings from Urdu to English. This dictionary is a mapping of each Urdu word to a Python keyword. Here is an example of such a dictionary:
92121
In PLY, we have the option to reserve some keywords so that the library automatically tokenizes them. We set a Grammar Rule which, whenever a reserved keyword (i.e. a word which is present as a key in a language dictionary) is tokenized, simply looks for the token in the language dictionary (key) and replaces it with the corresponding English keyword (value).
93122
We also set a Grammar Rule to ignore all content within double quotes or single quotes (i.e. strings and docstring) and to ignore content in comments (which start with a \#). We achieve this using Regular Expressions.
94123
Urdu numbers also lie on the Unicode scale. ۰ (or roman 0) is at 1776, while ۹ (or roman 9) is at 1785.
95124

96-
Keeping this in mind, we define a Regular Expression
97-
which detects any symbols equal to or between this range, 1776 - 1785. These are the Urdu digits. We then use the same language dictionary to translate these numbers into roman numerals. For example, ۵ becomes 5, ۹۰ becomes 90, ۱۰ becomes 10, ۲۰۲۲ becomes 2022.
125+
Keeping this in mind, we define a Regular Expression which detects any symbols equal to or between this range, 1776 - 1785. These are the Urdu digits. We then use the same language dictionary to translate these numbers into roman numerals. For example, ۵ becomes 5, ۹۰ becomes 90, ۱۰ becomes 10, ۲۰۲۲ becomes 2022.
98126
Furthermore, we also replace all periods ( . ) and commas ( , ), as these look different in Urdu than they do in English.
99127

100128
The rest of the code remains untouched. Whether it is a symbol,like :, ( or ), etc, or even if it causes the lexer to crash, we ignore it so we can preserve the original structure of the code as much as possible. It is not required to translate such symbols and errors anyway: They are meant to be handled by Python, not UniversalPython. Back to our initial example code, our engine detects and replaces it with print , 􏰁􏰃 is replaced with if ,
101129
is replaced with else , and all Arabic digits are replaced with Roman digits. Hence the translated code would be:
102130

103131
The above code is essentially vanilla Python code which can now simply be executed by the Python engine. The Urdu variable name 􏰀 does not present any issue to Python, as from Python 3.0 onwards, Unicode is fully supported. So the above code is passed on to the Python engine. The Python engine outputs some response; it can be some print statements which the user entered, compiler/interpreter warnings, and/or errors. This response is passed up to UniversalPython, where again it is tokenized to replace keywords in case of error messages. In our example, since there are no errors, it simply outputs the response as-is. So the response would be:
104132

105-
To demonstrate the ease by which plugins can be made for UniversalPython, we made a wrapper for the IPython kernel in which we imported UniversalPython as a package, and processed the code (i.e. translated it from Urdu to English) before it was passed onto IPython. This way, we overrode the
106-
107-
functions and achieved a working kernel for UniversalPython. It works line-by-line while maintaining the program memory. This also gave us a visual interface for the language to test it thoroughly.
133+
To demonstrate the ease by which plugins can be made for UniversalPython, we made a wrapper for the IPython kernel in which we imported UniversalPython as a package, and processed the code (i.e. translated it from Urdu to English) before it was passed onto IPython. This way, we overrode the functions and achieved a working kernel for UniversalPython. It works line-by-line while maintaining the program memory. This also gave us a visual interface for the language to test it thoroughly.
108134

109135
The Urdu dictionary is a YAML file containing mappings from Urdu to English keywords, is used for translation. The PLY library allows reserving keywords for automatic tokenization and replacement with their English equivalents. Additionally, grammar rules are set to ignore content within quotes, comments, and Urdu numbers (which lie on the Unicode scale).
110136

111137
\section{Experimentation}
112138

113-
We propose the following requirements which UniversalPython should meet in order than it is considered effective:
139+
We propose the following metrics which UniversalPython should meet in order than it is considered effective:
114140

115141
1) Programs which work in Python, should be recreatable in UniversalPython, and vice versa.
116142

117143
2) UniversalPython should operate as a reasonable speed which at least does not disturb the programmer.
118144

119-
3) Converting Plugins made for Python, into Urdu
120-
Python, should be fairly easy.
145+
3) UniversalPython should be able to translate from one non-English language to another
146+
147+
4) A benchmark should be made against other existing multilingual and non-English monolingual languages
148+
149+
5) A user experience test should be conducted to find out user acceptability towards a language in their native tongue.
121150

122151
\subsection{Benchmarks with Python}\label{AA}
123152

@@ -159,6 +188,7 @@ \section*{Acknowledgment}
159188
I would like to acknowledge my mentor, Dr Omer Beg, for always guiding me to the right path during my Bachelor's, professional life, and Master's degree.
160189

161190
\begin{thebibliography}{00}
191+
162192
\bibitem{b1} G. Eason, B. Noble, and I. N. Sneddon, ``On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,'' Phil. Trans. Roy. Soc. London, vol. A247, pp. 529--551, April 1955.
163193
\bibitem{b2} J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68--73.
164194
\bibitem{b3} I. S. Jacobs and C. P. Bean, ``Fine particles, thin films and exchange anisotropy,'' in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271--350.

0 commit comments

Comments
 (0)
X Tutup