DB_NAME: Myanmar (Burmese) Name Romanization with Alignment on Grapheme-Level DB_CREATOR: University of Computer Studies, Yangon (UCSY) DB_LICENSE: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/ DB_CONTENTS: 2,335 Myanmar names with corresponding Romanization are collected in myanmarname.txt. The data are organized in the following format. [original Myanmar name] ||| [original Romanization] ||| [aligned Myanmar/Latin graphemes] The alignment is completely monotonic and one-to-one. The syllables in Myanmar names are segmented to tokens of onset, rhyme, and tone, with inserted @ marks as "dummy vowels" for grapheme-level alignment. The corresponding Romanization for each Myanmar name is segmented and aligned to corresponding Myanmar tokens. The @ mark in the Romanization indicates the corresponding Myanmar token is silent. The segmentation and @-insertion for Myanmar names are conducted by decisive rules. The attached my-ort.py scripts can be used to segment and recover the Myanmar names. Usage of my-ort.py: my-ort.py seg < [original Myanmar name] > [segmented Myanmar name] my-ort.py rec < [segmented Myanmar name] > [original Myanmar name] Examples: echo "ဒေါ်ဝင်းပပ" | my-ort.py seg | sed 's/ /, /g' ဒ, ေါ်, ဝ, င်, း, ပ, @, ပ, @ echo "ဒ ေါ် ဝ င် း ပ @ ပ @" | my-ort.py rec ဒေါ်ဝင်းပပ The corresponding line of the example in myanmarname.txt is: ဒေါ်ဝင်းပပ ||| Daw Win Pa Pa ||| ဒ/D ေါ်/aw ဝ/W င်/in း/@ ပ/P @/a ပ/P @/a my-ort.py conducts transformation between the first column and the Myanmar tokens in the third collumn. my-ort.py has been tested under Python-2.6/-2.7. For Python-3.x, please refer to the codes for details.