DB_NAME: 
Myanmar (Burmese) Name Romanization with Alignment on Grapheme-Level


DB_CREATOR:
University of Computer Studies, Yangon (UCSY)


DB_LICENSE:
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
https://creativecommons.org/licenses/by-nc-sa/4.0/


DB_CONTENTS:

2,335 Myanmar names with corresponding Romanization are collected in myanmarname.txt.

The data are organized in the following format.
[original Myanmar name] ||| [original Romanization] ||| [aligned Myanmar/Latin graphemes]
The alignment is completely monotonic and one-to-one.

The syllables in Myanmar names are segmented to tokens of onset, rhyme, and tone, with inserted @ marks as "dummy vowels" for grapheme-level alignment. The corresponding Romanization for each Myanmar name is segmented and aligned to corresponding Myanmar tokens. The @ mark in the Romanization indicates the corresponding Myanmar token is silent.

The segmentation and @-insertion for Myanmar names are conducted by decisive rules.
The attached my-ort.py scripts can be used to segment and recover the Myanmar names.

Usage of my-ort.py:

my-ort.py seg < [original Myanmar name] > [segmented Myanmar name]
my-ort.py rec < [segmented Myanmar name] > [original Myanmar name]

Examples:

echo "ဒေါ်ဝင်းပပ" | my-ort.py seg | sed 's/ /, /g'
ဒ, ေါ်, ဝ, င်, း, ပ, @, ပ, @

echo "ဒ ေါ် ဝ င် း ပ @ ပ @" | my-ort.py rec
ဒေါ်ဝင်းပပ

The corresponding line of the example in myanmarname.txt is:
ဒေါ်ဝင်းပပ ||| Daw Win Pa Pa ||| ဒ/D ေါ်/aw ဝ/W င်/in း/@ ပ/P @/a ပ/P @/a
my-ort.py conducts transformation between the first column and the Myanmar tokens in the third collumn.
my-ort.py has been tested under Python-2.6/-2.7. For Python-3.x, please refer to the codes for details.