Home Overview Localization of PO and MO files
Localization of PO and MO files PDF Print E-mail

PO and MO files are the files of the Gettext library, which is commonly used in free software. Besides the implementation for the standard C++, there are implementations of the library for a lot of programming languages: PHP, Phyton, Perl, Pascal, Java and many others. The description of the Gettext library is available at:

http://www.gnu.org/software/gettext/manual/gettext.html

Strings in PO and MO files are stored as lists of entries. Each entry contains the fields msgid — original string — and msgstr — translation. The first record in the msgid field contains a blank line; in  msgstr that's the header. The header is a field set. The field Content-Type contains the name of the file's encoding.

PO files are textual; MO files are binary. MO files are obtained by compiling the PO files. The MO files contain the subsets of fields from the PO files; they do not store flags, comments, links and obsolete entries. PO files do not contain the BOM (Byte Order Mask) signature at the beginning of the file; therefore, the UTF-16 and UTF-32 encodings in PO files and, respectively, in MO files are not supported. The file header does not include information on the language; therefore, in <%APP%> msgid and msgstr languages must be set in the file properties dialog on the source resources tab.

Plurals

Support for plurals is the primary difference between the Gettext library and the other localization tools. Support for plurals improves the perception of the text. For example, instead of the string:

"% day(s) ago"

you could specify two strings:

"%d day ago"
"%d days ago"

The first string is for singular, and the second one is for plural. While English has only one form of plural, other languages may have several of those. For example, Russian uses three forms:

"%d день назад"  (1, 11, 21, etc.)
"%d дня назад"   (2..4, 22..24, 32..34, etc.)
"%d дней назад"  (other numbers)

Under certain conditions, the singular string may have no %d specifier at all:
"один день назад"

For the original language, the Gettext library supports only the languages with a single form of plural. In the vast majority of cases, that's English. The singular string is specified in the msgid field; the plural one – in msgid_plural. In the translation, the string options are specified in the msgstr field with an index:

msgid "%d day ago"
msgid_plural "%d days ago"
msgstr[0] "%d день назад"  
msgstr[1] "%d дня назад"   
msgstr[2] "%d дней назад"  

In the application, the translation option is selected by the ngettext function from the Gettext library. For one of the parameters, the program passes to the function the number to be placed instead of the %d specifier in the string. Normally that number is designated with the n character. The function ngettext reads the formula in the target language from the Plural-Forms field of the PO(MO) file's header for calculating the index of the msgstr string. The specified n is substituted to that formula, then the formula is calculated. The result of that is the index of the msgstr field, which the translation is taken from. Here is what the Plural-Forms field normally looks like for Russian:

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%"

where: nplurals is the number of options for plural, and plural is the formula for calculating the index.

Format Specifiers

Another distinctive feature of PO files is the support for several format specifier types for each string. The specifier type is specified in the flags field. It may include several specifier types at once; i.e. the string can be simultaneously used by several applications written in different languages. Those languages, of course, must support similar specifier formats. When checking the validity, it is necessary to check the strings for all the specified formats.

Support for PO and MO files in Radialix

Radialix supports all the above specified features of the Gettext library files.

Radialix implements:

  • Formula editor for calculating the index of the msgstr string
  • String editor for plurals
  • String validation for plurals
  • Checking string specifiers for compliance with several formats simultaneously
  • Automatic filling of file header with data from project information
  • Displaying links and comments from PO files

String Extraction Settings

Settings for extracting strings from PO and MO files are displayed on the Source Resources tab in the file properties dialog.

POSourceSettings

Radialix allows selecting which strings are to be extracted - msgid and msgstr selection options. If both the fields are selected, the translation will be imported when extracting the resources and when adding new strings when updating the resources.

Radialix can automatically detect the encoding of the file by reading the file header - the <Auto> item on the Encoding list. The obtained encoding name appears on the same list following the <Auto> tag. The string conversion result appears in the Preview grid; error count - in the Character conversion errors.

The initial string extraction settings can be made in the new project properties dialog, in the Parsers>Gettext Files section of the Project>Default Project Properties menu.

Target File Settings

Radialix supports the creation of both MO and PO files - the Target Build Action item in the file properties dialog on the Target Settings tab.

POTargetSettings

In the target settings, you can enable the automatic filling of the header with data from project information - the Options tab; make plural form settings and file encoding - the Plurals and Encodings tabs respectively. The Byte order option is used only for creating MO files. The <Default> value stands for the byte order in the original file – if it is a MO file – or the LittleEndian order – if otherwise.

The plural forms settings (number of forms and formula for calculating string index) can be set in the Language-Formula grid. To obtain the settings for the required language, <%APP%> searches the grid beginning with the first record. When the language match is found or when the entry tag contains <Any Language>, the search halts, and the program uses the parameters set in that entry. If no suitable entry is found, an error message will be displayed when creating the target files.

POPluralList

You can add data to the grid by using the Insert button or by entering the data in the last row of  the grid, instead of the Select Language label. To move the entries up or down, use the mouse or the Move Up and Move Down buttons. Holding the Ctrl key pressed while dragging an entry creates a duplicate of the entry. When searching entries, a language without a specified country equals to any country; therefore, such entry should be placed below the languages with a country specified. Respectively, the entry with the <Any Language> tag must be placed at the end of the grid, as the entries below that one are not included in the search. The switch in the Language column excludes the entries from the search. To switch to the entry editing mode, simply double-click on the required grid cell. The plural form settings can be edited in the plural formula editor.

For new files, the encoding is also set as a grid with Language as the first column. Similarly Radialix tries to find the encoding. The <Default> encoding stands for the encoding of the original file if it doesn't use UTF-8 or, otherwise, the default encoding for the target language.

EncodingList

Just as with the string extraction settings, the initial setting values for the new files can be edited in the new project properties dialog, in the Parsers>Gettext Files section of the Project>Default Project Properties menu.

Plural Formula Editor

The plural formula editor is used for the number of forms and the formula for calculating the index of the msgstr string. The formula is a C expression, which has one integer parameter n. This is a numeric argument, which is substituted to the string. For example, in the strings

msgstr[0] "%d день назад"  
msgstr[1] "%d дня назад"   
msgstr[2] "%d дней назад"  

the n value will be substituted instead of the %d specifier. Here is the result:

"1 день назад" (21, 31, 41 etc.)
"2 дня назад"  (3, 4, 22..24, 32..34 etc.)
"5 дней назад" (0, 6, 7, 8 etc.)

The formula calculation result appears in the Preview grid. The Index column is the formula calculation result, the n column is the value of the argument. The grid contains the calculation results for n from 0 through 1000. Information on formula compilation and index calculation errors is shown in the messages.

POPluralDialog

Editing Strings in PO and MO Files

In PO and MO files, strings appear in a grid of strings in the ENTRIES resource; obsolete strings appear in the OBSOLETE resource. The OBSOLETE strings are not stored in MO files and by default have the Read-only attribute.

The comments and the reference cont are displayed in the respective columns of the grid. To view a reference, double-click on the cell in the Reference column.

Strings that have plural forms are displayed as substrings of the same string and are separated with the zero character \0x00 from one another. Such strings can be edited right in the cells of the strings grid.

Plural_String

 

However, with a lot more convenience the same can be done in the string editor, which can be opened by pressing the F2 key or by selecting the corresponding command on the popup menu. In the editor, the translation is entered in the String column of the Translation grid. The # column displays the index of the msgstr string, and the n column shows the value of the string selection parameter. The additional editor commands (copying string, inserting characters, etc.) are available on the popup menu.

PluralStringEditor

 

Just as for the regular strings, inserting hot-key markers, maintaining the first and last characters of the string, and the automatic translation are also supported for plural strings. When performing the automatic translation, the program automatically defines the index of the msgstr string that matches the singular and inserts the translation of the msgid into it. To the rest of the strings, it inserts the automatic translation of the msgid_plural string. All of the string forms are stored in the translation memory as a single entry that consists of substrings separated with the zero character \0x00.

Plural Validation

The option for checking the number of plurals is available in the project validation settings in the Strings/Text section. This option is always enabled and is not available for editing.

PluralFormChecker

The validation is carried out in compliance with the number of plurals specified in the target file settings, in the file properties.