Working with external markup tables

This chapter provides the information required to work with external markup tables. It describes the format of external markup tables so that you can modify them or create new ones. The user exit mechanism of markup tables and its entry points are described to allow for customized processing of documents at different stages. Finally, a parser application programming interface provides some of OpenTM2’s internal functions to expand the possibilities of user exits.

The contents of external markup tables are described in terms of the SGML syntax. You should be familiar with SGML to modify or create markup tables. For a complete description of SGML refer toISO 8879, Information Processing – Text and Office Systems – Standard Generalized Markup Language (SGML).

[—ATOC—]
[—TAG:h3—]

 

Creating new markup tables

You can create your own markup table by exporting an existing markup table in external SGML format, modifying it with any text editor, and importing it back into OpenTM2 under a different name. Markup tables need to be available in an SGML-based format to be imported into OpenTM2. Notice that an exported markup table contains only the nondefault entries.

To become familiar with the content of markup tables you might want to export a markup table and study it before you create a new markup table. See Exporting a markup table for details.

When you have exported one of the markup tables provided by OpenTM2 you might see a second tag in the second line <SEGMENTEXIT> userexit </SEGMENTEXIT>. userexit is the name of the dynamic-link library (DLL) containing the user exit code. This tag is only required if a user exit is to be used. For more information, refer to Creating user exits for markup tables.

Layout and content of a markup table

The general layout and content of a markup table are as follows:

  • A markup table must begin with a <TAGTABLE> tag and end with a </TAGTABLE> tag.
  • Following the <TAGTABLE> tag are header tags that are descriptive or of general purpose for the markup table. These header tags do not declare individual markup data. You can use them to give the markup table a name and a description, to specify a character set for conversion, or to specify substitution characters. Header tags in a markup table are optional. See Table 9 for a list of allowed header tags and a detailed description.
An example of a header tag in a markup table is <DESCRNAME>descriptive name</DESCRNAME>, which lets you specify a name for the markup table that is different from its file name.
  • Next, a list of markup tag definitions follows. These definitions are the core of a markup table. Each definition describes a specific formatting tag, for example, a header tag, or a soft line feed. The definition always includes the name of the markup tag, and either its length or the delimiting characters. A markup tag definition can include further information, for example, whether the text associated with a markup tag needs to be translated. See Table 10 for a list of allowed tags to define a markup tag in detail.
A single markup tag definition always starts with the start tag <TAG> and ends with the corresponding end tag </TAG>. An example of a markup tag definition is:
<TAG>
 <STRING>[soft line feed]</STRING>
 <LENGTH>16</LENGTH>
 <TYPE>STNEUTRAL</TYPE>
 <SEGINFO>SEGNEUTRAL</SEGINFO>
</TAG>
which defines the markup of a soft line feed. The keyword [soft line feed] is defined as <STRING>[soft line feed]</STRING> and has a length of 16 characters. <TYPE>STNEUTRAL</TYPE> specifies that this markup tag has no influence on segmenting, and <SEGINFO>SEGNEUTRAL</SEGINFO> specifies that this markup tag does not influence the segmenting status.
  • Markup tags often have attributes that specify additional characteristics. For example, a markup tag for tables and figures in a document might use a width attribute to specify the width of the element. You need to define all attributes of a markup language in your markup table as well. The definition of attributes is similar to the definition of markup tags, except that each attribute definition is enclosed between the <ATTRIBUTE> and </ATTRIBUTE> tags. See Table 10 for a list of allowed tags to define an attribute in detail.
An example of an attribut definition is:
<ATTRIBUTE>
<STRING>WIDTH=%</STRING>
<ENDDELIM>' .\r\n'</ENDDELIM>
</ATTRIBUTE>
which defines the markup of a WIDTH attribute. Here, you will notice that the keyword WIDTH is supposed to be delimited by one of four delimiting characters, as opposed to the previous example, where an explicit length is specified.

In summary, a markup table has the following layout:

<TAGTABLE>
Header tags, as required
<TAG>
markup tag definition
</TAG>
<TAG>
markup tag definition
</TAG>
<ATTRIBUTE>
attribute definition (optional)
</ATTRIBUTE>
â‹®
<ATTRIBUTE>
attribute definition (optional)
</ATTRIBUTE>
</TAGTABLE>

Notice that all entries use the SGML syntax. All SGML tags must be enclosed in “<” and “>”. There are always a start tag and an end tag.

Your markup table can contain up to 1000 entries.

An SGML markup tag or attribute must be at least specified with STRING and ENDDELIM , or STRING and LENGTH .

After you have edited the markup table, you can import it into OpenTM2. If you import it into an existing markup table, this table is overwritten.

Substitution characters in a markup table

Your markup tag and attribute definitions in a markup table might require that you specify variable parts. An example is the definition of the WIDTH attribute in the previous section (<STRING>WIDTH=%</STRING>). Because a document can contain any value for the WIDTH attribute, the percentage sign % is used as a substitution character.

You can use the following two substitution characters in a markup table:

  • The percentage character (%) substitutes any number of characters.
  • The question mark (?) substitutes a single character.

The substitution characters do not distinguish between numeric and alphabetic characters.

Note that these substitution characters can be redefined in the markup table header.

SGML tags for markup table header

The following table contains the definition of the SGML tags that you can use in a markup table header.

Table 9. SGML tags for markup table header

SGML tag Definition
DESCRIPTION Specifies a markup table description, which is shown in the “Markup Table Properties” window and the “Markup Table List” window.
DESCRNAME Specifies a descriptive name for this markup table. For example, the specification of<DESCRNAME>ASCII</DESCRNAME> in the markup table EQFASCII would give it the name ASCII. If nothing is specified, the file name of the markup table is used.
CHARSET Specifies the character set to be used for import and export of documents that use this markup table. The documents will be converted using the selected character set without the need to do the conversion in a user exit. Specify one of the following character sets:

  • ASCII
  • ANSI
  • UTF8
  • UNICODE
SINGLESUBST Specifies the substitution character to use for single character substitution. The default character is ?.
MULTSUBST Specifies the substitution character to use for multiple character substitution. The default character is % .
USEUNICODE Specifies whether segmented source and target files in subdirectories SSOURCE and STARGET are stored in Unicode UTF-16 format. Specify one of the following:

  • YES
  • NO This is the default.
REFLOW Specifies whether CRLF are allowed to be changed during translation or not. EQFMRI is an example of a markup where RELOW is specified and set to NO. Specify one of the following:

  • YES This is the default.
  • NO
SEGMENTEXIT Contains the name of the user exit, if the markup table uses one.

SGML tags for markup tags and markup attributes

The following table contains the definition of the SGML tags that you can use to define markup tags and markup attributes in a markup table.
Table 10. SGML tags for markup tags and markup attributes

SGML tag Definition
STRING Specifies the name of the markup tag or markup attribute. The specification of STRING is required for an entry in the markup table.
ENDDELIM Specifies one character as end delimiter of the markup tag or markup attribute, if it has any. You can enter more than one end delimiter. OpenTM2 checks for all possible string combinations to determine the end of the tag or attribute. A string as end delimiter is not possible.When a tag or attribute has an end delimiter, the specification of its length is omitted or can be set to 0. If a tag or attribute has no end delimiter, its length must be specified.The specification of ENDDELIM is required for an entry in the markup table, if LENGTH is not defined.
LENGTH Defines the length of a markup tag or markup attribute. It must be specified only if the length of the tag or attribute cannot be determined by a delimiter specified by ENDDELIM.
COLPOSITION Specifies the column position where the markup tag starts. If a markup tag has no special start position and can occur anywhere in a line, COLPOSITION is omitted or can be set to 0. The default is 0.
TYPE Defines the type of the markup tag. If TYPE is not specified, STDEL is taken as the default.The following types are possible:

  • STDEL
Indicates the start of a new text segment.
  • ENDDEL
Indicates the end of a text segment.
  • SELFC
The markup tag is self-contained, that is, it is a text segment by itself.
  • STNEUTRAL
The markup tag is a start tag, which has no influence on segmenting.
  • ENDNEUTRAL
The markup tag is an end tag, which has no influence on segmenting.
SEGINFO Determines whether the text following the markup tag is to be segmented. If SEGINFO is not specified, SEGNEUTRAL is taken as the default.

  • SEGOFF
Sets segmenting off, that is, no segmentation is done until the next markup tag is found that sets segmenting on again. If two tags follow each other that set segmenting off, it needs two tags that set segmenting on to start segmentation again.
  • SEGON
Sets segmenting on again.
  • SEGNEUTRAL
Does not influence the segmenting status.
  • SEGRESET
Resets the segmenting status to on, even if the segmenting level requires more than one SEGON tag to set segmentation on.
  • PROTECTON
All following text, including segmentation control flags, is protected until a markup tag with PROTECTOFF is encountered.
  • PROTECTOFF
Turns off text protection. The following text is handled using normal segmentation rules.
ASSTEXT Defines types of text following the markup tag. If ASSTEXT is not specified, NOEXPL is taken as the default.

  • TSNL
Text follows on the same or the next line and will be associated with the markup tag.
  • TSL
Text follows on the same line and will be associated with the makeup tag.
  • NOEXPL
No special processing for associated text is required.
ADDINFO Specifies whether specific text is to be ignored when segments are aligned during the creation of an Initial Translation Memory :

  • 4 Marks the start of an area to be ignored.
  • 6 Marks the start of an area to be partly ignored. This applies to tags containing a % sign, for example HEADER]%.
  • 8 Marks the end of an area to be ignored.
  • 10 Marks the end of an area to be partly ignored. This applies to tags containing a % sign, for example HEADER %.
CLASSID Specifies how the contents of STRING is handled. The only class is CLS_HEAD This means that the text specified for STRING becomes an entry of the table of contents that you can display during the translation of a document using the Special go to… dialog.
ATTRINFO Specifies whether a markup tag has attached attributes (YES/NO). NO is the default. If YES is specified, the ATTRIBUTE SGML tag must be used to specify the attributes.
TRANSLATEINFO Specifies whether the segment associated with the markup tag or markup attribute must be translated or not (YES/NO). If TRANSLATEINFO is not specified, NO is taken as the default.

Examples of markup data and corresponding markup tags

If a document contains, for example, [soft line feed] as markup data, it is usually meant as a so-called inline tag, which means that it is contained in the segment. It has no influence on the segmentation of the document. The corresponding markup tag definition in a markup table looks as follows:

<TAG>
 <STRING>[soft line feed]</STRING>
 <LENGTH>16</LENGTH>
 <TYPE>STNEUTRAL</TYPE>
 <SEGINFO>SEGNEUTRAL</SEGINFO>
</TAG>

<STRING>… defines the markup string, and <LENGTH>… specifies its length. Because the length is specified, no ENDDELIM tag is required. <TYPE>STNEUTRAL<… defines that this markup string has no influence on segmentation. All other markup table SGML tags will be set to the default and therefore need not be specified.

Assumed that such markup tag causes segmentation, we define this as follows:

<TAG>
 <STRING>[soft line feed]</STRING>
 <LENGTH>16</LENGTH>
 <TYPE>STDEL</TYPE>
 <SEGINFO>SEGNEUTRAL</SEGINFO>
</TAG>

The following table lists some imaginary markup data with a description.

Markup data Definition
[bold] text [⁄bold] The text following this tag (until the end tag) is printed bold; this tag is part of the segment and has no influence on segmenting.
[Heading x ]text This tag describes a heading; the heading text must follow on the same line; x is the level of heading and goes from 1 to 9; this tag ends the previous segment and starts a new segment.
[page: even] A page break; the following text starts on an even page; this tag always starts on the first column and has no text following in the same line; a blank must separate the attribute even from the tag.
[page: odd] A page break; the following text starts on an odd page; this tag always starts on the first column and has no text following in the same line; a blank must separate the attribute odd from the tag.
[paragraph] A paragraph; this tag ends the previous segment and starts a new segment; the tag occurs at the end of the previous paragraph.
 % Stands for any number of characters. For example, in b%, % stands for the characters old.
[break] Starts a new segment. You use this tag to split an existing segment into two or more segments.
[*%] * indicates the start of a comment and % stands for the comment text.

This markup data would lead to the following markup table definitions. The defaults will not be shown.

Markup definition Explanation
<TAG>
 <STRING>[bold]</STRING>
 <LENGTH>6</LENGTH>
 <TYPE>STNEUTRAL</TYPE>
</TAG>

or

<TAG>
 <STRING>[bold</STRING>
 <ENDDELIM>]</ENDDELIM>
 <TYPE>STNEUTRAL</TYPE>
</TAG>

or

<TAG>
 <STRING>[b%</STRING>
 <ENDDELIM>]</ENDDELIM>
 <TYPE>STNEUTRAL</TYPE>
</TAG></nowiki>
The markup tag should be part of the segment, therefore STNEUTRAL is used. All examples have the same result, you can specify this markup tag by its length or end delimiter. You can also substitute part of the inline tag by %.
<TAG>
 <STRING>[Heading ?</STRING>
 <ENDDELIM>]</ENDDELIM>
 <SEGINFO>SEGRESET</SEGINFO>
 <ASSTEXT>TSL</ASSTEXT>
 <TRANSLATEINFO>YES</TRANSLATEINFO>
</TAG>
Single substitution is used for the heading level; the end of the tag is ]; the heading requires the reset of segmenting with SEGRESET; the text associated with the tag occurs on the same line; the text associated with the tag is translatable.
<TAG>
 <STRING>[page:</STRING>
 <ENDDELIM> </ENDDELIM>
 <ATTRINFO>YES</ATTRINFO>
 <COLPOSITION>1</COLPOSITION>
</TAG>
The markup tag ends with a blank; attributes may follow; the tag always starts at the first column in a line.
<TAG>
 <STRING>[paragraph</STRING>
 <ENDDELIM>]</ENDDELIM>
 <TYPE>ENDDEL</TYPE>
</TAG>

or

<TAG>
 <STRING>[paragraph]</STRING>
 <LENGTH>11</LENGTH>
 <TYPE>ENDDEL</TYPE>
</TAG>
The tag ends with ] or is defined by its length; the tag should end the previous segment, therefore ENDDEL is used.
<ATTRIBUTE>
 <STRING>even</STRING>
 <ENDDELIM>]</ENDDELIM>
</ATTRIBUTE>
This is an attribute; it ends with ].
<ATTRIBUTE> 
 <STRING>odd</STRING>
 <ENDDELIM>]</ENDDELIM>
</ATTRIBUTE>
This is an attribute; it ends with ].
<TAG>
 <STRING>[break]</STRING> 
 <LENGTH>7</LENGTH>
 <TYPE>STDEL</TYPE>
</TAG>
Indicates that a new segment starts.
<TAG>
 <STRING>*%</STRING>
 <ENDDELIM>\r\n/ENDDELIM>
 <COLPOSITION>1</COLPOSITION>
</TAG>
Indicates a comment that ends at the end of the line. COLPOSITION defines that the asterisk is only recognized as the start of a comment if it appears in the first column of a line.

Creating user exits for markup tables

There are document formats that require a user exit for their markup table:

  • Binary documents, for example Microsoft (R) Word for Windows (R) documents
  • Documents that require code page conversion, for example ANSI documents
  • Documents that have a fixed record layout
  • Documents that contain nontranslatable text parts, for example, RTF documents
  • Binary documents like Lotus Notes database files and template files that require context-dependent processing.

OpenTM2 provides two markup tables that are already combined with a user exit:

  • The user exit part of the EQFHTML4 markup table converts the code page and preprocesses JavaScripts to limit segments to 2048 characters. The markup table part controls text segmentation and the recognition of inline tags.
  • The user exit part of the EQFANSI markup table converts the code page, and the markup table part inserts segment breaks after empty lines.

In addition, OpenTM2 provides a user exit that you can use with the appropriate markup table. This user exit is a dynamic-link library (DLL) with predefined entry points. The code for the exit can be written in any programming language that supports PASCAL calling conventions. The include file EQF_API.H contains the definitions required for a user exit written in C.

The user exit is activated using the <SEGMENTEXIT> tag of the markup table (see also Segment exit described in Creating new markup tables).

General user exit entry points

The user exit entry points (their names start with EQF) are called at different stages during the analysis, translation, and export of a document.

  • During the analysis (see Figure 164):
    • EQFPRESEG2 is called before the text is segmented. It can be used to preprocess a document and decide whether text segmentation is done by OpenTM2 after EQFPRESEG2.
    • EQFPOSTSEGW is called after the text is segmented. It can be used to postprocess a document.
    • EQFPOSTTMW is called after Translation Memory matches are processed and terms lists are created. It can be used to modify segments.

Eqfb7s19a.GIF

Figure 164. Analysis of a document using the user exit

 

  • During the translation:
  • EQFCHECKSEGW is called after a segment is translated but before it is saved in the Translation Memory. It can be used to modify a segment.
  • EQFSHOW is called when the user selects the “Show translation” menu item.
  • During the export (see Figure 165):
  • EQFPREUNSEGW is called before OpenTM2 removes the segmentation from a document. It can be used for the same purpose, or whatever is required at this step.
  • EQFPOSTUNSEG2 is called after OpenTM2 (or EQFPREUNSEG2) removed the segmentation. It can be used, for example, to establish the external document format.
  • Alternatively, EQFPOSTUNSEGWcan be called after OpenTM2 (or EQFPREUNSEG2) removed the segmentation. If EQFPOSTUNSEGW entry point exists, OpenTM2 uses EQFPOSTUNSEGW, without regard of the existence of EQFPOSTUNSEG2. EQFPOSTUNSEGW requires that the input text is always UTF16. If EQFPOSTUNSEGW entry point exists, OpenTM2s’ “Undo text segmentation” step outputs an UTF16 file.

Eqfb7s19b.GIF

Figure 165. Export of a document using the user exit
The following sections describe the individual entry points in detail. Note that entry points from earlier versions of OpenTM2 (without the trailing letter W) are supported, and the calling syntax remains unchanged. However, you should use the entry points as listed in this section. See Compatibility notes concerning Unicode support for details.

EQFPRESEG2

Purpose

EQFPRESEG2 is called during the analysis of a document before the text is segmented. It preprocesses the document, for example converts code pages, and decides whether text segmentation is done by OpenTM2 or EQFPRESEG2 itself. If an error occurs, it can stop the analysis.

Format

Chap22 Figure1.gif

Parameters
  • MarkupTable
The pointer to the name of a markup table.
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path.
  • SourceFile
The pointer to the name of the source file (with full path).
  • Buffer
The pointer to the buffer containing the name of the temporary output file.
  • OutputFlag
The output flag indicating whether the text is to be segmented by EQFPRESEG2 instead of OpenTM2.
  • SliderWindowHandle
The handle of the slider window.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPRESEGEX

Purpose

EQFPRESEGEX is called during the analysis of a document before the text is segmented. It preprocesses the document, for example converts code pages, and decides whether text segmentation is done by OpenTM2 or EQFPRESEGEX itself. If an error occurs, it can stop the analysis. The EQFPRESEGEX entry point is identical to EQFPRESEG2 except for the additional parameter Analsysis handle.

Format

Chap22 Figure2.gif

Parameters
  • MarkupTable
The pointer to the name of a markup table.
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path.
  • SourceFile
The pointer to the name of the source file (with full path).
  • Buffer
The pointer to the buffer containing the name of the temporary output file.
  • OutputFlag
The output flag indicating whether the text is to be segmented by EQFPRESEGEX instead of OpenTM2.
  • SliderWindowHandle
The handle of the slider window.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.
  • AnalysisHandle
The analysis handle. This handle is required for the API calls EQFSETTAOPTIONS and EQFGETTAOPTIONS.

EQFPOSTSEGW

Purpose

EQFPOSTSEGW is called during the analysis of a document after the text is segmented. It postprocesses the document, for example adjusts segment boundaries. If an error occurs, it can stop the analysis.

Format

Chap22 Figure3.gif

Parameters
  • MarkupTable
The pointer to the name of a markup table.
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path.
  • SourceFile
The pointer to the name of the source file (with full path).
  • TargetFile
The pointer to the name of the target file.
  • SegmentationTags
The pointer to the tags inserted during text segmentation.
  • SliderWindowHandle
The handle of the slider window.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPOSTSEGWEX

Purpose

EQFPOSTSEGWEX is called during the analysis of a document after the text is segmented. It postprocesses the document, for example adjusts segment boundaries. If an error occurs, it can stop the analysis. The EQFPOSTSEGWEX entry point is identical to EQFPOSTSEGW except for the additional parameter Analysis handle.

Format

Chap22 Figure4.gif

Parameters
  • MarkupTable
The pointer to the name of a markup table.
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path.
  • SourceFile
The pointer to the name of the source file (with full path).
  • TargetFile
The pointer to the name of the target file.
  • SegmentationTags
The pointer to the tags inserted during text segmentation.
  • SliderWindowHandle
The handle of the slider window.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.
  • AnalysisHandle
The analysis handle. This handle is required for the API calls EQFSETTAOPTIONS and EQFGETTAOPTIONS.

EQFPOSTTMW

Purpose

EQFPOSTTMW is called during the analysis of a document after Translation Memory matches have been inserted and terms lists have been created. It is used to modify the segments. If an error occurs, it can stop the analysis.

Format

Chap22 Figure5.gif

Parameters
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path.
  • SegmentedSourceFile
The pointer to the name of the segmented source file.
  • SegmentedTargetFile
The pointer to the name of the segmented target file.
  • SegmentationTags
The pointer to the tags inserted during text segmentation.
  • SourceTargetFlag
The flag indicating if the segmented source differs from the segmented target.
  • SliderWindowHandle
The handle of the slider window.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFCHECKSEGW

Purpose

proEQFCHECKSEGW is called during the translation of a document after a segment has been translated but not saved yet in the Translation Memory. It can modify the segment, for example change lowercase characters to uppercase, and prevent the segment from being saved, for example if specific length limits have been exceeded.

EQFCHECKSEGW is also called when exact matches are automatically substituted during the analysis of a document.

Format

Chap22 Figure6.gif

Parameters
  • PreviousSourceSegment
The pointer to the text of the previous source segment.
  • CurrentSourceSegment
The pointer to the text of the current source segment.
  • Translation
The pointer to the translation of the current segment.
  • ModifyFlag
The pointer to the flag that is set when the user exit has modified the translated segment.
  • MessageFlag
The flag indicating whether a message box is shown.
Return code

The return code indicates if the segment can be saved.

EQFSHOW

Purpose

EQFSHOW is called during the translation of a document when the user selects the “Show Translation” menu item. It is up to the user exit to prepare and display the document in a window. The user exit can use the API calls EQFGETNEXTSEG, EQFGETNEXTSEGW, EQFGETPREVSEG, EQFGETPREVSEGW, EQFGETCURSEG, EQFGETCURSEGW and EQFGETINFO to retrieve the document segments and to get other document information.

Format

Chap22 Figure7.gif

Parameters
  • lInfo
A handle to the target document. This handle has to be specified in the API calls for accessing the segment text.
  • hwndParent
The handle of the window which should be specified as parent window for the window displaying the document.
Return code

The user exit should return TRUE if the document could be displayed and FALSE in case of errors.

EQFGETCURSEG

Purpose

EQFGETCURSEG returns a specific segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer as a zero terminated string. The variable pointed to by pusSegNum contains the number of the requested segment.

Format

Chap22 Figure8.gif

Parameters
  • lInfo
The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.
  • pusSegNum
The pointer to a ULONG variable containing the segment number.
  • pBuffer
The pointer to a buffer for the segment text.
  • pusBufSize
The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.
Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETCURSEGW

Purpose

EQFGETCURSEGW returns a specific segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer in UTF16-encoding and is terminated by 0x0000. The variable pointed to by pulSegNum contains the number of the requested segment.

Format

Chap22 Figure9.gif

Parameters
  • lInfo
The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.
  • pulSegNum
The pointer to a ULONG variable containing the segment number.
  • pBuffer
The pointer to a buffer for the segment text in UTF-16 encoding.
  • pusBufSize
The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer in number of UTF-16 characters.
Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETNEXTSEG

Purpose

EQFGETNEXTSEG returns the next segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer as a zero-terminated string. The API call increments the segment number automatically.

Format

Chap22 Figure10.gif

Parameters
  • lInfo
The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.
  • pusSegNum
The pointer to a USHORT variable containing the segment number. This variable should be set to 1 before the first call. The segment number is automatically incremented.
  • pBuffer
The pointer to a buffer for the segment text.
  • pusBufSize
The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.
Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETNEXTSEGW

Purpose

EQFGETNEXTSEGW returns the next segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer in UTF-16 encoding and is terminated by 0x0000. The API call increments the segment number automatically.

Format

Chap22 Figure11.gif

Parameters
  • lInfo
The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.
  • pulSegNum
The pointer to a ULONG variable containing the segment number. This variable should be set to 1 before the first call. The segment number is automatically incremented.
  • pBuffer
The pointer to a buffer for the segment text in UTF-16 encoding.
  • pusBufSize
The pointer to a USHORT variable containing the size of the buffer in number of UTF-16 characters.
Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETPREVSEG

Purpose

EQFGETPREVSEG returns the previous segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer as a zero-terminated string. The API call decrements the segment number automatically.

Format

Chap22 Figure12.gif

Parameters
  • lInfo
The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.
  • pulSegNum
The pointer to a USHORT variable containing the segment number. The segment number is automatically decremented.
  • pBuffer
The pointer to a buffer for the segment text.
  • pusBufSize
The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.
Return code

The function returns zero if successful otherwise an error code is returned.

EQFGETPREVSEGW

Purpose

EQFGETPREVSEGW returns the previous segment from the document identified by the lInfo handle. The text of the segment is stored in the buffer pointed to by pBuffer in UTF16-encoding and is terminated by 0x0000. The API call decrements the segment number automatically.

Format

Chap22 Figure13.gif

Parameters
  • lInfo
The document handle which has been passed to the user exit as the first parameter of the EQFSHOW entry point.
  • pulSegNum
The pointer to a ULONG variable containing the segment number. The segment number is automatically decremented.
  • pBuffer
The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer in number of UTF-16 characters.
  • pusBufSize
The pointer to a USHORT variable containing the size of the buffer pointed to by pBuffer.
Return code

The function returns zero if successful otherwise an error code is returned.

EQFBUILDDOCPATH

Purpose

EQFBUILDDOCPATH creates the fully qualified file name for a OpenTM2 document using the folder object name and the document long name. This function can be used to access documents stored in OpenTM2 folders.

Format

Chap22 Figure14.gif

Parameters
  • szFolObjName
The folder object name as returned using EQFGETINFO with the GETINFO_FOLDEROBJECT ID.
  • szDocLongName
The document long name.
  • PathID
The ID of the requested document path, valid IDs are:PATHID_SOURCE to build the path to the source documentPATHID_SEGSOURCE to build the path to the segmented source documentPATHID_SEGTARGET to build the path to the segmented target documentPATHID_TARGET to build the path to the target document
  • pchBuffer
The pointer to a buffer receiving the fully qualified document path, the size of this buffer has to be at least 60 bytes.
Return code
  • 0
function completed successfully
  • ERROR_INVALID_PARAMETER
wrong or missing parameter
  • ERROR_PATH_NOT_FOUND
the folder did not exist
  • ERROR_FILE_NOT_FOUND
the document does not exist
Examples

The folder “AnotherTestFolder” contains the document “myTest.HTML”. The folder is located on drive “E:” and has a short name of “ANOTH000.F00”. The document short name is “MYTESTHT.000”. The primary drive of the OpenTM2 installation is “C:”.

EQFBUILDDOCPATH( “C:\EQF\ANOTH000.F00”, “myTest.HTML”, PATHID_SOURCE, szBuffer ) would return ” E:\EQF\ANOTH000.F00\SOURCE\ MYTESTHT.000″ in szBuffer.

EQFGETINFO

Purpose

EQFGETINFO returns specific on the document currently being processed in the EQFSHOW function of the user exit. This function is used by the user exit to get more information concerning the document and its location.

Format

Chap22 Figure15.gif

Parameters
  • lInfo
The info handle passed to the user exit in the EQFSHOW call.
  • InfoID
The ID of the requested information, valid IDs are:
GETINFO_MARKUP to retrieve the markup table of the document
GETINFO_FOLDEROBJECT to retrieve the object name of the folder containing the document
GETINFO_FOLDERLONGNAME to retrieve the long name (in ASCII) of the folder containing the document
GETINFO_DOCFULLPATH to retrieve the fully qualified path of the document segmented target file
GETINFO_DOCLONGNAME to retrieve the document long name
  • pchBuffer
The pointer to a buffer receiving the requested information, if this parameter is NULL the size of the requested information is returned using the pusBufSize parameter.
  • pusBufSize
The pointer to a USHORT value containing the buffer size, on return this value contains the size of the returned information.
Return code
  • 0
function completed successfully
  • ERROR_INVALID_PARAMETER
unknown InfoID or missing parameter
  • ERROR_INVALID_HANDLE
invalid lInfo handle
  • ERROR_NOT_ENOUGH_MEMORY
not enough memory / memory allocation failed
  • ERROR_INSUFFICIENT_BUFFER
buffer is not large enough for the returned information, *pusBufSize contains required buffer size
Examples

Assuming the document “myTest.HTML” located in folder “AnotherTestFolder” is opened using EQFSHOW. The folder is located on drive “E:” and has a short name of “ANOTH000.F00”. The document short name is “MYTESTHT.000”. The primary drive of the OpenTM2 installation is “C:”

  • usBufSize = sizeof(szBuffer);
EQFGETINFO( lInfo, GETINFO_MARKUP, szBuffer, &usBufSize) would return “IBMHTM32” in szBuffer
  • usBufSize = sizeof(szBuffer);
EQFGETINFO( lInfo, GETINFO_FOLDEROBJECT, szBuffer, &usBufSize) would return “C:\EQF\ANOTH000.F00” in szBuffer
  • usBufSize = sizeof(szBuffer);
EQFGETINFO( lInfo, GETINFO_FOLDERLONGNAME, szBuffer, &usBufSize ) would return “AnotherTestFolder” in szBuffer
  • usBufSize = sizeof(szBuffer);
EQFGETINFO( lInfo, GETINFO_DOCFULLPATH, szBuffer, &usBufSize ) would return “E:\EQF\ANOTH000.F00\STARGET\MYTESTHT.000” in szBuffer
  • usBufSize = sizeof(szBuffer);
EQFGETINFO( lInfo, GETINFO_DOCLONGNAME, szBuffer, &usBufSize ) would return “MyTest.HTML” in szBuffer

EQFPREUNSEGW

Purpose

EQFPREUNSEGW is called during the export of a document before the segmentation tags inserted by OpenTM2 are removed. It decides whether the segmentation tags are removed by OpenTM2 orEQFPREUNSEGW itself. However, it is normally used to remove the segmentation tags. If an error occurs, it can stop the export.

Format

Chap22 Figure16.gif

Parameters
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path.
  • SegmentedTargetFile
The pointer to the name of the segmented target file (with full path).
  • Buffer
The pointer to the buffer containing the name of the temporary output file.
  • SegmentationTags
The pointer to the tags inserted during text segmentation.
  • OutputFlag
The output flag indicating whether the segmentation tags are removed by EQFPREUNSEGW instead of OpenTM2.
  • SliderWindowHandle
The handle of the slider window.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPOSTUNSEGW

Purpose

EQFPOSTUNSEGW is called during the export of a document after the segmentation tags have been removed from the text. The text must be in UTF16. It is normally used to establish the external document format. If an error occurs, it can stop the export.

Format

Chap22 Figure17.gif

Parameters
  • MarkupTable
The pointer to the name of a markup table.
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path (with full path).
  • TargetFile
The pointer to the name of the target file (with full path).
  • SegmentationTags
The pointer to the tags inserted during text segmentation.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

EQFPOSTUNSEG2

Purpose

EQFPOSTUNSEG2 is called during the export of a document after the segmentation tags have been removed from the text. It is normally used to establish the external document format. If an error occurs, it can stop the export.

Format

Chap22 Figure18.gif

Parameters
  • MarkupTable
The pointer to the name of a markup table.
  • Editor
The pointer to the name of the editor.
  • Path
The pointer to the program path (with full path).
  • TargetFile
The pointer to the name of the target file (with full path).
  • SegmentationTags
The pointer to the tags inserted during text segmentation.
  • ReturnFlag
The pointer to the return flag. If this flag changes to TRUE, the user exit must return immediately.

API calls for user exits

This group contains the API calls which can be called by the markup table user exits to access and modify OpenTM2 settings. Currently these are

The following sections describe the individual API calls in detail.API calls for user exitsAPI calls for user exits

EQFGETTAOPTIONS

Purpose

EQFGETTAOPTIONS can be used by the markup table user exit to retrieve the currently active analysis settings. The settings are returned in an EQFTAOPTIONS structure. The analysis handle used by this call is passed to the user exit by the user exit entry points EQFPRESEGEX, and EQFPOSTSEGWEX.

Format

Chap22 Figure19.gif

Parameters
  • AnalysisHandle
The analysis handle passed to the user exit by the entry points EQFPRESEGEX, and EQFPOSTSEGWEX.
  • Options
The pointer to a EQFTAOPTIONS structure receiving the currently active analysis settings.

EQFSETTAOPTIONS

Purpose

programming interface callsEQFSETTAOPTIONS EQFSETTAOPTIONS EQFSETTAOPTIONS can be used by the markup table user exit to change the currently active analysis settings. The settings are passed to the API call in an EQFTAOPTIONS structure. The analysis handle used by this call is passed to the user exit by the user exit entry points EQFPRESEGEX, and EQFPOSTSEGWEX.

Format

Chap22 Figure20.gif

Parameters
  • AnalysisHandle
The analysis handle passed to the user exit by the entry points EQFPRESEGEX, and EQFPOSTSEGWEX.
  • Options
The pointer to a EQFTAOPTIONS structure containing the analysis settings being modified.

EQFTAOPTIONS

Purpose

The structure EQFTAOPTIONS is used by the API calls EQFSETTAOPTIONS and EQFGETTAOPTIONS to get or set the analysis options.

Fields
  • fAdjustLeadingWS
This flag represents the “Adjust leading whitespace to whitespace of source segment” flag of the GUI.
  • fAdjustTrailingWS
This flag represents the “Adjust trailing whitespace to whitespace of source segment” flag of the GUI.
  • bForFutureUse
Area for future enhancements. Currently not in use.

User exit entry points for context-dependent translations

The following user exit entry points support context-dependent translations, where translation proposals and automatic translations not only depend on text matches but also on the type of document containing the text. These entry points are designed to support the translation of Lotus Notes and Domino design elements, such as Notes database files, template files, and application templates. When OpenTM2 imports these documents (using the LOTUSNGD markup table), it maintains context-dependent information about these design elements together with existing translations in the Translation Memory. If the user exit is used by the markup table, OpenTM2 uses the context information and the translation proposals to identify matches on the segments to be translated.

  • EQFGETCONTEXTINFO is called once when a markup table is loaded. It returns information about the number and the names of context strings used in the Translation Memory, and it controls (based on the availability of context information) whether further context information processing is performed.
  • EQFGETSEGCONTEXT is called before a translated segment is saved in the Translation Memory. It gets the context strings from the user exit and passes them to the Translation Memory.

 

    • EQFUPDATECONTEXT is called subsequently for every segment during the analysis of a document and updates the user exit with the context strings from the Translation Memory for the current segment.
  • EQFCOMPARECONTEXT is called for every segment and compares and ranks a segment’s context information against Translation Memory proposals.

OpenTM2 uses these user exit entry points to support the translation of Lotus Notes forms that contain the Form, Subform, Title, and Subtitle context strings.

EQFGETCONTEXTINFo

Purpose

EQFGETCONTEXTINFO is called once when a new markup table is loaded into the Translation Memory. It returns the number of context strings that are used by this markup and the names of these context strings (for example, Panel ID for MRI markup). If a markup table user exit does not support this entry point, or returns an error code, no further context information processing is performed for this markup table (neither EQFGETSEGCONTEXT, EQFUPDATECONTEXT, nor EQFCOMPARECONTEXT is called).

Format

Chap22 Figure21.gif

Parameters
  • pusNumOfContextStrings
The pointer to a USHORT variable receiving the number of context strings that are used by this markup.
  • pContextNames
The pointer to a UTF16 buffer for the context names. This buffer has a size of MAX_CONTEXT_LEN(4096) characters. The context names are stored as a list of UTF-16 strings, and the list is terminated by 0x0000.Currently the names will not be used. In a later version these names will be used in the translation environment to display the context of a segment.
Return code

The return code indicates whether context information could be returned.

EQFGETSEGCONTEXT

Purpose

EQFGETSEGCONTEXT returns the context strings for a given segment and passes them to the Translation Memory functions before a segment is about to be saved in the Translation Memory. This function is used by the editor during the translation. Using the supplied document handle the function can go backward or forward to other segments if necessary (for example, for an MRI markup it is necessary to go back to the segment containing the panel ID).

Format

Chap22 Figure22.gif

Parameters
  • pCurSeg
The pointer to a zero-terminated UTF-16 string containing the text of the current segment.
  • pPrevSeg
The pointer to a zero-terminated UTF-16 string that contains the text of the previous segment (NULL, if there is none).
  • pNextSeg
The pointer to a zero-terminated UTF-16 string that contains the text of the next segment (NULL, if there is none).
  • pContextStrings
The pointer to a UTF16 buffer for the context strings. This buffer has a size of MAX_CONTEXT_LEN (4096) characters. The context strings are stored as a list of UTF-16 strings, and the list is terminated by 0x0000.
  • hEditor
The handle of type HANDLE, which is required for the EQFGetNextSeg and EQFGetPrevSeg functions.
Return code

The return code indicates whether context strings could be returned.

EQFUPDATECONTEXT

Purpose

EQFUPDATECONTEXT is called subsequently during the analysis of a document. If the current segment in the Translation Memory contains context information, this function updates the user exit with the context strings for this segment. The retrieved context strings are used to identify exact context matches with the EQFCOMPARECONTEXT function.

Format

Chap22 Figure23.gif

Parameters
  • pSeg
The pointer to a zero-terminated UTF-16 string containing the text of the current segment.
  • pContextStrings
The pointer to a UTF16 buffer containing the current context strings and receiving the updated context strings. This buffer has a size of MAX_CONTEXT_LEN(4096) characters. The context strings are stored as a list of UTF-16 strings, and the list is terminated by 0x0000 .
Return code

The return code indicates whether context strings could be updated.

EQFCOMPARECONTEXT

Purpose

EQFCOMPARECONTEXT is called for every segment that has an exact text match and context information available. The function compares the context strings of a segment against the context strings of a Translation Memory proposal and ranks the match between 0 and 100. 0 means no context match at all, and 100 means an exact context match.

During an analysis only exact text matches and exact context matches of a segment lead to automatic substitutions. During a translation, the ranks are used to identify the best translation proposals.

Format

Chap22 Figure24.gif

Parameters
  • pContextStrings1
The pointer to a buffer containing the context strings of the current segment. The context strings are stored as a list of UTF-16 strings, and the list is terminated by 0x0000.
  • pContextStrings2
The pointer to a buffer containing the context strings of the proposal. The context strings are stored as a list of UTF-16 strings, and the list is terminated by 0x0000 .
  • pusRanking
The pointer to the variable receiving the ranking for the context strings.
Return code

The return code indicates whether context information could be compared.

Parser application programming interface

The following functions are internal OpenTM2 parsing functions that are made available to expand the possibilities of user exists. Their main purposes are:

  • To access and modify segmented documents on a segment base.
Documents can be loaded, and their segments can be retrieved and modified. Segments can be converted into an SGML tagged format. Code conversions can be done, and some document properties can be retrieved. Modified documents can be saved.
  • To access and tokenize markup tables to get information about markup tags and property information.
Markup tables can be loaded and tokenized, and the properties of markup tags can be accessed.

Because these are basically parsing functions, their names start with “Pars”. Function names ending with “W” are for Unicode documents, and for markup tables to be used with Unicode documents.

Note that these functions are not called at defined OpenTM2 processing steps (as opposed to the descriptions in General user exit entry points and User exit entry points for context-dependent translations. However, they are well suited to be used in the code of one or more of these entry points. For example, they can be used to create or clean up markup tables. A sample parser that uses these parser API functions can be found in file parssamp.c in directory \eqf\nondde\ .

Further details about these functions, like the definition of data types, can be found in file eqfpapi.h in the same directory.

The following sections describe the parser API functions in detail. Where applicable, the parser API functions are enabled for Unicode UTF-16 support.

ParsInitialize

Purpose

ParsInitialize initializes the parser API environment and creates a parser API handle that is to be used in most of the other parser API functions.

Format

Chap22 Figure25.gif

Parameters
Type Parameter Description
HPARSER phParser The pointer to the buffer for the parser API handle.
CHAR pszDocPathName The pointer to the zero-terminated document path name.
Return code

Integer of 0 , if the environment is successfully initialized, or an error code.

ParsBuildTempName

Purpose

ParsBuildTempName builds a temporary file name based on the fully qualified file name of the source document.

Format

Chap22 Figure26.gif

Parameters
Type Parameter Description
PSZ pszSourceName The pointer to the zero-terminated fully qualified file name of the source document. The name serves as the model for the temporary file name.
PSZ pszTempName The pointer to the zero-terminated temporary file name. The buffer for the file name should have a size of 128 bytes or more.
Return code

Integer of 0 , if the file name is successfully built, or an error code.

ParsLoadSegFile

Purpose

ParsLoadSegFile loads a segmented file into memory.

Format

Chap22 Figure27.gif

Prameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
CHAR pszFileName The pointer to the zero-terminated fully qualified file name of the document to be loaded into memory.
HPARSSEGFILE phSegFile The pointer to the buffer in memory that receives the segmented file.
Return code

Integer of 0 , if the file is successfully loaded, or an error code.

ParsGetSegNum

Purpose

ParsGetSegNum returns the number of segments of the segmented file loaded into memory.

Format

Chap22 Figure28.gif

Parameters
Type Parameter Description
HPARSSEGFILE phSegFile The handle of the segmented file in memory.
LONG plSegCount The pointer to the buffer that receives the number of segments.
Return code

Integer of 0 , if the number is successfully retrieved, or an error code.

ParsGetSeg

Purpose

ParsGetSeg gets a segment from the segmented file loaded into memory. If the segment in Unicode format, use ParsGetSegW.

Format

Chap22 Figure29.gif

Parameters
Type Parameter Description
HPARSSEGFILE hSegFile The handle of the segmented file in memory.
LONG lSegNum The number of the segment to get.
PPARSSEGMENT pSeg The pointer to the buffer that receives the segment data.
Return code

Integer of 0 , if the segment is successfully retrieved, or an error code.

ParsGetSegW

Purpose

ParsGetSegW gets a segment from the segmented file loaded into memory. If the segment not in Unicode format, use ParsGetSeg.

Format

Chap22 Figure30.gif

Parameters
Type Parameter Description
HPARSSEGFILE hSegFile The handle of the segmented file in memory.
LONG lSegNum The number of the segment to get.
PPARSSEGMENTW pSeg The pointer to the buffer that receives the segment data.
Return code

Integer of 0 , if the segment is successfully retrieved, or an error code.

ParsUpdateSeg

Purpose

ParsUpdateSeg updates a segment of the segmented file loaded into memory. If the segment is in Unicode format, use ParsUpdateSegW .

Format

Chap22 Figure31.gif

Parameters
Type Parameter Description
HPARSSEGFILE hSegFile The handle of the segmented file in memory.
LONG lSegNum The number of the segment to update.
PPARSSEGMENT pSeg The pointer to the buffer that holds the new segment data.
Return code

Integer of 0 , if the segment is successfully updated, or an error code.

ParsUpdateSegW

Purpose

ParsUpdateSegW updates a segment of the segmented file loaded into memory. If the segment is not in Unicode format, use ParsUpdateSeg.

Format

Chap22 Figure32.gif

Parameters
Type Parameter Description
HPARSSEGFILE hSegFile The handle of the segmented file in memory.
LONG lSegNum The number of the segment to update.
PPARSSEGMENTW pSeg The pointer to the buffer that holds the new segment data.
Return code

Integer of 0 , if the segment is successfully updated, or an error code.

ParsWriteSegFile

Purpose

ParsWriteSegFile writes the segmented file in memory to an external file.

Format

Chap22 Figure33.gif

Parameters
Type Parameter Description
HPARSSEGFILE hSegFile The handle of the segmented file in memory.
CHAR pszFileName The pointer to the zero-terminated fully qualified file name of the document.
Return code

Integer of 0 , if the file is successfully written, or an error code.

ParsMakeSGMLSegment

Purpose

ParsMakeSGMLSegment builds an SGML tagged segment as used in segmented files. If the segment is in Unicode format, use ParsMakeSGMLSegmentW.

Format

Chap22 Figure34.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
PPARSSEGMENT pSegment The pointer to the buffer that holds the segment data.
CHAR pszBuffer The pointer to the buffer that receives the zero-terminated SGML-tagged segment. The buffer size for the segment should be at least twice the maximum segment size.
INT iBufferSize The size of pszBuffer.
BOOL fSourceFile
  • TRUE
Create SGML for a segmented source file.
  • FALSE
Create SGML for a segmented target file.
Return code

Integer of 0 , if the segment is successfully built, or an error code.

ParsMakeSGMLSegmentW

Purpose

ParsMakeSGMLSegmentW builds an SGML tagged segment as used in segmented files. If the segment is not in Unicode format, use ParsMakeSGMLSegment.

Format

Chap22 Figure35.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
PPARSSEGMENTW pSegment The pointer to the buffer that holds the segment data.
WCHAR* pszBuffer The pointer to the buffer that receives the zero-terminated SGML-tagged segment (in Unicode UTF-16 format). The buffer size for the segment should be at least twice the maximum segment size.
INT iBufferSize The size of pszBuffer.
BOOL fSourceFile
  • TRUE
Create SGML for a segmented source file.
  • FALSE
Create SGML for a segmented target file.
Return code

Integer of 0 , if the segment is successfully built, or an error code.

ParsConvert

Purpose

ParsConvert performs an in-place conversion from ASCII to ANSI, or vice versa.

Format

Chap22 Figure36.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitializefunction.
PARSCONVERSION Conversion The conversion mode:

  • ASCIItoANSI
  • ANSItoASCII
CHAR pszData The pointer to the zero-terminated data to be converted.
USHORT usLen The length of the data to convert.
Return code

Integer of 0 , if the conversion is successful, or an error code.

ParsGetDocName

Purpose

ParsGetDocName returns the long document name.

Format

Chap22 Figure37.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
CHAR pszDocName The pointer to the buffer that receives the zero-terminated long document name. The size of the buffer should be 256 bytes.
Return code

Integer of 0 , if the document name is successfully returned, or an error code.

ParsGetDocLang

Purpose

ParsGetDocLang returns the language settings of the current document.

Format

Chap22 Figure38.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
CHAR pszSourceLang The pointer to the buffer that receives the zero-terminated source language, or NULL. The buffer size should be 40 bytes or more.
CHAR pszTargetLang The pointer to the buffer that receives the zero-terminated target language, or NULL. The buffer size should be 40 bytes or more.
Return code

Integer of 0 , if the language setting are successfully returned, or an error code.

ParsSplitSeg

Purpose

ParsSplitSeg splits text data into segments by using OpenTM2’s morphological functions. The function looks for segment breaks in the supplied data by applying the morphology for the document source language. The segment breaks are returned as a list of segment breaks. This list is a list of offsets of segment breaks within the data. The last element in this list is zero.

If the buffer for this list is too small, the function returns an error and the first element of the list contains the required size of the list (in number of list elements).

If the text data is in Unicode format, use ParsSplitSegW.

Format

Chap22 Figure39.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
CHAR pszData The pointer to the zero-terminated text data that is to be split into segments.
USHORT usDataLength The length of the text data, as number of characters.
USHORT pusSegBreaks The pointer to the buffer that receives the list of segment breaks.
USHORT usElements The size of the buffer that receives the list of segment breaks, in number of list elements.
Return code

Integer of 0 , if the segment is successfully split, or an error code.

ParsSplitSegW

Purpose

ParsSplitSegW splits text data into segments by using OpenTM2’s morphological functions. The function looks for segment breaks in the supplied data by applying the morphology for the document source language. The segment breaks are returned as a list of segment breaks. This list is a list of offsets of segment breaks within the data. The last element in this list is zero.

If the buffer for this list is too small, the function returns an error and the first element of the list contains the required size of the list (in number of list elements).

If the text data is not in Unicode format, use ParsSplitSeg.

Format

Chap22 Figure40.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
WCHAR* pszData The pointer to the zero-terminated text data (in Unicode UTF-16 format) that is to be split into segments.
USHORT usDataLength The length of the text data, as number of UTF-16 characters.
USHORT pusSegBreaks The pointer to the buffer that receives the list of segment breaks.
USHORT usElements The size of the buffer that receives the list of segment breaks, in number of list elements.
Return code

Integer of 0 , if the segment is successfully split, or an error code.

ParsFreeSegFile

Purpose

ParsFreeSegFile frees a segmented file from memory.

Format

Chap22 Figure41.gif

Parameters
Type Parameter Description
HPARSSEGFILE hSegFile The handle of the segmented file in memory.
Return code

Integer of 0 , if the memory is successfully freed, or an error code.

ParsLoadMarkup

Purpose

ParsLoadMarkup loads a markup table into memory for usage with the ParsTokenize or ParsTokenizeW function. The markup table is loaded from the \eqf\table directory.

Format

Chap22 Figure42.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitialize function.
HPARSMARKUP* phMarkup The pointer to the buffer in memory that receives the markup handle.
CHAR pszMarkup The pointer to the zero-terminated markup table name (without path and extension, for example, EQFANSI ).
Return code

Integer of 0 , if the markup table is successfully loaded, or an error code.

ParsTokenize

Purpose

ParsTokenize looks for tags in the supplied text area of the markup table loaded into memory. The result is a tag token list that can be processed by the ParsGetNextToken function.

If the supplied text area is in Unicode format, use ParsTokenizeW.

Format

Chap22 Figure43.gif

Parameters
Type Parameter Description
HPARSMARKUP hMarkup The markup handle, created by the ParsLoadMarkup function.
CHAR* pszData The pointer to the zero-terminated text area that is to be tokenized.
Return code

Integer of 0 , if the markup table is successfully tokenized, or an error code.

ParsTokenizeW

Purpose

ParsTokenizeW looks for tags in the supplied text area of the markup table loaded into memory. The result is a tag token list that can be processed by the ParsGetNextToken function. If the supplied text area is not in Unicode format, use ParsTokenize.

Format

Chap22 Figure44.gif

Parameters
Type Parameter Description
HPARSMARKUP hMarkup The markup handle, created by the ParsLoadMarkup function.
WCHAR* pszData The pointer to the zero-terminated Unicode text area that is to be tokenized.
Return code

Integer of 0 , if the markup table is successfully tokenized, or an error code.

ParsGetNextToken

Purpose

ParsGetNextToken returns the next token from the token list created by the ParsTokenize and ParsTokenizeW functions. At the end of the token list a token with a token ID of PARSTOKEN_ENDOFLIST is returned. The PARSTOKEN structure describes the token structure in detail.

Format

Chap22 Figure45.gif

Parameters
Type Parameter Description
HPARSMARKUP hMarkup The markup handle, created by the ParsLoadMarkup function.
PPARSTOKEN pToken The pointer to a PARSTOKEN structure (see The PARSTOKEN structure) that receives the data of the token.
Return code

Integer of 0 , if the next token is returned, or an error code.

The PARSTOKEN structure

This structure holds the token information of a token that is returned by the ParsGetNextToken function.

Type Name Usage
INT iTokenID The token ID of the token returned. The token ID represents the position of the tag in the markup table.

  • A token ID of PARSTOKEN_ENDOFLIST represents the end of the tag token list.
  • A token ID of PARSTOKEN_TEXT (text token) represents text which is not recognized as a tag.
INT iStart The start position (in characters, not bytes) of the token in the text area (see … parameter pszData of the ParsTokenize orParsTokenizeW function).
INT iLength The length of the token (in number of characters, not bytes).
USHORT usFixedID A fixed token ID, or NULL if none is specified for the tag in the markup table.
USHORT usAddInfo Additional tag information, or NULL if none is specified for the tag in the markup table.
USHORT usClassID A Class ID, or NULL if none is specified for the tag in the markup table.

ParsFreeMarkup

Purpose

ParsFreeMarkup frees a markup table loaded with the ParsLoadMarkup function from memory.

Format

Chap22 Figure46.gif

Parameters
Type Parameter Description
HPARSMARKUP hMarkup The markup handle, created by the ParsLoadMarkupfunction.
Return code

Integer of 0 , if the markup table is freed from memory, or an error code.

ParsTerminate

Purpose

ParsTerminate terminates the parser API environment.

Format

Chap22 Figure47.gif

Parameters
Type Parameter Description
HPARSER hParser The parser API handle, created by the ParsInitializefunction.
Return code

Integer of 0 , if the environment is successfully terminated, or an error code.