DynaPDF Manual - Page 453

Previous Page 452   Index   Next Page 454

Function Reference
Page 453 of 777
However, if texts must be deleted or replaced you must make sure that a template is not edited twice
if it occurs in another page or template. Such a duplicate check is strongly required and it must be
applied every time a template should be processed.
Whether a page or template contains templates can be determined with GetTemplCount(). Such
subsequent templates can then be opened for editing with EditTemplate(). When finish, the template
must be closed with EndTemplate().
Organization of text objects
A text object consists of a transformation matrix and the text. Several other properties are taken from
the current graphics state such as the font, font size, character spacing, word spacing and so on.
Text objects use a separate coordinate system which is represented by the text transformation matrix
tm. We call this coordinate system text space. All text properties such as font size, text width and so
on are calculated in text space. The PDF format supports also several text positioning operators to
decrease the size of a text object. To make the usage of the function easier DynaPDF includes all text
positioning operators already in the text transformation tm.
The text coordinate system must be transformed to user space by multiplying the text matrix with
the current transformation matrix cm to enable the calculation of the text position. The combined
matrix must be recalculated each time GetPageText() returns a new text object.
As mentioned earlier a content stream is not organized into text lines and the order in which text
objects occur is essentially arbitrary. A text record can occur in two different formats: as an array or
as one coherent text string. The array form enables the definition of kerning between characters in a
compact format since PDF viewers ignore any available kerning information in a font resource. The
strings in a kerning array lie always on the same text line.
The kerning array is also often used to emulate space characters because word spacing does not
work with CID fonts. Most PDF drivers use the same algorithm to format text of single and multi-
byte fonts; that is the reason why space characters are very often emulated with kerning space.
However, it is quite easy to determine whether a space character is emulated at given position: if the
displacement is larger than the half space width we can assume that a space character was emulated
at this position. The half space width should be used because the fonts of documents which emulate
space characters with kerning space contain often no space character. DynaPDF sets a default space
width in this case which can be too large if a condensed font is used.
However, the array form is just one possible format to enable kerning between characters. Due to
several reasons the array form is sometimes not used. Many PDF drivers update the text position
with text positioning operators instead. This technique produces not only much greater content
streams it splits text records also into separate ones. This complicates the identification of word
boundaries a lot because each record is returned in a separate GetPageText() call. We need now the
coordinates to determine whether the text must be assigned to the same line. If the text is not rotated
this is not a big deal but if the coordinate system is rotated or if it contains other transformations
some further math is required to determine whether a text record must be assigned to the current

Previous topic: Order of Text records, Organization of content streams and pages

Next topic: Possible encoding issues