DynaPDF Manual - Page 565

Previous Page 564   Index   Next Page 566

Function Reference
Page 565 of 777
This is the preferred callback function to develop text extraction algorithms. See also Sub string
coordinates for further information.
The arrays Source and Kerning contain the source and translated Unicode strings of a text record.
Both arrays contain always the same number of elements (parameter Count).
The parameter Width represents the width of the enitre text record. The kerning array provides also
the width of each sub record and the displacement vector Advance. Advance is a vector also if only
coordinate is given; the y-coordinate is always zero. Positive values of Advance move the cursor to
the left; negative values move it to the right in a non-rotated coordinate system. The string widths
and the displacement vector are measured in text space.
The displacement vector is often used to apply kerning between two characters but it can also be
used to emulate spaces or to move the cursor to an arbitrary position on the x-axis of the text line.
Because CID fonts do not support word spacing, spaces are very often emulated with the
displacment vector.
The source strings are required if the width of a sub string must be calculated. Note that it is not
possible to calculate the the width of a sub string from the Unicode string.
It is possible that one or more sub records contain strings with a zero length. In this case, only the
displacment vector Advance must be considered.
DynaPDF is delivered with the examples text_extraction and text_coordinates which demonstrate
how text extraction algorithms can be developed and how text coordinates must be calculated. One
of these projects should be used as basis to develop your own code.
Image Extraction
The following callback functions should be set to extract images:
// Optional
// Optional
TRestoreGraphicState // Optional
// Optional
// Optional
It is possible to extract images without a graphics state. However, the image coordinates and visible
size in user space cannot be computed in this case. In addition, 1 bit images require a special
processing if the image is drawn as an image mask. This is the case if no color table is present and if
the member Transparent of the TPDFImage structure is set to true. Black pixels are rendered in the
current fill color in this case. If the application wants to preserve the color in which the image would
be rendered in a PDF viewer, then it must convert the image to the color space of the current fill
color (or to a device color space). The current fill color can be converted to a device color space with
ConvColor(). If a conversion to a device color space is preferred then it is usually best to convert the

Previous topic: TRestoreGraphicState, TSaveGraphicState, TShowTextArrayA

Next topic: Physical organization of images