DynaPDF Manual - Page 566

Previous Page 565   Index   Next Page 567

Function Reference
Page 566 of 787
The real text width measured in user space can be calculated as follows:
double x1 = 0.0;
double y1 = 0.0;
double x2 = Width; // Width is a parameter of the callback function
double y2 = 0.0;
// Transform the text matrix to user space
TCTM m = MulMatrix(m_GState.Matrix, *Matrix);
Transform(m, x1, y1); // Start point of the text record
Transform(m, x2, y2); // End point of the text record
double realTextWidth = CalcDistance(x1, y1, x2, y2);
The end point of a text record is usually required to determine whether the next record lies on the
same line. An algorithm that is able to construct text lines in arbitrary rotated coordinate systems is
provided in the example Text Extraction which is delivered with all DynaPDF versions.
Character Spacing
As described above the current character spacing is already considered in the text width that is
provided in all text callback functions. However, the value must be stored in the graphics state if the
width of a sub string must be computed. Character spacing is measured in unscaled font units. The
required transformation to text space is done in functions like GetTextWidth().
Word Spacing
Like character spacing, the current word spacing is already considered in the text width that is
provided in all text callback functions. However, word spacing applies to the space character of
simple fonts only.
An application that extracts text from PDF files maybe wants to preserve the original formatting of
the text. In this case, the distance between two words in the same text record must be known, e.g. to
insert a number of spaces to emulate the word spacing.
However, note that the current word spacing must be ignored if the font type is ftType0 (the font
type is a parameter of the graphics state and is set with the TSetFont callback function).
Another thing that must be considered is that word and character spacing are measured in unscaled
font units. The width of a space character including word spacing can be calculated with the
function GetTextWidth() that is part of the font API (the name is fntGetTextWidth() in C/C++).
An algorithm that considers word spacing must check whether the source string contains space
characters. If a space was found, the width of the sub string that occurs before must be calculated so
that the start and end point of the word can be calculated. Additional spaces can be skipped and the
cursor position is updated to the position behind the spaces. Processing continues until the entire
text of the record was processed.
An algorithm that processes text in this way calculates essentially the start and end coordinates of
every text part that is either separated by spaces or kerning space.

Previous topic: Font Size, Text Width

Next topic: Text Scaling, Sub string coordinates

Start Chat