More complicated is the processing of certain European scripts such as Russian, Greek, Czech, and

so on. A common technique to process such scripts is to convert the original font to a symbol font to

avoid the usage of a CID font (multi-byte font) because the PDF format supports only four pre-

defined 8 bit encodings (WinAnsi, MacRoman, MacExpert, and Symbol). The advantage is that 8 bit

strings can be stored in the PDF file which results in a smaller file size and the PDF file is still

compatible to older Acrobat versions prior 4.0 because CID fonts are supported since PDF 1.3.

The problem is that if the font resource contains no ToUnicode CMap or PostScript character names

it is no longer possible to convert the text to Unicode. Depending on how a PDF file was created the

encoding is also often not known by the PDF driver, e.g. when converting PCL or AFP files to PDF.

Such PDF files can be viewed and printed correctly but it is not possible to extract human readable

strings from them.

How to calculate the absolute string position?

The absolute string position can be calculated from the matrices ctm and tm. Before the string

position can be computed the matrix tm must be transformed into user space. This can be done by

multiplying the matrices ctm and tm into another one:

TCTM MulMatrix(TCTM &M1, TCTM &M2)

{

TCTM retval;

retval.a = M2.a * M1.a + M2.b * M1.c;

retval.b = M2.a * M1.b + M2.b * M1.d;

retval.c = M2.c * M1.a + M2.d * M1.c;

retval.d = M2.c * M1.b + M2.d * M1.d;

retval.x = M2.x * M1.a + M2.y * M1.c + M1.x;

retval.y = M2.x * M1.b + M2.y * M1.d + M1.y;

return retval;

}

The usage is as follows:

TCTM m = MulMatrix(stack.ctm, stack.tm);

Note that the order in which the matrices are multiplied is important; the reversed order would

produce an incorrect result.

Now we need a function that transforms a point with the matrix:

void Transform(TCTM &M, double &x, double &y)

{

double ox = x;

x = ox * M.a + y * M.c + M.x;

y = ox * M.b + y * M.d + M.y;

}

The text position can now easily calculated as follows:

// Get the text matrix in user space

TCTM m = MulMatrix(stack.ctm, stack.tm);

double x = 0.0;

