In order to be translated, PDF files sometimes require processing through optical character recognition or OCR. This process serves to identify any and all text included in the file and transform it from pixels to characters.
It is not guaranteed that you will get an accurate word count analysis for PDF files without an optical character recognition process. An additional charge for the OCR will be added to the cost of the project.
There are two types of PDF files: text-based and image-based.
In principle, text-based files can be used for analysis (and translation) without being processed by an OCR program. Image-based PDFs, however, cannot. Most PDF files include features of both of these types, i.e. include both text-based and image-based content.
TEXT-BASED PDF FILES
Text-based PDF files can be processed through our translation management system without being processed with an OCR program first. A correct analysis for pricing can be made automatically. The file will, however, lose all stylistic formatting, and will be delivered as a plain text file (.txt) instead of a PDF.
The following will change in the deliverable file:
- Font, font sizes and any styles
- Logos and images
- Headers and footers
- File format (the translation will be delivered as a .txt file)
Any text within an image will not be recognised automatically, and will therefore not be included in the translation process. If your file includes text within images that you wish to have translated, the file needs to undergo an optical character recognition process. You can find more information about the OCR process below.
IMAGE-BASED PDF FILES
If your document was generated by scanning, it will be an image-based PDF that cannot be processed by translation software without undergoing an optical character recognition process (OCR). OCR will transform the image-based file into actual characters that translation software is able to process.
Please note that oftentimes, the source document contains both text- and image-based content. We can provide you a quote for the OCR process, after which the source text will become usable in our translation systems. A minimum charge of EUR/USD 80 is applied for all OCR services. Please contact our sales team at firstname.lastname@example.org for a quote or for more information.
Any orders that include images to be translated but are not sent to our sales team for a quote will be canceled. It is technically impossible to complete these orders without OCR.