We’re trying to convert PDFs to DOCx using Words Cloud, but we’re encountering some strange issues.
Any content in the PDF that has a not-square background colour (i.e. the background has a corner radius) then only the background colour/shape ends up in the word document. Any text that was inside/overlapping the shape is removed.
Additional to this, for no apparent reason, occasionally the document returns with all of the fonts changed from Verdana to Times New Roman. Usually it’s absolutely fine, but occasionally bam - all Times New Roman.
The PDFs are generated from HTML using puppeteer, and we’ve tried a couple of things to try to force the issue on the shapes, e.g. replacing <div>
elements with backgrounds with <svg>
elements with a path, but the same thing. Essentially the only way we can get text to show up is if the background has square corners.