Convert PDF to DOCX - no text is extracted



I need to convert pdf documents to the Word document format. These documents are mostly scanned pdfs. I use the following code for the conversion:

public String saveAsDocx() {

    String fn = parent.getNewFileName(file, "DOCX");

    DocSaveOptions saveOption = new DocSaveOptions();

    saveOption.setRecognizeBullets(true);, saveOption);

    parent.addResult(fn, false);

    this.processedFile.otherFormats.add(new ProcessedFile(fn));

    return fn;


where document is of type com.aspose.pdf.Document

The output I get is a word document, where each page contains only an image with the original pdf content. I understand that if the text of the scanned pdf is not well recognizable this is a valid output. However, when I try to extract the text from the pdf document it works well. For the text extraction I use the following code:

    public String saveAsText() throws Exception {

    String fn = parent.getNewFileName(file, "TXT");
    try (Writer writer = Files.newBufferedWriter(Paths.get(fn), StandardCharsets.UTF_8)) {
        for (int i = 0; i < document.getPages().size(); i++) {

            com.aspose.pdf.TextAbsorber textAbsorber = new com.aspose.pdf.TextAbsorber();
            document.getPages().get_Item(i + 1).accept(textAbsorber);
            String extractedText = textAbsorber.getText();

            writer.write("\n ----------------- PAGE --------------- \n");


    parent.addResult(fn, false);

    this.processedFile.otherFormats.add(new ProcessedFile(fn));

    return fn;


And the output of this method is a txt file with the text of the original pdf document and there are no errors in it.
So the question is - how to convert the pdf to docx when I know that the text is well extractable? I can also provide the document if it is necessary.




Thank you for contacting Aspose Support.

This forum is for topics related to Aspose REST APIs, your query is relevant to Aspose Native/Downloadable APIs, please follow this thread for an answer to your query.