Convert PDF to HTML in Node.js Using Aspose Words REST API Issue

I am having an issue using the Node.js version of the Aspose Words Cloud. Specifically, when trying to convert a PDF to HTML and generate separate HTML files for each page in the PDF. Below is my code

const htmlSaveOptionsData = new asposeWords.HtmlSaveOptionsData({
documentSplitCriteria: ‘PageBreak’,
fileName : ‘converting-to.html’,
saveFormat : ‘html’
});
const convertDocToHtmlRequest = new asposeWords.SaveAsRequest({
name : ‘converting-from.pdf’,
saveOptionsData : htmlSaveOptionsData
});

const docToHtmlResponse = await asposeWordsApi.saveAs(convertDocToHtmlRequest)

The issue seems to be with documentSplitCriteria. I get an error back in the response saying

Document part file cannot be written. When saving the document either output file name should be specified or custom streams should be provided via DocumentPartSavingCallback

However, I am already passing in an output file name, and the Node.js version does not seem to support the DocumentPartSavingCallback parameter. I see it detailed within the Java and .NET documentation, but nowhere in the Node.js documentation. In fact, the builtin request serialization does not appear to allow for a parameter of this name.

Does the Node.js version of Aspose Words Cloud even support splitting an HTML file based on page when converting from a PDF? And if so, what do I need to change in my request to get it to work?

@jessewilliams

I have converted a sample PDF to HTML with documentSplitCriteria property using Aspose.Words Cloud SDK for Node.js 22.2 and unable to reproduce the issue. Please share your input PDF document with us. We will try to replicate the issue and fix it.

I have uploaded the file I am testing with. It is just a very simple PDF with three pages.

Simple Document Pages.pdf (60.3 KB)

@jessewilliams

Thanks for sharing your input document. I have converted the PDF to HTML in Node.js using Aspose.Words Cloud SDK for Node.js 22.2 and unable to reproduce the issue. Please double check that you are using the latest version of Aspose.Words Cloud SDK for Node.js. And if you still face the issue then please share your exact code here for investigation.

Convert PDF to HTML Converter Online


const { WordsApi, SaveAsRequest, SaveOptionsData } = require("asposewordscloud");
var fs = require('fs');

const convert = async () => {
// Please get your Client ID and Client Secret from https://dashboard.aspose.cloud/.
wordsApi = new WordsApi("xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx", "xxxxxxxxxxxxxxxxxxxxx");

const remoteFileName = "Simple Document Pages.pdf";
const resultFileName = "Simple Document Pages.html";


try {
const request = new SaveAsRequest({
                    name: remoteFileName,
                    saveOptionsData: new SaveOptionsData({
                        saveFormat: "html",
                        fileName: resultFileName,
                        documentSplitCriteria: "PageBreak",
                    })
                    //folder: remoteFolder,
		    //storage:"MyDB_Storage"
                });

const PDFToHtmlResponse = await wordsApi.saveAs(request);
console.log(PDFToHtmlResponse.body); 
} catch (err) {
throw err;
}
}

convert()
.then(() => {
console.log("PDF document converted successfully");
})
.catch((err) => {
console.log("Error occurred while converting the PDF document:", err);
})

Looks like the issue was that I was using HtmlSaveOptionsData instead of just the standard SaveOptionsData. Although I was expecting multiple html files to be generated based on the breaking parameter, but perhaps I misunderstood. I should be able to manually parse the file and break it into separate files now that the break information is in place.

Thanks

@jessewilliams

Thanks for your feedback. It is good to know you have managed to resolve the issue. However, you may use SplitDocument API method to split PDF into separate HTML files as following.

Split PDF to Separate HTML Files Online

const { WordsApi, SplitDocumentRequest } = require("asposewordscloud");
var fs = require('fs');

const splitPDFtoHTML = async () => {
// Please get your Client ID and Client Secret from https://dashboard.aspose.cloud/.
wordsApi = new WordsApi("xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx", "xxxxxxxxxxxxxxxxxxxxxxxxxx");

const remoteFileName = "Simple Document Pages.pdf";

try {
const request = new SplitDocumentRequest({
                    name: remoteFileName,
                    format: "html",
                    //folder: remoteDataFolder,
                    //from: 1,
                    //to: 2
                });

const splitPDFResponse = await wordsApi.splitDocument(request);
console.log(splitPDFResponse); 
} catch (err) {
throw err;
}
}

splitPDFtoHTML()
.then(() => {
console.log("PDF document split successfully");
})
.catch((err) => {
console.log("Error occurred while splitting the PDF document:", err);
})

Thanks, this works almost perfectly. Is there a way to combine the functionality of the HTML converter so that the images on the resulting HTML are converted to files instead of base64 images?

Actually nevermind, the base64 images will work just fine.

Thank again

1 Like