Convert pdf to xml using python

Hi Team I’m trying to convert pdf to xml in python using asposepdf lib. and below code i wrote

import asposepdfcloud
from asposepdfcloud.apis.pdf_api import PdfApi

# Get App key and App SID from https://cloud.aspose.com
pdf_api_client = asposepdfcloud.api_client.ApiClient(
    app_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
    app_sid='xxxxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxxxxx')

pdf_api = PdfApi(pdf_api_client)
remote_name = 'sample.pdf'
output_file= 'sample.xml'
#upload PDF file to storage
pdf_api.upload_file(remote_name,remote_name)
#Covert PDF to XML and save in Aspose default storage
response = pdf_api.put_pdf_in_storage_to_xml(remote_name,output_file)
#response=pdf_api.put_pdf_in_storage_to_xml(remote_name,output_file)
print(response)

while the above code getting the below error
HTTP response body: {“RequestId”:“a90c8124-08a2-43fb-9a7e-fc333a394e87”,“Error”:{“Code”:“internalError”,“Message”:“Tagged pdf expected. Please use tagged pdf file for converting to xml format or use MobiXml for untagged pdf.”,“Description”:“Operation Failed. Internal error.”,“DateTime”:“2021-09-28T15:39:56.3280544Z”,“InnerError”:null}}

Can anyone help me in this issue.

@naveen1221

As stated in the exception message, you can use the PutPdfInStorageToMobiXml API method for untagged PDF conversion. The PutPdfInStorageToXml API method is used for tagged PDF conversion.

Hi Ahmad can you share the code snippet for converting pdf to xml using asposepdf cloud. And what do you meant by tagged pdf ?

@naveen1221

Your above shared code is correct. You can try this code using some tagged PDF. Please find a sample tagged pdf for testing.
4pages.pdf (16.8 KB)

Please check this post for tagged pdf details.

hi i used asposewordscloud to convert docx to pdf and im using same pdf for converting it into xml using aspose pdf cloud is it possible to convert like that.

@naveen1221

Please note, Aspose.Words Cloud API does not create tagged PDF.
So as stated above, you can convert tagged PDF to XML using the PutXmlInStorageToPdf API method and for untagged PDF document to MobiXML conversion using the Convert/PutXmlInStorageToPdf API method.