Convert PDF to Markdown in Python

Hi there,

I tried to convert a PDF to Markdown and that seems to work well :slight_smile: The converter I used wat this one: PDF In Markdown Umwandeln

When I search for this converter under the cloud API solutions I can not find it. We plan to use this converter in a Python project (to convert multiple thousand PDF’s) but we could go with cURL as well.

Please let me know how I can access this API.

Best regards

@lars.peyer

You can signup with aspose.cloud for credentials and use Aspose.Words Cloud SDK for Python for PDF to Markdown conversion as following. Hopefully, it will help you to accomplish the task.

# For complete examples and data files, please go to https://github.com/aspose-words-cloud/aspose-words-cloud-python
# Import module
import asposewordscloud
import asposewordscloud.models.requests
from shutil import copyfile

# Please get your Client ID and Secret from https://dashboard.aspose.cloud.
client_id='xxxxxx-xxxx-xxxx-xxxx-xxxxxxxx'
client_secret='xxxxxxxxxxxxxxxxxxxxxxxxxx'

words_api = asposewordscloud.WordsApi(client_id,client_secret)
words_api.api_client.configuration.host='https://api.aspose.cloud'

inputFileName = 'C:/Temp/MergedPdf.pdf'
outputFileName = 'C:/Temp/02_pages.md'
#Convert PDF to MD
request = asposewordscloud.models.requests.ConvertDocumentRequest(document=open(inputFileName, 'rb'), format='md')
result = words_api.convert_document(request)
copyfile(result, outputFileName)
print("Result {}".format(result))

@tilal.ahmad thanx for your quick reply! This seems to work and converts my document to markdown.
In the web converter I get a ZIP file with all the images in it. Over the API I get the Markdown with the links to the images but so far I haven’t figured out how to get the images from the API.

Do you have any hints here?

Cheers
Lars

@lars.peyer

Thanks for your feedback. To accomplish your requirements you can use the SaveAs API method as following. It will save the output as a zip file that includes image files as well.

.....
inputFileName = 'C:/Temp/02_pages.pdf'
remoteFileName = '02_pages.pdf'
outputFileName = '02_pages.md'
#upload PDF file to storage
request_upload = asposewordscloud.models.requests.UploadFileRequest(open(inputFileName,'rb'),remoteFileName)
response_upload = words_api.upload_file(request_upload)

#Convert PDF to MD and save to storage
save_options = asposewordscloud.SaveOptionsData(save_format='md', file_name=outputFileName, zip_output=True)
request_conversion = asposewordscloud.models.requests.SaveAsRequest(remoteFileName, save_options)
response_conversion = words_api.save_as(request_conversion)
print(response_conversion)
.....