How to modify Python download file code to avoid generating 0 byte pdf files? Other code functions are running normally

asposer01 · July 31, 2024, 1:37am

How to modify Python download file code to avoid generating 0 byte pdf files? Other code functions are running normally.

import os
from asposepdfcloud import PdfApi, models
from asposepdfcloud.api_client import ApiClient

# Replace with your Aspose Cloud App key and App SID
app_key = 'your_app_key'
app_sid = 'your_app_sid'

# Initialize the PdfApi client
pdf_api_client = ApiClient(app_key=app_key, app_sid=app_sid)
pdf_api = PdfApi(pdf_api_client)

def process_pdf_files_in_folder(input_folder, output_folder):
    # Ensure the output folder exists
    os.makedirs(output_folder, exist_ok=True)

    # Iterate over all PDF files in the input folder
    for filename in os.listdir(input_folder):
        if filename.endswith(".pdf"):
            input_file_path = os.path.join(input_folder, filename)
            remote_name = filename
            copied_file = f'processed_{filename}'
            
            # Upload PDF file to cloud storage
            pdf_api.upload_file(remote_name, input_file_path)

            # Copy the file
            pdf_api.copy_file(remote_name, copied_file)

            # Replace text
            text_replace = models.TextReplace(old_value='Watermark instead', new_value='', regex=True)
            text_replace_list = models.TextReplaceListRequest(text_replaces=[text_replace])
            pdf_api.post_document_text_replace(copied_file, text_replace_list)
            
            # Download the processed file to the local system
            output_file_path = os.path.join(output_folder, copied_file)
            
            # Retrieve the file content from the cloud
            response = pdf_api.download_file(copied_file)
            
            # Open a file stream to write the downloaded content
            with open(output_file_path, 'wb') as file:
                # Write the content to the file
                file.write(response)

            print(f'Processed and saved: {output_file_path}')

# Use specific folder paths
process_pdf_files_in_folder(r'D:\Temp\s1', r'D:\Temp\s2')

Dear @kirill.novinskiy ,
Can you help me?

kirill.novinskiy · July 31, 2024, 1:30pm

Hi, @asposer01
Does the problem occur when processing any specific files or is it random?
How big is the documents folder?
Please share your original document, which was saved with a length of 0 bytes.

asposer01 · July 31, 2024, 1:52pm

Random, I just tested 3 similar files. The size of this file is 530Kb.

kirill.novinskiy · July 31, 2024, 3:47pm

@asposer01, what type of storage do you use?

asposer01 · July 31, 2024, 3:53pm

My storage information is:

Internal Storage Details

Storage Name : aspose-pdf-cloud-python
Storage Mode : Retain files for one month

kirill.novinskiy · July 31, 2024, 4:52pm

@asposer01, your code snippet is not well.

Find this lines:

            # Retrieve the file content from the cloud
            response = pdf_api.download_file(copied_file)
            
            # Open a file stream to write the downloaded content
            with open(output_file_path, 'wb') as file:
                # Write the content to the file
                file.write(response)

change to:

            # Retrieve the file content from the cloud
            download_filepath = pdf_api.download_file(copied_file)
            shutil.move(download_filepath, output_file_path)

and add this to the beginning of the script:

import shutil

asposer01 · July 31, 2024, 5:01pm

According to your guidance, the code runs perfectly, even in the case of batch processing pdf is no problem at all, the problem was successfully resolved, thank you very much.