Convert Word Document to HTML in Ruby and Store in S3 Bucket

Hi!

We are using aspose to convert Word Documents to HTML.
A storage in aspose linked to our S3 bucket was also created, it is called “aspose-docs”

To convert the Word Doc to HTML we are using SaveAsOnlineRequest, here is the code snippet:

options = AsposeWordsCloud::HtmlSaveOptionsData.new(
  SaveFormat: 'html',
  FileName: "#{file_name_without_extension}.html",
  PrettyFormat: true,
  ExportDocumentProperties: true,
  ExportFontResources: true,
  FontsFolder: folder_name,
  ImagesFolder: folder_name,
  OfficeMathOutputMode: 'MathML',
  ResourceFolder: folder_name,
  ZipOutput: true,
  StorageName: 'aspose-docs'
)
save_request = SaveAsOnlineRequest.new(document: file_name, save_options_data: options)
@words_api.save_as_online(save_request)

Here is an example of the the values for the variables we are using:
file_name = “test_doc.docx”
file_name_without_extension = “test_doc”
folder_name = “documents/test_doc_docx/”

That folder already exists in S3 and contains the original Word Doc that is “test_doc.docx”

As you can see, in the options we are setting the StorageName field with the name of our storage created in aspose.

I’m assuming that by setting that StorageName, the save_as_online will store the generated files inside that S3 bucket.

Is there a missed step on our end for that to happen? or is there another way to get the generated file from the save_as_online response that we can then store in S3?

@altose87

Please note that the online methods are used to process files/content from the request body. Please use the SaveAs method to convert files from cloud storage. Please check the following links for more details.

PUT ​/words​/{name}​/saveAs Converts a document in cloud storage to the specified format.

@tilal.ahmad thank you for your answer, it seems that it is doing what is expected, I’m getting this response that includes this segment, which means to me that it is storing a register of a file stored on our S3 bucket and in the expected folders.

<AsposeWordsCloud::FileLink:0x00007fcdf52fc310 @href="documents/5d731153_8738_4327_8e32_54c6d9ff06c9_test_doc_docx/5d731153-8738-4327-8e32-54c6d9ff06c9-test_doc.html", @rel="saved">

But when we review the s3 bucket in that folder, the file is not there.

This is the code we are using:

 request_save_options_data = HtmlSaveOptionsData.new(
   FileName: html_file_name_with_folder,
   CssStyleSheetType: 'External',
   CssStyleSheetFileName: '/',
   SaveFormat: 'html'
 )
 request = SaveAsOnlineRequest.new(document: request_document, save_options_data: request_save_options_data)
 result = @words_api.save_as_online(request)

And the response above is generated after execute the last line of the code snippet:
result = @words_api.save_as_online(request)

Is there anything else we would need to do in order to get the files automatically uploaded to our S3 bucket?

@altose87

As stated above, please note the online methods do not involve cloud storage for file processing. So if you want to convert files from cloud storage and save the output to cloud storage, then please use the simple SaveAs method.

1 Like

@tilal.ahmad thank you, that worked pretty well.

@altose87

It is good to know that you have managed to resolve the issue. Please feel free to contact us for any further assistance.

Hi @tilal.ahmad after successfully convert the documents we are facing a new issue, it is not happening with all the paragraphs but with most of them, using the same sample files I’ve shared in other threads:

      <p style="margin-top: 0pt; margin-left: 27pt; margin-bottom: 0pt; font-size: 19pt; ">
        <span style="font-weight: bold">Walmart</span>
        <span style="font-weight: bold; letter-spacing: -0.5pt"> </span>
        <span style="font-weight: bold">Inc.</span>
        <span style="font-weight: bold; letter-spacing: -0.4pt"> </span>
        <span style="font-weight: bold">Takes</span>
        <span style="font-weight: bold; letter-spacing: -0.5pt"> </span>
        <span style="font-weight: bold">on</span>
        <span style="font-weight: bold; letter-spacing: -0.4pt"> </span>
        <span style="font-weight: bold; letter-spacing: -0.1pt">Amazon.com</span>
      </p>

the above HTML renders this text:

Walmart Inc. Takes on Amazon.com

in this html for every single space between words a <span> is created, is there a way to avoid that to happen and just create a <p> or <span> with the whole text?

The document looks very good, but the thing is that we have a feature that allows the users to highlight content from it, and with this new HTML structure that feature is failing.

@altose87

It seems an expected result, because when I convert your Word document to HTML using MS Word, it adds each word as a separate .

1 Like

Hi @tilal.ahmad I want to ask you if you know of other aspose users having problems with selecting text to create highlights and if they had resolved it in any way.

I know this question has nothing to do with the library but we haven’t been able to find a solution for this feature yet.

@altose87

Can you please elaborate on what exact issue you are facing while highlighting the text in the current output HTML? We will check how we can help in this scenario.