Convert HTML files to HTML

altose87 · July 14, 2023, 6:31pm

We I’ve gotten a lot of HTML files that we need to import into our Ruby on Rails platform.
In the past we’ve done this to convert Word docs to HTML and it is working very good, now, we are using a very similar approach to “convert” this HTML files and upload it to our application, here is a code snippet:

request_save_options_data = HtmlSaveOptionsData.new(
  FileName: "#{file_name_without_extension}.html",
  ImagesFolder: 'images/',
  SaveFormat: 'html',
  ExportHeadersFootersMode: 'None',
  ExportFontResources: 'true',
  FontsFolder: 'fonts/',
  CssStyleSheetFileName: "#{file_name_without_extension}.css",
  CssStyleSheetType: 'External'
)
request = SaveAsRequest.new(name: file_name, save_options_data: request_save_options_data, folder: folder_name)
@words_api.save_as(request)
html_file = s3.get_object(bucket: @bucket, key: html_file_name_with_folder)
find_s3_items_and_change_permissions
[set_images_path(html_file), folder_name, "#{file_name_without_extension}.css"]

We are having some issues after that process:

The original HTML files have styles and javascript inside the <head> tag, how we can keep those elements during or after the conversion?
We are loosing the images, is there a way in the options data to say that is should download the images from the img src attribute and then upload it? or is there another approach we should follow?
I’ve tried to do the same process using the Aspose Free Online Converter and after the conversion it is showing correctly the images, what makes me think we are missing something in our code.
Some elements from the original HTML have data attributes that we need to keep, for instance, tooltips text. Is it possible to add an option to say that we need to keep the data attributes?

I’m attaching one of our HTML files to give you more context on what we want to “convert” and the things we want to keep, just needed to add the .docx extension, you just need to remove it.
ff574f1d-3b85-43b0-b9ac-3b01bc362b9b.html.docx (424.8 KB)
ff574f1d-3b85-43b0-b9ac-3b01bc362b9b.html.docx (425 KB)

tilal.ahmad · July 14, 2023, 7:44pm

@altose87

Thanks for your inquiry. We will appreciate it if you could please share your expected sample output with us. It will help us investigate and provide a solution.

altose871 · July 17, 2023, 12:39pm

Hi @tilal.ahmad thanks for your help pointing me through this post.

Attached is a zip file with the expected result, that result was generated using the Aspose Free Online Converter.

Output (2).zip (102.6 KB)

Let me know if you need something else from our end

tilal.ahmad · July 17, 2023, 3:02pm

@altose871

Thanks for sharing the expected output. We will look into the requirements(WORDSCLOUD-2410) and guide you accordingly.

altose871 · July 24, 2023, 11:39am

Hi @tilal.ahmad is there any update on this issue?
Thanks in advance!

tilal.ahmad · July 24, 2023, 5:31pm

@altose871

Thanks for your patience. We have planned the issue’s investigation and will share our findings with you in a day or two.