We I’ve gotten a lot of HTML files that we need to import into our Ruby on Rails platform.
In the past we’ve done this to convert Word docs to HTML and it is working very good, now, we are using a very similar approach to “convert” this HTML files and upload it to our application, here is a code snippet:
request_save_options_data = HtmlSaveOptionsData.new(
FileName: "#{file_name_without_extension}.html",
ImagesFolder: 'images/',
SaveFormat: 'html',
ExportHeadersFootersMode: 'None',
ExportFontResources: 'true',
FontsFolder: 'fonts/',
CssStyleSheetFileName: "#{file_name_without_extension}.css",
CssStyleSheetType: 'External'
)
request = SaveAsRequest.new(name: file_name, save_options_data: request_save_options_data, folder: folder_name)
@words_api.save_as(request)
html_file = s3.get_object(bucket: @bucket, key: html_file_name_with_folder)
find_s3_items_and_change_permissions
[set_images_path(html_file), folder_name, "#{file_name_without_extension}.css"]
We are having some issues after that process:
-
The original HTML files have styles and javascript inside the
<head>
tag, how we can keep those elements during or after the conversion? -
We are loosing the images, is there a way in the options data to say that is should download the images from the img src attribute and then upload it? or is there another approach we should follow?
I’ve tried to do the same process using the Aspose Free Online Converter and after the conversion it is showing correctly the images, what makes me think we are missing something in our code. -
Some elements from the original HTML have data attributes that we need to keep, for instance, tooltips text. Is it possible to add an option to say that we need to keep the data attributes?
I’m attaching one of our HTML files to give you more context on what we want to “convert” and the things we want to keep, just needed to add the .docx extension, you just need to remove it.
ff574f1d-3b85-43b0-b9ac-3b01bc362b9b.html.docx (424.8 KB)
ff574f1d-3b85-43b0-b9ac-3b01bc362b9b.html.docx (425 KB)