Convert PDF to HTML using Aspose.PDF REST API renders incorrect HTML text

Hello,

I use Aspose.PDF Cloud to convert PDF documents into HTML. We found out that in a particular situation, the rendered HTML is a bit weird. Let me give an example.

We have this as text in a PDF (in the PDF, it is centered):

VIVAMUS PRETIUM ULTRICES
morbi accumsan turpis ante, in suscipit lectus venenatis hendrerit
suspendisse eget dictum tortor, nec ultricies odio
nam arcu neque, dictum vel velit sit amet, sagittis bibendum odio. In volutpat ornare mauris
ut interdum libero eu vestibulum feugiat. Nunc et ornare libero, sed euismod lectus
duis ante sem, accumsan vitae eros non, pretium sodales nibh
ut molestie aliquet augue, ut lacinia purus interdum non phasellus
quisque convallis augue vitae luctus.
______________________________
LAOREET
for
Nullam Tristique
Ipsum Luctus
Tortor Venenatis
Diam Dignissim
Gravida Diam

23 November 2015

Sed ligula sem, ullamcorper id est at, sollicitudin aliquet augue. Etiam eget elit dolor. Pellentesque ut ipsum leo. Proin ultrices nulla at scelerisque varius.
I had to obfuscate the text but the only interesting things are the line composed of underscores (____) and the date (23 November 2015).

In the HTML produced, I get this:



VIVAMUS PRETIUM ULTRICES  




morbi accumsan turpis ante, in suscipit lectus venenatis hendrerit  




suspendisse eget dictum tortor, nec ultricies odio  




nam arcu neque, dictum vel velit sit amet, sagittis bibendum odio. In volutpat ornare mauris  




ut interdum libero eu vestibulum feugiat. Nunc et ornare libero, sed euismod lectus  




duis ante sem, accumsan vitae eros non, pretium sodales nibh  




ut molestie aliquet augue, ut lacinia purus interdum non phasellus  




quisque convallis augue vitae luctus.  



_


_____________________________  



LAOREET  




for  




Nullam Tristique  




Ipsum Luctus  




Tortor Venenatis  




Diam Dignissim  



2


3 November 2015  



Sed ligula sem, ullamcorper id est at, sollicitudin aliquet augue. Etiam eget elit dolor. Pellentesque ut ipsum leo. Proin ultrices nulla at scelerisque varius.  


To increase readability, I modified the increment and highlighted in green/red the underscores line and the date.

As you can see, it isolates the first char of each line, causing that when I open the HTML file, the first underscore (_) is above the rest of the line, and same for the date, the 2 is one line above "3 November 2015".

What we also stated is that for each isolated character (_ and 2), the line does not end with   but all other lines who do not have an isolated character do finish with   .


Are you aware of this issue? Is there a workaround? Will it be fixed?

Thanks

Hi,


Thanks for contacting support.

We are working on testing the scenario in our environment and will get back to you soon. We are sorry for this inconvenience.
Hello,

Could you reproduce the issue? Any news on this?

Thanks.
Matt

Hi,


Sorry for the delayed response.

In order to be certain about the exact issue which you have been facing, can you please share the resource PDF files which you are using and have encountered the issues, so that we can test the scenario in our environment. We are sorry for this delay and inconvenience.
Hello,

I sent you an email with the file attached.

Should you need anything else, please let me know.

Best regards,
Matt

Hi Matt,


Thanks for sharing the resource files.

I have got the resource file and working on testing the scenario in our environment. Soon you will be updated with the status of correction.

Hi Matt,

Sorry, we were not able to reproduce the issue at our end. The text you mentioned in your post is not present in the PDF so it is difficult to identify any issue. Can you please check attached output HTML and highlight the issue if you see any?

Best Regards,