Free Support Forum - aspose.cloud

Issues converting docx to html using Aspose.words in asp.net mvc application

I am converting attached docx document to html fixed which retain all the formating and styling.

It converts the document decently, But the problem is I want to upload the image and shape on the cloud server and bind the uri of the uploaded image to respective <img src=“uri”. tag while converting it to html.

While converting doc to html fixed, Aspose create a folder and keep all the images, css and font winthin that folder But not all images like shape, diagrams, and charts are saved in that folder
where to find that images type ???

Also, custom shape, diagrams and charts don’t have HasImage property to true, so these type of images cannot be uploaded on the server any way around for it ??

Also, How to bind the uri of all types of images to respective <img src=“uri” tag while converting it to html fixed and loading all css and font??

In brief Point of concerns :
1. How to iterate through the docx and upload each images whether it is a diagram, shapes, charts or screenshot sequentially and upload it on the server to keep the image uri. Remember HasImage property is not true for custom shapes,diagrams and charts so such images type cannot be uploaded on the server any way around for it ?

2. How to bind uri of respective images to corresponding <img tag within html document while converting docx to html fixed type.

3. How to load css and font while loading html fixed document within the .net application.

P.S: I have attached a sample document with all the possible scenario and issues that I am facing with aspose.words. Refer attached sample document and provide code to resolve these issues.

Thanks,
Raju Kumar

Hi Raju,


Thanks for your inquiry. The Aspose.Words support team will review your requirements as soon as possible, sorry for any delay.

Thanks,
Adam

No resolution yet !!!

I am still waiting for my query to be answered !!!

Hi Raju,

You can use the code from http://www.aspose.com/community/forums/permalink/535974/535974/showthread.aspx#535974 to upload the images and then use uploaded images. FileFormatUtil.ImageTypeToExtension(shape.ImageData.ImageType) will give you the types of the images.

Following code can be used to save any shape as image.

ShapeRenderer r = shape.GetShapeRenderer();

<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" />

// Define custom options which control how the image is rendered. Render the shape to the JPEG raster format.

ImageSaveOptions imageOptions = new ImageSaveOptions(SaveFormat.Emf)

{

Scale = 1.5f

};

// Save the rendered image to disk.

r.Save(dataDir + "TestFile.RenderToDisk Out.emf", imageOptions);

Best Regards,

This doesn’t solve any of the problem. Did you tried using the attached document ?


To add some more details in the query, While converting docx containing shape, diagrams, and charts to html fixed type it generate “svg” image. Is there any way to extract “svg” from the html document and upload it on the cloud server in the form of image and bind the uri to respective <img src tag.

Again my POC are :

1. How to iterate through the docx and upload each images whether it is a diagram, shapes, charts or screenshot sequentially and upload it on the server to keep the image uri. Remember HasImage property is not true for custom shapes,diagrams and charts so such images type cannot be uploaded on the server any way around for it ?

2. How to bind uri of respective images to corresponding <img tag within html document while converting docx to html fixed type.

3. How to load css and font while loading html fixed document within the .net application.

Please read my query properly None of my query is answered. Could you please ask some senior person of person having expertise in aspose.words to look in to the issue.


Hi Raju,

We are working on the example using your document and will share it with you soon. Sorry for the inconvenience.

Best Regards,

Any update ??

Hi Raju,

We will share the example with you today. Sorry for the inconvenience.

Best Regards,

Hi Raju,

Sorry for the delay. There are some limitations if you use Aspose.Words for .NET to convert to HtmlFixed format and it does not support CurrentShape property in ResourceSavingArgs and SVG to raster image during Word to HtmlFixed. We need these features to accomplish the task using Aspose.Words only.

Two new issues to support above mentioned features have been logged into our issue tracking system as WORDSNET-11693 and WORDSNET-11694 respectively. We will keep you updated on these issues in this thread.

However one solution is available if you use Aspose.Words and Aspose.Pdf. Aspose.Words can be used to convert Word to PDF and then Aspose.Pdf can be used to convert PDF to HTML. Following code can be used to convert PDF to HtmlFixed and save SVGs as raster images during conversion.

Document pdf = new Document("c:/pdftest/out.pdf");

string outHtmlFile = @"c:\pdftest\Out_FileConverted.html";

// Create HtmlSaveOption with tested feature

HtmlSaveOptions saveOptions = new HtmlSaveOptions();

saveOptions.FixedLayout = true;

saveOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" />

pdf.Save(outHtmlFile, saveOptions);

You can check all topics related to PDF to HTML conversion, specially http://www.aspose.com/docs/display/pdfnet/PDF+to+HTML±+Save+Output+to+a+Stream+Object and http://www.aspose.com/docs/display/pdfnet/PDF+to+HTML±+Specify+Prefix+for+Images to save images to remote server and update image URLs.

Best Regards,

You may use my code for further fixing :


using System;
using System.Collections.Generic;
using System.Linq;
using System.Web.Mvc;
using Aspose.Words;
using Aspose.Words.Saving;
using System.IO;
using Aspose.Words.Drawing;
using Microsoft.WindowsAzure.Storage.Blob;
using AsposeWords.Models;
using HtmlAgilityPack;

namespace AsposeWords.Controllers
{
public class HomeController : Controller
{
///
/// Pick Docx file(“Aspose.docx”) from documents folder
/// Extract, Transform, Convert and Load in HTML document.
///
/// returns simple view, converted html file created in documents folder within the application
public ActionResult Index()
{
var path = Path.Combine(Server.MapPath("~/Documents/") + “Aspose.docx”);
Document docObj = new Document(path);
List imgUris = new List();
NodeCollection shapes = docObj.GetChildNodes(NodeType.Shape, true);
foreach (Shape shape in shapes)
{
if (shape.HasImage || shape.CanHaveImage)
{
string imgUrl = UploadImageToServer(shape);
imgUris.Add(imgUrl);
string imgName = Path.GetFileName("");
shape.AlternativeText = imgName;
}
}
string newURL = path.Remove(path.Length - 4) + “html”;

/****** Save to normal HTML with resolved shape and diagram issue but regular expression problem will remain *****/
// HtmlSaveOptions so = new HtmlSaveOptions();
// so.SaveFormat = SaveFormat.Html;
// so.ImageSavingCallback = new HandleImageSaving();
// docObj.Save(newURL, so);
/ End *****************************************/

docObj.Save(newURL, SaveFormat.HtmlFixed);

var assetFolder = Path.Combine(Server.MapPath("~/Documents/") + Path.GetFileNameWithoutExtension(path));

List linkedFiles = new List();
DirectoryInfo dirInfo = new DirectoryInfo(assetFolder);

if (dirInfo.Exists)
{
Dictionary<string, string> woffUriDict = new Dictionary<string, string>();
string fileName, uri = “”;

var woffFiles = dirInfo.GetFiles().Where(name => name.Extension == “.woff”);
var cssFiles = dirInfo.GetFiles().Where(name => name.Extension == “.css”);

foreach (var item in woffFiles)
{
fileName = Path.Combine(item.DirectoryName + “\” + item.Name);
uri = UploadCSSandWOFF(fileName);
linkedFiles.Add(uri);
woffUriDict.Add(item.Name, uri);
}

foreach (var item in cssFiles)
{
fileName = Path.Combine(item.DirectoryName + “\” + item.Name);
if (item.Name == “fontFaces.css”)
{
StreamReader reader = new StreamReader(fileName);
string input = reader.ReadToEnd();
reader.Close();
using (StreamWriter writer = new StreamWriter(fileName, false))
{
foreach (var data in woffFiles)
{
string uriPath = woffUriDict[data.Name];
input = input.Replace(data.Name, uriPath);
}
writer.Write(input);
writer.Close();
}

uri = UploadCSSandWOFF(fileName);
linkedFiles.Add(uri);
}
else
{
uri = UploadCSSandWOFF(fileName);
linkedFiles.Add(uri);
}
}
}

HtmlDocument doc = new HtmlDocument();
doc.Load(newURL);
int t = 0;

HtmlNode head = doc.DocumentNode.SelectSingleNode("/html/head");

foreach (var item in linkedFiles)
{
if (Path.GetExtension(item) == “.css”)
{
HtmlNode link = doc.CreateElement(“link”);
head.AppendChild(link);
link.SetAttributeValue(“rel”, “stylesheet”);
link.SetAttributeValue(“type”, “text/css”);
link.SetAttributeValue(“href”, item);
}
}
var nodes = doc.DocumentNode.SelectNodes("//img[@src]");
if (nodes.Count == imgUris.Count)
{
foreach (var tempNode in nodes)
{
tempNode.Attributes[0].Value = imgUris[t];
t++;
}
}
doc.Save(newURL);
if (System.IO.Directory.Exists(assetFolder))
{
System.IO.Directory.Delete(assetFolder, true);
}
return View();
}

///
/// Used to upload CSS and WOFF files on the blob(azure cloud).
/// “containerdell” : static blob container, On production this will be dynamic as per the partnerCode.
///
/// file name to upload
///
private string UploadCSSandWOFF(string fileName)
{
string uri = “”;
string strBlobContainer = “containerdell”;
CloudBlockBlob cloudUri = Azure.UploadToBlob(Guid.NewGuid().ToString(“N”) + Path.GetExtension(fileName), fileName, strBlobContainer);
uri = cloudUri.Uri.ToString();
return uri;
}

///
/// Upload image/shape on the blob(azure cloud) and return the uri.
///
/// object of individual shape/image content
/// returns uri of the uploaded file
private string UploadImageToServer(Shape s)
{
string strBlobContainer = “containerdell”;
string imgURL = string.Empty;
string imageFileName = “”;
if (s.ImageData.ImageType.ToString() != “Unknown”)
{
imageFileName = Server.MapPath("~/Documents/" + Guid.NewGuid().ToString(“N”) + s.Name + “.” + s.ImageData.ImageType);
s.ImageData.Save(imageFileName);
}
else
{
imageFileName = Server.MapPath("~/Documents/" + Guid.NewGuid().ToString(“N”) + s.Name + “.png”);
ImageSaveOptions opt = new ImageSaveOptions(SaveFormat.Png);
s.GetShapeRenderer().Save(imageFileName, opt);
}

CloudBlockBlob blockBlobSOPDocImage = Azure.UploadToBlob(Path.GetFileName(imageFileName), imageFileName, strBlobContainer);

if (System.IO.File.Exists(imageFileName))
{
System.IO.File.Delete(imageFileName);
}
imgURL = blockBlobSOPDocImage.Uri.ToString();
return imgURL;
}

///
/// For erforming operation on image being saved or process currently.
/// Its also auto delete image from local folder
///
public class HandleImageSaving : IImageSavingCallback
{
void IImageSavingCallback.ImageSaving(ImageSavingArgs e)
{
e.ImageStream = new MemoryStream();
e.KeepImageStreamOpen = false;
}
}
}
}

Hi Raju,

Thanks for the code. We will use it to confirm if the issues have been fixed.

Best Regards,

Hi Raju,

You can use the following code to upload all or specific resources to cloud storage and update resource URI according to new location while saving to HtmlFixed format. HtmFixedSaveOptions.ExportEmbeddedSvg can be set to false to capture IResourceSavingCallback.ResourceSaving method for all shapes (which are rendered as SVG).

var fileName = @"E:\Aspose\Defects\11694\Aspose test document.docx";

var outFileName = @"E:\Aspose\Defects\11694\out.html";

Document doc = new Document(fileName);

var so = new HtmlFixedSaveOptions();

so.ExportEmbeddedSvg = false;

var callback = new ResourceSavingCallback();

so.ResourceSavingCallback = callback;

doc.Save(outFileName, so);

callback.UploadImagesToServer();

<?xml:namespace prefix = "o" ns = "urn:schemas-microsoft-com:office:office" />

internal class ResourceSavingCallback : IResourceSavingCallback

{

private readonly Dictionary<string, Stream> mImages;

public ResourceSavingCallback()

{

mImages = new Dictionary<string, Stream>();

}

public void ResourceSaving(ResourceSavingArgs args)

{

if (args.ResourceFileName.EndsWith(".png") || args.ResourceFileName.EndsWith(".jpeg")) // TODO any other image types

{

args.ResourceFileUri = "wwww.myserver.com/" + args.ResourceFileName;

var stream = new MemoryStream();

args.ResourceStream = stream;

args.KeepResourceStreamOpen = true;

mImages.Add(args.ResourceFileName, stream);

}

}

public void UploadImagesToServer()

{

foreach (var pair in mImages)

{

// TODO upload images

Debug.WriteLine(pair.Key);

Debug.WriteLine(pair.Value.Length);

}

}

}

Best Regards,

The issues you have found earlier (filed as WORDSNET-11693) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

The issues you have found earlier (filed as WORDSNET-11694) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Moved here: https://forum.aspose.com/t/issues-converting-docx-to-html-using-aspose-words-in-asp-net-mvc-application/46749