Free Support Forum - aspose.cloud

Extract Images from word doc.- store it on cloud and pass the url to src tag while converting it to HTML

I have a requirement where i need to convert word document to html and images needs to be uploaded first on cloud and the http url of each image needs to be passed in respective
src tag of html file.

How aspose words can help to achieve this ?
I’ve bought the license, and stuck in its implementation to achieve the solution of the above.

Any assistant would be of great help.
Thanks !

Hi Raju,


Thanks for your inquiry. I believe you can achieve this after using the following code:

// load document which has images

Document doc = new Document(MyDir + @"input.doc");

// iterate through all shape nodes

foreach (Shape s in doc.GetChildNodes(NodeType.Shape, true))

{

// check if it is an image

if (s.HasImage)

{

// upload the image to cloud and retrieve new url

string imgUrl = UploadImageToServer(s);

string imgName = Path.GetFileName(imgUrl);

// temprarily store name of the image in a suitable Shape's property

s.AlternativeText = imgName;

}

}

HtmlSaveOptions so = new HtmlSaveOptions(SaveFormat.Html);

// set an option to specify cloud server's BaseUri

so.ImagesFolderAlias = "http://www.yourservername.com/images/";

// create and pass the object which implements the image handler methods.

so.ImageSavingCallback = new HandleImageSaving();

// save in the html format

doc.Save(MyDir + @“out.html”, so);



public class HandleImageSaving : IImageSavingCallback<o:p></o:p>

{

void IImageSavingCallback.ImageSaving(ImageSavingArgs e)

{

e.ImageFileName = e.CurrentShape.AlternativeText;

((Shape)e.CurrentShape).AlternativeText = string.Empty;

e.ImageStream = new MemoryStream();

e.KeepImageStreamOpen = false;

}

}


I hope, this helps.

Best regards,

Shape s in doc.GetChildNodes(NodeType.Shape,
true)

This doesn’t recognize images in docx file.
Is there any other class that can be used to extract images from docx file ?

and also recognize src tag when converted to html, so that images cloud uri can be passed at respective src tag.

Thanks.

Hi Raju,

Thanks for your inquiry. An image can either be represented by Shape or DrawingML object and you can try something like below to get reference to collections of all images in document:

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
// Newer Microsoft Word documents (such as DOCX) may contain a different type of image container called DrawingML.
// Repeat the process to extract these if they are present in the loaded document.
NodeCollection dmlShapes = doc.GetChildNodes(NodeType.DrawingML, true);

Best regards,