Questions about document redlining performance

I initially spoke with your sales team regarding licensing and pricing, and they kindly directed me to the forums for more detailed technical and performance-related questions. I’m therefore providing the requested technical details below so you can better advise on performance, scalability, and limitations.

I already asked this question in Apsose Forums, and I was redirected to ask the question here, so I really wish you can answer my questions.


1. Product Version

We are currently evaluating:

  • [Aspose.Words / GroupDocs.Comparison] Cloud API (latest available version)
  • We are also considering the SDK / self-hosted deployment option for higher performance and fewer timeout constraints.

Please let us know if there is a specific recommended version for large-document workloads.


2. Programming Environment & Language

Our application stack is:

  • Frontend: Next.js (TypeScript)
  • Backend: Node.js / TypeScript API layer
  • Document processing is handled server-side.

3. Evaluated Options

  • For cloud usage, we would use your hosted API offering.
  • For self-hosted / SDK usage, we would be open to any deployment approach as long as it can run on Linux.

4. Requirements Summary (Key Points)

Document Comparison (Redlining)

Must support:

  • DOCX, XLSX, PDF comparison
  • Output for DOCX comparison must be a Microsoft Word Track Changes document (not only a rendered PDF diff)

Document Conversion

Must support:

  • DOCX → PDF
  • XLSX → PDF

TOC / Field Code Update (TOC Template)

A critical requirement is:

  • We generate the TOC in DOCX using a TOC template XML / Word field structure.
  • Currently the TOC is not automatically updated correctly in our pipeline.
  • We need the service to recalculate Word field codes / update fields programatically so that the TOC is populated correctly without requiring the user to manually open Word/LibreOffice and update the TOC.

5. Performance & File Size Questions

During initial testing with larger DOCX documents, we observed that:

  • Cloud comparison requests timed out after approximately 20 minutes

Could you clarify:

  • Maximum supported file sizes (Cloud vs SDK)
  • Whether the Cloud timeout limit is configurable/extendable
  • Recommended best practices for processing large regulatory-style documents
  • Whether async/background job processing is available to avoid request timeouts

The larger the documents we can process reliably, the better.


6. Security & Safety Considerations (SDK)

Since we may process user-uploaded documents, security is an important consideration for us.

Could you clarify:

  • Does the SDK include built-in safeguards against malicious or malformed documents?
  • How are potentially dangerous elements handled (e.g., embedded macros, external references, scripts, malformed XML, zip bombs)?
  • Are there configurable limits for:
    • Memory usage
    • CPU usage
    • Maximum document structure depth
    • Maximum embedded object size
  • Are documents fully parsed in memory, or is streaming supported?
  • Do you provide any sandboxing recommendations for Linux deployments?

For example, if a user uploads a maliciously crafted DOCX file designed to exhaust resources or exploit the parser, what protections are in place within the SDK?


Thanks again — we’d be happy to schedule a technical call if that’s the fastest way to confirm these requirements.

Best regards,
Risto-Matti
Senior Software Engineer

@ristomattip

Thanks for the report. I have not confirmed this yet and can only confirm it after reproducing on our side. I need a bit more information to diagnose the performance issue in Aspose.Words when converting docx, xlsx, pdf to docx, pdf.

  • What is the typical size (in MB or page count) of the DOCX/XLSX/PDF files you need to compare and convert, and the maximum size you anticipate handling?
  • Which programming language/SDK would you use for a self‑hosted deployment (e.g., .NET, Java, Python), and what are the target Linux distribution and resource limits (CPU, RAM) for that environment?
  • Do you require the comparison service to run asynchronously (background job/queue) to avoid HTTP time‑outs, and if so, what maximum processing time you expect per document?
  • Are there any specific security or compliance standards (e.g., GDPR, ISO 27001) that the solution must meet, or particular sandboxing constraints you plan to enforce?

A support engineer will review your answers and follow up in this thread.

Once I have the missing details, I’ll try to reproduce this. If I can confirm it, I’ll log it internally and follow up here.

A support engineer will follow up in this thread if needed.

Hello,

Thank you for the follow-up questions — please see our clarifications below.


1. Typical and Maximum File Sizes

During testing, I performed a comparison using two DOCX files of approximately 170 MB each .

In our real-world usage:

  • Typical file sizes may range from a few megabytes up to several hundred megabytes
  • We may occasionally handle documents around 200 MB
  • Larger documents are possible, and the higher the supported limits, the better for us

These are typically large regulatory-style documents with complex structure, embedded content, and tracked revisions.

Could you clarify what the practical maximum supported file sizes are for:

  • Aspose.Words Cloud comparison
  • Self-hosted SDK comparison

And whether there are recommended configurations for handling files in the ~150–250 MB range?


2. Self-Hosted Deployment Setup

We have not yet set up a dedicated comparison server , so we are flexible regarding:

  • Programming language (.NET, Java, etc.)
  • Linux distribution
  • CPU/RAM allocation

Our key question is:

Which setup would you recommend for optimal performance and stability when handling very large document comparisons?

For example:

  • Is .NET preferred over Java for memory efficiency?
  • Are there recommended minimum RAM/CPU specifications per comparison job?
  • Do you have reference architectures for high-load or large-document workloads?

3. Asynchronous Processing & Time Expectations

Yes, asynchronous/background processing would be preferred if needed to avoid HTTP timeouts.

For very large documents, processing may take significant time — that is acceptable to us, provided it is reliable and predictable.

That said:

  • Faster processing is obviously preferred
  • We would like guidance on realistic processing time expectations for ~150–200 MB DOCX comparisons
  • Is there a recommended maximum processing time per document?

4. Security & Compliance Requirements

Security and compliance are important for us.

We need clarity on:

  • Whether Aspose Cloud meets specific compliance standards (e.g., GDPR alignment, ISO 27001, SOC 2, etc.)
  • Data handling and retention policies
  • Whether documents are stored temporarily and for how long
  • Encryption in transit and at rest

Additionally — and this part was not fully addressed previously — for the self-hosted SDK :

  • What safeguards exist against malicious or malformed documents?
  • How are zip bombs, malformed XML, embedded macros, or resource exhaustion attacks handled?
  • Are there configurable resource limits (memory usage, recursion depth, object size)?
  • Do you provide sandboxing or hardening recommendations for Linux environments?

Since we will be processing user-uploaded documents, this is a critical decision factor for us.


5. Outstanding Clarification

In previous responses, some of these SDK-related security and performance questions were redirected without concrete answers.

Because we are evaluating both Cloud and on-prem options , it is important for us to receive clear technical guidance for both models before making an architectural decision.

We would appreciate more detailed clarification on:

  • Practical performance limits
  • Recommended production setup
  • Built-in security mechanisms in the SDK

Thank you again — we are looking forward to your detailed guidance so we can proceed with the evaluation.

Best regards,
Risto-Matti

I will contact the development team, and we will create a comprehensive guide that answers all the questions you asked.

Sorry for the delayed response. Please find the answers to your questions below:

Both the Aspose.Words Cloud service and the Docker Hub version of the API have a 1 GB request body size limit. Technically, this means it does not matter which version you plan to use — the limits are the same.

Please also note that files around 200 MB may take a significant amount of time to load from and save to storage. The current request timeout is set to 60 minutes. If the network connection is unstable or slow, this could potentially cause issues.

The self-hosted version is based on our Docker Hub API image.

We test our Docker image on Ubuntu-based servers, so it is a fully supported and stable solution in that environment.

It is difficult to provide exact CPU/RAM requirements because large files can consume substantial memory depending on their internal structure. For example, even a 50 MB file may require up to 1 GB of RAM when fully loaded into memory. Therefore, we recommend performing performance testing with your specific documents and expected workload.

As a starting point, you may consider an 8-core CPU with 8 GB of RAM and then monitor resource usage to determine whether CPU or memory is the primary bottleneck.

Regarding client SDKs, you can use whichever language best fits your stack — Python, Ruby, .NET, or Java. I would suggest starting with Python or .NET.

Asynchronous operations are supported both at the API and SDK levels. The implementation depends on how you design and execute these operations in your application.

Additionally, we are currently developing a queue-based processing feature. Once released, it will allow you to submit requests without waiting for immediate processing and retrieve the results once they are ready.

If you plan to use our internal storage, please note that the default file retention period is up to one month. You can configure a shorter retention period if needed, and it is also possible to define a regex-based mask to preserve specific files or folders.

Regarding compliance standards, our API is not currently certified against major compliance frameworks. If this is a critical requirement for your organization, it may be better to consider the on-premise version of Aspose.Words instead.

Based on your requirements, I believe the on-premise option would be the better choice, particularly from a security and compliance standpoint.

Using storage within your own network should also provide better performance compared to external services. Additionally, running your own comparison server gives you full control over CPU and RAM allocation and overall resource management.