Questions about document redlining performance

I initially spoke with your sales team regarding licensing and pricing, and they kindly directed me to the forums for more detailed technical and performance-related questions. I’m therefore providing the requested technical details below so you can better advise on performance, scalability, and limitations.

I already asked this question in Apsose Forums, and I was redirected to ask the question here, so I really wish you can answer my questions.


1. Product Version

We are currently evaluating:

  • [Aspose.Words / GroupDocs.Comparison] Cloud API (latest available version)
  • We are also considering the SDK / self-hosted deployment option for higher performance and fewer timeout constraints.

Please let us know if there is a specific recommended version for large-document workloads.


2. Programming Environment & Language

Our application stack is:

  • Frontend: Next.js (TypeScript)
  • Backend: Node.js / TypeScript API layer
  • Document processing is handled server-side.

3. Evaluated Options

  • For cloud usage, we would use your hosted API offering.
  • For self-hosted / SDK usage, we would be open to any deployment approach as long as it can run on Linux.

4. Requirements Summary (Key Points)

Document Comparison (Redlining)

Must support:

  • DOCX, XLSX, PDF comparison
  • Output for DOCX comparison must be a Microsoft Word Track Changes document (not only a rendered PDF diff)

Document Conversion

Must support:

  • DOCX → PDF
  • XLSX → PDF

TOC / Field Code Update (TOC Template)

A critical requirement is:

  • We generate the TOC in DOCX using a TOC template XML / Word field structure.
  • Currently the TOC is not automatically updated correctly in our pipeline.
  • We need the service to recalculate Word field codes / update fields programatically so that the TOC is populated correctly without requiring the user to manually open Word/LibreOffice and update the TOC.

5. Performance & File Size Questions

During initial testing with larger DOCX documents, we observed that:

  • Cloud comparison requests timed out after approximately 20 minutes

Could you clarify:

  • Maximum supported file sizes (Cloud vs SDK)
  • Whether the Cloud timeout limit is configurable/extendable
  • Recommended best practices for processing large regulatory-style documents
  • Whether async/background job processing is available to avoid request timeouts

The larger the documents we can process reliably, the better.


6. Security & Safety Considerations (SDK)

Since we may process user-uploaded documents, security is an important consideration for us.

Could you clarify:

  • Does the SDK include built-in safeguards against malicious or malformed documents?
  • How are potentially dangerous elements handled (e.g., embedded macros, external references, scripts, malformed XML, zip bombs)?
  • Are there configurable limits for:
    • Memory usage
    • CPU usage
    • Maximum document structure depth
    • Maximum embedded object size
  • Are documents fully parsed in memory, or is streaming supported?
  • Do you provide any sandboxing recommendations for Linux deployments?

For example, if a user uploads a maliciously crafted DOCX file designed to exhaust resources or exploit the parser, what protections are in place within the SDK?


Thanks again — we’d be happy to schedule a technical call if that’s the fastest way to confirm these requirements.

Best regards,
Risto-Matti
Senior Software Engineer

@ristomattip

Thanks for the report. I have not confirmed this yet and can only confirm it after reproducing on our side. I need a bit more information to diagnose the performance issue in Aspose.Words when converting docx, xlsx, pdf to docx, pdf.

  • What is the typical size (in MB or page count) of the DOCX/XLSX/PDF files you need to compare and convert, and the maximum size you anticipate handling?
  • Which programming language/SDK would you use for a self‑hosted deployment (e.g., .NET, Java, Python), and what are the target Linux distribution and resource limits (CPU, RAM) for that environment?
  • Do you require the comparison service to run asynchronously (background job/queue) to avoid HTTP time‑outs, and if so, what maximum processing time you expect per document?
  • Are there any specific security or compliance standards (e.g., GDPR, ISO 27001) that the solution must meet, or particular sandboxing constraints you plan to enforce?

A support engineer will review your answers and follow up in this thread.

Once I have the missing details, I’ll try to reproduce this. If I can confirm it, I’ll log it internally and follow up here.

A support engineer will follow up in this thread if needed.

Hello,

Thank you for the follow-up questions — please see our clarifications below.


1. Typical and Maximum File Sizes

During testing, I performed a comparison using two DOCX files of approximately 170 MB each .

In our real-world usage:

  • Typical file sizes may range from a few megabytes up to several hundred megabytes
  • We may occasionally handle documents around 200 MB
  • Larger documents are possible, and the higher the supported limits, the better for us

These are typically large regulatory-style documents with complex structure, embedded content, and tracked revisions.

Could you clarify what the practical maximum supported file sizes are for:

  • Aspose.Words Cloud comparison
  • Self-hosted SDK comparison

And whether there are recommended configurations for handling files in the ~150–250 MB range?


2. Self-Hosted Deployment Setup

We have not yet set up a dedicated comparison server , so we are flexible regarding:

  • Programming language (.NET, Java, etc.)
  • Linux distribution
  • CPU/RAM allocation

Our key question is:

Which setup would you recommend for optimal performance and stability when handling very large document comparisons?

For example:

  • Is .NET preferred over Java for memory efficiency?
  • Are there recommended minimum RAM/CPU specifications per comparison job?
  • Do you have reference architectures for high-load or large-document workloads?

3. Asynchronous Processing & Time Expectations

Yes, asynchronous/background processing would be preferred if needed to avoid HTTP timeouts.

For very large documents, processing may take significant time — that is acceptable to us, provided it is reliable and predictable.

That said:

  • Faster processing is obviously preferred
  • We would like guidance on realistic processing time expectations for ~150–200 MB DOCX comparisons
  • Is there a recommended maximum processing time per document?

4. Security & Compliance Requirements

Security and compliance are important for us.

We need clarity on:

  • Whether Aspose Cloud meets specific compliance standards (e.g., GDPR alignment, ISO 27001, SOC 2, etc.)
  • Data handling and retention policies
  • Whether documents are stored temporarily and for how long
  • Encryption in transit and at rest

Additionally — and this part was not fully addressed previously — for the self-hosted SDK :

  • What safeguards exist against malicious or malformed documents?
  • How are zip bombs, malformed XML, embedded macros, or resource exhaustion attacks handled?
  • Are there configurable resource limits (memory usage, recursion depth, object size)?
  • Do you provide sandboxing or hardening recommendations for Linux environments?

Since we will be processing user-uploaded documents, this is a critical decision factor for us.


5. Outstanding Clarification

In previous responses, some of these SDK-related security and performance questions were redirected without concrete answers.

Because we are evaluating both Cloud and on-prem options , it is important for us to receive clear technical guidance for both models before making an architectural decision.

We would appreciate more detailed clarification on:

  • Practical performance limits
  • Recommended production setup
  • Built-in security mechanisms in the SDK

Thank you again — we are looking forward to your detailed guidance so we can proceed with the evaluation.

Best regards,
Risto-Matti