Deleted page beaks on split document

LegalUp · June 9, 2025, 7:06pm

Hello,

We are using the split document API, and we have the problem that this api deletes page breaks and, for some reason, section breaks could be placed in the wrong place in text (from one clause to another).

We are using splitting like this

const splitRequest = new SplitDocumentOnlineRequest({
  document: requestDocument,
  format: "docx",
  destFileName: "SplitDocument.docx",
  from: 1,
  to: 5,
});

The main problem, as we see it, is that the API returns us these pages in separate documents. We receive an array of documents by page. As we see, this could be fixed:

An ideal fix is if we could have a flag that will allow us to receive a new document with specific pages we asked for (using from and to parameters) without any changes in styles, structure, e.t.c. Example

const splitRequest = new SplitDocumentOnlineRequest({
  document: requestDocument,
  format: "docx",
  destFileName: "SplitDocument.docx",
  from: 1,
  to: 5,
  oneDocument: true // <- new property
});

If 1st variant is not possible, maybe you can save page breaks, section breaks, and other breaks in the final Word documents you return from the API. They should correct and work on AppendDocumentOnlineRequest and appendAllEntriesToOneSection: true

Professionalize.Discourse · June 9, 2025, 7:06pm

@LegalUp

It seems you are experiencing issues with the split document API, specifically regarding the loss of page breaks and incorrect placement of section breaks when splitting documents. Here are a couple of suggestions to address your concerns:

Single Document Output: You mentioned the need for a flag that would allow the API to return a single document containing the specified pages without altering the styles or structure. While this feature is not currently available, it would be a valuable enhancement to propose to the development team. You could suggest adding a property like oneDocument: true to the request, which would ensure that the output maintains the original formatting and structure.
Preserving Breaks: If the first option is not feasible, you could request that the API retains page breaks, section breaks, and other formatting elements in the final documents. This would help ensure that when you use the AppendDocumentOnlineRequest with the appendAllEntriesToOneSection: true option, the breaks are correctly handled and positioned.

For now, I recommend submitting a feature request to the Aspose team regarding the single document output option, as well as the preservation of breaks during the split process. This feedback can help improve the API for all users.

If you have further questions or need assistance with specific code implementations, feel free to ask!

yaroslaw.ekimov · June 10, 2025, 4:53am

The behavior you want to achieve already exists but on other api method

The Range API allows you to save any specific range in the document to the new document, it could be page0, page1:end(if you want to save first two pages for example) other possible value is id.0.0.0 id.0.0.2 if you want save just couple of paragraphs
you can try it first using reference page and if result suits your requirements just use sdk to execute the same scenario

LegalUp · June 26, 2025, 12:44pm

@yaroslaw.ekimov thank you for the link. it’s really what we needed. But I still have a small thing I can’t fix. We are trying to split big document for smaller by using empty pages as delimiter. For it we are using

    // First, split document to TXT format to analyze empty pages
    const splitRequestTxtFileName = `${timestamp}_split.txt`;
    const splitRequestResult = await wordsApi.splitDocumentOnline(
      new SplitDocumentOnlineRequest({
        document: bufferToStream(requestDocumentBuffer),
        format: 'txt',
        destFileName: splitRequestTxtFileName,
      }),
    );

    const pagesArray = new Array(splitRequestResult.body.document.size);

    // Process each page to identify empty ones
    for (const [name, doc] of splitRequestResult.body.document) {
      pagesArray[getPageNumber(name)] = doc.toString().trim();
    }

    // Find ranges of non-empty pages (using actual page numbers, not array indices)
    const splitsArray = [];
    let splitStart = null;
    
    pagesArray.forEach((page, i) => {
      const actualPageNumber = i + 1; // Convert to 1-based page numbering
      
      if (page && page.length > 0) {
        // Non-empty page
        if (splitStart === null) {
          splitStart = actualPageNumber;
        }
      } else {
        // Empty page
        if (splitStart !== null) {
          // End the current range
          splitsArray.push([splitStart, actualPageNumber - 1]);
          splitStart = null;
        }
      }
    });

but pages in txt and docx is different. if we calculate pages to split and I split it by these ranges i see wrong documents on the end. Could you say how can we find and split by empty pages in correct way. Maybe there is some flags to have the same behaviour on txt and docx document, or maybe some way how to finc empty pages via API

yaroslaw.ekimov · June 27, 2025, 3:07am

I will check the implementation to see if it is now possible or not.

yaroslaw.ekimov · June 30, 2025, 7:42am

Pages in the TXT and DOCX formats might be different, so we can’t guarantee that page 1 in DOCX is completely the same in TXT.
I will link the new issue for analysis and possible implementation

yaroslaw.ekimov · June 30, 2025, 7:42am

@LegalUp
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSCLOUD-3064

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

LegalUp · July 18, 2025, 7:42pm

Hello,

Could you inform us when it could be done? And after you release it, also.

LegalUp · July 18, 2025, 7:46pm

Also I would as you to check the bug. Fow example we splitted doc to search empty pages. In splited doc we see 10 pages for example. But if we start to split to start-5 and 6-end - we see that in final documents we have not 5 and 4 pages but 6 and 5 pages because splitted documents had less text for some reason. Based on this problem we have wrong cut process because sometimes we cut less that needed, sometimes more than needed

yaroslaw.ekimov · July 19, 2025, 2:53am

The issue is in queue for analysis and implementation.