PHP: Search Text in MS Word and PDF documents

Hi tilal,

I hope you are doing well.

Now, I am working on searching a text in file (doc/docx/pdf). How I will show the result in pdf, can you please let me know , what i need to do to fetch the data to show to user.
Thanks in advance.
below is my code:
$requestDocument = ‘srch_1698842354.pdf’;
$searchRequest = new SearchRequest(
$requestDocument, $search_text, NULL, NULL, NULL, NULL, NULL);
$rt_data = $wordsApi->search($searchRequest);

Hi @tilal.ahmad

I’m getting below result, How i can construct this data into pdf to view the data.

Aspose\Words\Model\SearchResponse Object
(
[container:protected] => Array
(
[request_id] => Root=1-65424939-38408ba87eeb11a71ed2b23a
[searching_pattern] => Ravi
[search_results] => Aspose\Words\Model\SearchResultsCollection Object
(
[container:protected] => Array
(
[link] =>
[results_list] => Array
(
[0] => Aspose\Words\Model\SearchResult Object
(
[container:protected] => Array
(
[range_start] => Aspose\Words\Model\DocumentPosition Object
(
[container:protected] => Array
(
[node] => Aspose\Words\Model\RunLink Object
(
[container:protected] => Array
(
[link] =>
[node_id] => 0.0.0.0
[text] => My Name is Raviraj Kumar.
)

                                                                    )

                                                                [offset] => 11
                                                            )

                                                    )

                                                [range_end] => Aspose\Words\Model\DocumentPosition Object
                                                    (
                                                        [container:protected] => Array
                                                            (
                                                                [node] => Aspose\Words\Model\RunLink Object
                                                                    (
                                                                        [container:protected] => Array
                                                                            (
                                                                                [link] => 
                                                                                [node_id] => 0.0.0.0
                                                                                [text] => My Name is Raviraj Kumar.
                                                                            )

                                                                    )

                                                                [offset] => 15
                                                            )

                                                    )

                                            )

                                    )

                            )

                    )

            )

    )

)

@ravirajwebmail

I am afraid I did not understand your requirements. Please share your input document and expected output. You can create your output document with MS Word.

Sorry. Like Search Text in PDF Documents | Look for Specific Text in PDF , here we are selecting the files and entering the search phrase and after the clicking on “View Result” button it’s showing the pdf and higlighting the “Search phrases” in the document.
I also would to do the same at my end, Above script is working but not able to higlight the search pharse and do not know how to make the pdf .

Steps:

  1. Open Search Text in PDF Documents | Look for Specific Text in PDF
  2. Select PDF, enter search phrase
  3. Click on “SEARCH” button
  4. Click on “View Result”

Please help me with the script, how i can show the result like “Step 4”. Thanks
Here I have created the staging site https://scarywords.ostwork.com/

Test_PDF_1.pdf (21.5 KB)

Screenshot 2023-11-02 104416.png (88.6 KB)

@ravirajwebmail

Thanks for the additional information. You can accomplish your requirements in two steps. First, find the rectangle coordinates of your text in the PDF document, and then annotate the text. Please check the following API methods to add the required annotations to a PDF document.

Step 1: Get the text position

GET ​/pdf​/{name}​/text Read document text.

curl -X GET "https://api.aspose.cloud/v3.0/pdf/02_pages.pdf/text?format=origami&splitRects=true&folder=Temp&LLX=0&LLY=0&URX=0&URY=0" 
-H "accept: application/json" 
-H "authorization: Bearer [Access_Token]" 
-H "x-aspose-client: Containerize.Swagger"

Step 2: Add Annotation to the specified rectangle location

POST ​/pdf​/{name}​/pages​/{pageNumber}​/annotations​/highlight Add document page highlight annotations.

​POST /pdf​/{name}​/pages​/{pageNumber}​/annotations​/strikeout Add document page StrikeOut annotations.

POST ​/pdf​/{name}​/pages​/{pageNumber}​/annotations​/redaction Add document page redaction annotations.

curl -X POST "https://api.aspose.cloud/v3.0/pdf/02_pages.pdf/pages/1/annotations/highlight?folder=Temp" 
-H "accept: application/json" 
-H "authorization: Bearer [JWT_Access_Token]" 
-H "Content-Type: application/json" 
-H "x-aspose-client: Containerize.Swagger" 
-d "[ { "Color": { "A": 255, "R": 255, "G": 255, "B": 0 }, "Name": "highlight_annot", "Rect": { "LLX": 259.27580539703365, "LLY": 743.4707997894287, "URX": 332.26148873138425, "URY": 765.5148007965088 }, "PageIndex": 1, "ZIndex": 0, "Subject": "subject", "Title": "test", "RichText": "rich text" }]"

Thanks @tilal.ahmad for the above and How we can do the above same for Word Document. can you please help? Thanks

@ravirajwebmail

A Word document consists of sections, paragraphs, and runs (text). So to highlight text in a Word document, first you need to find the run in the paragraph and then update its HighlightColor property using the UpdateRunFont API method.

Hi @tilal.ahmad ,
I want to search two keyword in a pdf in a single query, below is my code, and it’s not working.
$name = ‘Test_PDF_03th_Nov_2023.pdf’;
$res = $pdfApi->getText($name,0,0,0,0,[‘Ravi’,‘Twist’]);
Please help how can i search two string in a single query, Thanks

https://reference.aspose.cloud/pdf/#/Text/GetText
On this page, it’s working but above code it’s not working

@ravirajwebmail

We have noticed the GetText API method in Aspose.PDF Cloud SDK for PHP is not working as expected. So I logged a ticket (PDFCLOUD-3877) for further investigation and rectification. We will notify you as soon as the issue is resolved.

Hi @tilal.ahmad ,
Hope you are doing well.
I need your urgent input.
I have to update only certain characters of a run, how i can do that?
Below is the code and response data:

$requestDocument = “f2_1699257961.docx”;
$searchRequest = new SearchRequest(
$requestDocument, ‘ravi’, NULL, NULL, NULL, NULL, NULL);
$res = $wordsApi->search($searchRequest);
$res = ObjectSerializer::sanitizeForSerialization($res);
echo “

”;
print_r($res);

stdClass Object

(
[SearchingPattern] => ravi
[SearchResults] => stdClass Object
(
[ResultsList] => Array
(
[0] => stdClass Object
(
[RangeStart] => stdClass Object
(
[Node] => stdClass Object
(
[Text] => vestibulum gravida,
[NodeId] => 0.0.12.6
)

                                [Offset] => 13
                            )

                        [RangeEnd] => stdClass Object
                            (
                                [Node] => stdClass Object
                                    (
                                        [Text] =>  vestibulum gravida, 
                                        [NodeId] => 0.0.12.6
                                    )

                                [Offset] => 17
                            )

                    )

                [1] => stdClass Object
                    (
                        [RangeStart] => stdClass Object
                            (
                                [Node] => stdClass Object
                                    (
                                        [Text] => . In gravida et 
                                        [NodeId] => 0.0.20.158
                                    )

                                [Offset] => 6
                            )

                        [RangeEnd] => stdClass Object
                            (
                                [Node] => stdClass Object
                                    (
                                        [Text] => . In gravida et 
                                        [NodeId] => 0.0.20.158
                                    )

                                [Offset] => 10
                            )

                    )

            )

    )

[RequestId] => Root=1-654901f1-579db83c3d8f645043ef1c92

)

I’m using below code to update runs, but issue is that
it’s changing the color of whole run, not only a certail text like “ravi”.

$reqXmlColorDto = new XmlColor(array(“alpha” => 255,“web”=>“red”));
$requestFontDto = new Font(array(“bold”=>true,“color” => $reqXmlColorDto));
//$runUpdateReqData = new UpdateRunFontRequest($requestDocument, $paragraph_path, $index, $requestFontDto, null, null, null, null, null, $desDocument, null, null);
foreach($res->SearchResults->ResultsList as $key => $uData)
{
$mD = $uData->RangeStart->Node->NodeId;
$md = explode(‘.’,$mD);

	$runUpdateReqData = new UpdateRunFontRequest($requestDocument, 'sections/'.$md[1].'/paragraphs/'.$md[2], $md[3], $requestFontDto, null, null, null, null, null, null, null, null);
	$wordsApi->updateRunFont($runUpdateReqData);
}

Please let me know how i can update only the some text of a run using CURL. Thanks

@ravirajwebmail

Please share your input, output and expected output documents with us. We will look into the issue and guide you accordingly.

Hi @tilal.ahmad ,

Below is the sentence, and I just wanted to color the text “vira” to yellow in below sentence, How can I achive that? Thanks
Sentance: “My Name is Raviraj Kumar”
Filet type: docx

@ravirajwebmail

I have noticed the reported issue and logged a ticket(WORDSCLOUD-2529) for further investigation and rectification. We will keep you updated about the issue resolution progress in this forum thread.

Thanks @tilal.ahmad .
I hope, I will get resolution very fast. Thanks

1 Like

Hi @tilal.ahmad , I have noticed that issue WORDSCLOUD-2529 ---- Status : Closed is closed , Where I can find the issue resolution link, Thanks

@ravirajwebmail

In reference to WORDSCLOUD-2529, please note that a run is the minimal part of a Word document. If you change the style of some characters, Word separates them in a run and assigns the style to them.

Please check the sample code; you can follow the workflow in your code accordingly. Hopefully, it will help you accomplish your requirements.

 var localName = "source.docx";
 var remoteName = "source.docx";
 var resultName = "result.docx";

 var paragraphPath = "sections/0/paragraphs/0";
 string runPath = "0.0.0.0";
 var runIndex = 0;
 var phrase = "vira";

 // upload the file
 await api.UploadFile(new UploadFileRequest {
   FileContent = new FileStream(localName, FileMode.Open),
     Path = remoteName,
 });

 // get the run in the paragraph
 var runResponse = await api.GetRun(new GetRunRequest {
   Name = remoteName,
     ParagraphPath = paragraphPath,
     Index = runIndex
 });

 // remove the run
 await api.DeleteRun(new DeleteRunRequest {
   Name = remoteName,
     ParagraphPath = paragraphPath,
     Index = runIndex
 });

 var splitted = Regex.Split(runResponse.Run.Text, $ "({phrase})").Reverse();

 foreach(var strValue in splitted) {
   var insertRequest = new InsertRunRequest {
     Name = remoteName,
       ParagraphPath = "sections/0/paragraphs/0",
       InsertBeforeNode = runPath,
       Run = new RunInsert {
         Text = strValue
       }
   };

   await api.InsertRun(insertRequest);

   if (strValue.Equals(phrase)) {
     await api.UpdateRunFont(new UpdateRunFontRequest {
       Name = remoteName,
         ParagraphPath = paragraphPath,
         Index = runIndex,
         FontDto = new Font {
           Color = new XmlColor {
             Web = "Red"
           }
         }
     });
   }
 }

 using(var fs = new FileStream(resultName, FileMode.OpenOrCreate)) {
   await api.DownloadFile(new DownloadFileRequest {
     Path = remoteName
   }).Result.CopyToAsync(fs);
 }

The issues you have found earlier (filed as WORDSCLOUD-2529) have been fixed in this update. This message was posted using Bugs notification tool by Ivanov_John