Streaming Filtered Text

Sometimes, you might want to get filtered text in small chunks instead of all at once. This approach has the following advantages:

  • You can process the output data in parallel with filtering the rest of the text. Parallel processing can minimize the time it takes to filter and process the text.
  • You can choose to stop filtering when the application has all the text it needs, which can save valuable resources. This approach is called partial filtering.

You can read filtered text from a stream provided by the Text() method on your Document object. This allows you to stop processing before the end of the stream. File Content Extraction only processes as much of the document as it needs to output the text you read.

Copy
using (Document myDoc = session.Open(inputFilePathOrStream))
{
    try
    {
        using (FileStream fs = File.Create(outputFilePath))
        using (StreamWriter sw = new StreamWriter(fs))
        using (DocumentStreamReader dsr = myDoc.Text())
        {
            while (!dsr.EndOfStream)
            {
                // Read filtered text one line at a time
                var filterOutput = dsr.ReadLine();
                
                Console.WriteLine(filterOutput);
                sw.WriteLine(filterOutput);
                
                // Here the program can check to see if a certain
                // condition is met, and stop filtering
            }
        }
    }
    catch (KeyViewException e)
    {
        Console.WriteLine($"{e.Message}");
    }
}