Streaming Filtered Text
Sometimes, you might want to get filtered text in small chunks instead of all at once. This approach has the following advantages:
- You can process the output data in parallel with filtering the rest of the text. Parallel processing can minimize the time it takes to filter and process the text.
- You can choose to stop filtering when the application has all the text it needs, which can save valuable resources. This approach is called partial filtering.
You can read filtered text from a stream provided by the Text()
method on your Document
object. This allows you to stop processing before the end of the stream. File Content Extraction only processes as much of the document as it needs to output the text you read.
using (Document myDoc = session.Open(inputFilePathOrStream))
{
try
{
using (FileStream fs = File.Create(outputFilePath))
using (StreamWriter sw = new StreamWriter(fs))
using (DocumentStreamReader dsr = myDoc.Text())
{
while (!dsr.EndOfStream)
{
// Read filtered text one line at a time
var filterOutput = dsr.ReadLine();
Console.WriteLine(filterOutput);
sw.WriteLine(filterOutput);
// Here the program can check to see if a certain
// condition is met, and stop filtering
}
}
}
catch (KeyViewException e)
{
Console.WriteLine($"{e.Message}");
}
}