Streaming Filtered Text

Sometimes, you might want to get filtered text in small chunks instead of all at once. This approach has the following advantages:

  • You can process the output data in parallel with filtering the rest of the text. Parallel processing can minimize the time it takes to filter and process the text.
  • You can choose to stop filtering when the application has all the text it needs, which can save valuable resources. This approach is called partial filtering.

You can read filtered text from a stream provided by the text property on your Document object. This allows you to stop processing before the end of the stream. File Content Extraction only processes as much of the document as it needs to output the text you read.

Copy
with session.open(file_path) as doc:
    try:
        for line in doc.text:
            # Handle the line of text
            print(line)
    
    except kv.KeyViewError as e:
        print(e)