Streaming Filtered Text
Sometimes, you might want to get filtered text in small chunks instead of all at once. This approach has the following advantages:
- You can process the output data in parallel with filtering the rest of the text. Parallel processing can minimize the time it takes to filter and process the text.
- You can choose to stop filtering when the application has all the text it needs, which can save valuable resources. This approach is called partial filtering.
You can read filtered text from a stream provided by the text
property on your Document
object. This allows you to stop processing before the end of the stream. File Content Extraction only processes as much of the document as it needs to output the text you read.
with session.open(file_path) as doc:
try:
for line in doc.text:
# Handle the line of text
print(line)
except kv.KeyViewError as e:
print(e)