Package org.apache.poi.extractor
Class POIOLE2TextExtractor
java.lang.Object
org.apache.poi.extractor.POITextExtractor
org.apache.poi.extractor.POIOLE2TextExtractor
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
EventBasedExcelExtractor
,ExcelExtractor
,HPSFPropertiesExtractor
,OutlookTextExtactor
,PowerPointExtractor
,PublisherTextExtractor
,VisioTextExtractor
,Word6Extractor
,WordExtractor
Common Parent for OLE2 based Text Extractors
of POI Documents, such as .doc, .xls
You will typically find the implementation of
a given format's text extractor under
org.apache.poi.[format].extractor .
-
Field Summary
Fields -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotected
POIOLE2TextExtractor
(POIOLE2TextExtractor otherExtractor) Creates a new text extractor, using the same document as another text extractor.POIOLE2TextExtractor
(POIDocument document) Creates a new text extractor for the given document -
Method Summary
Modifier and TypeMethodDescriptionReturns the document information metadata for the documentReturn the underlying POIDocumentReturns an HPSF powered text extractor for the document properties metadata, such as title and author.getRoot()
Return the underlying DirectoryEntry of this document.Returns the summary information metadata for the document.Methods inherited from class org.apache.poi.extractor.POITextExtractor
close, getText, setFilesystem
-
Field Details
-
document
The POIDocument that's open
-
-
Constructor Details
-
POIOLE2TextExtractor
Creates a new text extractor for the given document- Parameters:
document
- The POIDocument to use in this extractor.
-
POIOLE2TextExtractor
Creates a new text extractor, using the same document as another text extractor. Normally only used by properties extractors.- Parameters:
otherExtractor
- the extractor which document to be used
-
-
Method Details
-
getDocSummaryInformation
Returns the document information metadata for the document- Returns:
- The Document Summary Information or null if it could not be read for this document.
-
getSummaryInformation
Returns the summary information metadata for the document.- Returns:
- The Summary information for the document or null if it could not be read for this document.
-
getMetadataTextExtractor
Returns an HPSF powered text extractor for the document properties metadata, such as title and author.- Specified by:
getMetadataTextExtractor
in classPOITextExtractor
- Returns:
- an instance of POIExtractor that can extract meta-data.
-
getRoot
Return the underlying DirectoryEntry of this document.- Returns:
- the DirectoryEntry that is associated with the POIDocument of this extractor.
-
getDocument
Return the underlying POIDocument- Specified by:
getDocument
in classPOITextExtractor
- Returns:
- the underlying POIDocument
-