java
html
php
xml
ajax
mysql
database
linux
xcode
ruby-on-rails
regex
objective-c
visual-studio
silverlight
flash
algorithm
apache
php5
api
jsp
You don't want to implement an IFilter to parse an Office 2007 docx. You want to use Microsoft's already written IFilter objects, so that you can learn the contents of a docx file.
IFilter
docx
Then you use standard IFilter mechanisms to parse the file contents:
procedure TForm1.ProcessFile(filename: string); var Filter: IFilter; hr: HRESULT; chunk: PSTAT_CHUNK; // attr: FULLPROPSPEC; flags: ULONG; c: Cardinal; buffer: WideString; begin Log('Processing "'+filename+'"'); Log('Calling LoadIFilter'); filter := LoadIFilter(filename); if filter = nil then begin Log('filter is null; leaving'); Exit; end; try Log('Calling filter.Init(IFILTER_INIT_INDEXING_ONLY)'); hr := filter.Init(IFILTER_INIT_INDEXING_ONLY, 0, nil, flags); OleCheck(hr); Log('Init returned sucessfully, looking for chunks...'); while True do begin New(chunk); try hr := filter.GetChunk(chunk); if Failed(hr) then begin Log('No more chunks: '+IntToHex(hr, 8)+' ('+GetChunkHresultToStr(hr)+')'); Break; end; Log('== Got chunk. ChunkType='+IntToStr(chunk.flags)+' (1=text, 2=value) =='); if (chunk.Flags and CHUNK_TEXT) = CHUNK_TEXT then begin c := 2048; SetLength(buffer, c); hr := filter.GetText(c, PWideChar(buffer)); if Succeeded(hr) then begin Log('=== Got text ==='); SetLength(buffer, c); Log(buffer); while Succeeded(hr) do begin c := 2048; SetLength(buffer, c); hr := filter.GetText(c, PWideChar(buffer)); if Succeeded(hr) then begin SetLength(buffer, c); Log('==== Really long chunk, here''s the next 2048 characters ===='); Log(buffer); end; end; end else begin Log('Could not get text from chunk: '+IntToHex(hr, 8)+' ('+GetChunkHResultToStr(hr)+')'); Log(' It might be a "Value" chunk, meaning i should call filter.GetValue rather than filter.GetText. But i''m too lazy'); end; end else if (chunk.flags and CHUNK_VALUE) = CHUNK_VALUE then begin Log('This is a "VALUE" chunk. i''m not going to read anything out of it cause it''s too hard :('); end else Log('Unknown chunk type'); finally Dispose(chunk); end; end; //end while true getting chunks finally filter := nil; end; end;
Where Windows already provides the code that loads an IFilter for a specified filename:
function TForm1.LoadIFilter(const filename: WideString): IFilter; var hr: HRESULT; unk: IUnknown; begin hr := ntQuery.LoadIFilter(PWideChar(filename), nil, unk); OleCheck(hr); Result := unk as IFilter; end;
If you seek out the old Borland/CodeGear newsgroups, then you may find references to an IFilter implementation by "Soluciones Vulcano" which has reference to develop.shorterpath.com which still seems to exist. Beyond that, I've never seen any other implementation component, and I've not yet managed to look at it myself.