magnolia-lucene indexer application
Its a standalone application which could be configured to index pages from different magnolia instances
config/indexer.xml
<Repository config="./config/repository_config_test.xml" id="website" logName="site-1 indexer" interval="3600000" indexDir="./index" domain="http://localhost:8082">
</Repository>
config is same as repository config file in default magnolia instance (WEB-INF/config/config.xml)
id is an ID of a repository as defined in the above config file
logName is used by log4j (any name string)
interval in milliseconds
indexDir is a start directory where lucene index will be created
domain which will be indexed recursively
How it works
it reads repository as configured and stream all pages one by one using straight http calls to the domain specified.
each indexed document has two fields handle and data, handle is same as magnolia path and data holds all html or whatever returned by the server.
you can program your templates such that if the request is coming from this indexer you return plain text or page without navigation etc....
How to use from your search template
The index created is pure lucene index, use lucene API in your template and point to the same index directory indexDir
In its previous incarnation on JspWiki, this page was last edited on Feb 9, 2007 10:25:51 AM