Hounder features
Designed for flexibility
- Works as a complete solution (from crawling the web to providing a search interface).
- Works as a complement to existing solutions (feeding a content system or indexing a data stream).
- Crawl specific sites in depth or the web at large, searching for relevant pages and automatically classifying them
- Add custom modules to indexer, crawler and searcher to add functionality.
Installation
- GUI and Command line installer.
- Configuration wizard for most common uses.
Integration
- Searcher supports XML-RPC, RMI and OpenSearch.
- Indexer supports XML-RPC and RMI.
Indexing
- Document processing pipeline defined by modules, defined as plugins: create and add your own.
- Existent modules include: filtering spam, adding certain fields, logging, etc.
- Manage when and how index updates are submitted to the searcher.
Crawler
- Bayesian filter to determine if a page is of interest or to which category it belongs.
- Politeness.
- Detects page content change and adapts frequency of recrawling.
- Document processing pipeline defined by modules, defined as plugins: create and add your own.
- Existent modules include: whitelisting, blacklisting, boosting, classifying, caching, indexing, etc.
Search results
- Snippet generation.
- Results grouping.
- Boosting.
Queries
- Define fields for your documents and search on the fields you want.
- Operators: Or, And, Not.
- Phrase recognition.
Performance
- Results caching.
- Smart query execution and queue size management.
Monitoring and controlling
- Monitor and control all nodes of Hounder with the clustering web application.