一.需求描述
Ranked Retrieval : 应用BM25对搜索出的结果排序(RANK)。
高级功能:
实现一个就行:
1. Automatic query expansion (pseudo-relevance feedback): Implement an
extension to your ranked retrieval approach that implements an automatic query
expansion approach (for example, the Okapi Term Selection Value approach, or
another approach).
Such an approach should include at least two parameters that can be set at run-
time: R, the number of top-ranked documents that are assumed to be relevant; and
E, the number of terms that should be appended to the original query.
2. Manual relevance feedback: Implement an extension to your ranked retrieval
approach that implements manual relevance feedback.
Such approaches must include an interactive feedback phase, where a user is presented with a list of documents in response to an initial query, and is given the
opportunity to mark certain documents as relevant (and, optionally, to mark some
as non-relevant).
The initial query then needs to be updated, for example using Rocchio's approach.
Finally, the updated query is run to retrieve the final answer list.
3. Phrase search: Extend your ranked retrieval implementation to support phrase
queries (for example, the query "cold fusion" would return all documents (and
only those documents) that contain the term cold followed directly by the term
fusion).
4. Diversity ranking: Extend your ranked retrieval approach to include a diversity
component. For example, for an ambiguous query such as java, a diversity approach
would seek to ensure that the top answers include documents that cover various
interpretations of the query (the island, the coee, the programming language).
An example of a diversity approach is Maximal Marginal Relevance (MMR).
5. Disk-based inverted index construction: Modern collections are often too
large to t into main memory. Constructing an inverted index therefore requires
various efficiency considerations.
6. Document summaries: Extend your search system to produce short document
summaries with each answer item returned in a search results list.
You must implement at least two summary creation approaches. One of these should
be based on query-biased information, taking the user's query into account. The
second should include evidence other than the user's current query (for example, a
very simple choice would be to simply return the first chunk of a document).
7. Other advanced feature: If you wish to propose another advanced IR feature,
you are welcome to do so. However, you must send an email to the lecturer to
discuss this, and to agree on the scope of what such a feature needs to cover, and
how it should be evaluated.
二.人才需求
熟练使用JAVA语言
三.参考作品
不能参考现有代码
四.合作方式
远程