A Review Of Journal Article On Search Engine Operations (Part 1)

This post is a review of journal articles that discuss various subjects on search engine operations.
Lewandowski (2006) discusses Web search engines; mainly the challenges in indexing the World Wide Web, the user behavior, and the ranking factors used by these engines. He divided these ranking factors into query-dependent and query-independent factors, the latter of which have become more and more important within recent years. The possibilities of these factors are limited, mainly of those that are based on the widely used link popularity measures. He concluded his article with an overview of factors that should be considered to determine the quality of Web search engines.  He stated that the challenges in indexing pages in the World Wide Web are firstly, the size of the database of the search engine which is shown by the number of pages indexed after crawling. He further explained that the size does not necessarily portray the overall quality of the engine as an ideal search engine should know all the pages of the Web, but there are contents such as duplicates or spam pages that should not be indexed.
Secondly, another challenge according to him is the being up-to-date of search engines’ databases, where he further explained that the contents on the Web change very fast and therefore, new or updated pages should be indexed as fast as possible. Search engines face problems in keeping up to date with the entire Web, and because of its enormous size and the different update cycles of individual websites, adequate crawling strategies are needed.
Third, is the problem posed by web content. He argued that on the Web, documents are written in many different programming languages, that many different file types are used on the web, and that search engines today not only index documents written in HTML, but also PDF, Word, or other Office files. Each file format provides certain difficulties for the search engines. In the overall ranking, all file formats have to be considered.
Lewandowski (2006) say that “The Invisible Web is defined as the part of the Web that search engines do not index. This may be due to technical reasons or barriers made by website owners, e.g. password protection or robots exclusions.”
Lastly, according to Lewandowski (2006), spam is another major challenge for search engines as the search engines have to filter these spam pages to keep their indexing clean.
The behavior of users of search engines very considerably. Research made by Lewandowski (2006) showed that a greater percentage of search engine users are not sophisticated in their use of search engines. Most users don’t know advance searching techniques and a great percentage of those that know seldom use these advanced searching techniques. He further said that most users don’t go over the first page of indexed results.
Lewandowski (2006) in his article discussed the ranking factors of search engines. He classified all ranking factors into two major categories which are the query-dependent factors and the query-independent factors. According to him, query-dependent factors are factors that are one way or the other related to the user search. They include factors such as word document frequency, search term distance, search term order, position of the query terms, metatags, and position of the search terms within the document, emphasis on terms within the document etc. query-independent factors are used to determine document quality regardless of the query. According to Lewandowski (2006), such factors include link popularity, directory hierarchy, numbers of incoming links, click popularity, how up to date the page is, document length, file format and size of the website.
Lastly, Lewandowski (2006) discussed certain critical factors that determine the quality of a search engine. These factors according to him includes index quality; which is the aggregate of database size, indexing depth, how up to date the indexing is, low indexing bias etc., advanced search features, which is not a commonly used parameter but quite useful in determining the quality of a search engine.
Web search engines apply a variety of ranking signals to achieve user satisfaction, i.e., results pages that provide the best-possible results to the user. Have a nice day.

No comments:

Post a Comment

Note: If Your Comment Is Irrelevant Or Inappropriate, It Will Be Removed. The Views Expressed In The Comments Do Not Necessarily Represent That Of The Owner Of The Blog. For more information see terms of use and privacy policy link. Reach 0092348033451818 for more details. Thank you for visiting.