This post is a review of journal articles that discuss various subjects on search engine operations.
Lewandowski
(2006) discusses Web search engines; mainly the challenges in indexing
the World Wide Web, the user behavior, and the ranking factors used by
these engines. He divided these ranking factors into query-dependent and
query-independent factors, the latter of which have become more and
more important within recent years. The possibilities of these factors
are limited, mainly of those that are based on the widely used link
popularity measures. He concluded his article with an overview of
factors that should be considered to determine the quality of Web search engines. He stated that the challenges in indexing pages in the World
Wide Web are firstly, the size of the database of the search engine
which is shown by the number of pages indexed after crawling. He further
explained that the size does not necessarily portray the overall
quality of the engine as an ideal search engine should know all the
pages of the Web, but there are contents such as duplicates or spam
pages that should not be indexed.
Secondly, another challenge
according to him is the being up-to-date of search engines’ databases,
where he further explained that the contents on the Web change very fast
and therefore, new or updated pages should be indexed as fast as
possible. Search engines face problems in keeping up to date with the
entire Web, and because of its enormous size and the different update
cycles of individual websites, adequate crawling strategies are needed.
Third,
is the problem posed by web content. He argued that on the Web,
documents are written in many different programming languages, that many
different file types are used on the web, and that search engines today
not only index documents written in HTML, but also PDF, Word, or other
Office files. Each file format provides certain difficulties for the
search engines. In the overall ranking, all file formats have to be
considered.
Lewandowski (2006) say that “The Invisible Web is defined
as the part of the Web that search engines do not index. This may be
due to technical reasons or barriers made by website owners, e.g.
password protection or robots exclusions.”
Lastly, according to
Lewandowski (2006), spam is another major challenge for search engines
as the search engines have to filter these spam pages to keep their
indexing clean.
The behavior of users of search engines very
considerably. Research made by Lewandowski (2006) showed that a greater
percentage of search engine users are not sophisticated in their use of
search engines. Most users don’t know advance searching techniques and a
great percentage of those that know seldom use these advanced searching
techniques. He further said that most users don’t go over the first
page of indexed results.
Lewandowski (2006) in his article discussed
the ranking factors of search engines. He classified all ranking factors
into two major categories which are the query-dependent factors and the
query-independent factors. According to him, query-dependent factors
are factors that are one way or the other related to the user search.
They include factors such as word document frequency, search term
distance, search term order, position of the query terms, metatags, and
position of the search terms within the document, emphasis on terms
within the document etc. query-independent factors are used to determine
document quality regardless of the query. According to Lewandowski
(2006), such factors include link popularity, directory hierarchy,
numbers of incoming links, click popularity, how up to date the page is,
document length, file format and size of the website.
Lastly,
Lewandowski (2006) discussed certain critical factors that determine the
quality of a search engine. These factors according to him includes
index quality; which is the aggregate of database size, indexing depth,
how up to date the indexing is, low indexing bias etc., advanced search
features, which is not a commonly used parameter but quite useful in
determining the quality of a search engine.
Web search engines apply a
variety of ranking signals to achieve user satisfaction, i.e., results
pages that provide the best-possible results to the user. Have a nice day.
Digital Solutions, Articles And Posts On Agribusiness, Blogging, Business, Web development, Digital Skills Upgrade, Social media marketing, Trends, Currencies Assets, NF Tokens, Tips, Strategies, Opportunities, Tools, Artificial Intelligence and More.
Join over 38,000 friends and followers on Twitter
A Review Of Journal Article On Search Engine Operations (Part 1)
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: If Your Comment Is Irrelevant Or Inappropriate, It Will Be Removed. The Views Expressed In The Comments Do Not Necessarily Represent That Of The Owner Of The Blog. For more information see terms of use and privacy policy link. Reach 0092348033451818 for more details. Thank you for visiting.