A Review Of Journal Article On Search Engine Operations (Part 2)

Continued from previous post here, Lewandowski (2012) in his research work on web search engine credibility, he discussed the issue of delivering highly credible search results to search engine users. He further emphasized the criteria by which search engines decide upon including documents in their indices. These criteria include:
·        Text-based matching: matches queries and documents to find documents that fulfill the query.
·        Popularity: which are measured based on clicks, links that lead to the page etc.
·        Freshness: the newness and up-to-date of the document.
·        Locality: knowing the locality of the user is paramount in giving useful results.
·        Personalisation: giving results based on user’s search habits.
He argued that popularity lies at the heart of these systems.
Search engines use a lot of page ranking algorithms to carry out the indexing of web pages.
Chandra, Suaib & Beg (2015) outlined and briefly discussed Google search algorithm updates against web spam some of which include: Page Rank, panda and penguin among a list of many others. According to them, Page Rank counts the number and quality of links to a page to calculate a rough estimate of a website's global importance. They further said that it can be assumed that important websites are more likely to receive high number of links from other websites and that initially, Google's search engine was based on Page Rank and signals like title of page, anchor text and links etc.  Chandra, Suaib & Beg (2015) further stated that currently, Google search engine uses more than 200 signals for ranking of web pages as well as to combat web spam. Google also uses the huge amount of usage data (consisting of query logs, browser logs, ad-click logs etc.) to interpret complex intent of cryptic queries and to provide relevant results to end user.
In their research, they explained that the panda update aimed to lower rank of low quality websites and increased ranking of news and social networking sites. Panda is the filter to down rank sites with thin content, content farms, doorway pages, affiliates websites, sites with high ads-to-content ratio and number of other quality issues. Panda update affects ranking of entire website rather than individual page. It includes new signals like data about the site users blocked via search engine result page directly or via the chrome browser.
Another important algorithm update is the penguin update. This update is purely web spam algorithm update. It adjusts a number of spam factors including keyword stuffing, in-links coming from spam pages, anchor text/link relevance. Penguin detects over optimization of tags and internal links, bad neighborhood, bad ownership etc. Subscribe here for more.

No comments:

Post a Comment

Note: If Your Comment Is Irrelevant Or Inappropriate, It Will Be Removed. The Views Expressed In The Comments Do Not Necessarily Represent That Of The Owner Of The Blog. For more information see terms of use and privacy policy link. Reach 0092348033451818 for more details. Thank you for visiting.