A Whole New World, Part 1: What is the Semantic Web?
I have a dream for the Web in which computers become capable of analyzing all the data on the Web
- Tim Berners-Lee, Iventor of the World Wide Web, 1999
At the end of 2016, WooRank, along with almost every other SEO website, published a list of trends in search marketing to look forward in the new year. If you read through some of these posts you’ll start to notice a common thread among the topics: conversational search, search intent, natural language processing and artificial intelligence (or machine learning). Basically, what we’re saying here is that robots are getting really smart and, in Google’s case, are starting to get really good at figuring out context.
The result of these advancements is something called the "semantic web".
The Internet of Yesteryear
Before semantic markup came onto the scene, the internet was a bunch of different files that were connected by links. (It isn’t really like a series of tubes, but some references just can’t not be made.) These links connected files to each other, allowing easy reference and navigation. These files are usually web pages, but could also be PDFs, jpegs, videos or some other type of file. That’s the basic structure of the internet.
Search engines work by using these links to travel from site to site, and page to page, crawling these pages and storing them in their databases, known as "indexes." When crawling and indexing pages, search engines read their code to figure out what’s an image, what’s a title, what’s a subhead, what’s a video and what’s normal body copy. This information is also stored in the index and used to determine the relevance to a user’s search query.
Of course, relevance is not the sole determining factor when it comes to showing a search result, so search engines also look at the links pointing at a site and use their various elements (hypertext and linking domain, among others) to calculate a site’s authority and popularity. Linking domain and hypertext acted as clues regarding what words sites and pages were authorities on.
So this brings us to the idea of keywords, which, along with links, you’re probably very familiar with. Search engines relied on key words (information scientists a clever bunch when it comes to naming) or phrases that matched the code and content to words in the search query. Search engines would then determine relevance to the query based on how often those key words appeared on the page. They would then use links pointing to the page and site to measure that page’s accuracy and authority. That’s a pretty simple view of how it worked, but we’re about to blow that all up so it’ll do.
The Rise of the Semantic Web
That system worked pretty well but fell short when people were looking for more precise information, specifically answers to questions. To use Gary Illyes’ famous example, take someone looking for a guide to beating a video game without cheat codes. Google saw "without" as a stop word, meaning it ignored it, so “how to beat a video game without cheat codes” turned into “how to beat a video game cheat codes”, which is the exact opposite of what the original query was trying to find.
So in comes the semantic web. What is the semantic web?
Well, since we’re talking about semantics, we’ll start by throwing out a bunch of definitions. First, what does "semantics" mean?
Primarily the linguistic, and also philosophical, study of meaning — in language, programming languages, formal logics, and semiotics. It focuses on the relationship between signifiers — like words, phrases, signs, and symbols — and what they stand for, their denotation.
Or, to put it in the succinct words of Merriam-Webster, semantics is "the study of meanings."
So, when that study of meaning is applied to web content, we get the semantic web:
The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries
What does that mean?
To put it simply, the semantic web is a way to connect ideas, also known as entities, and pieces of data, not just files and web pages. These connections allow programs, like search engines, to explore beyond the words on a page to the ideas and concepts behind them. Take this simple sentence for example:
I was born in Michigan and I am a resident of Brussels.
Before the semantic web, there wasn’t a way to help search engines connect the words in that sentence to their meaning: who am I (not to get too Cartesian here), what is Michigan and what is Brussels?
Now, thanks to the semantic web, there’s a better way to do that: structured data (also known as semantic markup). Help bots parse this sentence with a few bits of HTML:
<div vocab="http://schema.org/” typof=”Person”> <span property="name”>Greg Snow-Wasserman</span> was born in <span property="birthPlance” typeof=”Place” href=”https://www.wikidata.org/wiki/Q1166”> <span property="name”>Michigan</span> and is a resident of <span property="homeLocation” typeof=”Place” href=”https://www.wikidata.org/wiki/Q240”> <span property="name”>Brussels</span> </span> </div>
Now that basic sentence, which both humans and machines could read, but only humans could truly understand, is meaningful to both parties.
In other words, the semantic web shifts from a web of linked pages to a web of linked data, representing the meanings, ideas and concepts behind that data.
So, to recap, here’s how the old web connected ideas via linked web pages:
And here’s how the semantic web views ideas via linked data:
How the Semantic Web Changed Search
As humans, we don’t see all that semantic markup that gives meaning to the words on the page. But it had, and is continuing to have, a huge impact on search engines, bringing about what’s known as semantic search.
Hummingbird: Google as an Answer Engine
In September 2013, Google announced they had been running a whole new algorithm update for a month. That update, known as Hummingbird (because it was designed to be "fast and precise"), gave Google the ability to apply the semantic web to queries and its search results.
Hummingbird analyzes the semantics of a search query to determine the intent behind the query. To go back to our example of the query "how to beat video games without cheating", Hummingbird no longer sees “without” as a stop word, and can now figure out that the user is actually looking for game walkthroughs or other strategy guides, not how to enter god mode.
Hummingbird, because it’s precise, then finds pieces of content that fulfill that intent and delivers it to the user. Notice the use of the word "content" there; it’s a pretty significant change from the old system that just delivered the page it thought was most relevant to the query’s keywords. Want to see it in action? Check out the SERP for the query “what is link juice”.
Google was able to interpret the meaning behind my query (finding the definition of the term "link juice") and then find a single paragraph on a single web page that answered my question.
RankBrain Powers the Answer Engine
Our goal is to build a personal Google for each and every user
- Sundar Pichai, Google's CEO
So how exactly does Google figure out search intent and which individual pieces of content fulfill those goals? It uses RankBrain, its machine learning and artificial intelligence system. Many people (including use, because sometimes it’s just easier) refer to RankBrain as an algorithm, but that’s not really accurate.
Even though we know it’s really important when ranking search results, ranking isn’t its primary function. It’s actually more of an interpretation algorithm. It’s the part of Hummingbird that figures out what "without" that video game query means, and then connects it to a page that provides a walkthrough. Or, it’s what reads the query “how big is the planet” and figures out that I want to know the circumference of the planet Earth.
RankBrain also helps decides whether to show me that answer in miles or kilometers. The example above is from a search in the United State using the AdWords Ad Preview and Diagnosis tool. Below is a search on my actual browser here in Brussels:
As you can probably guess, or as you’ve likely already seen, the rise of the semantic web can have some pretty profound impact on search engine optimization and digital marketing in general. SEO is, in some ways, becoming much more complex, while in other ways maybe a bit simpler. However, one thing is for sure, the semantic web is here to stay. So stay tuned for the next part in our semantic web series to learn how to welcome our new robot overlords!