Semantic Enrichment refers in general terms to the technologies and practices used to add semantic metadata to content.
SPARQL is an RDF query language (a semantic query language for databases) able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web [1].
DBpedia is a crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG) which is available for everyone on the Web [2].
In this document, you can see how to enrich entities of the type:
# Load SPARQL libraries
from SPARQLWrapper import SPARQLWrapper, JSON
# Returns a clean entity
def get_clean_entity(text):
return text.replace("'", "").replace('"', '')
# Make a query to sparql
def exec_sparql_query(query, verbose=False):
entry_point = "https://dbpedia.org/sparql"
header = """
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX geo:<http://www.georss.org/georss/>
"""
try:
query = header + query
if verbose:
print(query)
sparql = SPARQLWrapper(entry_point)
sparql.setQuery(query)
sparql.setReturnFormat(JSON)
result = sparql.query().convert()
if result and 'results' in result:
result = result['results']['bindings']
return result
except:
return None
# Query in DBpedia the information associated with the person
def query_person(person_name):
person_name = get_clean_entity(person_name)
query = """
SELECT (SAMPLE (?name) AS ?name) (SAMPLE (?birthPlace) AS ?birthPlace) (SAMPLE (?birthDate) AS ?birthDate)
(SAMPLE (?person) AS ?person) (SAMPLE(?description) AS ?description)
WHERE {
?person dbo:birthPlace ?birthPlace.
?person dbo:birthDate ?birthDate.
?person foaf:name ?name.
?person rdfs:comment ?description.
FILTER (?name like "%""" + person_name + """%"^^xsd:char).
FILTER (langMatches(lang(?description), "en")).
}
GROUP BY ?person
ORDER BY ?person
"""
# Run query against DBpedia
return exec_sparql_query(query)
# Ask DBpedia for 'Steve Jobs' person
person_name = 'Steve Jobs'
result = query_person(person_name)
result[0]
{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Steve Jobs'}, 'birthPlace': {'type': 'uri', 'value': 'http://dbpedia.org/resource/California'}, 'birthDate': {'type': 'typed-literal', 'datatype': 'http://www.w3.org/2001/XMLSchema#date', 'value': '1955-02-24'}, 'person': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Steve_Jobs'}, 'description': {'type': 'literal', 'xml:lang': 'en', 'value': "Steven Paul Jobs (; February 24, 1955 – October 5, 2011) was an American business magnate, industrial designer, investor, and media proprietor. He was the chairman, chief executive officer (CEO), and co-founder of Apple Inc., the chairman and majority shareholder of Pixar, a member of The Walt Disney Company's board of directors following its acquisition of Pixar, and the founder, chairman, and CEO of NeXT. Jobs is widely recognized as a pioneer of the personal computer revolution of the 1970s and 1980s, along with Apple co-founder Steve Wozniak."}}
# Query in DBpedia the information associated with the country
def query_country(country):
country = get_clean_entity(country)
query = """
SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?poblation) AS ?poblation) (SAMPLE(?geoloc) AS ?geoloc)
(SAMPLE(?place) AS ?place) (SAMPLE(?description) AS ?description)
WHERE {
?place rdf:type dbo:Country.
?place dbo:abstract ?description.
?place rdfs:label ?name.
?place dbo:populationTotal ?poblation.
?place geo:point ?geoloc.
FILTER (langMatches(lang(?name),"en")).
FILTER (langMatches(lang(?description),"en")).
FILTER (?name like "%""" + country + """%"^^xsd:char).
}
GROUP BY ?place
ORDER BY DESC(?poblation)
"""
# Run query against DBpedia
return exec_sparql_query(query)
# Ask DBpedia for 'Malaysia' country
country_name = 'Malaysia'
result = query_country(country_name)
result[0]
{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Malaysia'}, 'poblation': {'type': 'typed-literal', 'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger', 'value': '32730000'}, 'geoloc': {'type': 'literal', 'value': '2.5 112.5'}, 'place': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Malaysia'}, 'description': {'type': 'literal', 'xml:lang': 'en', 'value': "Malaysia ( () mə-LAY-zee-ə, -\u2060zhə; Malay: [məlejsiə]) is a country in Southeast Asia. The federal constitutional monarchy consists of thirteen states and three federal territories, separated by the South China Sea into two regions, Peninsular Malaysia and Borneo's East Malaysia. Peninsular Malaysia shares a land and maritime border with Thailand and maritime borders with Singapore, Vietnam, and Indonesia. East Malaysia shares land and maritime borders with Brunei and Indonesia and a maritime border with the Philippines and Vietnam. Kuala Lumpur is the national capital and largest city while Putrajaya is the seat of the federal government. With a population of over 32 million, Malaysia is the world's 43th-most populous country. The southernmost point of continental Eurasia is in Tanjung Piai. In the tropics, Malaysia is one of 17 megadiverse countries, home to a number of endemic species. Malaysia has its origins in the Malay kingdoms which, from the 18th century, became subject to the British Empire, along with the British Straits Settlements protectorate. Peninsular Malaysia was unified as the Malayan Union in 1946. Malaya was restructured as the Federation of Malaya in 1948 and achieved independence on 31 August 1957. Malaya united with North Borneo, Sarawak, and Singapore on 16 September 1963 to become Malaysia. In 1965, Singapore was expelled from the federation. The country is multi-ethnic and multi-cultural. About half the population is ethnically Malay, with large minorities of Chinese, Indians, and indigenous peoples. While recognising Islam as the country's established religion, the constitution grants freedom of religion to non-Muslims. The government is closely modelled on the Westminster parliamentary system and the legal system is based on common law. The head of state is an elected monarch, known as Yang di-Pertuan Agong, chosen from the hereditary rulers of the nine Malay states every five years. The head of government is the Prime Minister. After independence, the Malaysian GDP grew at an average of 6.5% per annum for almost 50 years. The economy has traditionally been fuelled by its natural resources but is expanding in the sectors of science, tourism, commerce and medical tourism. Malaysia has a newly industrialised market economy, ranked third-largest in Southeast Asia and 33rd-largest in the world. It is a founding member of ASEAN, EAS, OIC and a member of APEC, the Commonwealth and the Non-Aligned Movement."}}
# Query in DBpedia the information associated with the city
def query_place(place):
place = get_clean_entity(place)
query = """
SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?poblation) AS ?poblation) (SAMPLE(?geoloc) AS ?geoloc)
(SAMPLE(?place) AS ?place) (SAMPLE(?description) AS ?description)
WHERE {
?place rdf:type dbo:Place.
?place dbo:abstract ?description.
?place rdfs:label ?name.
?place dbo:populationTotal ?poblation.
?place geo:point ?geoloc.
FILTER (langMatches(lang(?name),"en")).
FILTER (langMatches(lang(?description),"en")).
FILTER (?name like "%""" + place + """%"^^xsd:char).
}
GROUP BY ?place
ORDER BY DESC(?poblation)
"""
# Run query against DBpedia
return exec_sparql_query(query)
# Ask DBpedia for 'Houston' place
city_name = 'Houston'
result = query_place(city_name)
result[0]
{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Houston'}, 'poblation': {'type': 'typed-literal', 'datatype': 'http://www.w3.org/2001/XMLSchema#nonNegativeInteger', 'value': '2100263'}, 'geoloc': {'type': 'literal', 'value': '29.762777777777778 -95.38305555555556'}, 'place': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Houston'}, 'description': {'type': 'literal', 'xml:lang': 'en', 'value': 'Houston ( () HEW-stən) is the most populous city in the U.S. state of Texas, fourth most populous city in the United States, most populous city in the Southern United States, as well as the sixth most populous in North America, with an estimated 2019 population of 2,320,268. Located in Southeast Texas near Galveston Bay and the Gulf of Mexico, it is the seat of Harris County and the principal city of the Greater Houston metropolitan area, which is the fifth most populous metropolitan statistical area in the United States and the second most populous in Texas after the Dallas-Fort Worth metroplex, with a population of 6,997,384 in 2018. Comprising a total area of 637.4 square miles (1,651 km2), Houston is the eighth most expansive city in the United States (including consolidated city-counties). It is the largest city in the United States by total area, whose government is not consolidated with that of a county, parish or borough. Though primarily in Harris County, small portions of the city extend into Fort Bend and Montgomery counties, bordering other principal communities of Greater Houston such as Sugar Land and The Woodlands. The city of Houston was founded by land investors on August 30, 1836, at the confluence of Buffalo Bayou and White Oak Bayou (a point now known as Allen\'s Landing) and incorporated as a city on June 5, 1837. The city is named after former General Sam Houston, who was president of the Republic of Texas and had won Texas\' independence from Mexico at the Battle of San Jacinto 25 miles (40 km) east of Allen\'s Landing. After briefly serving as the capital of the Texas Republic in the late 1830s, Houston grew steadily into a regional trading center for the remainder of the 19th century. The arrival of the 20th century saw a convergence of economic factors which fueled rapid growth in Houston, including a burgeoning port and railroad industry, the decline of Galveston as Texas\' primary port following a devastating 1900 hurricane, the subsequent construction of the Houston Ship Channel, and the Texas oil boom. In the mid-20th century, Houston\'s economy diversified as it became home to the Texas Medical Center—the world\'s largest concentration of healthcare and research institutions—and NASA\'s Johnson Space Center, where the Mission Control Center is located. Houston\'s economy since the late 19th century has a broad industrial base in energy, manufacturing, aeronautics, and transportation. Leading in healthcare sectors and building oilfield equipment, Houston has the second most Fortune 500 headquarters of any U.S. municipality within its city limits (after New York City). The Port of Houston ranks first in the United States in international waterborne tonnage handled and second in total cargo tonnage handled. Nicknamed the "Bayou City" "Space City", "H-Town", and "the 713", Houston has become a global city, with strengths in culture, medicine, and research. The city has a population from various ethnic and religious backgrounds and a large and growing international community. Houston is the most diverse metropolitan area in Texas and has been described as the most racially and ethnically diverse major metropolis in the U.S. It is home to many cultural institutions and exhibits, which attract more than 7 million visitors a year to the Museum District. Houston has an active visual and performing arts scene in the Theater District and offers year-round resident companies in all major performing arts.'}}
# Query in DBpedia the information associated with the company
def query_company(company):
company = get_clean_entity(company)
query = """
SELECT (SAMPLE (?company) AS ?company) (SAMPLE(?name) AS ?name) (SAMPLE(?industry) AS ?industry)
(SAMPLE (?foundingYear) AS ?foundingYear) (SAMPLE(?description) AS ?description)
WHERE {
?company rdf:type dbo:Company.
?company foaf:name ?name.
?company dbo:industry ?industry.
?company dbo:foundingYear ?foundingYear.
?company dbo:abstract ?description.
FILTER (?name like "%"""+ company + """%"^^xsd:char).
FILTER (langMatches(lang(?name),"en")).
}
GROUP BY ?company
ORDER BY ?company
"""
# Run query against DBpedia
return exec_sparql_query(query)
# Ask DBpedia for 'Linux' company
company_name = 'Amazon Books'
result = query_company(company_name)
result[0]
{'company': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Amazon_Books'}, 'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'Amazon Books'}, 'industry': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Retail'}, 'foundingYear': {'type': 'typed-literal', 'datatype': 'http://www.w3.org/2001/XMLSchema#gYear', 'value': '2015'}, 'description': {'type': 'literal', 'xml:lang': 'en', 'value': 'Amazon Books is a chain of retail bookstores owned by online retailer Amazon. The first store opened on November 2, 2015, in Seattle, Washington. As of 2018, Amazon Books has a total of seventeen stores, with plans to expand to more locations.'}}
# Query in DBpedia the information associated with the movie
def query_movie(movie):
company = get_clean_entity(movie)
query = """
SELECT (SAMPLE(?name) AS ?name) (SAMPLE(?movie) AS ?movie) (SAMPLE(?actor) AS ?actor)
(SAMPLE(?director) AS ?director) (SAMPLE(?description) AS ?description)
WHERE {
?movie rdf:type dbo:Film.
?movie foaf:name ?name.
?movie dbo:starring ?actor.
?movie dbo:director ?director.
?movie dbo:abstract ?description.
FILTER (?name like "%"""+ movie +"""%"^^xsd:char ).
FILTER (langMatches(lang(?description),"en")).
}
GROUP BY ?movie ?actor
ORDER BY ?movie
"""
# Run query against DBpedia
return exec_sparql_query(query)
# Ask DBpedia for 'John Wick' movie
movie_name = 'John Wick'
result = query_movie(movie_name)
result[0]
{'name': {'type': 'literal', 'xml:lang': 'en', 'value': 'John Wick: Chapter 2'}, 'movie': {'type': 'uri', 'value': 'http://dbpedia.org/resource/John_Wick:_Chapter_2'}, 'actor': {'type': 'uri', 'value': 'http://dbpedia.org/resource/John_Leguizamo'}, 'director': {'type': 'uri', 'value': 'http://dbpedia.org/resource/Chad_Stahelski'}, 'description': {'type': 'literal', 'xml:lang': 'en', 'value': 'John Wick: Chapter 2 (also known as simply John Wick 2) is a 2017 American neo-noir action-thriller film directed by Chad Stahelski and written by Derek Kolstad. It is the second installment in the John Wick film series, and the sequel to the 2014 film John Wick. It stars Keanu Reeves, Common, Laurence Fishburne, Riccardo Scamarcio, Ruby Rose, John Leguizamo, and Ian McShane. The plot follows hitman John Wick (Reeves), who goes on the run after a bounty is placed on him. Principal photography began on October 26, 2015, in New York City. The film premiered in Los Angeles on January 30, 2017, and was theatrically released in the United States on February 10, 2017. It was acclaimed by critics, with praise for the action sequences, direction, editing, visual style and the performances of the cast, particularly Reeves. The film grossed $200 million worldwide against its $40 million budget, over twice the $85 million gross of the original film. A sequel, titled John Wick: Chapter 3 – Parabellum, was released on May 17, 2019.'}}