This is the specification of the RDF mapping of the Wikibase Lexeme data model. It is based on the Wikibase RDF dump format. If not stated otherwise the prefixes are defined by this document. When relevant it reuses the LEMON model by the Ontolex W3C community group.
Lexeme
Example:
@prefix dct: <http://purl.org/dc/terms/> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
wd:L64723 a wikibase:Lexeme , ontolex:LexicalEntry ;
# lemma
wikibase:lemma "hard"@en ;
rdfs:label "hard"@en ;
# language
dct:language wd:Q1860 ;
# lexical category
wikibase:lexicalCategory wd:Q34698 ;
# statements
wdt:P2 wd:Q3 ;
wdt:P7 "value1" , "value2" ;
p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c ;
# forms
ontolex:lexicalForm wd:L64723-F1 ;
# senses
ontolex:sense wd:L64723-S1 .
Comments:
- Classes
- The lexeme concept of Wikibase aligns well with
ontolex:LexicalEntry. A classwikibase:Lexemeis also used for consistency withwikibase:Itemandwikibase:Property. - Lemma
- We use the custom property
wikibase:lemma. The closest lemon relation isontolex:canonicalFormbut its range isontolex:Form. Usingwikibase:lemmahas instead of the genericrdfs:labeljust like item (and maybe alsoschema:nameandskos:prefLabel) has the advantage of not having lexemes appearing in existing SPARQL queries that are usingrdfs:labeland allows to easily query only lexemes by label with just one triple pattern. - Language
- We use the the Dublin Core
languageproperty just like lemon examples. We are not reusing directlyschema:inLanguagebecause it is already used for Wikibase sitelinks representation with a BCP 47 language code range. It is planned but not implemented yet to emit thisschema:inLanguageproperty as a derived value with as value the BCP 47 language code of the language when it exists. - Lexical category
- We use our own
wikibase:lexicalCategoryproperty in order to avoid a slight abuse of thelexinfo:partOfSpeechfrom the lexinfo lemon extension that is restricted to parts of speech. - Statements
- For consistency and simplicity we use the same schema as for items and properties.
- Forms
- The relation between Lexemes and Forms uses the
ontolex:lexicalFormrelation. See the Form section for forms representation. - Senses
- The relation between Lexemes and Forms uses the
ontolex:senserelation. See the Sense section for forms representation.
Form
Example:
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
wd:L64723-F1 a wikibase:Form , ontolex:Form ;
# representation
ontolex:representation "hard"@en ;
rdfs:label "hard"@en ;
# grammatical features
wikibase:grammaticalFeature wd:Q1234 , wd:Q2345 ;
# statements
wdt:P2 wd:Q3 ;
wdt:P7 "value1" , "value2" ;
p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .
Comments:
- Classes
- The form concept of Wikibase aligns with
ontolex:Form. The additional classwikibase:Formis also used. - Representation
- We use the
ontolex:representationrelation from lemon. We do not use its sub propertyontolex:writtenRepin order to not forbid representations in phonetic variants of languages even if the lemon specification recommends to not useontolex:representationdirectly.rdfs:labelis also emitted for interoperability reasons. - Grammatical Features
- We use a custom property
wikibase:grammaticalFeaturebecause there is no such relation in lemon withontolex:Formfor domain. - Statements
- For consistency and simplicity we use the same schema as for items and properties.
Sense
Example:
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
wd:L64723-S1 a wikibase:Sense , ontolex:LexicalSense ;
# gloss
skos:definition "presenting difficulty"@en ;
rdfs:label "presenting difficulty"@en ;
# statements
wdt:P2 wd:Q3 ;
wdt:P7 "value1" , "value2" ;
p:P2 wds:Q3-4cc1f2d1-490e-c9c7-4560-46c3cce05bb7 ;
p:P7 wds:Q3-24bf3704-4c5d-083a-9b59-1881f82b6b37 ,
wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c .
Comments:
- Classes
- The sense concept of Wikibase aligns with
ontolex:LexicalSense. The additional classwikibase:Senseis also used. - Gloss
- We use
skos:definitionto provide gloss following Lemon usage.rdfs:labelis also emitted for interoperability reasons even if a gloss is not really a label. - Statements
- For consistency and simplicity we use the same schema as for items and properties.
Data node
Example:
wdata:L64723 schema:version "59"^^xsd:integer ;
schema:dateModified "2015-03-18T22:38:36Z"^^xsd:dateTime ;
a schema:Dataset ;
schema:about wd:L64723 .
For each Lexeme a data node should be returned with the URI wdata:L1 if the Lexeme is wd:L1. It should use the same schema as for Wikibase items and properties data node. It could also provide some statistics based on page properties just like items.
Note: There is no specific data node for forms and senses because the granularity of data nodes is the data container (wiki page). It is not a strong limitation because it is easy to retrieve the data node of the Lexeme they belong to with the property path schema:about/ontolex:lexicalForm or schema:about/ontolex:sense.
Wikidata Query Service
Wikidata Query Service does not provide the following features (mostly for performance reasons):
- The
wikibase:Lexeme,wikibase:Formandwikibase:Senseclasses. - The
rdfs:labelrelations (more specific equivalents exists for lexemes, forms and senses). - Just as for items and properties, the data node is integrated within the
wd:node.