Wikidata:Lexicographical data/Documentation/Lexeme languages
The language to which a lexeme belongs is a reference to a Wikidata item for a language.
For most languages, this is a straightforward determination: English (Q1860), Thai (Q9217), Manchu (Q33638), and Gun (Q3111668) are just four of the many possibilities, since they have supported language codes en
, th
, mnc
, and guw
.
Some languages, however, have begun to require for their lexemes that particular language items be used. While this page lists some of those choices, more information about them may be found on the documentation pages for those languages.
Enlarged scopes of existing language items
edit- Turkish (Q256) encompasses the language spoken in Turkey both before and after the introduction of Latin script in 1928; items referring to 'Ottoman Turkish' should not be used in lexemes as languages (although they may be used in variety of lexeme, form or sense (P7481) statements on lexemes and senses).
- Punjabi (Q58635) encompasses the language spoken in Punjab on both sides of the Radcliffe line, rather than merely east of the line as is implied by the composition of pa.wikipedia.org; items referring to a 'Western Punjabi' should not be used in lexemes at all. There are some language varieties which form a continuum with Punjabi which are modeled separately, such as Saraiki (Q33902) and Hindko (Q382273), following lines drawn in the references cited on these lexemes. Pothohari(-Pahari) or Mirpuri is a variety of Punjabi and not an exceptionally divergent one at that; accordingly, separate treatment is not warranted.
Reduced scopes of existing language items
edit- Arabic (Q13955) is used exclusively for lexemes in Modern Standard Arabic (Q56467); lexemes from individual varieties should use the respective items for those varieties, from Moroccan Darija (Q56426) in the west to Gulf Arabic (Q56385) in the east.
- Vietnamese (Q9199) is used exclusively with Latin-script lemmata and form representations; lemmata and representations using Chinese characters belong on separate lexemes with language chữ Nôm (Q875344).
Uses of less-expected language items
edit- Modern Greek (Q36510) is used instead of Greek (Q9129) for lexemes in modern demotic Greek, to distinguish from Ancient Greek (Q35497) being applied to lexemes used in ancient times.
- Hindustani (Q11051) is used instead of Hindi (Q1568) or Urdu (Q1617).
- Standard Mandarin (Q727694) is used instead of Mandarin (Q9192) (and especially instead of Chinese (Q7850)) for lexemes in the language used officially by the governments in Beijing and Taipei.
- New Persian (Q56356571) is used instead of Persian (Q9168) or Tajik (Q9260); lexeme senses used in a particular country (whether Iran for Persian, Afghanistan for Dari, or Tajikistan for Tajik) may be marked with location of sense usage (P6084) statements.
Unresolved problem areas
edit- The language spoken by the major parties to the Yugoslav Wars (Q242352)
- The language spoken by both sides of the Konfrontasi