Wikidata:Property proposal/ConLang Code Registry code

From Wikidata
Jump to navigation Jump to search

ConLang Code Registry code

[edit]

Return to Wikidata:Property proposal/Authority control

   Under discussion
Description3-letter identifier for language defined in the ConLang Code Registry, using codes reserved for private use in ISO 639-3
RepresentsConLang Code Registry (Q107713633)
Data typeString
Domainlanguages
Example 1Black Speech (Q686210) → qbs
Example 2Ithkuil (Q35846) → qit
Example 3Idiom Neutral (Q35847) → qin
Sourcehttps://www.kreativekorp.com/clcr/
Planned useassign value to relevant languages
Number of IDs in source255
Expected completenesseventually complete (Q21873974)
Robot and gadget jobsNot that I know of

Motivation

[edit]

Currently, many conlangs (constructed languages) that don't have official ISO 639-3 codes have an ISO 639-3 code (P220) value listed anyway, for a code issued by the ConLang Code Registry; they generally take the format Black Speech (Q686210)ISO 639-3 code (P220)qbsissued by (P2378)ConLang Code Registry (Q107713633). The problem is that this is wrong: the ISO 639-3 standard defines the codes qaa-qtz as "reserved for private use", which means that anyone can use them for any purpose, and still be compliant with the standard. The fact that the ConLang Code Registry has issued codes in this range does not mean that these conlangs have actually received ISO 639-3 codes, though; it just means that a completely separate organisation has decided to issue codes which just-so-happen to be in ISO 639-3's private use range.

This is obviously intentional, as it allows them to be used as language codes without conflicting with the official ISO standard, but that doesn't mean we should have statements which claim they're actually part of the standard. Simply saying that the code is "issued by the ConLang Code Registry" doesn't make it clear that the ConLang Code Registry is not entitled to issue ISO 639-3 codes, and therefore obscures the fact that the code isn't actually part of the standard in the first place (i.e. that the statement is wrong). Outside of conlang communities, these codes see almost no use, and they may conflict other privately-issued codes used in other communities.

To get around the issue, I'd like to propose a separate property for the ConLang Code Registry, so that these codes can be listed without polluting the data for those who only want information on official ISO 639-3 codes.  – The preceding unsigned comment was added by Theknightwho (talk • contribs) at 12:47, August 16, 2024‎ (UTC).

Discussion

[edit]
  •  Support This makes sense to me. Are there really 255 of these? ArthurPSmith (talk) 19:24, 16 August 2024 (UTC)[reply]
    @ArthurPSmith Yes - 252 active codes and 3 retired ones according to https://www.kreativekorp.com/clcr/. There are 520 codes in the private use range (qaa-qtz), which is the theoretical maximum, and growth is quite uneven, with some years seeing very few codes added (2 added in 2021), while others see many (32 added in 2023). Codes are retired if they're replaced by genuine ISO 639-3 codes. Theknightwho (talk) 21:18, 16 August 2024 (UTC)[reply]
  •  Support --Lewis Hulbert (talk) 11:16, 19 August 2024 (UTC)[reply]
  •  Oppose I don't think this would be a good idea.
    It's not a coincidence that the ConLang Code Registry (CLCR) codes are in ISO 639-3's private-use range, the whole point is that they are private-use ISO 639-3 codes for use as ISO 639-3 codes.
    ISO 639-3 provides the set of private-use codes for people to use how they want, which means that users assign their own meanings. It's true that other places may assign conflicting meanings, but that's why those statements all have a qualifier saying who gives it that meaning. Conflicts shouldn't be a problem either, we can use constraints like single-best-value constraint (Q52060874) and/or add separator (P4155) to the constraint.
    The existing statements shouldn't pollute the data if you're using it correctly. The items with private-use codes also have a no value statement set to preferred rank to indicate that they have no official code, and if you're looking at all statements, you need to take ranks and qualifiers into account for the data to be meaningful anyway.
    The real issue here is: How should we model usage of private-use codes? This isn't limited to CLCR or even ISO 639-3 (e.g. XK is widely used as the ISO 3166-1 code for Kosovo, Qaag is used by Unicode and CLDR as the ISO 15924 code for Zawgyi, U+F8D0 is the Unicode codepoint normally used for the Klingon letter "a"), so I don't think it makes sense to have a property specifically for CLCR's private-use ISO 639-3 codes. We should have a model we can apply to private-use codes in general instead.
    - Nikki (talk) 09:23, 28 August 2024 (UTC)[reply]
    @Nikki These are not ISO 639 codes, though - these are codes assigned by another body which have been intentionally designed to match up with ISO 639's private use range. It's not a question of qualifying the codes or setting no value as a preferred rank - it's the fact that it is factually wrong to say that they are ISO 639 codes at all, and they shouldn't be listed there at all. Theknightwho (talk) 22:09, 5 September 2024 (UTC)[reply]
    @Nikki, any changes in your opinion based on the response? Regards, ZI Jony (Talk) 18:25, 16 September 2024 (UTC)[reply]