Wikidata:Requests for comment/Unifying GO activities and enzyme articles
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- no consensus --Emu (talk) 12:02, 3 February 2024 (UTC)[reply]
An editor has requested the community to provide input on "Unifying GO activities and enzyme articles" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
Most (but not all) binary duplicates on EC enzyme number (P591) are caused by different bots that create different entries for the same enzyme activity: one from the various Wikipedia bots that generate flat articles (such as en:Thymidine-triphosphatase, EC 3.6.1.39, Thymidine-triphosphatase (Q7799624)) and one from a huge import from Gene ontology (GO) at some point in the history of Wikidata (thymidine-triphosphatase activity (Q22320779)). These describe the same thing and should be merged globally across the site. In fake bot-speak that would be:
- For items with Gene Ontology ID (P686) and P591, and known to have a P591 duplicate:
- If the P591 value ends with
.-
, leave it for now. - If the item is a subclass of an item with the same P591, leave it.
- Find the duplicate with instance of (P31)enzyme (Q8047) and merge with it.
- If the merge fails, pop a instance of (P31)Wikimedia duplicated page (Q17362920) on it.
- If the P591 value ends with
Before moving on to the bot request, however, I figured that it is a good idea to RFC this since this is expected to be a huge move. And bot requests need justification anyway. --Artoria2e5 (talk) 15:01, 12 June 2019 (UTC)[reply]
Doc James
Bluerasberry
Gambo7
Daniel Mietchen
Andrew Su
Andrux
Pavel Dušek
Mvolz
User:Jtuom
Chris Mungall
ChristianKl
Gstupp
Sintakso
علاء
Adert
CFCF
Jtuom
Drchriswilliams
Okkn
CAPTAIN RAJU
LeadSongDog
Ozzie10aaaa
Marsupium
Netha Hussain
Abhijeet Safai
Seppi333
Shani Evenstein
Csisc
TiagoLubiana
ZI Jony
Antoine2711
JustScienceJS
Scossin
Josegustavomartins
Zeromonk
The Anome
Kasyap
JMagalhães
Ameer Fauri
CorraleH
- Comment Unrelated, but usually ontologically, enzyme-items are subclasses of enzyme (Q8047), which are in turn subclasses, not instances of protein (Q8054). See for example User:TomT0m/Classification for an explanation, or deoxyribodipyrimidine photo-lyase (Q424241) which uses , not . Did you mean subclass of (P279) instead of instance of (P31) in your proposal ? author TomT0m / talk page 15:52, 17 June 2019 (UTC)[reply]
- Support. This matter already exists for Wikidata items of drugs and drug classes. I am working on a project to adjust Wikidata biomedical taxonomy. All contributors are invited to join the project. --Csisc (talk) 13:23, 19 June 2019 (UTC)[reply]
CommentSupport Conceptually I'm very supportive of this plan, but would like to see what an example merged item would look like in practice. Just want to make sure I understand how this would impact User:ProteinBoxBot, which I help manage. @Artoria2e5:, can you perform one merge as an example? (I'd do it myself, but I'm not confident I wouldn't mess it up or do it in a different way than you envision.) Best, Andrew Su (talk) 18:59, 20 June 2019 (UTC)[reply]- I have done... enough of them to make me feel annoyed about this issue. D-alanine-D-alanine ligase activity (Q21199314) is an example, although admittedly I am not yet sure about the ontology/subclass issue. --Artoria2e5 (talk) 18:52, 22 June 2019 (UTC)[reply]
- Technically what is classified is usually concrete objects of the real world, for example my car, or my stomach, are examples of the « car » or « stomach » concepts. « My stomach » is then with no ambiguity an instance of « stomach », and could nether be a subclass. The « instance of » statements should be translated as « is an example of », and « subclass of » statements as « are examples of ». For example « human stomach(s) » are example of « stomach(s) », hence « subclass of » is the right property. Here, enzymes are examples of proteins, so the right property is « subclass of ». There is an exception to this rule, as so called « meta-classes ». I tried to summary these in User:TomT0m/Classification. See also en:Is-a (on the ambiguity of the « is a » relationship, which is why there is both instance of and subclass of) and/or en:type-token distinction for a philosophical perspective and en:metaclass (semantic web) for metaclasses. author TomT0m / talk page 21:09, 22 June 2019 (UTC)[reply]
- Thanks for the example, Artoria2e5. Yup, that looks fine from the ProteinBoxBot perspective. I think the reference to "imported from Wikipedia" is not necessary when we have a more authoritative source already there, but not a big deal either. Best, Andrew Su (talk) 19:10, 24 June 2019 (UTC)[reply]
- I have done... enough of them to make me feel annoyed about this issue. D-alanine-D-alanine ligase activity (Q21199314) is an example, although admittedly I am not yet sure about the ontology/subclass issue. --Artoria2e5 (talk) 18:52, 22 June 2019 (UTC)[reply]
: Support. Please more of these fixes. --SCIdude (talk) 13:50, 29 July 2019 (UTC)
[reply]
@Artoria2e5: what happens next? Are you still prepared to merge these entries? The issue is constantly getting in my editing path. --SCIdude (talk) 07:36, 29 August 2019 (UTC)[reply]
- Comment Please also add the qualifier "mapping relation type"--->"exact match" with the merged entries. This would make it possible to distinguish them from items of specific proteins, and we could then remove the distinctiveness criterion. --SCIdude (talk) 07:48, 29 August 2019 (UTC)[reply]
final comment:I see. There is no way to get a list of pairs to merge except via manual inspection, and then you can merge manually as well... --SCIdude (talk) 09:55, 2 September 2019 (UTC)[reply]
- Oppose. Actually, I'm now against this. The concepts "enzyme family" and "molecular function" are too different. Most refuting is that the activity item has an "exact match" statement which gets falsified by the merge. Also, proteins "have" the activity/function through molecular function (P680) and the proposal would merge subject and object of such a statement, e.g. how to state "xyz-group of methyltransferases have methyltransferase activity"? The mentioned problems like inst-of enzyme should be dealt with separately from merging making them inst-of enzyme family. Finally note WD should reflect reality (through imported data) not Wikipedia article design which has its own demands. --SCIdude (talk) 16:10, 2 September 2019 (UTC)[reply]
- Oppose an enzyme is not an "activity" and WD does not store data about instances, but classes of these things. MrProperLawAndOrder (talk) 01:45, 15 May 2020 (UTC)[reply]
Info I propose to close this RfC as no consensus after 31 January 2024. Please comment if you don’t agree. --Emu (talk) 20:17, 13 January 2024 (UTC)[reply]