Page MenuHomePhabricator

Special:Linksearch should de/encode internationalized domain names
Closed, ResolvedPublic8 Estimated Story Points

Description

http://xn--kbenhavn-54a.eu is the decoded version of the internationalized domain name http://københavn.eu . Steps to reproduce:

  1. Place a link to any one of the above domains on any wiki page.
  2. Search for the other in Special:Linksearch

The link created in Step 1 is not returned. Special:Linksearch should find links for all combinations of a de/encoded search string and a de/encoded link.

Version: 1.27.0-wmf.17 (rMWd511973bb2ba) 03:00, 18 March 2016

See also:

Event Timeline

Change 320721 had a related patch set uploaded (by Legoktm):
Parser: normalize internationalized domain names

https://gerrit.wikimedia.org/r/320721

My patch normalizes the externallinks table to the unicode version, but requires that you use the unicode version on Special:LinkSearch, is that acceptable?

  • It's necessary, but not sufficient -- one could imagine someone placing a link like xn--kbenhavn-54a.eu (e.g. https://en.wikipedia.org/?diff=748790713). It violates the principle of least astonishment that when one copies the link into Special:Linksearch -- the usual workflow -- no results are returned. Those not technically inclined wouldn't know what to search for, and those who are have to go through the step of encoding the domain name using some external tool.
  • Are there enough links of the form xn--kbenhavn-54a.eu existing in our databases to justify a maintenance script? (My gut feeling is no.)
  • Does this patch fix T130483 as a side effect?
  • It's necessary, but not sufficient -- one could imagine someone placing a link like xn--kbenhavn-54a.eu (e.g. https://en.wikipedia.org/?diff=748790713). It violates the principle of least astonishment that when one copies the link into Special:Linksearch -- the usual workflow -- no results are returned. Those not technically inclined wouldn't know what to search for, and those who are have to go through the step of encoding the domain name using some external tool.

Alright, I'll work on that then.

  • Are there enough links of the form xn--kbenhavn-54a.eu existing in our databases to justify a maintenance script? (My gut feeling is no.)

No idea, but we can either wait for pages to be naturally purged or use refreshLinks.php. But since you don't think so, we can just let it happen naturally.

  • Does this patch fix T130483 as a side effect?

Kind of. It normalizes everything to the unicode form, so regexes that are in unicode will get matched against both decoded and encoded domains.

  • Does this patch fix T130483 as a side effect?

Kind of. It normalizes everything to the unicode form, so regexes that are in unicode will get matched against both decoded and encoded domains.

I would consider that sufficient, as long as it is documented somewhere.

Change 322729 had a related patch set uploaded (by Anomie):
Use new externallinks.el_index_60 field

https://gerrit.wikimedia.org/r/322729

kaldari set the point value for this task to 8.Jun 5 2017, 10:34 PM

@MaxSem: Note that https://gerrit.wikimedia.org/r/322729 would solve this task. Current status of that change is that the schema change in T153182 needs doing, then https://gerrit.wikimedia.org/r/#/c/322728/ needs merging, then a maintenance script needs running.

MaxSem removed MaxSem as the assignee of this task.Jun 6 2017, 7:50 PM
MaxSem removed a project: Community-Tech-Sprint.
MaxSem subscribed.

Okay, leaving this to avoid stepping on your toes.

Change 320721 abandoned by Legoktm:
Parser: normalize internationalized domain names

Reason:
Anomie's patch will take care of this once the schema change is done.

https://gerrit.wikimedia.org/r/320721

Change 322729 merged by jenkins-bot:
[mediawiki/core@master] Use new externallinks.el_index_60 field

https://gerrit.wikimedia.org/r/322729

Newly-created IDN links will now be translated.

Existing IDN links may not be found correctly for all searches until T209373: Run maintenance/refreshExternallinksIndex.php on all wikis is complete.