PageHTMLHandler currently makes an HTTP request to RESTbase to fetch Parsoid HTML. It should instead call Parsoid directly. The critical aspect to consider is caching. The REST endpoint handled by PageHTMLHandler is not yet used for any public functionality, so cache capacity is not yet a problem. However, the intent is to move us towards eventually using the core REST API as a backend for VisualEditor. For that to be feasible, we need a caching strategy.
Skope of this task: Make PageHTMLHandler use ParserCache, and call Parsoid directly. Wrap Parsoid output in a ParserOutput object. Note that this may cause a lot of data to be added to ParserCache if something started to use the PageHTMLHandler endpoint heavily.
Later tasks to consider:
- Pre-generating parsoid output after edits, and populate the relevant key in the ParserCache.
- Putting more information into ParserOutput generated by parsoid output, eventually creating parity with the old parser's output, so parsoid can drive secondary data updates (in particular, LinksUpdate).
- Create an API in core that can be used by VisualEditor, with fully parity to RESTbase. In a first step, it can be simply a proxy to RESTbase for functionality still missing in core. RESTbase can then be made internal.
- Slowly shift cache capacity from RESTbase into ParserCache: Add a mode to RESTbase that will return HTML only if it's already cached. In core, when content is not in ParserCache, ask RESTbase for that content before trying to regenerate. If it's in RESTbase, fetch it from there, store it in ParserCache, and delete from RESTbase.