Saturday, August 26, 2023

HTML arxiv papers, semantic tagging, and formal verification

arxiv recently started experimenting with converting .tex files to HTML: https://info.arxiv.org/about/accessible_HTML.html

The first official announcement was in February 2022:
https://blog.arxiv.org/2022/02/21/arxiv-articles-as-responsive-web-pages/

The HTML from .tex effort is a collaboration between Deyan Ginev (a student of Dr. Kohlhase) and Bruce Miller (at NIST - https://www.nist.gov/people/bruce-r-miller). Kohlhase's group (https://kwarc.info/research/) focuses on semantic enrichment of Latex. Bruce provided the software to convert Latex.

The reason for this Latex to HTML conversion is because it's the first step for enabling semantic enrichment of content on the arxiv. There's immediate benefit for arxiv being able to support HTML, which I suspect is why arxiv cooperated with Kohlhase's group.

In the long term I see a need to connect semantic tagging (the focus of Kohlhase's group) with formal verification (e.g., derivations-as-proofs using Lean). The formally verified math expressions need to be tied to use in narrative text (e.g., arxiv papers). For example, if I'm referring to "x" in a publication, is that the same "x" specified in a Lean-based proof? One way of answering is to use tags like

<unique_variable_id=42>x</unique_variable_id>
 
in the narrative text document, and then have Lean reference the ID 42 in derivations. There are more conventional uses of tags like
<properties name_of_variable="distance" dimension_of_variable="length">x</properties>
but those tags don't address the question of whether two variables refer to the same concept.

Summary

I predict that the conversion of arxiv .tex files to HTML will enable semantic tagging. This will intersect with the challenge of "how do I relate the use of variables and expressions across multiple proofs in Lean?"

No comments:

Post a Comment