Saturday, May 30, 2020

literature review for using arXiv as a corpus for analysis

"Towards Machine-assisted Meta-Studies: The Hubble Constant"
https://arxiv.org/pdf/1902.00027.pdf
"an approach for automatic extraction of measured values from the astrophysical literature, using the Hubble constant for our pilot study. Our rules-based model – a classical technique in natural language processing – has successfully extracted 298 measurements of the Hubble constant, with uncertainties, from the 208,541 available arXiv astrophysics papers."


"Scienceography: the study of how science is written" (2013)
https://arxiv.org/abs/1202.2638
https://arxiv.org/pdf/1202.2638.pdf
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.488.6970&rep=rep1&type=pdf
Focused on characterization
separates out packages, comments, authors, figures in the .tex source

"Transforming the arχiv to XML" (2008)
https://link.springer.com/chapter/10.1007%2F978-3-540-85110-3_46
Kohlhase

"An Architecture for Recovering Meaning in a LATEX to OMDoc Conversion" (2009)
https://pdfs.semanticscholar.org/6647/612d3b61102a589db63a7ad9ac243901a9d8.pdf
undergrad thesis; describes processing pipeline for arXiv to OMDoc using LatexML
Kohlhase's student

"Delineating Fields Using Mathematical Jargon"
https://www.aclweb.org/anthology/W16-1508.pdf

"On the Use of ArXiv as a Dataset" (2019)
https://arxiv.org/abs/1905.00075
primarily characterization of arXiv

"Plagiarism Detection in arXiv"
https://arxiv.org/pdf/cs/0702012.pdf


No comments:

Post a Comment