20140927 syntax := ABNF
An issue I've previously encountered is how to store the information associated with the graph. I started with plain text, and then moved to XML (the current state). The content stored in the XML is Latex. Latex renders well, but lacks context (meaning). I am now investigating Content MathML as a stricter (but less widely adopted) syntax. Although Content MathML is better, it doesn't entirely capture the use I intend for this project.
The core of the project centers on the following two conventions:
<statement> = <left hand side> <relation> <right hand side>
<statement> --> <inference rule> --> <statement>
The relation is typically equality or inequality, though it could include union and other operators.
The inference rule acts on one or more statements.
The statement is the relation between two mathematical expressions
This is outside the scope of Content MathML, XML, and Latex. I think the appropriate capture is to use something like Augmented Backus-Naur Form (ABNF). Here are a few possible rules:
derivation = 1*2statement inferenceRule statement; some derivations involve multiple input statements, here we allow 1 or 2
statement = mathExpression relation mathExpression
mathExpression = *ALPHA *DIGIT *SP *operator *"\" *"." *"," *"'"
relation = ( "=" / "<=" / "<" / ">" / ">=" / "U" )
operator = ( "+" / "-" / "/" / "*" / "^" / "!" / ( "[" "]" ) / ( "(" ")" ) / ( "{" "}" ) / "|" )
Looking at
the code, I've already adopted the following rules
infrule_name = 1*ALPHA
statement_punid = 10DIGIT
statement_tunid = 7DIGIT
symbol_punid = 15DIGIT
From my notes, it appears I first figured out the requirement for the statement label restrictions on 20110406.
20140527 commercial projects
In the past week I've been made aware of 3 commercial efforts similar to this project
SymboLab and Formula-Database are similar, in that both are about search of equations. In addition to search, SymboLab provides derivations and plot. It's not clear whether these are dynamic or hand-coded. SymboLab appears to be more mathematics oriented, whereas Formula-Database has descriptions of the physics and symbol definitions. EquationMap is about user-generated graphs of derivations. My project has a poorer user interface, and the objective is grander (capture all of physics derivations).
None of these three are direct competition with this open source project, though there is overlap with subtasks from each project. I don't see a clear use case for any of the three -- if it were that useful as a product, I would have expected someone else to already be working in this area.
I'm not clear on why there aren't other open source projects in this area. Proofwiki.org is not graph-based, and PlanetPhysics disappeared.
20140527 task prioritization
Task prioritization:
- navigable interface. The current graph is large enough that intuitive navigation is an issue which needs to be addressed. The graph is currently rendered as a static PNG file. Rendering the graph in a web browser might be more accessible (allowing interactive navigation), and would generate a new set of issues to be addressed (i.e., reading XML databases into a new data structure). By "navigable interface" I am distinguishing from an interface for entering new data to be stored in the database
- ability to high-light subsections of the total graph related to a specific derivation
- ability to high-light the symbol use graph within the statement graph
- add content (E=mc^2, Maxwell's equations)
- fully check graph using CAS (currently SymPy)
- user input through CLI
- user input through web browser GUI
To create a navigable interface, my first guess is to start by finding an example of an HTML5/Javascript directed graph with png/svg files as nodes. This is because I think it's probably unrealistic to render the Latex statements in a browser with a dynamic graph
As far as intuitive interface, I like Google Maps and its ability to zoom in/out when visualizing large spatial data. EquationMap is a good start but isn't open source.
20140526 computer algebra system (CAS) inputs
In reviewing candidate Computer Algebra systems, I still think SymPy is the best option compared to the other free and open source CASs
https://en.wikipedia.org/wiki/List_of_computer_algebra_systems
Currently I expect the user to provide two inputs for each statement: Latex for visual rendering, and the SymPy equivalent for checking with the CAS. This is redundant, since SymPy can render Latex.
The reason I let the user supply Latex is because as a physicist I think in terms of Latex (not SymPy). I treat the addition of the SymPy equivalent as a second step
I have a fundamental conflict with (1) wanting the input format to be easy (i.e., Latex) and (2) wanting the content to be checkable by a CAS (i.e., MathML). I don't think there is a an easy-to-input format which is also easily checkable by a CAS.
Does the database need to be checked by a CAS?
There are two reasons one builds this database:
- as a notepad for current research, possibly to be used in a publication. Assumptions: dynamic; written to by only a few people; it may contain mistakes. Thus, it would be helpful to be checked by a CAS
- to store relations between all accumulated knowledge. Assumptions: static database; written to by many people; read by many people. Thus, mistakes are likely to be found whether or not there is a CAS
20140510 graph vs relational database
I need to better understand the difference between Graph vs relational database vs object
- https://en.wikipedia.org/wiki/Graph_database
- https://en.wikipedia.org/wiki/Relational_databases