Monday, April 6, 2020

data in JSON does not scale to multiple users

In version 7 of the Physics Derivation Graph I realized that I could use Python's Pickle format to serialize the data stored in memory without having to decide what storage format (CSV, XML, SQLite) is best.  That insight lead to use of JSON because everything needed fits in dictionaries and lists.

The use of Pickle and then then JSON enabled development of many features, so it was a worthwhile investment. However, some operations are not well suited to the nested dictionaries and lists. A set of tables might be better for some operations. Converting from the current dictionaries and lists to tables would be a big rewrite, so I haven't started that yet.

If I move away from JSON for storage, the current candidates are Redis and PostgreSQL and SQLite3.

Use of a relational database would require a significant rewrite since most of the functions in the PDG rely on the nested dictionaries.

As a potentially easier transition, Redis has a plugin that supports JSON:
https://redislabs.com/blog/redis-as-a-json-store/
https://redislabs.com/redis-best-practices/data-storage-patterns/json-storage/
However, I'm not comfortable with the PDG being dependent on a plugin.

If I go with a relational database, I'll need to choose which one.
MySQL or PostgreSQL versus SQLite
https://stackoverflow.com/a/5102105/1164295
"SQLite can support multiple users at once. It does however lock the whole database when writing, so if you have lots of concurrent writes it is not the database you want (usually the time the database is locked is a few milliseconds - so for most uses this does not matter)."
https://www.sqlite.org/whentouse.html
"Any site that gets fewer than 100K hits/day should work fine with SQLite."
To improve concurrency, reads can happen without blocking writes: https://www.sqlite.org/wal.html

No comments:

Post a Comment