Technical Documentation

Summary

This documentation describes internal technical processes that make it easier to understand and work with the data when it is to be handled outside of the Receipts Space app. The architecture is designed in the spirit of “Local First” and “File Over App”.

File format - Sync

Log

Changes to data are written to a special continuous “log”. Each installation writes its own log. Installations are distinguished by a unique clientId. The entries in each log are numbered consecutively with whole numbers, starting with 0. In the file system, a separate file is created for each entry for technical reasons (avoidance of conflicts, integrity check, simplification for external file sync services), which are written to individual directories according to a distribution algorithm of 1000 files each.

Changes

The changes to the local database are collected as a list and summarized in a transaction (transactions). Each individual database object is uniquely identified by the value of _id. The integer value of _v is used for the Last-Write-Wins (LWW) method (Lamport-Clock) to ensure that the database is ultimately consistent when it is restored. Only the new values whose _v entry is greater than the found value are set during sync. If there is a conflict, the entry with the higher timestamp wins. This method means that the order in which the change entries are applied is irrelevant (CRDT).

File format

The file is stored in the simplest form as a JSONL file. The first line is the header and all data following the line break, in its actual binary representation, is the content.

The header contains the check elements s with the size in bytes of the content. c is the SHA256 checksum of the content as Base64 with URL-compliant encoding (- instead of + and _ instead of /) without padding. t is the time of creation as a timestamp in seconds. v contains the version number of the format, which is currently 1, but is thus prepared for adaptations. Example:

{"s":1473,"v":1,"c":"QHqyEU4WJOFsnxitlmsXFmpCXV2kZCCctzvO50_3IgM","t":1732311704}

The individual changes per line follow as JSON objects, as defined in JSONL. See above. Example:

{"_id":"47855a70de36449f821d40b45f8c170a","_type":"category","_v":1,"title":"Telekommunikation"}

File attachments

Files are stored outside the database as assets (assets). These are also stored per client with a consecutive index, analogous to the transactions. Database entries refer to assets with a special URL that contains the following data:

  1. cid: The client ID.
  2. idx: The index within the asset repository.

Metadata is not stored, but is part of the reference, which in turn allows the consistency of the data to be checked. Further information is:

  1. checksum: SHA-256 encoded as Base64.
  2. size: Size in bytes.
  3. type: MIME type.
  4. name: File name.

The link to the asset is encoded as a URL:

asset:///cid/idx/name?t=type&s=size&d=checksum

Example:

asset:///1EH7BEtuL9xOz5aTpEyI4K/466/unnamed?s=6284&t=application%2Fpdf&d=JAd0HmXcSIVVdYMmDBjfVZeTvAyXQ94GmjA6CwSwOYU

Special data types

  • The reference to another data set is realized by the _id of the entry. The name of the property usually corresponds to the type of the target object: _type.
  • Date without time is realized by an integer value with the format YYYYMMDD. December 1, 2024 is therefore displayed as 20241201.
  • The amount is represented as a floating point number, e.g. 1.23.
  • Timestamps are usually accurate to the second.

Example

node.js project that reads the Receipts Sync file, creates the database and writes all data (JSON) and assets to the export folder:

Download Export Sample