January | 2016

Update March 2021

This post still gets a fair few hits, so I want to preface this by saying that I wrote this some time ago, and probably wouldn’t do the same thing today. The main problem is that I ignored the client! Sending a single whole JSON object in such a way means the client has to reassemble the whole thing in-memory anyway, which somewhat defeats the point, unless you have fat clients and very thin servers (or processing is slow and you have a very short socket timeout). You might have a valid use-case for a hack like this, but if I were to solve this problem again, I’d send newline-delimited json instead (with metadata in headers or on a separate endpoint if necessary).

I have a SQLAlchemy query (able to behave as an iterator) which could return a large result set. First version of the code was very simple. Release objects have a to_dict() function which returns a dictionary, so I append to a list and jsonify the result:

# releases = <SQLAlchemy query object>

output = []
for r in releases:
    output.append(r.to_dict())

return jsonify(releases=output), 200

(context on github)

This result set could potentially grow to a point that fitting it memory would be impractical – with only a thousand releases there is already a significant lag before we start getting results. Continue reading →

All of us have data which has value beyond our own lives. My parents’ generation have little record of their childhoods, other than the occasional photo album, but what little records there are, are cherished. My own childhood was well preserved, thanks to the efforts of my mother. Each of my brothers and I has a stack of photo albums, with dates and milestones meticulously documented.

Today, we are generating a massive amount of data. While the majority of it will not be of interest to future generations, I believe preserving a small, selective record of it, akin to the photo albums my mother created, would be immensely valuable to my relatives and descendants – think of your great grandparents jewellery, a photo album of your childhood that your parents created, immigration papers of your predecessors.

Modern technology allows us to document our lives in vivid detail, however the problem is that the data is transient by nature. For example, this blog is run on a Linode server – if I die, the bill doesn’t get paid and Linode deletes it. If Linode goes away, I have to be there to move it to a new server. If Flickr goes away, my online photos are lost. If Facebook goes away, all that history is lost. Laptops and computers are replaced regularly, and the backups created by previous computers may not be readable by future ones, unless we carry over all the data each time.

In part one of this series (this article) I document the problems of common backup solutions for archival storage, with reference to my own set-up. In part two, I’ll detail my “internet research” into optical BD-R media and how it solves these problems, and in part 3 I’ll deal with checksums and managing data for archival (links will be added when done).

Part 1 is fairly technical, so if you just want safe long-term storage, install and configure Crashplan, and skip to part 2.

Continue reading →

Al4

My hobby…

Monthly Archives: January 2016

Streaming JSON with Flask

Update March 2021

Archival Storage Part 1: The Problems