We should've stopped--but let's keep going and see what happens.

Thoughts on Streaming Signal Services, Pt. 1

So, let’s think about the next interesting form of data processing for a bit.

We’ve done RDBMSes to death; lots of very clever people have done some excellent work to make it easy to query related data. NoSQL exists for varying degrees of easily-horizontally scalable data, and also for certain types of exotic data query (think something like Neo4J and graph DBs).

So, we’re going to be dealing with a lot of real-time events and data, right? All the loud money is toting “Internet of Things” and whatnot, massive sensor networks creating lots and lots of bytes of data and streaming it in to some system to digest and analyze.

These data streams can be visualized as flowing from a sensor, to a recorder, to a transform network, to a display device. There’s sort of two workflows you want to accomplish with this: real-time transformation and visualization, and also batch processing rifling through all the available data for trends.

A common problem is also that you’ve got a lot of cases where the same conceptual data source is moving across several data sensors—the usual mistake (at least, it’s a mistake in my opinion) is that people tend to lump the handling of the “what data source generated this data” along with the “this sensor generated this data, from some source” problems.

Really, you just want a very (simple!) service to pull records corresponding to one or more sensors, and then-a separate service for saying “Oh, this data source? Yeah, was tracked by thus-and-such sensors at the requested time…go bug them”.