I was interested to read you discourse on the development of your time series database. Time series databases have been an interest of mine since about 1997. I needed a database to store sensor data from about 3600 sensors. I cast around and found a system designed to store sound signatures in a SQL database especially designed for the job. This system was called Tekbase (later renamed to Metrica for legal reasons). It ran on UNIX and was fantastic. Unfortunately, the company was bought by a telecoms giant to record data from cellular phone networks and they cut off general scientific/engineering users like me. Some years later I became interested in columnar databases rather than the the normal row based relational databases such as MySQL or SQL Server. I first knocked into these because of an interest in APL, the third oldest but still the most unusual computer language. This language has a long history of using so called inverted tables to hold data. One of its most famous proponents developed several off shoot languages based on the original ideas of APL and the latest of these is the language K. From K they developed a columnar database called KDB which is used to store the massive ticker output from the investment markets and allow the data analysts to analyse this data in seconds rather than 8 to 10 hours with conventional data mining techniques. This is the same problem as sensor data as a time series but is thousands of data points per second from multiple sources that have to be recorded against time. It is probably the fastest time series database in the world and investment banks and hedge funds pay a fortune for it. Columnar databases are coming into fashion to replace hypercubes in business Intelligence systems because they an recover the data many, many times faster and they do not need a massive reconfiguration if an extra field needs to be added to the table. HP bought Vertica, SQL Server added column Store indexes, Oracle has one which I can’t recall the name of. JSoftware has developed one called Jdb, Dyalog APL has developed an early prototype columnar database. This is a long preamble to ask the question as to whether you have looked at considering using a columnar database?
Have you any thoughts about, and/or experience with, InfluxDB?
I looked at Influx last night. So far I have not worked out exactly how the database is structured to achieve the speed necessary for time series work. Another very interesting system is MapD. This is a massively paralleled DBMS system that uses multiple CUDA GPUs to do the processing instead of using CPUs. The speeds are amazing. A desktop PC costing $5000 with 4 GPUs can perform the job as fast as supercomuter (of course the supercomputer is more general purpose). You can even spin a virtual machine up on the Google cloud with up to 8 GPUs and run MapD on that.
Did you get a chance to read this?
I read the article on InfluxDB’s website concerning the “Storage Engine” last night. The today I saw your reply with the link to the other article and the accompanying video. I am very impressed by the direction that InfluxDB has taken. Firstly, it definitively states that it is using a columnar database, which fits with my own feelings that this is the way to go and then it takes the whole issue of what makes a time series special to a new level. I was thinking that using a columnar database was the solution but it is only a part of the solution. The reason for this is that I had not really understood the extreme write and read speeds needed today. Twenty years ago when I installed Tekbase to record data from 3600 sensors on 200 test rigs the rate of writes was only 1.7 million writes per day. Influxdb is aiming to support orders of magnitude faster rates than that. I am very impressed.
Would be nice to see emonCMS also working with InfluxDB. I think I will give this a try in the next weeks.
The link is to Grafana’s Live Demo page. You can modify any of the Dashboards on the page.
The “Big Dashboard” has lots of widgets to play with.