InfluxDB as new timeseries engine

Please comment on supporting this ‘new’ engine, do you think it can bring evolution to EmonCMS?

InfluxDB is a time series database built from the ground up to handle high write and query loads. It is the second piece of the TICK stack. InfluxDB is meant to be used as a backing store for any use case involving large amounts of timestamped data, including DevOps monitoring, application metrics, IoT sensor data, and real-time analytics.

2 Likes

I realize that you are asking the question… but as a programmer, what do you see the advantage being of using influxdb over existing?
What would be the benefits from a user’s perspective.

Paul

My concern would be that we’re still seeing migration issues from the MySQL days, so changing again could (I don’t say would) give rise to another round of the same.

There’s a long established rule that says ‘The best tool for the job is the one you’re familiar with.’ Applying that indiscriminately clearly stifles innovation, all I’m doing is advising caution, because of the support issues in either migrating existing users or maintaining two systems. If the perceived advantages outweigh those, then I’m OK with that.

For users with 64-bit systems, it’s available as a Ubuntu/Debian package as well as packages for Redhat/CentOS, OS X, Standalone use and Docker.

Installation is easy. It’s a single file with no external dependencies. (as is Grafana)

e.g. the command sudo dpkg -i influxdb_0.13.0_amd64.deb
is all that’s needed to install the current (as of this writing) Debian version.
It’s easy to use. i.e. the write and query APIs are easy to understand.
The Query Language is similar to MySQL.

ARM as well as X64 processors are supported, which means it will run on a Raspberry Pi.
Performance on a Pi 2, while not blazing fast, is definitely not bad.

X86 CPU versions aren’t produced by the developer, and although influxdb is easy to compile from source code, that’s something most users are not going to want to deal with, so is likely to be a show-stopper for many.

FWIW, here’s an example of Influxdb and Grafana: tppg.mooo.com
It’s running on a Virtual Private Server, so performance is very good. I’ve run a copy of Influxdb/Grafana on an Intel Atom N270 based nettop with 1GB of RAM with good results.

1 Like

I will try to answer all my concerns, and why sugesting influx DB.
The need of a better engine is for scale applications, running on a raspberry i do recommend mysql with redis low write.
PHPTIMESERIES is fine as a proof of concept but it should never be seriously used on any professional application where reliability and performance is required because it’s based on PHP, a scripting language that is not much performant and threads can abort at anytime without considering critical phases like file writes.
My concerns relate to the above and the fact that some changes happen from time to time that are specific of a particular engine issues. The solutions should be agnostic of the engine selection.
InfluxDb is a database specific for time data, it’s build in C (edit: GO and is compiled) so it’ faster than emoncms implementations in PHP.
It’s maintained by others, and part of a larger project, so we can expect it to evolve and bugs get fixed similary as with mysql.
I personally have few free time right now to do the new engine module.
But invite and can help on doubts who whats to begin that devellopment. Engines sources are well documented and mysql engine source is a simple starting template - just need to keep the functions name and replace the code to use influxdb instead of mysql.

2 Likes

Sounds like a good argument to me.
Especially that it’s maintained by others, reducing the support demands.
I see that there is already a node-red-contrib node to write & query data from InfluxDB, which would enable greater integration, but ultimately its down to @TrystanLea & @glyn.hudson to determine, and I welcome their thoughts.

Paul

Probably a moot point, but according to

it’s written in Go

Personally, I am happy with the PHPFina and PHPTimeSeries engines, they have been stable now for some time and have allowed emoncms.org to scale a long way from earlier engines: mysql and timestore. I also think both the simplicity and lower level understanding of the engines is useful for long term maintenance. The language that the engine is written in doesn’t necessarily make a huge difference, as the bottleneck is IO. Chris Davis, who wrote graphite, wrote an interesting bit about this here: The Architecture of Open Source Applications: Graphite

I’m aware PHPFina and PHPTimeSeries is not everyone’s cup of tea and if you’re familiar with MYSQL you may prefer to use it or an other engine, and I don’t want to stop people using what they would prefer. If you’re interested in feed engines, I would really recommend spending a bit of time understanding the inners of PHPFina and PHPTimeSeries, and it may also be useful to read the timeseries engine development documentation I wrote a while back, here: https://github.com/openenergymonitor/documentation/tree/master/BuildingBlocks/TimeSeries

Disclaimer: This thread is to let all of us become aware of a new technologies that keep appearing, i’m not forcing its adoption.

InfluxDB is still in its earlier steps but it seems much oriented to be ‘The database’ for time data recording.

I agree with trystan, it’s not a problem as of right now as php implementations suffice, but i twist my nose to having a database on top of php for the reasons said.

The effort spent in calculating averages in php and addition of parameter to the api as a way of fixing specific engines gaps is not a good thing.
Thus InfluxDB return averages, fills intervals, groups by, downsamples,etc on its own and much other tricks that we are trying to do in emoncms code with performance penalties because of the php. See Query Language | InfluxData Documentation Archive

BTW, as of today i think the other thread about dashboards evolution to predifined
component sizes is much more interesting for the future of emoncms.

I would personally love to see InfluxDB implemented, especially as a remote logging option. InfluxDB is nice in that it can handle the aggregation of old data easily. For example, it can automatically keep KWh data every 5 seconds for a day, every minute for a week, and then hourly for data that is older than a week. I personally use it for logging data for my weather stations, CPU load, memory, disk, etc. I’m using OpenHAB and have been using the InfluxDB persistence module to store my HVAC settings and termperatures. I even have my meat smoker log to it when I’m using it. I use InfluxDB at work as well for lots of time series data. It’s fantastic for keeping data. Put with Grafana and I have begun creating some great dashboards to monitor my home.

And multiple-user-account friendly too.

Hi Bill,

I had to take a fresh look at building a time-series database in designing the SDcard datalog for my IotaWatt energy monitor. I haven’t gone back to look at the specifics of the current model used in eMonCMS, but I think my solution might have some of the virtues of InfluxDB with respect to data storage. I don’t have a query tool yet.

The crux of it is that everything is logged as double precision value*hours, along with total hours. With standard 8 byte double precision, this is enough to extract the average data value between any two entries, or to report the total kwh of a high power circuit over many years. In that respect it’s something like the eMon KWH feeds. But by dividing the time weighted values by the time (hours) you can also get the average discrete value (like watts, volts, current) for the differential period. So there is no need for separate power and kwh files. They are the same. Either can be extracted by reading only the records for the intervals needed. It will also support most any other type of data - like temperature or even on/off events.

The file only contains active measurements. So if there is a gap in the data, there are no file entries. posting means just adding a record to the end. Turns out that searching for individual entries is simple and fast.

The downside is that a single entry for the fifteen channel IotaLog is 256 bytes. That includes two data items per channel - voltage and frequency for voltage channels/ power and current for current channels. That’s about 1.6GB per year at 5sec sampling. Storage cost and availability seem to be staying way ahead of that as a single 16gb SDcard could hold almost ten years worth of entries.

Another potential downside for a data repository is that it’s not practical to allow changing historical data. You can only move forward. In my case, at the device level, I don’t think that’s a problem unless you happen to own a DeLorean with a flux gate capacitor.

I would be very pleased to see emonCMS working with InfluxDB! IMHO it’s probably the best timeseries database available. Is it possible to add it to emonCMS without the need to change the code of graph, dashboard and apps?

1 Like

I know I’m only new here, but I have been looking at adopting the OEM hardware in conjunction with other smart home devices, and Home Assistant.

It was while looking over the components for Home Assistant that I came across its integration with InfluxDB. Since then I have installed InfluxDB with Grafana with some test sensor data and have been blown away by the ease of setup, use and speed.

I am totally convinced by InfluxDB that it will now be my DB for time based data for my smart home which I am slowly pulling together.
I’m with the others that it would be killer to be able to send the Emon data directly to an Influx DB for dashboarding in Grafana.

I could be wrong, but believe I could do this in a roundabout kind of way through a Home Assistant EmonCMS sensor, but would also nice to be able to record the data direct.
It would be nice to have InfluxDB as an option, not a replacement.

Started to integrate InfluxDB into emoncms: GitHub - rexometer/emoncms at influx
It’s in a quite early state, but would love to work with you together to improve it @davidski
Of course any comment is welcome.