Data Viewer questions

borpin · 10 April 2018 13:44

Couple of queries on the Data Viewer.

Is there any documentation to read that explains the Data Viewer, (things like ‘skip missing’ and ‘Limit Interval’)?
Why does changing the interval sometimes imply there is missing data?
I do not get a list of feeds on the left as one of the videos shows.

Secondly, is there any way to get the additional data available on the web page (such as min/max/mean), returned via the API? Would be really useful if it could.

I am assuming here that this is a standard module as there is nothing to indicate the data viewer requires an additional module to be installed.

Server Information


Emoncms	Version	9.8.28 : 2018.01.27
	Modules	Administration : EventProcesses : Feed : Input : CoreProcess : Schedule : Time : User : Visualisation
Server	OS	Linux 4.9.0-4-amd64
	Host	DietPi-Emoncms DietPi-Emoncms (127.0.1.1)
	Date	2018-04-10 14:35:16 BST
	Uptime	14:35:16 up 8 days, 1:58, 1 user, load average: 0.01, 0.00, 0.00
HTTP	Server	lighttpd/1.4.45 HTTP/1.1 CGI/1.1 80
MySQL	Version	5.5.5-10.1.26-MariaDB-0+deb9u1
	Host	localhost (127.0.0.1)
	Date	2018-04-10 14:35:16 (UTC 01:00‌)
	Stats	Uptime: 698245 Threads: 2 Questions: 152137 Slow queries: 0 Opens: 38 Flush tables: 1 Open tables: 32 Queries per second avg: 0.217
Redis	Version	3.2.6
	Host	localhost:6379 (127.0.0.1)
	Size	182 keys (861.01K)
	Uptime	8 days
MQTT	Version	1.4.14
	Host	localhost:1883 (127.0.0.1)
Memory	RAM	Used: 46.85% Total: 987.36 MB Used: 462.55 MB Free: 524.81 MB
	Swap	Used: 0.77% Total: 1.04 GB Used: 8.13 MB Free: 1.03 GB
Disk	Mount	Stats
	/	Used: 70.35% Total: 7.75 GB Used: 5.45 GB Free: 2.28 GB
PHP	Version	7.0.27-0+deb9u1 (Zend Version 3.0.0)
	Modules	apc v5.1.3 : apcu v5.1.8 : calendar v7.0.27-0+deb9u1 : cgi-fcgi : Core v7.0.27-0+deb9u1 : ctype v7.0.27-0+deb9u1 : curl v7.0.27-0+deb9u1 : date v7.0.27-0+deb9u1 : dio v0.1.0 : dom v20031129 : exif v7.0.27-0+deb9u1 : fileinfo v1.0.5 : filter v7.0.27-0+deb9u1 : ftp v7.0.27-0+deb9u1 : gd v7.0.27-0+deb9u1 : gettext v7.0.27-0+deb9u1 : hash v1.0 : iconv v7.0.27-0+deb9u1 : igbinary v2.0.1 : json v1.4.0 : libxml v7.0.27-0+deb9u1 : mbstring v7.0.27-0+deb9u1 : mcrypt v7.0.27-0+deb9u1 : mosquitto v0.4.0 : mysqli v7.0.27-0+deb9u1 : mysqlnd : openssl v7.0.27-0+deb9u1 : pcre v7.0.27-0+deb9u1 : PDO v7.0.27-0+deb9u1 : pdo_mysql v7.0.27-0+deb9u1 : Phar v2.0.2 : posix v7.0.27-0+deb9u1 : readline v7.0.27-0+deb9u1 : redis v3.1.1 : Reflection v7.0.27-0+deb9u1 : session v7.0.27-0+deb9u1 : shmop v7.0.27-0+deb9u1 : SimpleXML v7.0.27-0+deb9u1 : sockets v7.0.27-0+deb9u1 : SPL v7.0.27-0+deb9u1 : standard v7.0.27-0+deb9u1 : sysvmsg v7.0.27-0+deb9u1 : sysvsem v7.0.27-0+deb9u1 : sysvshm v7.0.27-0+deb9u1 : tokenizer v7.0.27-0+deb9u1 : wddx v7.0.27-0+deb9u1 : xml v7.0.27-0+deb9u1 : xmlreader v7.0.27-0+deb9u1 : xmlwriter v7.0.27-0+deb9u1 : xsl v7.0.27-0+deb9u1 : Zend OPcache v7.0.27-0+deb9u1 : zip v1.13.5 : zlib v7.0.27-0+deb9u1 :

Client Information












HTTP
Browser
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0


Screen
Resolution
1920 x 1080


Window
Size
1601 x 906

pb66 · 10 April 2018 16:01

not to my knowledge, I seem to recall @TrystanLea explaining what they were when they got added, but I have just done a quick search and didn’t come up with anything.

Because not all of the datapoints are displayed, there is only so many datapoints shown depending on browser window size, screen resolution, total period queried, the selected interval and of course the feed interval. The api call will look for a datapoint close to each target datapoint and if there is no data at that point it might show as missing. Shifting either the start or end times could result in a completely different results set or graphed profile for pretty much the same period.

I think you are looking at the “dataviewer” visualization rather than “graph” module, it’s the graph module that gives you the feed list on the left.

There is a setting in emoncms settings.php to use the “graph” module as the default “dataviewer” when clicked from the feeds page

https://github.com/emoncms/emoncms/blob/master/default.settings.php#L133

if you set that to graph/ you will be taken to a different page when you click the eye next to a feed on the feeds page.

BUT! for that to work you need to install the graphs module, I don’t see it as an installed module in your server info.

No there’s no other clue’s other than the comment in settings.php.

Indeed it would, as far as I can tell only the average is available. For the feed api dox

Returns feed data between start time and end time at the interval specified. Each datapoint is the average (mean) for the period starting at the datapoint timestamp. https://emoncms.org/feed/average.json?id=0&start=UNIXTIME_MILLISECONDS&end=UNIXTIME_MILLISECONDS&interval=3600

borpin · 11 April 2018 13:44

It there any way of extracting the raw data via API, i.e. give me all the recorded data points between time X & Y?

cagabi · 17 April 2018 11:11

Hi,

Assuming you are logged in you can build the URL yourself as follows (in bold what you have to change):

your_emoncms_URL/feed/data.json?id=your_feed_id&start=timestamp_in_miliseconds&end=timestamp_in_miliseconds&interval=interval_in_seconds&skipmissing=0_or_1&limitinterval=0_or_1

Beware of:

Only 8928 datapoints can be returned. How many datapoints you are requesting will be given by the interval, the start and end points
skipmissing: only taken into account for fixed interval feeds (PHPFina)
limitinterval: if set to 1 will ensure that the interval you are requesting is not shorter than the feed interval
This will return a massive json string

pb66 · 17 April 2018 17:02

Sorry Brian, I missed or forgot your post.

@cagabi has provided the info it you haven’t done it already. The graph module or data viewer put priority on the start and end times and calculate the interval, you can override that and fix the interval.

Likewise with the feed api’s, you can set the interval, but if the start and end times mean there is more than 8928 datapoints to return, it will just return an error message.

If you are using a self-hosted emoncms, you can edit that limit. Otherwise to get a every datapoint in a feed you would need to start at the beginning and add 89280 secs to calculate and define the end time (assuming a 10s phpfina), then use that end time as the next calls start time and add 89280 secs again, basically with a 10s phpfina feed you can only download just over a days worth of data with each request, but it should be straight forward to automate with a script and append each new reply to a single file.

borpin · 20 April 2018 08:45

Am I right in thinking that if I specify a short period, say 1 second, and Limitinterval = 1, then it will simply return every data point available?

I’m really not understanding the effect of this setting on the data returned.

The reason for this is that I think I am having issues that the sending data is coming in at between 11-12 seconds, so I keep getting dropped inputs. I’d just like to know exactly what has been recorded in the Feed (and when) so I can work it backwards.

cagabi · 20 April 2018 10:49

Yes, if your feed interval is 5s (minimum allowed in the dropdown when creating a feed) or anything higher, and in your API call you set “interval” to anything lower than that (like you say 1s) and “limit_interval” to 1, then you will get all the datapoints (at the feed’s interval), there will be not any “empty” datapoints. If you Limitinterval = 0 you wilt get datapoints with a null value because there is no data for them.

This need a bit of explanation, so how fixed interval feeds work:

For each feed we have file storing metadata (feed_id.meta), it keeps the timestamp of when the feed was created and the interval.
The values are stored in another file (feed_id.dat)
- Each byte in the file is a feed value
- Let say the interval is 1m, then the 1st byte of the file has the value of the first sample (creation date), the 2nd byte has the value of the sample 1 minute later, 3rd byte the value of sample 3 which is 2 minutes after creation… and so on
Imagine the following situation:
- when the 10 first samples have arrived, our file is 10 bytes long and the file has data for those first 10 minutes
- for some reason the next 10 samples get lost, so aren’t ingested by emonCMS. We have lost 10 minutes of data and nothing has been written to the file
- The next sample arrives. This is minute 21, but in our file we still only have the first 10 samples
- Because emonCMS knows the starting date of the file, it calculates this new sample has to be stored in byte 21.
- emonCMS opens the file and realizes that its only 10 bytes long. So what it does is insert 10 null values for the missing samples and stores sample 21.

And now, to answer your question, when you request the data via API with skipmissing = 0, you get something like:

[[timestamp_1,5],[timestamp_2,7],[timestamp_3,null],[timestamp_4,6]]

As you see, the 3rd sample got lost and its value is null.

If you request the data via API with skipmissing = 1, you would get something like:

[[timestamp_1,5],[timestamp_2,7],[timestamp_4,6]]

Effectively you have skipped the missing value.

borpin · 20 April 2018 11:10

Great explanation - thanks. Can it go into the documentation somewhere please

So if I have a feed running at 10s, but the input is actually coming in at 11 seconds, does it record the feed time or the input time?

What if the input is running at 9s and the feed is 10s?

I suppose this is about understanding the relationship between the inputs and the feeds. I also think this is to do with understanding the different feed types and the impact that has as well.

cagabi · 20 April 2018 12:21

Less comprehensive but it’s kind of there already
https://learn.openenergymonitor.org/electricity-monitoring/timeseries/Fixed-interval

Uhmm, I have had to look at the code

Quick answer: it stores the feed timestamp

Longer answer: as the timestamp is not stored in the file, emonCMS calculates where the position should be in the feed file, rounding it to the lowest position.

To make it clear:

input timestamp | feed timestamp

      0               |           0
      11             |           10
      22             |           20
      33             |           30
      44             |           40
      55             |           50
      66             |           60
      77             |           70
      88             |           80
      99             |           90
      110           |           110
      121           |           120

As you see there will be one missing point at 100s

Doing as above

input timestamp | feed timestamp

      0             |           0
      9             |           0
      18           |           10
      27           |           20
      36           |           30
      45           |           40
      54           |           50
      63           |           60
      72           |           70
      81           |           80
      90           |           90
      99           |           90
      108         |           100

As you see the feeds with:

timestamp 0 will have the value of input with timestamp 9s, input with timestamp 0 is lost forever!!!
the same for the feed with timestamp 90s, it will have the value of input with timestamp 99s and the input with timestamp 90 is gone

Very revealing for myself as well

Well, all of this only applies to fixed interval feeds, the non fixed interval ones do log the timestamp so this problem would be avoided. But resist the temptation to not use the fixed interval feeds, from the same link than above:

There are two main advantage of this approach versus the variable interval approach.

The first advantage is that if we want to read a datapoint at a given time we don’t need to search for the datapoint as we can calculate its position in the file directly. This reduces the amount of reads required to fetch a datapoint from up to 30 reads down to 1 giving a significant performance improvement.

The second advantage is that in a time series where the data is highly reliable, the disk space used can be as little as half that used used by a variable interval engine, due to not needing to record a timestamp for each datapoint.

borpin · 22 April 2018 21:19

What I gather from the code is that when the input changes it forces a check to see if the input should go into the feed store. Is that right? Does this also mean there is a maximum jitter of the feed interval (i.e. an input could be recorded as being at a time ± feed interval?

Am I also correct in thinking that if you want to ensure you get every input, running the feed quicker than the input frequency assures that you record every input even though you get apparent blanks?

As an aside, am I right in thinking that once a feed has not been used for 115 days, it can never be used again (if over padding limit it just exits)

github.com

emoncms/emoncms/blob/12876ee932fb364dfa30ce0725e3f7176fbfe349/Modules/feed/engine/PHPFina.php#L198


      
              $this->log->warn("post() could not open data file id=$id");
              return false;
          }
          
          // Write padding
          $padding = ($pos - $last_pos)-1;
          
          // Max padding = 1 million datapoints ~4mb gap of 115 days at 10s
          $maxpadding = 1000000;
          
          if ($padding>$maxpadding) {
              $this->log->warn("post() padding max block size exeeded id=$id, $padding dp");
              return false;
          }
          
          if ($padding>0) {
              $padding_value = NAN;
              
              if ($last_pos>=0 && $padding_mode!=null) {
                  fseek($fh,$last_pos*4);
                  $val = unpack("f",fread($fh,4));

My pleasure

pb66 · 23 April 2018 09:19

I think it might actually be first come first served, when the 0 or the 90 are presented, the slots are available and they get persisted, when the 9 and 99 get processed, the slots are taken and the data doesn’t get persisted,

I’ve never checked, but I suppose this might actually vary across installs, I know the buffered-write prevents any values being over-written once persisted, but I suppose they do remain in the buffer until the buffered write occurs, so it might even depend when in the “buffered-write cycle” that they are processed. I wonder if redis plays any part in this too?

pb66 · 23 April 2018 09:24

Correct!

All the emonTx’s and emonTH sketches are set up to be slightly under the intervals so that “it appears” to be correct, ie there are no null points by airing on the side of caution and posting at 9.9secs rather than 10s, but that means a percentage of the data is lost to ensure the graphs “look nice”.

Correct again, for a 10s feed, a emonTH 60s feed will go 690 days before hitting the limit.

borpin · 23 April 2018 09:34

What do you mean by ‘persisted’.

Looking at the code (line 226), it seems there is not a check to see if the $pos i.e. the position calculated (Ln 181) on the basis of the incoming $timestamp (which is a reused parameter which I find irritating) is the same as the $last_pos from the meta file. Therefore any earlier use of that position will be overwritten by later data.

pb66 · 23 April 2018 11:05

Written to disk, no longer just in RAM, Redis or Buffer etc, actually written to disk.

Looks like that is right too.

When posting using the buffered-write though, the usual phpfina “post()” function isn’t used.

When the input process calls feed->insert_data it checks for redisbuffer enabled (feed_model.php L814) and uses redisbuffer->post() and not phpfina->post() for low-write installs.

In that instance post() just adds the details to a buffer and then periodically, the bufferedwrite service calls redisbuffer->process_buffers() which in turn calls redisbuffer->process_feed_buffer(). Which (and I’m a little hazy here) I believe will ordinarily call phpfina->post_bulk_prepare(). Which appears to check if ($pos>$last_pos) and only writes the data if the data is later than the last recorded data (not equal to).

github.com

emoncms/emoncms/blob/master/Modules/feed/engine/PHPFina.php#L679


      
                  if ($pos>=0 && $pos < $meta->npoints)
                  {
                      // read from the file
                      fseek($fh,$pos*4);
                      $val = unpack("f",fread($fh,4));
          
                      // add to the data array if its not a nan value
                      if (!is_nan($val[1])) {
                          $value = $val[1];
                      }
                  }
          
                  $split_values[] = $value;
              }
          
              $data[] = array($time,$split_values);
          
              $date->modify($modify);
              $time = $date->getTimestamp();
          }
          fclose($fh);

further down that code, the “else” says "if data is in past, its not supported, … ".

The code is pretty complex so I wouldn’t be at all surprised if I’ve got something not quite right.

A lot of it revolves around what $last_pos actually is. I had arrived at the conclusion that the minus one in the $last_pos = $meta->npoints - 1; lines is just to derive an index from a count eg a count of 3, minus 1, gives us an index of 2 as in 0,1,2 is a count of 3.

borpin · 23 April 2018 12:43

Ah. No, redisbuffer isn’t enabled in this instance and feedwriter is not running.

Don’t you just love it when there are 2 bits of code that are to all intents and purposes doing the same but are in different places and actually have subtle differences as that is how they have evolved over time…

I note that the phpfina->post_bulk_prepare() version does not have a padding limit.

Yes I’d come to the same conclusion.

I suppose the academic question is, should both methods of inserting data to the feeds be consistent? I suspect the answer is yes.

Way OT now! However, at least I have a good idea as to what I am looking at data wise now!

cagabi · 23 April 2018 13:48

I totally agree with that, I think I prefer the redisbuffer solution:

if data is in past, its not supported, could call update. It makes all the sense that for changing an already existing value you have to update it
Max padding: I can’t see a reason for having it. Also it can be very annoying not to know why a feed is not updating, at least there should be something in the Feeds view to tell it to the user in the same way that it says Active or Inactive. But this doesn’t make sense either as it’s dependant on using the redisbuffer or not. I would get rid of it

@pb66, what do you think? It wouldn’t be very much to change the code in the phpfina->post(). I’m happy to do it and do a pull request

borpin · 23 April 2018 14:40

From a purist perspective, I’d like to see the common code from both functions put into a single function/procedure so you can be assured both methods react in the same manner and any changes are made common by design.

pb66 · 23 April 2018 17:25

I also think post and update are different things, if you look at a power to kWh process, it uses “update” not “post” as it is designed to keep overwriting the last value.

It’s something that is used on emoncms.org to manage the abandoned feeds. Originally the emoncms master branch was used for emoncms.org too and the low-write was a separate branch, we managed to combine the low-write and “normal” versions into one branch with different “modes” to reduce the versions and unify the “modes” of using emoncms. BUT the emoncmsorg repo popped up with another version of the code so that was a wasted effort.

I no longer know why the max-padding is still in the main repo since emoncms.org has it’s own repo, but IMO even if it was to stay, I think it should be a setting not hardcoded. The default setting could be max_padding = 0 and the code could be

if ($max_padding) {
        if ($padding>floor($maxpadding / $meta->interval)) {
                $this->log->warn("post() padding max block size exeeded id=$id, $padding dp");
                return false;
        }
}

This would mean a max padding of 0 would switch off max padding and the $max_padding setting would be seconds not datapoints so that all feeds would hit the limit after the same period of inactivity, eg a max_padding of 7776000 would give a limit of 90 days regardless of the interval used.

However, this feature might become redundant even on emoncms.org, you cannot really charge “per feed” and then block the user from posting to that feed, no matter how long it’s been since they last used it.

For the record, I think the max datapoint limit also needs to be user defined instead of being hardcoded. The number “8928” is hardcoded 8 times in emoncms just for the one limit. What does “8928” relate to? 1day 43mins and 12 secs for an emonTx power feed or 6days 4hours, 19mins and 12secs for an emonTH temp reading, seems a bit random.

While we’re at it, the “8928” limit, should ideally be applied after the “skip missing” is applied so that you can get “9828” valid datapoints, rather than “9828 intervals” worth of data with all the nulls removed, as that could result in very few datapoints if the data is patchy.


HTTP	Browser	Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Screen	Resolution	1920 x 1080
Window	Size	1601 x 906