Compression with EmonHubEmoncmsHTTPInterfacer?

sean · 15 December 2018 17:19

I’ve just started to use the EmonHubEmoncmsHTTPInterfacer interfacer on a local Pi to send readings to a remote, private emoncms site. I notice that it is sending the data as an HTTP GET request with the data just specified in the URL itself:

2018-12-15 17:15:18,648 INFO     emoncmsorg sending: https://example.com/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y&data=[[1544894094.433681,21,2.992,0,20.900000000000002,49.400000000000006,17.6,-48],[1544894101.370502,5,0.045,0,0,0],[1544894102.083283,22,2.954,22,40.400000000000006,-55,-33],[1544894105.405525,21,2.992,0,20.900000000000002,49.400000000000006,17.7,-50],[1544894116.451357,21,2.992,0,20.900000000000002,49.400000000000006,17.7,-45]]&sentat=1544894118
2018-12-15 17:15:18,795 DEBUG    emoncmsorg acknowledged receipt with 'ok' from https://example.com

I intend to use this feature for sensors in a remote location with only a limited internet connection, so I want to avoid using unnecessary data. I imagine sensor readings would be easily compressed, so is this something that is possible here? Perhaps with an HTTP POST request containing zipped data instead?

sean · 17 December 2018 21:13

Looks like this is not available yet, but seems simple enough to add by editing both EmonHubEmoncmsHTTPInterfacer.py on emonhub and input_methods.php on emoncms. I guess it would be possible to implement gzip compression since this is a standard HTTP tool that should be available on both the Pi and a PHP server.

Is there any plan to add such a feature? I could try implementing it and submit pull requests if no one is planning to work on this.

TrystanLea · 17 December 2018 21:51

Hello @sean sorry the logging isn’t clear, the content is sent in the post body emonhub/EmonHubEmoncmsHTTPInterfacer.py at emon-pi · openenergymonitor/emonhub · GitHub but still not compressed as you suggest.

Thank you for the suggestion and I appreciate your offer of trying to implement, that would be great, it’s not in our development plan as far as I know, interested to hear what kind of compression level you can achieve?

sean · 17 December 2018 22:14

I might give it a go. I noticed that the emonpi SD card uses git to get the code - is it simple enough to check out my own branch on the Pi and modify it?

Not sure about compression levels yet, but I bet it will depend on how long a gap you leave between readings, and how much the data varies. It seems this is a specialist topic (including for NASA, I guess for spacecraft), so perhaps once the groundwork in is place we can investigate fancier algorithms than gzip.

stuart · 18 December 2018 13:51

Compressing the above URL with GZIP returns about 50% saving - 223 bytes vs 420 bytes

If you compress the HTTP POST data the receiving webserver should automatically decompress the stream and therefore transparent to the receiving code.

You should be able to make the change with a couple of lines of code

post_body = zlib.compress( post_body )
reply = self._send_post(post_url, post_body)

You will also need to send the correct HTTP header - content-encoding = gzip

sean · 18 December 2018 14:17

Ah, I didn’t realise that the target server decompresses data if you specify the right header, but in hindsight that makes perfect sense. Thanks for the tip!

I’ll give this a go soon.

Still wondering if there is an easy way to check out my own branch on the emonSD image - is it as simple as git checkout -b new-branch, or would that break something?

stuart · 18 December 2018 14:53

If you clone the existing repository into your own github account you can then checkout that version overwriting the local one.

I’m no github/git expert so others will have to indicate the command line to use!

sean · 18 December 2018 14:58

I think the pre-built SD card image has a local check-out of the official GitHub repository. I can as you say checkout my own version, but I wonder if there are any init scripts etc. that need to be run when this code is checked out, for example.

Basically I’m asking for a guide to developing on the emonPi

sean · 18 December 2018 21:51

Ok, @TrystanLea, I have made two pull requests to add compression support:

emoncms #1143
emonhub #70

I’ve tested this and it works nicely: using a 5 minute interval, the data payload size I send (approximately 50 readings) is reduced from 2741 to 1172 characters (57%). I expect with longer intervals the compression factor will increase.

stuart · 19 December 2018 09:43

That was quick @sean

I assume the HTTP Header didn’t work then?

TrystanLea · 19 December 2018 10:07

Nice work @sean! i will run this on a system here to test and modify to disabled by default for now, we could re-evaluate enabling it as default later.

sean · 19 December 2018 10:48

No, it turns out that it web servers generally don’t recognise gzipped post data and decompress it. I also found conflicting arguments about whether it’s even allowed to set Content-Encoding headers on requests (only responses) so I avoided this by instead specifying a request flag.

sean · 19 December 2018 11:02

Great! Please go ahead and push the changed default setting to the PR. Let me know how you get on with the tests.

BTW, in general the Python code for the interfaces could do with a little refactoring, if you don’t mind me saying so. I wanted to implement the compression flag common to all interfacers (i.e. in EmonHubInterfacer), allowing the compression to be used on not just the EmonHubEmoncmsHTTPInterfacer but others too if ever there was a need, but this proved difficult because of the way the requests are built. For example, the HTTP post body is built directly in EmonHubEmoncmsHTTPInterfacer and passed to EmonHubInterfacer which builds the rest of the request. This means I would have to either compress the whole post body if I added the compression code to EmonHubInterfacer, since it receives as an input the fully constructed post body, or break up the post body into key -> value pairs and compress the sensor readings directly, then recombine them into a string again! The former would confuse the target PHP server, which expects a well-formed key/value pair in the post body, and the latter would be too hacky an approach, needlessly splitting and unsplitting a string into a dict and vice versa.

I think it would be better if the EmonHubInterfacer class handled all of the building of the HTTP request, including creating a post body key -> value mapping. The good news is that there is a great library for this - requests - which is even recommended to use in the Python documentation. It makes building HTTP requests pretty simple, and ensures headers, encoding, etc. are correct for what you want to send. All you have to do is provide it with a dict mapping keys to values for the post body. This means the individual subclass interfacers like EmonHubEmoncmsHTTPInterfacer can provide the key/value map to EmonHubInterfacer, then the code there can decide whether to compress any of it or not. This would allow the compression code to be used for other types of interface in the future.

More honest feedback: the configuration stuff is a bit clunky. I recommend you look at using the standard configparser library which supports hierarchical settings, allowing you to specify defaults but allow the user to override individual settings. There are also a few things that technically work but are not considered best practice in the Python world, like the use of 1 and 0 as flags (best use True/False), the use of str.__len__ instead of just len(), etc.

TrystanLea · 19 December 2018 11:46

Looks like its working nicely here

I had to put str() round the setting to start with to get it to work but after that it worked fine:

self._log.info("Setting " + self.name + " compress: " + str(setting))

In order to test I added additional logging to check the compression ratio and verify that it was running the compression, the url encoded output was also confusion so I changed the log to show the content prior to url encoding as you suggested.

I found that in the case of a single item in the buffer the compressed size is larger than the non compressed size so have added the check to use compression only when it does reduce the data size:

    # Construct post body
    post_body_data = {"data": data_string, "sentat": sentat}
    
    if self._settings['compress']:
        # Compress data and encode as hex string.
        compressed = zlib.compress(post_body_data["data"]).encode("hex")
        compression_ratio = 100 * len(compressed) / len(post_body_data["data"]);

        if compression_ratio<100:
            post_body_data["data"] = compressed
            # Set flag.
            post_body_data["c"] = 1
            
        # Log compression ratio
        self._log.info("compressed data size " + str(compression_ratio) + "% of original")
    
    post_body = urllib.urlencode(post_body_data)
    
    # logged before apikey added for security
    self._log.info("sending url:" + post_url + "E-M-O-N-C-M-S-A-P-I-K-E-Y, body data:")
    self._log.info(post_body_data)

The result in emonhub.log:

Buffer size: 1

2018-12-19 11:36:20,240 DEBUG    emoncmsorg Buffer size: 1
2018-12-19 11:36:20,244 INFO     emoncmsorg compressed data size 162% of original
2018-12-19 11:36:20,245 INFO     emoncmsorg sending url:http://192.168.0.132/emoncms/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y, body data:
2018-12-19 11:36:20,247 INFO     emoncmsorg {'data': '[[1545219379.938908,10,166,928,0,0,252.49,21.75,300,300,77281,41181,0,0,0,-55]]', 'sentat': 1545219380}

Buffer size: 5

2018-12-19 11:36:50,354 DEBUG    emoncmsorg Buffer size: 5
2018-12-19 11:36:50,356 INFO     emoncmsorg compressed data size 77% of original
2018-12-19 11:36:50,357 INFO     emoncmsorg sending url:http://192.168.0.132/emoncms/input/bulk.json?apikey=E-M-O-N-C-M-S-A-P-I-K-E-Y, body data:
2018-12-19 11:36:50,358 INFO     emoncmsorg {'c': 1, 'data': '789c75d0eb0dc3200c04e0855ccb3edb606689b2ff1a7589449236e5f1e704a70fb64dc3033a2c07e768dd9554489bd34092d444803de51c2028f7201399bb7724c855d3e605a157c44eab79184b58c209568b5de4bb0db7c8c885b57ab49afcda54c666185846d3656cfad7d59e5c2e51ef02e470e1c725e4c1760beb28e72d296c299b5c7b3f4a54f5a1d4fac9762aed4169533996d2f7fd0dccec50be', 'sentat': 1545219410}

Buffer size: 166

2018-12-19 11:56:23,625 DEBUG    emoncmsorg Buffer size: 166
2018-12-19 11:56:23,629 INFO     emoncmsorg compressed data size 43% of original

Would you be happy with these changes?

TrystanLea · 19 December 2018 11:53

@pb66 and I had a really long discussion on proposed emonhub development earlier in the year here: https://community.openenergymonitor.org/t/emonhub-development/6432/44 Unfortunately neither of us have had time to pursue it further since that discussion due to focus on other parts of the project. It would be great to get your input on it, if you have the time sometime to read that thread and post a response that would be much appreciated.

sean · 19 December 2018 12:17

Excellent - nice feature! I figure that the main benefit will come when sending large chunks of data when the interval is set to a higher threshold - surely in 30 mins worth of data there will be plenty of repeated characters that can be compressed to a greater extent.

Also worth noting that one can control the compression ratio that zlib uses with an additional flag - perhaps this can be exposed as a user setting too. The default is 6 and this is usually reasonable, but at the expense of extra computing power it can be pushed further and get better ratios.

sean · 19 December 2018 12:18

Sounds good - I will read though the thread and maybe over the holidays I can submit a pull request or two to clean some of the niggling issues up.

stuart · 19 December 2018 12:28

Microsoft IIS and Apache certainly can automatically decode, for instance

http://hype-free.blogspot.com/2007/07/compressed-http.html

sean · 19 December 2018 12:48

Rest assured, I would have preferred the way you suggest, but it seems like it would create lots of edge cases for differently configured HTTP servers. This automatic decompression behaviour is not part of the standard, only a feature of particular web servers. From that page you link:

However, the fact that the client is the first to send, means that there is no way for the server to signal its (in)capability to accept gzip encoding. Even the fact that it’s Apache and previously served up compressed content doesn’t guarantee the fact that it can handle it, since the input and output filters are two separate things.

That means that the emoncms PHP code would still have to try to identify whether the content has been automatically inflated or not, and even then, it would then have to manually populate the $_REQUEST key/value superglobals for the benefit of the rest of emoncms because PHP normally does this for you if the post body is uncompressed. And because emoncms dumps these superglobals into its own data structure quite early in the request, this repopulation of the superglobals would have to be done even earlier in the request, not necessarily in the place I did put the code. All of this just made things more difficult than doing the inflation in the PHP code directly.

borpin · 19 December 2018 12:56

@sean @TrystanLea

Might it be an idea to update the API information with the optional parameter? I think that this could be used by anyone using a bulk update of data.

[edit]
I’d also favour using a full name rather than just ‘c’ for clarity (or offer either - case insensitive).