Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blips in MS5611 sensor data #23

Open
hpmax opened this issue Jul 30, 2020 · 79 comments
Open

Blips in MS5611 sensor data #23

hpmax opened this issue Jul 30, 2020 · 79 comments

Comments

@hpmax
Copy link
Contributor

hpmax commented Jul 30, 2020

When I turn on my OpenVario, I will see a lot of +/- blips on the vario, every few seconds, with a magnitude in the neighborhood of .5 m/s.

I decided to record the data out of the MS5611s and found that upon boot up, I'd see a spike high (of about 35-40 Pascals) followed by an immediate low spike of perhaps 10-15 Pascals. The high pulse lasted about 2 samples, and the low pulse lasted perhaps 10 samples.

Interestingly, the effect occurred on BOTH MS5611s simultaneously. and it repeated every 92-94 cycles (it varied).

I killed the daemon and restarted it, and the blips mostly or completely went away. I recorded twice, once for over 20 minutes and never saw a blip, once for over a minute and only saw one blip. This seems almost completely repeatable for me.

Blue trace is the restart, the other 3 were what it does on bootup.
blips.zip
blips

@mihu-ov
Copy link
Member

mihu-ov commented Jul 31, 2020

These blips (and incorrect pressure reading spikes) are an unsolved mystery and further investigation would be more than welcome. One known issue is RF noise from the VHF transmitter but this seems to be something different.

@hpmax
Copy link
Contributor Author

hpmax commented Jul 31, 2020

@mihu-ov My glider has a Dittel FSG-2T but the unit was off. It is also equipped with a Stratux (which has no transmitter), and a SoftRF unit, both of which were on. The SoftRF is capable of transmitting at 900-ish MHz although its a pretty low power output. I also have multiple switching converters going from 12V to 5V to among other things power the SoftRF and Stratux... But their state should have been unaffected by stopping an re-starting the daemon, so this seems unlikely.

So you are saying this is a known issue? I'm surprised more hasn't been done by this, this seems like a pretty significant problem. The blips do get through the Kalman filter and are quite noticeable.

@mihu-ov
Copy link
Member

mihu-ov commented Jul 31, 2020

In my setup I have no issue with blips and only minimal effects when transmitting on VHF. In one of our club gliders (DG300) there is an occasional blip and vario goes wild when transmitting. Your traces are probably an extreme case - that´s great for investigating the problem (RF issues or anything else).

@hpmax
Copy link
Contributor Author

hpmax commented Jul 31, 2020

Again, I see absolutely no reason to believe this is RF (at least from outside the OV) related. My VHF transceiver was off at the time. Those waveforms are 100% repeatable. When I start the OS, the red, orange, and purple traces are what happen. You can see it is regular. They are occurring every 92 to 94 samples. When I kill the process and restart the blips either mostly or completely go away. I do not see how an external device could be impacting that.

blips2

Blue is TE, Orange is static... out of the same data file.

@mihu-ov
Copy link
Member

mihu-ov commented Jul 31, 2020

I agree, I don´t believe your case is RF induced.
PX4/PX4-Autopilot#7311 descibes a similar effect
https://px4.github.io/Firmware-Doxygen/d6/d17/ms5611_8cpp_source.html is the code used in PX4
Maybe a comparison what they do differently would make sense?
Is there any difference in timing of sensor reads during the blip? What is the sensor temperature readout at each sample?

@hpmax
Copy link
Contributor Author

hpmax commented Jul 31, 2020

There are some signs that point to this, but others that don't:

  1. If the problem was inadequate wait between conversion and read out, its likely it would affect both sensors. It does.
  2. If a temperature conversion was corrupted that temperature data is used for something like 10 pressure samples before it's re-sampled. That's about the length of the blip.

None of this explain why only temperature conversions are corrupted or why it's periodic, or why restarting makes the problem go away.

But here's the other thing... Even if the data is somehow corrupted, why is it consistently increasing for a couple samples and then consistently decreasing for more samples. I mean, okay, maybe we have a bad temperature conversion, this results in bad compensation value, and the next pressure read is consistently high, and the temperature compensation corruption is consistently decreasing it until the next temp conversion... but there are multiple high samples prior to multiple low samples, and the shape isn't very consistent.

Anyway, I'll try a few experiments:

  1. modify the code to dump out the measurement counter, and the temperature
  2. increasing the usleep time to say 20ms.
  3. Record time between conversion and read.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 1, 2020

Okay, here's what I found:

I tried everything listed above and found nothing of use. Increasing the usleep time may have decreased the magnitude of the problem. The problem occurred at random meas_counter values, and was not tied to a temperature. I tried adding a usleep at the beginning, and tried resetting in the middle of the run. And while the time between conversion and read was not terribly consistent, it was always longer than the requested usleep value. In other words, that whole exploration was a bust.

But, I did find something very interesting out.

If sensord is started with: systemctl start sensord.service (or restart) it has the blips
If sensord is started with: /opt/bin/sensord -c /opt/conf/sensord.conf it does NOT have the blips.

This seems 100% repeatable. Doesn't matter if the service is disabled or enabled, doesn't matter if it's started, stopped, started again, or restarted, or whatever. Similarly, when I run it from the command line, it appears to run clean.

If I edit the /lib/systemd/system/sensord.service to change it from Type = forking to Type = simple, and change the command line to /opt/bin/sensord -f -c /opt/conf/sensord.conf it does NOT have the blips.

I disabled the service and added: "/opt/bin/sensord -c /opt/conf/sensrord.conf" to the ovmenu-ng.sh script and it appears that is a workaround.

So something about systemctl type=forking is causing the problem. My solution is clearly a hack work-around, but hopefully someone who knows Linux better than me might have some ideas.

@kedder
Copy link
Member

kedder commented Aug 1, 2020

Is sensord printing anything to stdout periodically?

@hpmax
Copy link
Contributor Author

hpmax commented Aug 1, 2020

I'm not sure what you mean. I didn't pipe stdout to anything, and if it's run through systemctl I won't see stdout. I don't recall seeing anything when running it from the command line, and wouldn't think I would unless I put in -f mode, which I didn't try. What were you expecting/looking for?

@kedder
Copy link
Member

kedder commented Aug 1, 2020

Just trying to think up anything that might be different between running from systemd and console.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 1, 2020

I don't know, I was talking to a friend who is way more Linux knowledgeable than me, and he was stumped too, but thought maybe its run with different privileges or environmental variables or something. But since it only fails as "type=forking" that doesn't make much sense... And I'm completely stumped why I'm the only person who seems to be having this problem.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 1, 2020

If you can think of something specific to try, I can certainly perform an experiment, but I went through everything obvious. I still have no idea what is causing the erroneous readings let alone why systemctl has anything to do with it.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 2, 2020

I figured something out...

The foreground code writes all the log data to stdout which is the terminal:
// open console again, but as file_pointer
fp_console = stdout;
stderr = stdout;
setbuf(fp_console, NULL);
setbuf(stderr, NULL);

	// close the standard file descriptors
	close(STDIN_FILENO);
	//close(STDOUT_FILENO);
	close(STDERR_FILENO);	

The daemonizing code writes all the log data to sensord.log:
// close the standard file descriptors
close(STDIN_FILENO);
close(STDOUT_FILENO);
close(STDERR_FILENO);

	//open file for log output
	fp_console = fopen("sensord.log","w+");
	stderr = fp_console;
	setbuf(fp_console, NULL);

I tried replacing the open file for log output from the daemonizing version to the foreground version and then ran it as a daemon through systemctl (type=forking) and... it looks like it worked.
blipsnew


Unfortunately, I also noticed a different kind of blip that I hadn't seen before. It only affects the static sensor, always seems to occur on both meas_counter==6 and 26 (and only those two) and its much larger. In fact, the result almost seems random although more often than not its over 123000 and gets ignored by the pressure_read routine.

@mihu-ov
Copy link
Member

mihu-ov commented Aug 3, 2020

Wow, great investigation, thank you! Ignoring obvious outliers was a rough hack but finding the root cause would be much better. There is some interesting reading on the effects of sample rate jitter on https://www.freesteel.co.uk/wpblog/2014/11/26/ms5611-altitude-sensor-very-sensitive-to-own-working-temperature/ and also http://blueflyvario.blogspot.com/2012/06/latency-and-timing.html

@hpmax
Copy link
Contributor Author

hpmax commented Aug 3, 2020

I didn't get much out of the bluesky thing. A lot of his blog seems to be him noticing that filters can cause latency. Stuff like that is well understood, and manageable. The freesteel one on the other hand was definitely interesting. That doesn't look exactly like I am seeing, but its definitely similar. When I was looking at timing data I was looking for a short time, not an irregular time. All the times are irregular (although typically by less than 1ms) and by looking at the data I've already collected I can't see an obvious pattern (but the data I have is time from conversion to read, not from conversion to conversion).

I think others have pointed out that the way we are accessing the sensors isn't great. It's very difficult to precisely control timing on a non-real time system, and to do this properly the conversions would have to be done in hardware. Despite that, I think we could better improve timing control if that was necessary.

One obvious thing would be to monitor the time between samples and adjust usleep to better target a 12.5 ms interval. In other words, check the time immediately after waking up from usleep, and then check the time right before going to usleep, and adjust the usleep time to be target time-awake time.

Another possible thing would be to run this as nice -10. If for some reason timing is getting screwed up by a disk access (as might be implied by the fact that not writing to the log seemed to fix the problem), I'm not sure that changing the nice level will help since the disk access might have higher priority than even nice -19 (or whatever maximum un-niceness it).

What really bothers me is that I seem to be the only person who is affected. It's hard to believe... I'm using the Stefly hardware and the stock 20193 release and seeing the problem.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 3, 2020

Alright, I'm out of ideas at this point.

I tried nice -15 it didn't make much if any difference.

The glitches that I described two comments ago, were caused by the code change, which is what fixed the blips.

I'm out of ideas, it seems these things are finicky. As a workaround, I am successfully running it with systemctl as Type=simple with the -f command line option. No glitch, no blips, and its only a minor change (edit to /lib/systemd/system/sensord.service) and it's good-to-go. That's how I plan to do it going forward.

I did notice we are recalculating the sensitivity and offset (the MS5611 internals) each time we read the pressure. Since these are uniquely tied to data that is calculated after a temperature read, it seems like it makes more sense to calculate these after each temperature read. This shouldn't result in incorrect results, we're just repeating calculations that don't need to be repreated.

I still would like to experiment with higher sampling rates (and averaging) and improving the accuracy of timing. If we sampled on each 12.5ms clock, in theory that would reduce the standard deviation of the error by a factor of 2 with no major cost. Another question is... is there a good reason why we are putting out updates every 50ms. Does a vario actually benefit from being that fast, or would it be worth trading off some response rate for better filtering (albeit, I don't have a real good concept on the response rate of the Kalman filter).

@kedder
Copy link
Member

kedder commented Aug 3, 2020

If removing logging removes the blips, I'd suspect filesystem write cache flushes for this effect. Just for fun, try to redirect log to /var/volatile, e.g. like this:

ln -sf /var/volatile/log/sensord.log /sensord.log

and restart the device. Will it make any difference?

@kedder
Copy link
Member

kedder commented Aug 3, 2020

Also, what is the contents of that log, actually, when you see the blips. Is sensord writing to that log when blips are happening?

@hpmax
Copy link
Contributor Author

hpmax commented Aug 3, 2020

Its not clear to me that removing logging fixes it. But I can definitely try the symlink next time I try.

I don't think it's writing anything to sensord.log... Just a bunch of stuff when it's kicked off. There's a -r command line option (or something like that which records the output. That's what I was using. I modified the string -r puts out. I can't tell that the -r write is affecting things. I don't think much of anything is being written to sensord.log (/home/root/.xcsoar/logs/sensord.log).

In general I have not tried correlating sensor data to vario output. But I have no reason to believe the blips in the raw data weren't causing the vario blips. When the sensor is "quiet" there are no vario blips.

@iglesiasmarianod
Copy link
Contributor

Hi guys, had a similar issue some time ago. Periodically one of my pressure sensors spiked the vario. Not 5 m/s but a solid 2 or 3 m/s every 3 or 4 seconds. In my case it was a bad solder joint.
Removed both sensors, resoldered them and my problem went away.
Perhaps a resolder of both sensors might help you too.
Another thing I would try is removing them one by one to see if one of them is faulty.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 4, 2020

Thanks for chiming in. Where did you get the hardware (and when)?

The blips I was seeing on the vario were about 0.5 m/s tops. I assume you didn't get to the point that you looked at the data as closely as I did (ie could confirm the simultaneous blips on static/TE and that the blips looked like my blips)...

@iglesiasmarianod
Copy link
Contributor

You're welcome!
I've built my own boards following the schematics. This was around november/december 2019.
I had three assembled boards at that time and only one did that strange vario twitch.
I tested the boards with i2cdetect only.
I didn't dug deeper because the other boards worked OK with the same mainboard and same software image.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 4, 2020

I was using the stefly hardware, which I presume was assembled in a factory, so it seems unlikely there was a soldering issue. I was thinking if you had ordered from Stefan that perhaps there might have been a bad batch.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 4, 2020

I recorded the NMEA output and it was a bit disturbing when I looked at the graphs:

firste

This is a recording of the POV,E statement (i.e. vario output).

I have no idea what that hash is from 6500 to 10000.
There are also 19 spikes to over 4 m/s. I don't actually understand this. I don't recall hearing any of the spikes while recording the data... and for that matter, I wouldn't have expected this much data. I would expect 20 per second, and that implies I recorded over an hour, and I think I recorded for about 15 minutes max. So... wuuuhat?

After 70000 it gets noisier. At some point toward the end (which I assume is that portion) I re-started the daemon and turned on temperature readings (see my next PR). Now, I did something a bit stupid here. I initiated the temperature conversion AFTER initiating the MS5611 conversions. That's probably a no no given the comments made on the Freesteel website. I tried a different variant which initiated the temperature conversion before the pressure conversions and...

std deviation with no temp conversions ~ .019
std deviation with temp conversion (initiate before pressure) ~.0315
std deviation with temp conversion (initiate after pressure) ~ .034

As you might expect, standard deviation of temperature increases from .06 degrees C when initiated before pressure to .11 degrees after.

And yes, when looking at the data in the second run which was only 4582 samples (4 minutes?) there was still a single blip -- with a magnitude of about +/- .15 mbar in the static (this of course is impacted by the IIR filter)

I'm thinking all these things are dumping noise into ground and it is sensitive to the quality of the ground.

Right now, we are performing both MS5611 conversions simultaneously reading it out and then waiting. There's no reason why we can't interleave the pressure readings.

The more I am looking at this, the more it feels like the hardware should be redesigned.

@iglesiasmarianod
Copy link
Contributor

@hpmax I assume you recorded a sensord log file and are tracing it with excel, right?
I've recorded a log file some time ago and uploaded it here:

Openvario/variod#15

It is nearly at the bottom of the thread. Perhaps yo can analyze my file and see if my device also has these spikes.
Or just tell me what kind of log you need and I'll se how to get it and send it to you.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 4, 2020

All the plots were done in Octave (GNU copy of Matlab). Probably could have used Excel or LivreOffice, but I like Octave.

The first few plots were from data out of sensord, using the equivalent of the -r command line option (ie raw and unfiltered). The last plot was from "Debug" of the NMEA stream from XCSoar with a Perl script to strip out unnecessary values.

I'm about half done a re-write that will poll the sensors more frequently, regularly, and fully interleaved.

I really hope I'm not trying to chase down a hardware problem in software. I don't understand how others don't seem to be running into this problem.

@iglesiasmarianod
Copy link
Contributor

Great! Thanks!
My log is raw sensord -r dump. It has around 3000 samples.
I just loaded it in Excel and saw two spikes with a shift in the mean value.
This is how it looks.

image

@hpmax
Copy link
Contributor Author

hpmax commented Aug 4, 2020

So that doesn't look like the "blip" to me but it does look like it might be a "spike". The blips were in the magnitude of .3-.5. I never saw one that large and they always started with a positive component. But the spikes that were happening more irregularly do look kind of like that.

Also interestingly, yours looks like it jumps downward afterward.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 16, 2020

I made a few variants of the software to test different sampling strategies. Running the daemon from the command line, I used:
/opt/bin/sensord -c /opt/conf/sensord.conf -s -r /home/root/testdata

I've never seen the chips below 30C, so the second order compensation doesn't do squat, but I see little reason not to use it just in case. I'm not even sure why this is set as an option to turn it on. It should just be on, there's virtually no penalty when you don't need it, and the penalty is well worth it if you do.

I took a moving average (+/- 32 samples) and subtracted that out from the original data to try to remove the sensor drift.

My three sampling strategies were:

  1. What we do now. Convert both static and TE, wait 12.5ms, read out, then wait another 37.5ms, record values, and then wait 12.5ms and repeat the whole cycle.
  2. This alternates between reading out static and then starting a conversion of TE with reading out TE and starting a conversion on static. Every 40 cycles I do a simultaneous conversion on temp for both sensors, but other than that, one sensor is converting, and the other is sitting idle in each time slot.
  3. Do a conversion on both sensors simultaneously, each time slot, except the 40th slot in which you do a temperature conversion.

There should be a benefit to running the sampling algorithm faster. Assuming the noise is uncorrelated and IID, the standard deviation will go down with the square root of the number of samples included. In other words, if you take 4000 samples and average them out into 1000 bins of 4, the standard deviation should be half what it was if you only had 1000 samples taken at 1/4th the rate.

When I took the standard deviation of the static and TE sensors, I got:

  1. TE = 1.3915 Pascals, Static = 1.4394
  2. TE = 1.4267 Pascals, Static = 2.1465
  3. TE = 2.0200 Pascals, Static = 2.0301

Side note: These numbers were calculated based on "digoutP", "sens", and "off" in Octave. Sensord's p number is slightly higher standard deviation (0.02 - 0.03 Pascals). I assume this is because of quantization error by doing all integer calculations.

So, now I tried averaging (2 in the case of #2, and 4 in the case of #3)
2) TE = 1.0876 Pascals, Static = 1.8383 (should expect 1.0088 and 1.5178 if uncorrelated)
3) TE = 1.5074 Pascals, Static = 1.5023 (should expect 1.01 and 1.0151 if uncorrelated)

Conclusions:

Clearly, the noise is NOT uncorrelated. Looking at the data the results with a ~1.4 STD seemed to show an interference pattern that looked a little like envelope modulation. The ~2.1 STD showed what looked like a triangular waveform superimposed on it It was very distinct, and fairly large. It is bizarre to me that the two MS5611s behaved differently in sampling variant #2.

I did not see any of the previously described behaviors this time. I collected 92800 samples on variant #1, 28300 samples on variant #2, and 103700 samples on variant #3.

Worst cases were (based on subtracting the moving average):

  1. TE = 6.230769/-5.923077 Pascals, Static = 5.846154/-6.446154 Pascals
  2. TE = 7.769231/-5.538462 Pascals, Static = 8.492308/-6.615385 Pascals
  3. TE = 11.40000/-7.400000 Pascals, Static = 10.569231/-7.661538 Pascals

I'll also point out something interesting about the AMS5915. While datalogging, I have sometimes seen absurd dynamic pressures -8 billion or something like that. I have to assume this is some sort of numeric nonsense, since the sensor is limited to 50mbar (5000 Pascals). So I was trying to get to the bottom of it. So I checked over the ams5915.c code. It appears to be implementing things correctly, but it seems like they could have been implemented more intelligently. I wish I could catch it outputting something funny. But it did output something funny. My dynamic pressure is -0.3318 Pascals with an output word of 1551. The datasheet provides the following formula (1551 - 1638) * 50 /13107 = Pressure which is exactly what is calculated -- except that 1638 is supposed to be 0 pressure. Kinda weird. I guess its possible it works slightly below 0 and this is just some sort of sensor error/drift, but it has been pretty consistent down there.

I have a few more experiments on tap. I'm going to make the following variants:

  1. A variant on fix disconnect-reconnect #1 in which static and TE both sampled every 4 slots, but there is a 2 slot offset between them.
  2. A variant on the one above, but with the order of sampling reversed.
  3. A variant on frequent disconnects on the socket with XCSoar or variod #2 in which the order of sampling is reversed.
  4. A variant on sensord -p replay data from file, not doing what it should #3 in which the number of slots before temp sampling is increased.

If anyone else has suggestions on experiments, I'm all ears.

@mihu-ov
Copy link
Member

mihu-ov commented Aug 16, 2020

Thank you for putting such an effort in getting the best data out of our sensors! Frankly, I don´t understand all the details yet but from a first try in interpreting your data:

  • With hardware and software "as is" (your case 1), standard deviation of both sensors is ~1.4Pa (0,014mbar). At MSL this would equal about 0.11m difference in altitude and is perfectly in line with reports from other people who have tested the MS5611 (mostly for UAV projects).
  • Resolution acc. to MS5611 datasheet is 0.065 / 0.042 / 0.027 / 0.018 / 0.012 mbar with Oversampling Ratio: 256 / 512 / 1024 / 2048 / 4096. IIRC we use 4096x oversampling and the SD you measure is in the same order of magnitude as sensor resolution. Please correct me if I am wrong but acc. to https://www.sml-inc.com/uncertainy.htm the minimum SD possible with this resolution would be 0.012mbar / SQR(3) = 0.007mbar (0.7Pa). Again, the 1.4 Pa SD you measured (2x theoretical minimum) seems to be quite OK to me?
  • Comparing your measured SD with the graph you showed before (Blips in MS5611 sensor data #23 (comment)) also fits quite well.
  • Reading between the lines I assume the blips that started this topic are gone? How did you get rid of "Unfortunately, I also noticed a different kind of blip that I hadn't seen before. It only affects the static sensor, always seems to occur on both meas_counter==6 and 26 (and only those two) and its much larger. "?
  • For the AMS5915: You have a negative offset on your sensor, same as most AMS5915 I have tested. There´s nothing wrong with that and the datasheet formula already accounts for the possibility of negative offsets by shifting the nominal zero point to 1638 digits.
  • AMS5915 blips: I have sometimes seen absurd dynamic pressures -8 billion or something like that. Can you test if this is a bad sensor reading or some issue from sensord integer calculations?

So my current summary would be - please correct me if I am wrong:

  • Both MS5611 perform as expected with current hardware and software if we can avoid the blips
  • Avoiding blips will require software changes (please summarize!)
  • AMS5915 blips happen sometimes (probably not frequently?). Need to find out if this is bad raw data from sensor or issues from integer calculations

@hpmax
Copy link
Contributor Author

hpmax commented Aug 20, 2020

It might be overkill, but I figure... future expansion. It's hard for me to gauge how real-time a task audio output is for instance.

The next question is do we change the sensors...

If we replace the the MS5611's the best choice seems to be the LPS22HH. It might be a better, I'm not sure.
The best replacement for the MPU9150 seems to be the LSM6DSM, however that does not have a magnetometer. Possible magnetometer choices include the MMC5983A, LIS2MDL, and AK09940. The MMC is allegedly the best of the bunch. These would definitely be better than the MPU9150 or 9250.

I am fairly happy with the AMS5915, and don't see many other good choices on this one, but I also don't see where you can buy them.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 21, 2020

Okay, I've had it running for several hours and am relatively happy with the results, I think it probably could still use some improvement, and it definitely could use a critical eye, but here goes:

First, I changed the ms5611 routines slightly. The Sensitivity and Offset calculations were moved into the read_temp routine instead of read_pressure. It really makes no sense to have them in read_pressure. The sens/off variables should only change as a result of change in temperature. So why not calculate it after doing a temperature read rather than after a pressure read. Second, I separated out the read_pressure and the compensation routines (i.e. there is now a ms5611_calculate_pressure). Third, the sens/off calculation in read_temperature is not always performed, it depends on the value of another parameter passed into it.

As for the pressure_measurement_hander:

I add a new variable called "glitch". Glitch is normally 0. If glitch is 0, I record the raw temperature data (D2, sens and off) for each sensor into a 8-slot circular buffer, based on the last read value.

If too much time passed (>3 ms beyond anticipated) is detected, I take the average of all values in the circular buffer (if glitch was already 0 when detected) and increment glitch by (4 + (glitch magnitude / 50ms)).

Measure the pressure of one sensor and temp of the other and alternate each time through based on meas_counter&1.

If (glitch)>0 {
if (--glitch==0) glitch =1; // decrement glitch each time through but don't go below 1.
if (last_read_D2 value>avg_D2-15) glitch=0; else last.read_D1 += (last_read_D2-average_D2) * K;
}
Calculate pressure value based on D1 using either the last sens/off based on D2 for that sensor, or the average sens/off for that sensor if (glitch>0)

If (glitch<2) process calculated pressure, else discard

So basically the algorithm is, if a glitch is detected save the last known good D2 (raw temp) values. Keep doing what you're doing, but compensate the D1 value by the difference in D2 between current and last known good * a constant. When reading the temperature out, you are just doing it to get the D2 value, not to update sens/off. Also, depending on how big the glitch we are throwing out the first couple pressure samples (not passing them through to the filter) since the impact of the glitch is proportional to both size and time since the glitch.

This is running at nearly twice the speed that the previous program was running sp we now need to adjust the way we are filtering, in the case of pitot and static, there are three obvious ways:

  1. Before passing to the next filter, we decimate: Y(i) = (X(i2)+X(i2-1))/2
  2. Before passing to the next filter do an FIR: Y(i) = (X(i)+X(i-1))/2
  3. Adjust the IIR to do Y(i) = Y(i-1)*7/8 + X(i)/8 instead of Y(i-1)*3/4 + X(i)/4

The third option produces slightly better results, although I suspect it has a slightly longer time constant than the others. I'm inclined to go with option #3 because in many ways its the simplest.

Like the IIR, the Kalman filter can just be run more often, although some tweaking of the gain may be needed.

Standard deviation pre-filter is about 0.86, post IIR filter about 0.28 Pascals. Worst case is about +/- 6 Pascals pre-filter and +/- 1.4 Pascals post-filter.

Some tweaking will be needed for the nmea_message handler may be needed for the meas_handler. Deciding when to dump out data would be good.

Anyway, this is the basic algorithm. Unfortunately, the devil is in the details, and it's easy to miss something and use old data when you're not supposed or vice versa. Actual code is uglier than the psuedo code. I'm not sure my glitch discard is working correctly, and I'm also not sure what K is supposed to be. I picked .297 (30/101) based on the data I collected before. I'm concerned that it may be sensor specific and/or temp specific. I.e. there are two compensation routines (TE_D1 += (Stat_D2-Avg_Stat_D2)*K1) and (Stat_D1 += (TE_D2-Avg_TE_D2)*K2) if the K values are different it means I will need a calibration routine to determine the correct K values. Probably not a bad idea either way.

Another question I just thought of is that perhaps we don't want to just read the temperature value and calculate it, but rather either use a rolling average (FIR filter) or an IIR filter. Might be worth checking if this reduces noise a little. I presume the same amount of noise is in the temperature measurement as the pressure measurement, so filtering the temperature measurement might actually work well with this. Plus, it works really well with this scheme (it actually makes it simpler than my current method). I guess I should try that.
Comments?

@hpmax
Copy link
Contributor Author

hpmax commented Aug 22, 2020

I think I am close to done messing with it...

histogram

Standard deviation of Kalman output (TE pressure) improved from 0.02 m/s to 0.0144 m/s.
Standard deviation of IIR output (static pressure) improved from .55 pascals to .29 pascals.
Standard deviation of IIR output (pitot pressure) improved from .64 pascals to .54 pascals.
Standard deviation of (unfiltered) TE pressure improved from 1.394 pascals to 1.2312 pascals
Standard deviation of (unfiltered) static pressure improved from 1.435 pascals to 1.2576 pascals.
Standard deviation of (unfiltered) dynamic pressure improved from 1.24 pascals to 1.113 pascals (no idea why)
2 Point Averaging unfiltered dynamic pressure improves from 1.113 to .966 pascals
4 Point Averaging unfiltered dynamic pressure improves from 1.113 to .82 pascals (this is effectively equivalent to the same data rate as the baseline).

Note: Averaging (on either the input or output) improves filtered data only slightly. So there isn't too much point to averaging if you are filtering

The new program outputs pressures twice as often as the old, however the time constants should be close to the same. I'm not sure about the Kalman filter, since I don't understand the dynamics of it. It produces twice as much data as it used, to but since all output is filtered, I may just decimate the data. The Kalman filter takes < 4 seconds to settle to less than .05 m/s.

In several hours of testing, I've seen only two "blips". One of them I suspect was me closing up the trailer. The other... I don't know what it was... Pressure dropped about 4-5 Pascals and came back up, in a triangular shape over the course of a few seconds. It looked nothing like the timing jitter noise I had seen, and was perfectly correlated between static and TE.

I'm still working on getting a new Kalman filter. The new filter will use all three sensors to filter all three pressures simultaneously. That means, in theory, you will only need two sensors (albeit I'm not sure that's actually true since there is some drift between the sensors.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 24, 2020

I have posted a forked version (hpmax/sensord). I am hoping it will have substantially better noise performance but otherwise function like the current baseline. Please try it out!

@iglesiasmarianod
Copy link
Contributor

Hi guys. This is the port I was thinking might be used for the hardware sensor driver.
I believe it was probably thought for another SD card.
We have 3.3V there. ESP32 runs on 3.3V so I think we don´t need MAX3232 and we can use TTL directly.
I'll try to set up UART6 on those pins and see what happens.

image

@hpmax
Copy link
Contributor Author

hpmax commented Aug 25, 2020

Before we go too far off the deep-end of a new sensordboard, please try my new software. I think it's a significant improvement over the old one... and I think I can squeeze out more improvements (which I think may be pretty significant).

  1. Right now, my compensation scheme is to simply multiply by .297. One thing I'd like to do is change sensorcal, to continuously read off the sensor and periodically throw in delays to induce the glitches. I'd then take repeated measurements and perform linear regressions, not just first order, but second, or third, to see if I can better dial in the compensation and do it as part of the calibration routine.

2 I am hoping to roll out a new Kalman filter which I think will be better able to reject the glitches as well as allowing it to work without TE. (Note that having a second MS5611 is vital to my current compensation scheme). Incorporating both sensors in should in theory further reduce noise since we now essentially have twice as much data to work with and the noise between them should be completely uncorrelated. In other words, I'm hopeful we could reduce standard deviations another 40% or so.

However, if we do decide to go to new hardware, I think to make it worth the effort, I think we should go all out and ensure we are not only using pressure measurements but also accelerometer, gyro, magnetometer, and GPS to do a full IMU. This will require significant software effort. It may require software changes to Xcsoar to communicate IAS, TAS, compass direction, altitude, etc and/or two way communication with Xcsoar. It will require serious questions... Should the board be external? Should it have the audio in it, or a signal to drive a remote display (or multiple remote displays, or multiple XCsoars (imagine having two XCsoar units, one in front and one in back with a common sensor box feeding both).

I personally like the idea of moving it outside of OV, and connecting it via USB. I'm not sure if ESP32 supports USB though, and I'd like to see it have multiple USBs, where it could drive multiple XCSoars, as well as be able to be directly connected to a GPS. Using an external GPS receiver would give the option of upgrading as newer versions come out with better performance.

This is a big undertaking. We also should probably decide on which specific sensors to use: MS5611 vs LPS22HH for instance? MS4515DO vs AMS5915? LSM6DSM and MMC5983A for the acc/gyro/magneto?

This is a big undertaking, is everyone in?

@iglesiasmarianod
Copy link
Contributor

iglesiasmarianod commented Aug 25, 2020 via email

@hpmax
Copy link
Contributor Author

hpmax commented Aug 25, 2020

I think the MPU9150 could probably be used without much impact on what I did. Although I do not have interrupt driven behavior, my usleep's are now semi-intelligent waiting only enough time to get to when it needs to wake up again. That means even if it takes up 3 ms of time calculating stuff, the program should compensate.

John Wells did have a variant with the MPU9150. It's worth pointing out that the MPU9150 is pretty old. They aren't being sold any more, the 9250 (which is also end-of-life) has significantly better performance and the LSM6DSM is much better still. However, I think it's JUST an AHRS. In other words, the 9150 isn't aware of what's going on with the pressure sensors or vice versa, and neither are aware of what's going on with GPS. These are really pretty critical to integrate together. In fact, in theory, they should be integrated at a lower level with the GPS raw data. I think some GPS units can do that. Again, that would require some serious expertise, which I'm not sure any of us has. My friend has some, but he's not going to be willing to devote that much time to it.

I really don't have a good feeling for how much we are loading down the Cubieboard. In theory, I wish we could start again with Raspberry Pi. Because, you know, it's actively developed and we it's far newer, and incorporates WiFi and BT. Of course, it doesn't have the LVDS interface and it probably consumes more power.

I am not clear what that board you were showing was. It doesn't appear to be either the Cubieboard or the Adapterboard (at least not the one I got from Stefan). I am confused though. Are you proposing stick an ESP32 on the Adapterboard and then use that to interface to the current Sensorboard?

@iglesiasmarianod
Copy link
Contributor

That's the idea. Connecting IDC cable to ESP32 and ESP32 to cubieborad with another IDC cable.
With a driver inside ESP32 we can control sensor timing the way we want and feed cubieboard preprocessed data. If cubieboard writes to disk or has a higher priority task it doesn´t affect sensor timing.

The adapterboard I posted is the design you can download from Openvario.org.

I wanted to change I2C pins to UART pins for the IDC connector but that would lead us to loose the ADC converter (Voltage) that is soldered to adapterboard. Using that free port would allow us to have an internal UART and still have voltage readings.
Didn't know Stefan's boards where that different.

I think that enabling functions one by one can be beneficial too. We can tackle the full IMU step by step.
If we have magnetic heading we can calculate wind inside XCSoar without turning or changing course.
It is a good feature to have if you fly the wave.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 25, 2020

I thought ESP32 was Timo's idea. However,I don't really see the benefit of an adapter-adapter-board. It makes the whole contraption more mechanically complicated. Now you have to figure out a way to mount another board and route more cables.... And yes, it;s another board, because even if you don't need 3.3V because you already have it you'll still need two IDC cables and two mounting points.

Personally, I think it would be better to just redesign the sensorboard. Obviously, you'd want to design it in such a way that you could re-use the manifold block. The only real pricey part is the AMS5915 or MS4515DO, the rest are probably less than thirty dollars total. You'd still need a custom board regardless. Plus changing the sensorboard allows fewer mechanical changes, a more straightforward construction and the opportunity to upgrade to newer sensors that aren't EOL. It is pretty frustrating how much of this project is EOL. My audio amplifier apparently died, and the one in there is an AS1702 which appears to be EOL. I found one company selling it, and they have 25 left. To make things more complicated, like I said, I don't think Stefan's boards match what you posted. I don't see those holes on the board. I do see an unused header that is mounted between the IDC connector and the Cubieboard connector, but I think its smaller pitch holes than you have.

Why not just bite the bullet, and do a proper redesign with modern hardware and get all this done in one go?

Imho, the big task here is not the hardware, it's the software. It's going to be a complete rewrite of sensord, possibly of variod, and there will likely be changes to xcsoar, and all new software in the ESP32. And that is the case whether we use the existing hardware + ESP32 condom or not

@iglesiasmarianod
Copy link
Contributor

iglesiasmarianod commented Aug 25, 2020

Yes, Timo proposed ESP32, connected outside on an IGC RJ45 connector.
I'm trying to get it inside the case so I don't loose an RJ45 and have a tidier installation.
Don't have a strong opinion on staying with this hardware.
I've upgraded audio, connectors and IMU of my sensorboard myself because I couldn't find the components.
Audio died on me a couple of times because I connected it with power and shorted the output.
If we can upgrade let's do it!

@hpmax
Copy link
Contributor Author

hpmax commented Aug 25, 2020

I'm not in favor of doing a stop-gap update. Like I said, the cost to spin a new board isn't that high in the grand scheme of things, although assembling isn't that fun. I'd like to use JLCPCB but they have a limited selection of parts. :(

I think my software changes along with the other potential improvements that I'd like to roll out within the next few weeks, should be more than adequate to justify not monkeying with the hardware unless we are doing a thorough update. And that would require someone to commit to working on all the software. Preferably someone who has real experience with IMUs and Kalman filters.

Also, that brings up another interesting thing. I think Stefan audio amp is on a daughter board attached to the adapter board, it's not on the sensor board. He's also using a linear amp, I don't know if it would be better to switch to a Class D amplifier.

I wish there wasn't so much forking on the hardware. It's also worth pointing out that if we modify the hardware (whether with the ESP32 adapter, or a full new sensorboard, we're going to wind up with two software versions which are incompatible with one another.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 27, 2020

I have committed a new update into hpmax/sensord

The new change is the addition of a program called "compdata". Compdata logs data off the ms5611s and then writes it to disk. It records the data in memory and then when finished (after about 1.5 hours) it will dump it all to disk in two text files. If someone wants to download this, compile and run it for me, that'd be great. I'd like to collect data on other setups other than my own. Also, mine will possibly be out of commission for the next 1-2 weeks.

Please try to keep the air around the sensors as still as possible. Long term drift isn't going to bother me, but short term fluctuations because of movement or something may disturb it.

@iglesiasmarianod
Copy link
Contributor

Hi @hpmax. Tried to compile and run compdata to gather information but make fails.
Is it possible you have a typo here in the makefile?

all: sensord sensorcal cal

error message reads:

make: *** No rule to make target 'cal', needed by 'all'. Stop.

@hpmax
Copy link
Contributor Author

hpmax commented Aug 31, 2020

Not sure what happened there, I may not have pushed all the files or something. Try again. I also included a pre-compiled compdata which should work.

Unfortunately, my OV is down for repairs so I can't test it, I'm hoping to get it fixed Monday or Tuesday.

@iglesiasmarianod
Copy link
Contributor

@hpmax I used your .bin file. The typo in the makefile is still there and I couldn't compile the code.
Here are both text files with data. Hope it helps. If you need another run let me know.

tep_data.txt
static_data.txt

@hpmax
Copy link
Contributor Author

hpmax commented Sep 4, 2020

@iglesiasmarianod Thanks for the data. Unfortunately, I don't have my development or testing setups available to me at all, hopefully, I should be back to "normal" in a week or so.

I did analyze your data... and if you want to, you can too, data format is:

The first row is C data (ie stuff stored in the EEPROM of the ms5611), you can ignore it for now, the remainder of the data looks like:
TepD1F TepD1 TepD2F TepD2 StaticD1F StaticD1 StaticD2F StaticD2

Where F stands for filtered. Filtering stops as soon as the glitch starts, so the F values will be defined at the start of each glitch and not change again until the next glitch. What I'm looking for is a relationship between (D2F-D2) on one sensor with (D1F-D1) of the other sensor. In other words, if we assume that the D2 should stay constant during the glitch (D2 is proportional to temperature), how much does D1 change relative to the change in D2 which we know to be bogus.

It looks like every sensor is different, but not very different. It's possible that the differences are proportional to the C values, but I haven't looked into that yet. As a result we could probably use a default value and allow people to calibrate if they wish to improve performance. It's possible if the relationship to the C values were understood the default values could be improved, but I doubt it would help if you cal'ed the individual sensors. There is a temperature relationship.

At the start of your run, it appears the linear/scale constant is -.32923, whereas at the end of your run it was -.31188 (and your sensor clearly heated up). I had measured 0.31 on mine. The other sensor went from -.32975 to -,30881. There is also a scalar term that varies from 37 to 18.24 on the first sensor, and from 46.09 to 19.15 on the second. I am hoping this range is adequate to extrapolate. From what I can tell, adding a second order term doesn't help much.

@hpmax
Copy link
Contributor Author

hpmax commented Sep 4, 2020

filtered

Brown trace is StaticD1-StaticD1f
Blue Trace is StaticD1-StaticD1f + (TepD2-TepD2f) * (TEPD2f * K1 + K2) + (TepD2f * K3 + K4)
Previously I was using StaticD1-StaticD1f + (TepD2-TepD2f) *-0.31

For some perspective, the brown trace has a mean of -400, and a std deviation of 445. By comparison, the compensated data has a mean of 1 and std deviation 75. Target for both is 0. Note that these are both in raw D1 values. D1 is 8.7e6, if we assume 1:1 linearity that means 1 Pascal is 87 in D1 space.

Also, note that I only captured data during glitches, so this is not an accurate reflection of what the average and std deviation truly would look like in actual usage.

@mihu-ov
Copy link
Member

mihu-ov commented Sep 6, 2020

Just thinking, could the amplifier on the sensorboard cause issues due to switching noise? Any change in sensor data if power and input signal is removed from the amp?

@hpmax
Copy link
Contributor Author

hpmax commented Sep 7, 2020

@mihu-ov I'm using Stefan's hardware. On it the amplifier is on a daughter board off the adapter board and isn't on the sensor board. It is also a Class AB amplifier which shouldn't throw a lot of noise out. Mine failed, and am in the process of replacing it with a Class D, because I'm hoping it will be more robust.

That said, those spikes you see on the brown trace were deliberately injected by adjusting timing delays. The reason the amplitude is different is because I randomized the delay. The longer the delay, the bigger the spike. Obviously you can see there is some correlation between the orange trace and the blue trace, but I did my best to compensate.

There is noise in the D1 data. But I think we have to expect it. The question is, how much is caused by electrical noise. I really don't know. At this point I can't tell that it's a real problem, or that @iglesiasmarianod data is worse than mine. Rht now I'm only recording data during timing glitches. Perhaps we could try recording data under normal circumstances and see how much difference there is.

Looking at the schematic, I'm actually fairly happy with it, although I might throw in a 10uF cap on the input to all four regulators, and an extra 1uF on the output, next to the .1uF. However... Looking at the BOM, I don't have a lot of confidence in the capacitors. Unfortunately, with MLCC capacitors, it's important to know about the specific capacitor you are using, because some have very substantial capacitance loss over the working voltage, and they can be wide variation in self resonant frequency and ESR as well. Some lose capacitance with frequency, up until they gain it back again when it goes self resonant.

@hpmax
Copy link
Contributor Author

hpmax commented Sep 12, 2020

Just to let you guys know, I bought some LPS22HHs, ESP32-WROOM32Ds, and LSM6DSMs to play around with. I'll try to design a board and see how good the results are compared to what we have now. I'll also want to get MMC5983A and MS4515DO (pin-compatible equivalent to the AMS5915, but claims somewhat better performance -- price appears to be slightly better and it is more available in the US, but it's still expensive and not widely available) but unfortunately Newark didn't have those.

I'm also building up an amplifier based on the MAX98304 to see how bad it is from an EMI perspective.

@hpmax
Copy link
Contributor Author

hpmax commented Oct 19, 2020

I'm basically done with a new software update. There is nothing Earth shattering here, but I have added configurable quadratic compensation, and the compensation for the two sensors can be sensor specific (i.e 2 sets of coefficients). I don't think the quadratic (versus linear) is that important, but it's basically free, so why not? Configurable individual coefficients is a big deal though. One of my sensors was using a linear term of -.33, whereas the other was -.29. And swapping the coefficients made the jitter noise readily apparent when graphing the data whereas using the correct ones it was MUCH harder to spot.

I have significantly enhanced compdata, so it will now run tests and calculate a set of coefficients and write them into the configuration file (note it just appends them to the end -- so you could end up running it 5x and getting 5 different coefficient sets). It's actually a pretty slick setup, and I think we want to include this in the ov-menu.

Now the problem... It always gets within 10%, but it doesn't always get better than that, and 10% isn't that great. So you may still want to hand tweak this and I am trying to figure out how to make this easy for the average user. I was thinking of adding in a feature "-nj" which deliberately adds noise jitter. You can then use "-r" to record data and plot it in Matlab/Octave/Excel/whatever. If you are seeing high jitter blips you want to increase the magnitude of the linear term. If you see low jitter blips you want to decrease the magnitude of the linear term. Perhaps a binary search could be done, although I'm not sure how you would handle the quadratic and constant terms.

The other question is how much do the compensation coefficients drift with temperature/pressure/time/whatever.

Anyone got any ideas here?

@hpmax
Copy link
Contributor Author

hpmax commented Oct 19, 2020

I'm starting to re-think this. After tuning it up as good as I could get it. I let it run overnight and looked at the results. It started out real good, but over time it got worse. A lot worse. It's possible it's temperature sensitive, but I can't find much evidence that's the case. I'm still working on that, but I'm not expecting to see a clear correlation. And I've looked at the temperature of both sensor, as well as change in the ratiometric difference between the sensors (and there is some over time, but it's pretty subtle).

One thought I've had is dynamically adjusting the compensation. In other words, after each detected jitter event, I analyze the data generated during the event, and re-calculate the coefficients to use for the next event. But of course, I don't know if the frequency or magnitude of the jitter affects this. Magnitude presumably doesn't have a huge impact because that would cause a quadratic term and the quadratic term -- compensation is almost entirely in the linear term. One problem I see, is that I am deliberately creating timing jitter to measure the results. This can be done with intentionally large amounts which gives nice clear results, using smaller amounts of jitter, I think might result in a worse fit and unfortunately in actual use I don't have control over how large or frequent the timing error is, and I certainly don't want to deliberately induce it.

@hpmax
Copy link
Contributor Author

hpmax commented Oct 30, 2020

I'm working on a new release that should be done in the next couple days unless something happens.

First some definitions: My current method of measuring noise is to look at the difference from the current sample vs previous. I'm ignoring the first 1000 samples during which it is settling. When I am talking about filtering something, in general I am referring to an IIR filter in which Y(k) = (Y(k-1)*7)+X(k))/8.

The new software implements the following changes:

  1. Currently, resolution is .01 mbar (which is roughly 8 cm at sea level). I have increased resolution by a factor of 16. This doesn't improve accuracy and generally may seem kind of silly, but it did reduces std deviation of noise in the filtered output by ~3% and it's practically free.

  2. Current code uses the last (non-glitched) D2 (temperature) value to calculate the coefficients to measure pressure. New code uses the filtered D2 values. Improvement is ~1-2% in std deviation of noise and it is free.

  3. For reasons, I don't understand and haven't looked into, my algorithm of looking for timing errors of > 2ms still misses some significant, but not huge glitches. I am now detecting glitches based on both timing error and a cycle to cycle drop of more than 300 D2 counts on the temp sensor. This doesn't have a huge impact on std deviation, but my worst case noise has easily been halved by doing this. It is possible other people's sensors are different.

  4. Previously, when a glitch was detected, it would go into glitch compensation mode, and only leave after a pre-programmed amount of time AND after that the new temperature reading was within 15 counts of the filtered temperature reading (note that the filtered temperature reading isn't updated in glitch mode, so it's the filtered output immediately prior to entering the glitch mode). The new version changes this substantially:

First, when a glitch is encountered, the program attempts to estimate (overestimate) the length of the glitch based on the size of the glitch (in D2-D2 filtered) and time constants that have been observed. When a glitch is first observed it looks for the maximum change in temperature vs filtered temperature over the first few cycles, it then calculates how long it thinks it will take to return to normal, it calculates this in a conservative manner. It exits glitch mode after the timer elapses (in cycles not seconds) OR the temperature reading is within 15 counts of the filtered temperature -- note that it can't exit glitch compensation mode during the first few cycles while it's observing the glitch magnitude as I've seen it do weird things in the first or second cycles after the glitch.

Secondly, there is an additional counter, which is essentially a watchdog counter. Glitch mode is re-triggerable. I worry that in theory, it could stay in glitch mode forever, and that the filtered temp value will remain stationary while the actual temperature moves. As such, if you stay in glitch mode for too long, it will now exit glitch mode, and update the filtered temperature values to the current temperature value as a fail-safe. It may produce some temporary ugliness on the outputed values, but hopefully it should resolve in a fraction of a second and then it will be over.

I believe the problem in the previous post was a rapid change in temperature which caused it to never exit glitch mode. Once it fails to leave, the filtered value becomes more and more wrong, and any future glitches screw things up the compensation badly. My first fix likely goes a long way to addressing this by guaranteeing it gets out after a fixed period of time. But in theory, it's still possible that it could get stuck if it keeps getting re-triggered. The watchdog addresses this unlikely possibility.

  1. I have very occasionally seen totally bogus values (usually very low) that cannot possibly be right. I don't know the origin of this, but I have established a filter that rejects any value that is more than 100k counts from the previous value. This filter may need to be improved.

  2. I have updated the compensation code so that it is using a quadratic compensation not linear. I'm not sure this makes much difference, but again, it costs almost nothing. The compdata program that I wrote now analyzes the data it takes and writes the quadratic coefficients into the configuration file. It produces two sets of coefficients, one for compensating the TE sensor and one for the static -- and on mine, they are about 10% different.

I've been running the new code nearly continuously for the past 24 hours and have been very happy with what I'm seeing:

47.5% has less than 1 mbar sample to sample variation.
79.1% has less than 2 counts sample to sample variation.
93.9% has less than 3 counts sample to sample variation.
98.7% has less than 4 counts sample to sample variation.
99.8% has less than 5 counts sample to sample variation.
99.98% has less than 6 counts sample to sample variation.
99.998% has less than 7 counts sample to sample variation.
Worst case is about 8.3125 counts, standard deviation is 1.6192 (1 count = .01 mbar, ~ 8 cm at sea level)

Filtering with a (7*X(k-1)+X(k))/8 FIR filter we get:
49.9% has less than 0.1 counts sample to sample variation.
82.2% has less than 0.2 counts sample to sample variation.
95.7% has less than 0.3 counts sample to sample variation.
99.3% has less than 0.4 counts sample to sample variation.
99.92% has less than 0.5 counts sample to sample variation.
99.993% has less than 0.6 counts sample to sample variation.
Worst case is 0.82133 counts, standard deviation is 0.14861

Unfiltered:
dist
Filtered:
fdist

The future:

I have no plans for additional software updates to address the glitches (except maybe point 5) ... however after this, my next software update will probably add the temperature sensor (DS18B20) back in.

More long term (yeah, I know, I've been promising it for a while), before the end of the year -- I'm hoping to have a new Kalman filter code. The Kalman filter code should be a major improvement. First, the noise in the two sensors appears to be almost completely uncorrelated which means along with standard deviation data collected by compdata on the two sensors, both sensors can be used to estimate both static and TE pressure. This should decrease noise by nearly 40%. But this also means that if you don't have a TE probe, you should plumb the static line to both static and TE ports -- and this will be an option in the configuration. It'll be interesting to see which gives better results. I also plan to use the glider's polar to better improve the estimate...


NOTE: Comparing the standard deviation to the previous standard deviations I listed, it seems something got worse... But in reality, it's just how the data was presented. The data shown above was based on taking the standard deviation of the delta between subsequent samples. Previously, I had taken the standard deviation of the data minus a +/- 32 point moving average. I went back and recalculated the data using the old method and got the following:

61.8% has less than 1 count difference from average.
92.0% has less than 2 counts difference from average.
99.1% has less than 3 counts difference from average.
99.95% has less than 4 counts difference from average.
99.998% has less than 5 counts difference from average.
100% has less than 6 counts difference from average.
Worst case is about 6.2048 counts, standard deviation is 1.144.

I also ran it a second time with twice as much data and std deviation further decreased to 1.12 and the rest of the distribution improved as well. So the new code in fact appears to be about 10% better than my August 22nd numbers, which in turn were better than my August 16th numbers.

@hpmax
Copy link
Contributor Author

hpmax commented Nov 5, 2020

Bad news... I hooked up a vacuum pump and dropped the pressure to about half an atmosphere (about 9500 feet?) and the coefficients changed. I guess, maybe that shouldn't have been a shock. So now I have to figure out the compensation relationship as a function of pressure. I'm hoping it's something simple like compensation = measured compensation * (D1 / measured D1). This also calls into question whether a changing D2 will impact that as well. I guess that's next.

Ughh...

@hpmax
Copy link
Contributor Author

hpmax commented Nov 6, 2020

As luck would have it (or maybe not), my vacuum source had a slow leak -- which means I got a nice gradient through my data collection. Looking at the data (visually, not exhaustively)... D2 (raw temp data) glitches seem pretty consistent over pressure (haven't checked over temperature).

As expected, D1 glitches weren't consistent. I tried: (D1-D1filtered) * max(D1 filtered)/D1 filtered

This helped, but it under compensated the data. That is, at low pressure, the glitches were still less than they were at normal pressure.

Then I tried: (D1-D1 filtered) * max (Pressure) / Pressure

This OVER compensated the data. That is, at low pressure glitches were now LARGER than they were at normal pressure.

Then I tried: (D1-D1 filtered ) * max (D1 filtered - X) / (D1 filtered - X) for different values of X.

On the static sensor, X = 3.2e6 gave pretty consistent results (at least visually). But on the TE sensor, I think it was closer to 3.5e6.

Then I tried: (D1 - D1 filtered) * max (Pressure + X) / (Pressure + X)

I thought X = 23000 gave pretty consistent results on both static and TE sensor.

There are a few problems here...

  1. The data I collected is surprisingly hard to analyze. The gradient is nice to to get sampling at all different pressures, but, particularly at the low-end the gradient was enough that the gradient could clearly be seen during the glitch. At it's worst I was probably "sinking" at about 1 foot per second. This makes it nearly impossible to properly fit the data for compensation coefficients without compensating for the gradient.

  2. I have no idea where the 23,000 comes from. It was a TLAR magic number. The good news is, it looks about right for both of my sensors -- and the compensation is almost certainly better than nothing. The bad news is I'd like to have a better idea where this number came from, or some good way to generate the number (I'd prefer to derive it from other known values).

  3. I'm not sure if it's necessary to try the same thing with temperature, it probably is. But I'm not sure what a safe way would be to get good data.

Looking at the temperature data I saw these blatantly obvious undulations in the temperature data. I was thinking... Maybe this was the effect of the glitch. While it's glitching the temperature was dropping, but the peaks were about 25 minutes apart. I suspect this was the furnace turning on and off in my house. If we assume there was maybe 1C swing from on to off, and that was about 3200 counts... Let's say the max temperature difference we'll ever see is 80C hard to imagine external temp being more than +/- 40C, and the internal temp while potentially higher, should have less swing than that puts, the max delta in D2 at about 256k, which means (8.84e6+256e3)/8.84e6 gives a max swing of about 2.9%. So... maybe temperature compensation is unnecessary?

Anyone got any rabbits to pull out of hats here?

@hpmax
Copy link
Contributor Author

hpmax commented Nov 8, 2020

Okay, I think I am getting pretty good results with:

Compensation = ((D2 - D2f)^2*K0 + (D2-D2f)*K1 + K2) * (Pressure + K3)

I think I have compdata dialed in to give accurate K0, K1, and K2. In fact, as a result of a bug, I was writing out the TE and static compensation values swapped. When I looked at the data, I could clearly see noise blips. After realizing, they were swapped from looking at the code, I switched them around in the config file, re-ran, and could no longer see clear noise blips. They were only 10% apart. I'm very happy about that.

The question is what to do about K3... On my sensors, the values I am using are 15365 and 17322. If someone didn't want to measure K3, I'd suggest just using 16343.5 as a starting point. How critical is that difference... Well, let's take a look. At 44 kPa, that's an error of about 1.66% between using the average value and the correct value. On the other hand, I have a sample size of 2 here.

The way I determined K3 was to connect static and TE lines (and I should have connected pitot but didn't -- don't be an idiot and do what I did) to a hand vacuum pump and dropped the pressure to some low level (44 kPa). I then started collecting data with intentional glitches (300ms). The vacuum setup wasn't perfect and did leak, which allowed pressure to slowly rise back to ambient.

The data of interest was D1-D1f, and D1f based pressure. I filtered it such that I only looked for (D1(i)-D1f(i)-D1(i-1)+D1f(i-1)>(1500*Pressure/100e3). In other words I was only looking for the peak of the glitches, and only the intentional glitches that I injected. I then separated the data into three bins. One at low pressure (<50kPa), medium pressure (50<P<95), and normal pressure (>95kPa). I took the mean of (D1-D1f/(Pressure+X)) for each value of the filtered data in each bin, for all X values between 0 and 40e3 and then found X such that the low and high bins had the same mean and then double checked the medium bin.

If I were to automate it, I'd likely do something similar but create more bins (variable based on how many data points I have, too few data points could cause the means to be unreliable), and then find X that minimizes the standard deviation of the means across the bins. Would anyone use such a tool, does anyone care?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants