Noise removal algorithm for Grove DHT Pro Sensor

Hi @cluckers, @royhills

My promise

As I have promised a few days ago, I have come with a more portable version of the noise removal algorithm for the DHT sensor.
Here’s a link to our Grove DHT Pro sensor.
Here’s a link to the 1st topic that raised the issue.
Here’s a link to the 2nd topic that raised the issue.

The problem

The problem is that the Grove DHT sensor is not reliable when it’s used for long periods of time.
As users have reported, there’s a chance of 0.5-2% of getting bad readings - extreme values, which sometimes don’t even belong in the sensor’s working range.

Scientifically, it’s quite normal for a sensor to give bad readings at some point and statistically the chance of encountering erroneous data increases as time passes.

Here’s a visual representation of data points collected over time from a Grove DHT sensor - @royhills’s contribution .

As you can see, there’re lots of moments when the sensor is way off the chart by a long shot.
Totally unreliable.

The solution

So, we have to come up with a solution to attenuate/filter those outlier values.
My approach is to use a basic statistical model for removing the outlier values.

The thing here is to capture a certain amount of data points (values) and then filter them by excluding any value that’s outside of a certain threshold.
The threshold is simply calculated:

threshold = mean_average +- standard_deviation * factor

As you can see, we’re calculating the mean average of the captured data, calculate the standard deviation and then also include a factor for making the threshold more restrictive or permissive.
The factor can be user-defined, even though it’s set by default to factor = 2.

Here’s a visual representation of data points collected over time from a Grove DHT sensor - still @royhills’s contribution.

The code

Here’s the library you have to include.
Please take a look at the folder structure down below and see how you have to use it.

As an everyday user, you shouldn’t modify this library’s implementation.

import threading
import numpy
import datetime
import math
from grovepi import dht
import time

# after a list of numerical values is provided
# the function returns a list with the outlier(or extreme) values removed
# make the std_factor_threshold bigger so that filtering becomes less strict
# and make the std_factor_threshold smaller to get the opposite
def statisticalNoiseReduction(values, std_factor_threshold = 2):
	if len(values) == 0:
		return []

	mean = numpy.mean(values)
	standard_deviation = numpy.std(values)

	# just return if we only got constant values
	if standard_deviation == 0:
		return values

	# remove outlier values which are less than the average but bigger than the calculated threshold
	filtered_values = [element for element in values if element > mean - std_factor_threshold * standard_deviation]
	# the same but in the opposite direction
	filtered_values = [element for element in filtered_values if element < mean + std_factor_threshold * standard_deviation]

	return filtered_values

# class for the Grove DHT sensor
# it was designed so that on a separate thread the values from the DHT sensor are read
# on the same separate thread, the filtering process takes place
class Dht(threading.Thread):
	# refresh_period specifies for how long data is captured before it's filtered
	def __init__(self, pin = 4, refresh_period = 10.0, debugging = False):
		super(Dht, self).__init__(name = "DHT filtering") = pin
		self.refresh_period = refresh_period
		self.debugging = debugging
		self.event_stopper = threading.Event()

		self.blue_sensor = 0
		self.white_sensor = 1
		self.filtering_aggresiveness = 2
		self.callbackfunc = None
		self.sensor_type = self.blue_sensor

		self.lock = threading.Lock()

		self.filtered_temperature = []
		self.filtered_humidity = []

	# refresh_period specifies for how long data is captured before it's filtered
	def setRefreshPeriod(self, time):
		self.refresh_period = time

	# sets the digital port
	def setDhtPin(self, pin): = pin

	# use the white sensor module
	def setAsWhiteSensor(self):
		self.sensor_type = self.white_sensor

	# use the blue sensor module
	def setAsBlueSensor(self):
		self.sensor_type = self.blue_sensor

	# removes the processed data from the buffer
	def clearBuffer(self):
		self.filtered_humidity = []
		self.filtered_temperature = []

	# the bigger the parameter, the less strict is the filtering process
	# it's also vice-versa
	def setFilteringAggresiveness(self, filtering_aggresiveness = 2):
		self.filtering_aggresiveness = filtering_aggresiveness

	# whenever there's new data processed
	# a callback takes place
	# arguments can also be sent
	def setCallbackFunction(self, callbackfunc, *args):
		self.callbackfunc = callbackfunc
		self.args = args

	# stops the current thread from running
	def stop(self):

	# replaces the need to custom-create code for outputting logs/data
	# print(dhtObject) can be used instead
	def __str__(self):
		string = ""
		if len(self.filtered_humidity) > 0:
			string = '[{}][temperature = {:.01f}][humidity = {:.01f}]'.format("%Y-%m-%d %H:%M:%S"),

		return string

	# returns a tuple with the (temperature, humidity) format
	# if there's nothing in the buffer, then it returns (None, None)
	def feedMe(self):

		if self.length() > 0:
			temp = self.filtered_temperature.pop()
			hum = self.filtered_humidity.pop()
			return (temp, hum)
			return (None, None)

	# returns the length of the buffer
	# the buffer is filled with filtered data
	def length(self):
		length = len(self.filtered_humidity)
		return length

	# you musn't call this function from the user-program
	# this one is called by threading.Thread's start function
	def run(self):
		values = []

		# while we haven't called stop function
		while not self.event_stopper.is_set():
			counter = 0

			# while we haven't done a cycle (period)
			while counter < self.refresh_period and not self.event_stopper.is_set():
				temp = None
				humidity = None

				# read data
					[temp, humidity] = dht(, self.sensor_type)

					# check for NaN errors
					if math.isnan(temp) is False and math.isnan(humidity) is False:
						new_entry = {"temp" : temp, "hum" : humidity}

						raise RuntimeWarning("[dht sensor][we've caught a NaN]")

					counter += 1

				# in case we have an I2C error
				except IOError:
					if self.debugging is True:
						print("[dht sensor][we've got an IO error]")

				# intented to catch NaN errors
				except RuntimeWarning as error:
					if self.debugging is True:

					# the DHT can be read once a second

			if len(values) > 0:
				# remove outliers
				temp = numpy.mean(statisticalNoiseReduction([x["temp"] for x in values], self.filtering_aggresiveness))
				humidity = numpy.mean(statisticalNoiseReduction([x["hum"] for x in values], self.filtering_aggresiveness))

				# insert into the filtered buffer

				# if we have set a callback then call that function w/ its parameters
				if not self.callbackfunc is None:

			# reset the values for the next iteration/period
			values = []

		if self.debugging is True:
			print("[dht sensor][called for joining thread]")

And here’s an example program for you @cluckers - now, you can use it relatively simple.
This is the most basic program.

#!/usr/bin/env python3
from grove_dht import Dht
import signal
import sys

# Don't forget to run it with Python 3 !!
# Don't forget to run it with Python 3 !!
# Don't forget to run it with Python 3 !!

# Please read the source file(s) for more explanations
# Source file(s) are more comprehensive

dht = Dht()

def signal_handler(signal, frame):
    global dht

def callbackFunc():
    global dht

def Main():
    print("[program is running][please wait]")

    global dht
    digital_port = 4

    # set the digital port for the DHT sensor
    # using the blue kind of sensor
    # there's also the white one which can be set by calling [dht.setAsWhiteSensor()] function
    # specifies for how long we record data before we filter it
    # it's better to have larger periods of time,
    # because the statistical algorithm has a vaster pool of values
    # the bigger is the filtering factor (as in the filtering aggresiveness)
    # the less strict is the algorithm when it comes to filtering
    # it's also valid vice-versa
    # the factor must be greater than 0
    # it's recommended to leave its default value unless there is a better reason
    # every time the Dht object loads new filtered data inside the buffer
    # a callback is what it follows

    # start the thread for gathering data

    # if you want to stop the thread just
    # call dht.stop() and you're done

if __name__ == "__main__":
    signal.signal(signal.SIGINT, signal_handler)

Here’s the folder structure you should have:

Now, this example program is really basic.
The library is capable of much more.
I encourage anyone to make a code review on this library.
Maybe there’s someone that can add other features to it and by so making it more practical.
Maybe there’s someone who’s thinking of transforming this library into an add-on, so that any sensor can use it.

If there’s anything unclear or if I did a mistake somewhere here, please let me know about it.
Here’s a link to the repo folder - you’ll see here’s another interesting example program.

A README is also an interesting idea.

Don’t forget to use Python 3!

Thank you!


Hi Robert,

Many thanks for your effort.
I tried your example but it returns errors when I use it because I am using python 2.7 within geany.
When I change python building settings in geany, another error indicates me that it cannot find ‘grovepi’.
I will look further into it when I have a bit more time.
Anyway I will need to modify my other python programs to version3 as well…

many thanks,


Hi @cluckers,

I’ve edited the initial post.
Please take a look at it again and inform me of how it goes for you.

Also, please use Python 3 as with Python 2 you won’t be able of exiting the program.
Also, when I wrote this library, I’ve used the Python 3's documentation.

Thank you!