# Hello World

Welcome to Hexo! This is your very first post. Check documentation for more info. If you get any problems when using Hexo, you can find the answer in troubleshooting or you can ask me on GitHub.

# Python-Implementation-of-Hampel-Filter

## Introduction

The Hampel filter is used to detect outliers in time series data. I wanted to use this filter to detect outliers in the sensor data collected from wearable sensors. However, the only implementation I could find [1] is implemented in Python using Pandas’s rolling() and apply() functions, which causes longer execution time as the input time series gets longer. Therefore, I reimplement this using NumPy for faster execution. The execution time of these two implementations, Pandas-based implementation [1] and my NumPy-based implementation, are compared at the end of this article.

My implementation is available here.

## Hampel Filter

The algorithm of the Hampel Filter is as follows [2]:

For a given sample of data, $x_s$, the algorithm:

• Centers the window of odd length at the current sample.
• Computes the local median, $m_i$, and standard deviation, $\sigma_i$, over the current window of data.
• Compares the current sample with $n_\sigma \times \sigma_i$, where $n_\sigma$ is the threshold value. If $∣x_s−m_i∣>n_\sigma \times \sigma_i$, the filter identifies the current sample, $x_s$, as an outlier and replaces it with the median value, $m_i$.

So basically, the algorithm applies sliding window and computes the local median and standard deviation, and if the Median Absolute Deviation (MAD) of the sample ($∣x_s−m_i∣$) is bigger than the threshold, $n_\sigma \times \sigma_i$, the sample is treated as an outlier. You can control the sensitivity of the filter by changing $n_\sigma$. Its window size is also a configurable parameter.

## Performance Comparison

Figure 1. shows a comparison of execution times when the sample size is changed from 10 to 10^6. The horizontal axis is the length of the input signal (number of samples), and the vertical axis is the execution time. The execution time is proportionally increasing to the number of samples with the pandas-based implementation, whereas the execution time of the numpy-based implementation is not increasing until about n=10^3. After that, the execution time is linearly proportional to the number of samples for both, but the numpy-based implementation is about 10*3 times faster.

Figure 1. Comparison of execution times

# 1000+ Cited Sensor-Based Human Activity Recognition Publications

## 1000+ cited Sensor-based HAR Publications

I have compiled a list of publications related to sensor-based human activity recognition or ubiquitous computing that have been cited over 1000 times on my Notion. It starts with “Some computer science issues in ubiquitous computing” and “Hot topics-ubiquitous computing”, both written by Mark Weiser in 1993, and covers up to “Deep Learning for Sensor-based Activity Recognition: A Survey” written by Jindong Wang et al. in 2019. Please let me know if I am missing any papers that should be listed. You can see the list from the following link.

1000+ cited Sensor-based Human Activity Recognition Publications

• page 1 of 1