ENH: O(logW) Hampel filter implementation

Hello everyone,

I would like to draw your attention to this Pull Request, which introduces an optimized implementation of the sliding-window Hampel filter.

The Problem with Existing Solutions

To the best of my knowledge, all existing implementations available online require a full selection step taking linear time, O(W), over the absolute deviations every time the local median value shifts. Because a moving median dynamically alters all absolute distances within the active window, tracking the exact Median Absolute Deviation (MAD) efficiently has microfilm traditionally been a major bottleneck.

Proposed Architecture

The suggested solution introduces two additional dual-heap structures (rank trackers) alongside the standard running median filter.

  • By buffering these structures with positive/negative infinity values and maintaining a fixed rank offset between them, we can dynamically slice out exactly half of the lowest absolute deviations at any given moment.

  • When the window slides, instead of rebuilding the structures from scratch, we can update the internal boundaries on the fly, essentially shifting ranks by systematically replacing the infinity bounds. This reduces a global functional update to a localized heap adjustment taking logarithmic time, O(log W).

Performance Realities

In terms of benchmarks, this approach does run about 5 times slower than a highly optimized, pure rolling median filter, which is expected given the multi-heap orchestration. However, because it completely breaks the linear dependency on the window size W, I believe it stands as the fastest bit-exact rolling MAD algorithm currently available.

I would love to get your feedback on the architecture and discuss its integration!

Best regards,
GGKOGAN

Following up on my previous post, I would like to share some additional context from signal processing literature regarding the practical value of this filter, as well as how it is supported across other established scientific computing platforms.

Academic Context and Significance

The Hampel filter is recognized in robust statistics as an exceptionally resilient tool for outlier identification, particularly in environments dominated by impulsive, non-Gaussian noise where traditional mean and standard deviation metrics break down (Hampel, 1974).

In applied fields, the filter is heavily utilized for preprocessing high-frequency data streams. For instance, academic literature extensively documents its use in cleaning GNSS and satellite telemetry data, filtering artifacts from biomedical signals like ECGs, and protecting industrial wireless sensor networks from telemetry spikes. Because it preserves structural signal trends while eliminating severe anomalies, it is often a preferred foundational step in data-cleaning pipelines.

Implementations in Other Computational Ecosystems

Given its utility, native and optimized versions of the Hampel filter are standard offerings in several major scientific computing environments: