A short while ago, I asked a bunch of people for long-term touchpad usage data (specifically: evemu recordings). I currently have 25 sets of data, the shortest of which has 9422 events, the longest of which has 987746 events. I requested that evemu-record was to be run in the background while people use their touchpad normally. Thus the data is quite messy, it contains taps, two-finger scrolling, edge scrolling, palm touches, etc. It's also raw data from the touchpad, not processed by libinput. Some care has to be taken with analysis, especially since it is weighted towards long recordings. In other words, the user with 987k events has a higher influence than the user with 9k events. So the data is useful for looking for patterns that can be independently verified with other data later. But it's also useful for disproving hypothesis, i.e. "we cannot do $foo because some users' events show $bla".
One of the things I've looked into was tapping. In libinput, a tap has two properties: a time threshold and a movement threshold. If the finger is held down longer than 180ms or it moves more than 3mm it is not a tap. These numbers are either taken from synaptics or just guesswork (both, probably). The need for a time-based threshold is obvious: we don't know whether the user is tapping until we see the finger up event. Only if that doesn't happen within a given time we know the user simply put the finger down. The movement threshold is required because small movements occur while tapping, caused by the finger really moving (e.g. when tapping shortly before/after a pointer motion) or by the finger center moving (as the finger flattens under pressure, the center may move a bit). Either way, these thresholds delay real pointer movement, making the pointer less reactive than it could be. So it's in our interest to have these thresholds low to get reactive pointer movement but as high as necessary to have reliable tap detection.
General data analysis
Let's look at the (messy) data. I wrote a script to calculate the time delta and movement distance for every single-touch sequence, i.e. anything with two or more fingers down was ignored. The script used a range of 250ms and 6mm of movement, discarding any sequences outside those thresholds. I also ignored anything in the left-most or right-most 10% because it's likely that anything that looks like a tap is a palm interaction . I ran the script against those files where the users reported that they use tapping (10 users) which gave me 6800 tap sequences. Note that the ranges are purposely larger than libinput's to detect if there was a significant amount of attempted taps that exceed the current thresholds and would be misdetected as non-taps.
Let's have a look at the results. First, a simple picture that merely prints the start location of each tap, normalised to the width/height of the touchpad. As you can see, taps are primarily clustered around the center but can really occur anywhere on the touchpad. This means any attempt at detecting taps by location would be unreliable.
The analysis of time is more interesting: There are spikes around the 50ms mark with quite a few outliers going towards 100ms forming what looks like a narrow normal distribution curve. The data points are overlaid with markers for the mean , the 50 percentile, the 90 percentile and the 95 percentile . And the data says: 95% of events fall below 116ms. That's something to go on.
The analysis of distance is similar: Most of the tap sequences have little to no movement, with 50% falling below 0.2mm of movement. Again the data points are overlaid with markers for the mean, the 50 percentile, the 90 percentile and the 95 percentile. And the data says: 95% of events fall below 1.8mm. Again, something to go on.
Now let's combine the two. Below is a graph mapping times and distances from touch sequences. In general, the longer the time, the longer the more movement we get but most of the data is in the bottom left. Since doing percentiles is tricky on 2 axes, I mapped the respective axes individually. The biggest rectangle is the 95th percentile for time and distance, the number below shows how many data points actually fall into this rectangle. Looks promising, we still have a vast majority of touchpoints fall into the respective 95 percentiles though the numbers are slightly lower than the individual axes suggest.
So based on the above, we learned one thing: it would not be reliable to detect taps based on their location. But we also suspect two things now: we can reduce the timeout and movement threshold without sacrificing a lot of reliability.
Verification of findings
Based on the above, our hypothesis is: we can reduce the timeout to 116ms and the threshold to 1.8mm while still having a 93% detection reliability. This is the most conservative reading, based on the extended thresholds.
To verify this, we needed to collect tap data from multiple users in a standardised and reproducible way. We wrote a basic website that displays 5 circles (see the screenshot below) on a canvas and asked a bunch of co-workers in two different offices  to tap them. While doing so, evemu-record was running in the background to capture the touchpad interactions. The touchpad was the one from a Lenovo T450 in both cases.
We asked each user three questions: "do you know what tapping/tap-to-click is?", "do you have tapping enabled" and "do you use it?". The answers are listed below:
- Do you know what tapping is? 33 yes, 12 no
- Do you have tapping enabled? 19 yes, 26 no
- Do you use tapping? 10 yes, 35 no
Ok, let's look at the data sets, same scripts as above:
What does this mean for users?
Based on the above, the libinput thresholds will be reduced to 100ms and 1.3mm. Let's see how we go with this and then we can increase it in the future if misdetection is higher than expected. Patches will on the wayland-devel list shortly.
For users that don't have tapping enabled, this will not change anything. All users who have tapping enabled will see a more responsive cursor on small movements as the time and distance thresholds have been significantly reduced. Some users may see a drop in tap detection rate. This is hopefully a subconscious enough effect that those users learn to tap faster or with less movement. If not, we have to look at it separately and see how we can deal with that.
If you find any issues with the analysis above, please let me know.
 These scripts analyse raw touchpad data, they don't benefit from
libinput's palm detection
 Note: mean != average, the mean is less affected by strong outliers. look it up, it's worth knowing
 X percentile means X% of events fall below this value
 The Brisbane and Paris offices. No separate analysis was done, so it is unknown whether close proximity to baguettes has an effect to tap behaviour
 i.e. the effect of users learning how to use a system that doesn't work well out-of-the-box. This may result in e.g. quicker taps from those that are familiar with the system vs those that don't.