A suitable paradigm to understand a semantic of digital data is inspired by a human, who exploits the pairwise similarity of unknown objects with those already known for object learning and classifying. The basic task of the digital data processing based on similarity is an efficient similarity search, i.e. efficient identification of data objects similar to a given query object.
The metric space has been dominating for decades as a model of a pairwise similarity of complex data objects. It can describe the perceived similarity of data objects with sufficient veracity, and its artificial properties facilitate efficient searching by generic techniques based on solid mathematical principles.
Current metric searching exploits both, analytical properties of space, and objects relationships that are not guaranteed. Hundreds of approaches with diverse applicability, contributions, and theoretical foundations have been developed in past decades to support metric searching efficiency, which is still running behind the current data explosion.
We aim at the basics of the whole research, as we observed that all triplets of objects in current metric similarity models form ordinary triangles in Euclidean space with strongly limited sizes of angles. All angles in triangles are not from 0\deg to 180\deg as have been always assumed, but for instance, from 20\deg to 80\deg. Such information motivated us to enhance an ordinary rule of triangle inequality with information about a range of permitted angles in triangles. Thus, if angles are between 20\deg and 80\deg, we can limit the length of each side in a triangle by three restrictions:
- c <= (a + b) * 0,74
- c >= |a - b| * 1,53
- c >= (a + b) * 0,17
Our analytical findings open new possibilities to reveal a context in metric similarity searching, that has been described by the world community just experimentally. Improvements in mathematical foundations of the research are valuable in particular as they improve certainty about performance and whole behaviour of developed techniques in diverse environments. Also, they are revealing promising directions for related research based on solid theorems and explanations.
References and Where to Learn More
Full paper: https://is.muni.cz/publication/1686161/2020-SISAP-Accelerating-metric-filtering.pdf
Video presentation (25min): https://is.muni.cz/publication/1686161/2020-SISAP-Accelerating-metric-filtering-video.mp4
How to Cite This Paper
MÍČ, Vladimír a Pavel ZEZULA. Accelerating Metric Filtering by Improving Bounds on Estimated Distances. In Shin'ichi Satoh, Lucia Vadicamo, Arthur Zimek, Fabio Carrara, Ilaria Bartolini, Martin Aumüller, Björn Þór Jónsson, Rasmus Pagh. Similarity Search and Applications: 13th International Conference, SISAP 2020, Copenhagen, Denmark, September 30 - October 2, 2020, Proceedings. Cham: Springer, 2020. s. 3-17. ISBN 978-3-030-60935-1. doi:10.1007/978-3-030-60936-8_1.