Researchers from Masaryk University Improved the Basic Paradigm for Efficient Similarity Search

The article "Accelerating Metric Filtering by Improving Bounds on Estimated Distances" of dr. Vladimír Míč and prof. Pavel Zezula received the Best Paper Award at the 13th International Conference on Similarity Search and Applications, SISAP 2020 in Copenhagen, Denmark. We enhance a definition of metric space, which is the most often used generic model of the pairwise similarity of complex objects, such as multimedia. The novel definition includes a small amount of information about the addressed data and allows to improve many contemporary techniques for efficient data processing.

2 Feb 2021

A suitable paradigm to understand a semantic of digital data is inspired by a human, who exploits the pairwise similarity of unknown objects with those already known for object learning and classifying. The basic task of the digital data processing based on similarity is an efficient similarity search, i.e. efficient identification of data objects similar to a given query object.

The metric space has been dominating for decades as a model of a pairwise similarity of complex data objects. It can describe the perceived similarity of data objects with sufficient veracity, and its artificial properties facilitate efficient searching by generic techniques based on solid mathematical principles.

Current metric searching exploits both, analytical properties of space, and objects relationships that are not guaranteed. Hundreds of approaches with diverse applicability, contributions, and theoretical foundations have been developed in past decades to support metric searching efficiency, which is still running behind the current data explosion.

We aim at the basics of the whole research, as we observed that all triplets of objects in current metric similarity models form ordinary triangles in Euclidean space with strongly limited sizes of angles. All angles in triangles are not from 0\deg to 180\deg as have been always assumed, but for instance, from 20\deg to 80\deg. Such information motivated us to enhance an ordinary rule of triangle inequality with information about a range of permitted angles in triangles. Thus, if angles are between 20\deg and 80\deg, we can limit the length of each side in a triangle by three restrictions:

  • c <= (a + b) * 0,74
  • c >= |a - b| * 1,53
  • c >= (a + b) * 0,17

Our analytical findings open new possibilities to reveal a context in metric similarity searching, that has been described by the world community just experimentally. Improvements in mathematical foundations of the research are valuable in particular as they improve certainty about performance and whole behaviour of developed techniques in diverse environments. Also, they are revealing promising directions for related research based on solid theorems and explanations.

References and Where to Learn More 

Full paper:  
Video presentation (25min):  

How to Cite This Paper  

MÍČ, Vladimír a Pavel ZEZULA. Accelerating Metric Filtering by Improving Bounds on Estimated Distances. In Shin'ichi Satoh, Lucia Vadicamo, Arthur Zimek, Fabio Carrara, Ilaria Bartolini, Martin Aumüller, Björn Þór Jónsson, Rasmus Pagh. Similarity Search and Applications: 13th International Conference, SISAP 2020, Copenhagen, Denmark, September 30 - October 2, 2020, Proceedings. Cham: Springer, 2020. s. 3-17. ISBN 978-3-030-60935-1. doi:10.1007/978-3-030-60936-8_1.

More articles

All articles

You are running an old browser version. We recommend updating your browser to its latest version.

More info