Доцент кафедры компьютерных технологий Миланского Университета Паоло Болди расскажет о современном подходе к определению мер центральности на примере компьютерных и социальных сетей.
Научный семинар пройдёт 23 сентября в московском офисе Яндекса в 19:00.
Начало регистрации в 18:30.
Семинар пройдет на английском языке.
Подробные тезисы можно посмотреть по ссылке: http://events.yandex.ru/events/science-seminars/Boldi-23Sep/
Research
A Modern View of Centrality Measures
Given a social network, which of its nodes are more central? This question has been asked many times in sociology, psychology and computer science, and a whole plethora of centrality measures have been proposed to account for the importance of the nodes of a network. Also, many modern IR problems call for a mixture of classical text-retrieval methods with graph mining techniques, especially within the area of social-network search, where node centrality can play a crucial role.
But what do existing centrality measures actually measure? To what extent do they agree? What is their meaning when they disagree? And further: which of them can be computed or approximated reasonably on the large (huge) graphs under observation?
My talk is twofold. On one hand, I will try to give a comprehensive and mathematically coherent account of the most important centrality measures from the literature, and present for the first time a set of scientifically well-grounded methodologies to establish whether a measure is actually doing what it has been designed for. I propose to assume an axiomatic approach, wherein I suggest some simple, basic properties a centrality measure should have, and I corroborate my findings with some anecdotal results on publicly available medium-to-large web and social networks. I validate the results by examining each measure under the lens of information retrieval, leveraging state-of-the-art knowledge in the discipline to measure the effectiveness of the various indices in locating web pages that are relevant to a query.
On the other hand, I approach the problem of computing a large family of centralities (known as «geometric centralities") on very large graphs accessed in a semi-streaming fashion, with a very small amount of memory (in the order of a dozen bytes) per node available in core memory. I leverage the newly discovered algorithms based on HyperLogLog counters, making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of some measures (e.g., closeness) is not new, our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using «just» 2 TiB of RAM. Moreover, the computations I describe are inherently parallelizable, and scale linearly with the number of available cores.
Paolo Boldi
Paolo Boldi obtained his PhD in Computer Science at the University of Milan, where he is currently Associate Professor. His research interests have touched many different topics in theoretical and applied computer science, such as domain theory, non-classical computability theory, distributed computability, anonymous networks, sense of direction, and self-stabilizing systems. Recently, his works have focused on problems related to the World Wide Web, a field where his research has also produced software tools used by many people working in the same area.