Text Summarization

Information overload is a phenomenon where one is unable to make decisions due to the overabundance of information related to the issues at hand. There is, after all, a limit to the amount of information a human can process and, with the advent of the internet this boundary has been pushed more and more. In particular, recommendation systems are more and more prevalent in both social networks and information retrieval systems, where instead of a user inputting a query, the system already has a preconceived notion of who the user is and tailors results and push notifications based on that idea. In this context, there is a need for a system to condense information and present it to the user. Text Summarization is an NLP task that consists of condensing the information contained in one or mode documents in order to produce a shorter text that is readable, coherent and contains all relevant information from the source(s). Recently, Large Language Models have been employed in this task for their ability to understand and manipulate text.
The group is currently investigating the application of text summarization technologies to specific domains like biomedical scientific articles and legal documents in order to create systems that can reduce research times for experts in these fields. Furthermore, the group is also researching validation metrics for these generated summaries to make sure that what is produced is accurate to the sources and it contains all relevant information.

Social Media Analysis

A full ninety percent of all the data in the world has been generated over the last two years. The major sources of this large amount of data are various social media platforms and other digital sources such as smartphones. The large volume of data is a vast untapped resource that could potentially help researchers in solving various problems. Our social media definition includes platforms such as Twitter, YouTube, etc to scientific communities such as DBLP where the underlying social link among entities could be due to reasons like co-authorship or citations. Our research is interdisciplinary and often explores various social science principles and statistical techniques to solve problems. We are investigating problems like community patterns, information dissemination and, impact of tweets on financial market to name a few.
In one of the recently finished work, we analyzed interest based community patterns on several different kinds of large real social network datasets. The focus is how interest based community patterns differ from conventional network communities based on structural properties of the network. In a different social media work, we are studying the impact of tweets on the financial market by analysing a large set of tweet volumes on a very long period (more than two years) and for a total of 1883 companies. In another interesting work, we are focusing on the problem of resolution of user profiles across social networks. That is, to identify if two user profiles from two different networks with different user ids or nicknames belong to the same user. The problem is more meaningful for resolving different profiles in digital forensic and criminal investigations.

Data Integration in Healthcare

Today we are more connected than ever. We rely on all kinds of smart equipment to communicate, work and for leisure activities. This radical change is also affecting our health. We adopt more and more devices that can monitor and produce data streams about our health with regularity and precision. For example, we use smartwatch to document our physical activity or smartphone to track the calorie consumption. Thus, these applications are transforming the medicine and healthcare sector. The big challenge is how to integrate and interpret this data from various heterogeneous socio-health databases.
The objectives of our group are manifold like processing of medical data of the patients to identify the best therapies or monitoring cognitive performance of the individual in order to detect early cognitive impairment to name a few. To achieve this we are analysing a large set of anonymous data from an entire population that go far beyond the single pathology (environmental exposure, diet and physical habits, previous analysis).
We are collaborating with the staff of the S. Orsola Hospital and Maggiore Hospital in Bologna, G.B. Morgagni - L. Pierantoni Hospital in Forlì and with neuropsychiatry group of the Santa Maria Nuova Hospital in Reggio Emilia.

Predictive Models

Nowadays the business has to be more profitable, react quickly and provide better solutions to the decision makers, with less human efforts and at a lower cost. To achieve this, it is necessary that an effective knowledge creation and management process is required. This has become a reality as a lot of organizations are collecting a large amount of data and subsequently mining information from it, to achieve this goal. Methodology to mine the information encompass from simple statistics rules to sophisticated machine learning algorithms. The group is currently investigating problems in the domain of Healthcare, where it is desirable to predict the functional decline in elderly people. In our research, we targeted the subjects that are starting to experience limitations in their day-to-day life and are at risk of frailty.

Watermarking and Fingerprinting in Social Media

The proliferation of mobile devices are contributing to a massive production of digital contents, like images, videos and text. At the same time, social media offer users many platforms for sharing them. These digital contents meet several kind of users’ needs both in professional activities and social interactions, however, it is important to protect the intellectual property rights of the authors. Digital watermarking has become crucially important in the authentication and copyright protection. Moreover, the devices usually leave a characteristic fingerprint in most of digital contents produced. This fingerprint can be considered a kind of implicit watermark through which authorship can be proven. The group is currently investigating these problems in the domain of Social Media, where sharing activity is one of the most noticeable behaviour and where the common internet-user spends most of his time.

Temporal Information Retrieval

Traditional information retrieval and search engine fail to exploit the temporal information of documents. Temporal Information Retrieval (TIR) has been a topic of great interest in recent years. Its purpose is to improve the retrieval of documents by exploiting their temporal information, making it possible to position queries and documents in a timeline in accordance with their temporal features.
Our research on TIR focus on temporal information in the content-level of documents: temporal expressions in natural language are extracted and normalized to describe the temporal scope of documents and queries. The resulting TIR model ranks results combining both traditional IR metrics and new defined temporal metrics.
This research copes also with the Text Categorization task: through the extraction of temporal features from the set of temporal expressions and the application of Machine Learning models, we reach greater classification accuracy in standard document collections.

In a very related research with respect TIR, we are analysing the temporal dimension (Time Quantification) of large document collections. With current text mining techniques, it has now become possible to measure society’s attention and focus when it comes to remembering past events or projecting topics in the future. Moreover, the analysis of temporal dimension for a variety of document collections allows us to better understand its peculiarities and take advantage of this knowledge to improve existing TIR models. Temporal mentions in the text strongly depends on the document’s context and scope. In history books, for example, many mentions exist for periods of time much far from the present year. Contrariwise, in social networks like Twitter, where information is produced and consumed day by day, temporal mentions are mostly related to last hours. Hence, in order to study time incidence in the text we are currently analysing document collections from different context: Wikipedia, Twitter, blogs, search engine queries, news articles.