Publications
Decision trees are among the most popular supervised models due to their interpretability and knowledge representation resembling human reasoning. Commonly-used decision tree induction algorithms are based on greedy top-down strategies. Although these approaches are known to be an efficient heuristic, the resulting trees are only locally optimal and tend to have overly complex structures. On the other hand, optimal decision tree algorithms attempt to create an entire decision tree at once to achieve global optimality. We place our proposal between these approaches by designing a generative model for decision trees. Our method first learns a latent decision tree space through a variational architecture using pre-trained decision tree models. Then, it adopts a genetic procedure to explore such latent space to find a compact decision tree with good predictive performance. We compare our proposal against classical tree induction methods, optimal approaches, and ensemble models. The results show that our proposal can generate accurate and shallow, i.e., interpretable, decision trees.
Authorship Analysis, also known as stylometry, has been an essential aspect of Natural Language Processing (NLP) for a long time. Likewise, the recent advancement of Large Language Models (LLMs) has made authorship analysis increasingly crucial for distinguishing between human-written and AI-generated texts. However, these authorship analysis tasks have primarily been focused on written texts, not considering spoken texts. Thus, we introduce the largest benchmark for spoken texts - \sf HANSEN( ̲Human ̲ANd ai ̲Spoken t ̲Ext be ̲Nchmark). \sf HANSEN encompasses meticulous curation of existing speech datasets accompanied by transcripts, alongside the creation of novel AI-generated spoken text datasets. Together, it comprises 17 human datasets, and AI-generated spoken texts created using 3 prominent LLMs: ChatGPT, PaLM2, and Vicuna13B. To evaluate and demonstrate the utility of \sf HANSEN, we perform Authorship Attribution (AA) & Author Verification (AV) on human-spoken datasets and conducted Human vs. AI text detection using state-of-the-art (SOTA) models. While SOTA methods, such as, character n-gram or Transformer-based model, exhibit similar AA & AV performance in human-spoken datasets compared to written ones, there is much room for improvement in AI-generated spoken text detection. The \sf HANSEN benchmark is available at: https://huggingface.co/datasets/HANSEN-REPO/HANSEN
@inproceedings{tripto-etal-2023-hansen,
title = "{HANSEN}: Human and {AI} Spoken Text Benchmark for Authorship Analysis",
author = "Tripto, Nafis and
Uchendu, Adaku and
Le, Thai and
Setzu, Mattia and
Giannotti, Fosca and
Lee, Dongwon",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2023",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-emnlp.916",
doi = "10.18653/v1/2023.findings-emnlp.916",
pages = "13706--13724",
abstract = "$\textit{Authorship Analysis}$, also known as stylometry, has been an essential aspect of Natural Language Processing (NLP) for a long time. Likewise, the recent advancement of Large Language Models (LLMs) has made authorship analysis increasingly crucial for distinguishing between human-written and AI-generated texts. However, these authorship analysis tasks have primarily been focused on $\textit{written texts}$, not considering $\textit{spoken texts}$. Thus, we introduce the largest benchmark for spoken texts - ${\sf HANSEN}$($\underline{H}$uman $\underline{AN}$d ai $\underline{S}$poken t$\underline{E}$xt be$\underline{N}$chmark). ${\sf HANSEN}$ encompasses meticulous curation of existing speech datasets accompanied by transcripts, alongside the creation of novel AI-generated spoken text datasets. Together, it comprises 17 human datasets, and AI-generated spoken texts created using 3 prominent LLMs: ChatGPT, PaLM2, and Vicuna13B. To evaluate and demonstrate the utility of ${\sf HANSEN}$, we perform Authorship Attribution (AA) {\&} Author Verification (AV) on human-spoken datasets and conducted Human vs. AI text detection using state-of-the-art (SOTA) models. While SOTA methods, such as, character n-gram or Transformer-based model, exhibit similar AA {\&} AV performance in human-spoken datasets compared to written ones, there is much room for improvement in AI-generated spoken text detection. The ${\sf HANSEN}$ benchmark is available at: https://huggingface.co/datasets/HANSEN-REPO/HANSEN",
}
10.18653/v1/2023.findings-emnlp.916
A growing body of interdisciplinary literature indicates that human decision-making processes can be enhanced by Artificial Intelligence (AI). Nevertheless, the use of AI in critical domains has also raised significant concerns regarding its final users, those affected by the undertaken decisions, and the broader society. Consequently, recent studies are shifting their focus towards the development of human-centered frameworks that facilitate a synergistic human-machine collaboration while upholding ethical and legal standards. In this work, we present a taxonomy for hybrid decision-making systems to classify systems according to the type of interaction that occurs between human and artificial intelligence. Furthermore, we identify gaps in the current body of literature and suggest potential directions for future research.
Decision Trees (DTs) are accessible, interpretable, and well-performing classification models. A plethora of variants with increasing expressiveness has been proposed in the last 40 years. We contrast the two families of univariate DTs, whose split functions partition data through axis-parallel hyperplanes, and multivariate DTs, whose splits instead partition data through oblique hyperplanes. The latter include the former, hence multivariate DTs are in principle more powerful. Surprisingly enough, however, univariate DTs consistently show comparable performances in the literature. We analyze the reasons behind this, both with a synthetic and a large benchmark of datasets. Our research questions test whether the pre-processing phase of removing correlation among features in datasets has an impact on the relative performances of univariate vs multivariate DTs.
@article{setzu2023correlation,
title={Correlation and Unintended Biases on Univariate and Multivariate Decision Trees},
author={Setzu, Mattia and Ruggieri, Salvatore},
journal={arXiv preprint arXiv:2312.01884},
year={2023}
}
Language Models (LMs) have been shown to inherit undesired stereotypes that might hurt minorities and underrepresented groups if such systems were to be integrated into real-world applications without careful fairness auditing. This paper proposes FairBelief, an analytical approach to capture and assess beliefs, i.e., propositions that a LM may embed with different degrees of confidence and that covertly influence its predictions. With FairBelief, we leverage prompting to study the behavior of several state-of-the-art LMs across different previously neglected axes, such as model scale and prediction likelihood, assessing predictions on a fairness dataset specifically designed to assess LMs' outputs' hurtfulness. Finally, we conclude with an in-depth qualitative assessment of the beliefs held by the models. We apply FairBelief to English LMs revealing that, although these architectures enable high performances on diverse natural language processing tasks, they show hurtful beliefs about specific genders. Interestingly, training procedure and dataset, model scale, and architecture induce beliefs of different degrees of hurtfulness.
While a substantial amount of work has recently been devoted to enhance the performance of computational Authorship Identification (AId) systems, little to no attention has been paid to endowing AId systems with the ability to explain the reasons behind their predictions. This lacking substantially hinders the practical employment of AId methodologies, since the predictions returned by such systems are hardly useful unless they are supported with suitable explanations. In this paper, we explore the applicability of existing general-purpose eXplainable Artificial Intelligence (XAI) techniques to AId, with a special focus on explanations addressed to scholars working in cultural heritage. In particular, we assess the relative merits of three different types of XAI techniques (feature ranking, probing, factuals and counterfactual selection) on three different AId tasks (authorship attribution, authorship verification, same-authorship verification) by running experiments on real AId data. Our analysis shows that, while these techniques make important first steps towards explainable Authorship Identification, more work remains to be done in order to provide tools that can be profitably integrated in the workflows of scholars.
@article{setzu2023explainable,
title={Explainable Authorship Identification in Cultural Heritage Applications: Analysis of a New Perspective},
author={Setzu, Mattia and Corbara, Silvia and Monreale, Anna and Moreo, Alejandro and Sebastiani, Fabrizio},
journal={arXiv preprint arXiv:2311.02237},
year={2023}
}
https://doi.org/10.48550/arXiv.2311.02237
Transformer-based models are used to solve a variety of Natural Language Processing tasks. Still, these models are opaque and poorly understandable for their users. Current approaches to explainability focus on token importance, in which the explanation consists of a set of tokens relevant to the prediction, and natural language explanations, in which the explanation is a generated piece of text. The latter are usually learned by design with models traind end-to-end to provide a prediction and an explanation, or rely on powerful external text generators to do the heavy lifting for them. In this paper we present TriplEx, an explainability algorithm for Transformer-based models fine-tuned on Natural Language Inference, Semantic Text Similarity, or Text Classification tasks. TriplEX explains Transformers-based models by extracting a set of facts from the input data, subsuming it by abstraction, and generating a set of weighted triples as explanation.
@inproceedings{DBLP:conf/cogmi/SetzuMM21,
author = {Mattia Setzu and
Anna Monreale and
Pasquale Minervini},
title = {TRIPLEx: Triple Extraction for Explanation},
booktitle = {CogMI},
pages = {44--53},
publisher = ,
year = {2021}
}
10.1109/CogMI52975.2021.00015
Artificial Intelligence (AI) has come to prominence as one of the major components of our society, with applications in most aspects of our lives. In this field, complex and highly nonlinear machine learning models such as ensemble models, deep neural networks, and Support Vector Machines have consistently shown remarkable accuracy in solving complex tasks. Although accurate, AI models often are “black boxes” which we are not able to understand. Relying on these models has a multifaceted impact and raises significant concerns about their transparency. Applications in sensitive and critical domains are a strong motivational factor in trying to understand the behavior of black boxes. We propose to address this issue by providing an interpretable layer on top of black box models by aggregating “local” explanations.
We present GLocalX, a “local-first” model agnostic explanation method. Starting from local explanations expressed in form of local decision rules, GLocalX iteratively generalizes them into global explanations by hierarchically aggregating them. Our goal is to learn accurate yet simple interpretable models to emulate the given black box, and, if possible, replace it entirely. We validate GLocalX in a set of experiments in standard and constrained settings with limited or no access to either data or local explanations. Experiments show that GLocalX is able to accurately emulate several models with simple and small models, reaching state-of-the-art performance against natively global solutions. Our findings show how it is often possible to achieve a high level of both accuracy and comprehensibility of classification models, even in complex domains with high-dimensional data, without necessarily trading one property for the other. This is a key requirement for a trustworthy AI, necessary for adoption in high-stakes decision making applications.
@article{DBLP:journals/ai/SetzuGMTPG21,
author = {Mattia Setzu and
Riccardo Guidotti and
Anna Monreale and
Franco Turini and
Dino Pedreschi and
Fosca Giannotti},
title = {GLocalX - From Local to Global Explanations of Black Box {AI} Models},
journal = {Artif. Intell.},
volume = {294},
pages = {103457},
year = {2021}
}
10.1016/j.artint.2021.103457
Artificial Intelligence systems often adopt machine learning models encoding complex algorithms with potentially unknown behavior. As the application of these “black box” models grows, it is our responsibility to understand their inner working and formulate them in human-understandable explanations. To this end, we propose a rule-based model-agnostic explanation method that follows a local-to-global schema: it generalizes a global explanation summarizing the decision logic of a black box starting from the local explanations of single predicted instances. We define a scoring system based on a rule relevance score to extract global explanations from a set of local explanations in the form of decision rules. Experiments on several datasets and black boxes show the stability, and low complexity of the global explanations provided by the proposed solution in comparison with baselines and state-of-the-art global explainers.
@inproceedings{DBLP:conf/pkdd/SetzuGMT19,
author = {Mattia Setzu and
Riccardo Guidotti and
Anna Monreale and
Franco Turini},
title = {Global Explanations with Local Scoring},
booktitle = {\{PKDD/ECML\} Workshops {(1)}},
series = {Communications in Computer and Information Science},
volume = {1167},
pages = {159--171},
publisher = {Springer},
year = {2019}
}
10.1007/978-3-030-43823-4_14
We introduce a framework to extract and parse Java source code, serialize it into RDF triples by applying an appropriate ontology and then analyze the resulting structured code information by using standard SPARQL queries. We present our experiments on a sample of 134 Java repositories collected from Github, obtaining 17 Million triples about methods, input and output types, comments, and other source code information. Experiments also address the scalability of the framework. We finally provide examples of the level of expressivity that can be achieved with SPARQL by using our proposed ontology and semantic technologies.
@inproceedings{DBLP:conf/semco/SetzuA16,
author = {Mattia Setzu and
Maurizio Atzori},
title = {SPARQL Queries over Source Code},
booktitle = ,
pages = {104--106},
publisher = {IEEE Computer Society},
year = {2016}
}
10.1109/ICSC.2016.65
Education
Post-doc
Research Fellowship
Research Grant
Ph.D. in Computer Science
Master's Degree in Computer Science
Bachelor's Degree in Computer Science
Software
Decision Tree induction library: now with Multivariate (Oblique) Trees!
A small wrapper library for AutoML on tabular datasets.
A library all'avanguardia for tabular dataset loading and preprocessing.
Implementation of the "FairBelief - Assessing Harmful Beliefs in Large Language Models" paper.
Implementation of the "Learning from Polyhedra" IJCAI paper.
Implementation of the TriplEx paper.
Implementation of the GLocalX paper.