Citation Bias in the Citation Recommendation System.
How does citation bias impact fairness in research publications?
The current pace of research and technological development has led to an exponential increase in the number of research publications. In this digital information age, researchers are facing the challenging problem of finding the right citations for the document to be published. Citations are written to back up research claims and give correct credit to past and state-of-the-art research. Citation selection is becoming a daunting task with an increasing number of research publications every year. Apart from academic research, there is demand for finding relevant citations in non-research documents such as encyclopedic articles, grant writing, patents and news articles, blog posts etc.
An automatic citation recommender system is developed to assist both the novice and experienced researcher in finding relevant documents for citation. Recommendation systems are popular machine learning algorithms that help to solve user’s paralyzing problem of too many choices. Citation recommendation systems help in recommending citations for a particular text. It is important to select the right gold labels in developing this recommender system. If we are not careful enough we might end up with the amplification of citation bias in these systems. In this post, we will see what type of algorithmic bias needs to be considered while developing these systems. How do citations influence fairness in academic research publication?
With the advent of artificial intelligence systems, we are witnessing many new phenomena of algorithmic bias. AI systems inherit bias and the principal effect of bias is unfairness. Bias merely means that an individual or system consciously or unconsciously does not give equal opportunity to one particular idea or thing on another. Bias have roots in our cognitive system. However, algorithmic bias is the result of historical, cultural, and social systems inherited from the data as well as a lack of understanding of models due to technical limitations, unintended applications, lack of context, and poor design.
The most prevalent algorithmic bias is popularity bias, which is observed in social media content distribution. A simple example of popularity bias is YouTubers who have the most views get more and more recommended views. Awareness of biases helps to either mitigate or reduce its impact.
Popularity, Citation and Reporting Bias in Research Publications
Based on my research experience, a simple example of popularity bias (leads to citation bias in the research community) is that an already cited paper gets more citations. Generally, who is cited and who is not is often based on the popularity of the author which is determined by their location, affiliation as well as their field of research, etc. This has been amplified due to algorithmic bias in search and recommendation engines.
Publication/reporting bias: This is an age-old problem. This simply means researchers have a tendency to publish only positive and significant results. There are many reasons why publication or reporting bias happens, however, the impact is that it leads to citation bias.
Citation bias: It is about citing positive or significant results supporting one’s own research (which is due to conformation bias) more than those that are negative. The problem arises because citations are found through a reference list of the most frequently cited papers.
The unfair consequences are:
- In many medical and scientific studies, publication bias has been found to be far more widespread than in any other area [1]. It has led to over-estimated drug treatment benefits, underestimation of the treatment and wasted research since negative results are not published even though they can be useful [2] and journals are biased to publish only positive outcomes[3]. There is a greater tendency for researchers to publish positive results in order to obtain or maintain grants[4].
- Citation bias leads to citation distortion, which means that based on a few popular papers in the field, citations amplify claims as a fact or acceptance as authority. Sometimes this happens when researchers are using the reference list without checking the validity of the claim by reading the original paper. In the citation bias study, the outcome of the citation distortion observed was a claim without actual supported data and a journal paper cited that did not exist.
- Another implication of citation bias is that papers written by women tend to get cited less frequently than by men[5]. There is an underrepresentation of women in most fields, however, the implicit biases are observed against women when men cite articles.
- In general, people tend to cite papers of the author in their professional networks named as inter-citation or cross-citation. Thus, people who are in the same social or networking circle as the top researchers may get a citation bump [6].
- Editorial bias refers to decisions influencing the rejection or acceptance of papers by editors based on the author’s background and geographical origin. This leads to another unfair consequence, which is the difficulty in getting a paper accepted for researchers publishing from lesser known geographical origins [7].
How can recommender systems influence and amplify citation bias?
What can happen to a machine learning model that is trained with biased data? The most popular papers are ranked based on citation metrics which in turn appear on the first page of the search engine. They get more citations that can carry publication bias. The recommendation system formed with a fixed set of publication data of popular articles, instead of eliminating citation bias, might introduce additional bias towards specific articles.
In summary
As there is an awareness of popularity bias in research and recommendation systems. We need to make more of an effort now to quantify and mitigate the popularity bias when building a citation recommender system. There is a compromise between quality, variety and popularity. In order to maintain equitable choice and equal opportunity, more work is required to quantify the citation bias. The impact of new technologies is unforeseeable, and we cannot predict how that will change the world. Now is the time to ask more questions to create a fair citation recommendation technology.
Thank you Suhas Pai for your suggestions and help in editing.