Data Visualisation: a discipline to make sense of scientific production

article_24Febbraio_2014

Data visualisation has become fashionable over the last few years: many newspapers such as the New York Times use it prominently in their articles, political parties and companies apply it as a communication tool and scientific articles regularly include it as part of their argumentation and analysis. Nevertheless, within many contexts data visualisation is still perceived as a “design tool” used to present and communicate data in a fancy way, closer related to word processing than to research per se. In fact, data visualisation has become a solid discipline that seeks to understand how our brain process images and understands information, as well as the best ways to represent and interact with that information to better comprehend it and transform it into insight.

Data visualisation is a discipline that involves the development of static or interactive graphical representations of quantitative or qualitative data to amplify cognition. In other words, it is the generation of visualisations that make extensive use our pre-attentive capabilities, which are those related to the unconscious processing that our brain performs every time it sees something, to better understand a set of data or concepts.

Regarding its trendiness, we live immersed in a world of data, so we need new tools to cope with it. Every day we produce tons of data, conscious or even unconsciously, in the form of spatially referenced (geotagged) pictures or statuses in social networks, bank transactions, phone calls, website visits, etc. This also applies to the context of academia, where new publications appear constantly in journals and conferences dealing with different topics and embracing a variety of authors related to different institutions. It is really difficult to get a big picture of what is happening behind this data with mere numbers and statistics. For instance, new tools should address classic problems such as finding what topics are hot in certain research fields, who the main authors are and what the most relevant papers are. This is just one example that illustrates the vast amount of data that exists at the level of a single researcher. But we can scale those problems up to research groups, institutions and even to countries developing their research programmes. To them, it is important to spot new research topics, hidden connections to other fields, communities of authors that tend to work together or that missing profile that would increase the quality of a research team. Having all this in mind. it is easy to understand that the methodologies that involve data visualisation could help to get that big picture to assist the decision-making process in academia.

In this post, we will show some examples of visualisations that seek to understand scientific production better.

Understanding scientific production through visualisations

There are many techniques that can be used to understand scientific production that go beyond the simple use of statistical visualisations, such as bar charts, line charts, or scatter plots. In fact, the many relations that can be extracted from scientific publications make them suitable to be mapped as networks. Generally speaking, networks are node-link diagrams where nodes represent items such as papers, authors, institutions, keywords or topics that are connected through links that arise from their implicit or explicit relations. Explicit connections are those links that exist physically, such as citations, whereas implicit connections are those that arise through the investigation of similar contents (i.e., sharing the same keywords) and other inner synergies among academic organisations and publications (i.e., two authors could be linked if they publish in the same conferences).

The following visualisation is an example of a network that accumulates the citations that exist from one discipline to another.

Network generated with data from Thomson Scientific's 2004 Journal Citation Reports

Network generated with data from Thomson Scientific’s 2004 Journal Citation Reports link: http://www.eigenfactor.org/map/maps.htm

Orange circles represent fields, with larger circles indicating a larger field size as measured, while blue arrows represent citation flows between fields. The darker the colour, the more relevant the field is or the higher the number of citations between fields. The visualisation reveals a very important relation among the fields of Molecular and Cell Biology and Medicine and stresses the connections between Physics and Chemistry.

As stated before, the same approach can be applied to authors as done in 20 Years of Four HCI Conferences: A Visual Exploration (Henry, Goodell, Elmqvist and Fekete 2007), where the authors visualised networks that represent both a citations network that shows how authors mention each other and a co-authorship network that reveals which authors tend to work together in the best Human Computer Interaction conferences.

co-authorship

citations

Citations (above) and Co-authorship (below) networks in the most important Human Computer Interaction conferences (Henry, Goodell, Elmqvist and Fekete 2007)

In these networks, nodes represent researchers and its size is related to their number of publications in the field. It is interesting to see that, while looking at the same dataset, both networks are completely different, leading to very different interpretations and conclusions. The takeaway from this representation is that, at a glance, one can identify relevant communities with very different meanings according to how the network has been built.

Nevertheless, other approaches to visualise networks do exist. PivotGraph (Dörk, Henry, Ramos and Dumais 2012) which provides a more complex example, depicts richer networks based upon three types of nodes: authors, papers and topics.

PivotGraph (Dörk, Henry, Ramos and Dumais 2012) representing a tripartite graph (a graph with three different type of nodes) with authors, topics and papers

PivotGraph (Dörk, Henry, Ramos and Dumais 2012) representing a tripartite graph (a graph with three different type of nodes) with authors, topics and papers

The most important feature of this representation is that it has been designed “for casual traversal of collections in an aesthetically pleasing manner that encourages exploration and serendipitous discoveries”. This is a very important property, as promoting user engagement with visualisations improves their understanding of the underlying data.

Another example worth looking at is Paperscape, a system that rather than accumulating information or providing ways to navigate through a subset of the data, represents the whole universe of the 896,570 scientific papers currently available at arXiv, an archive for electronic preprints of papers in the fields of Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics. In this case, the inner network of papers sets papers from the same field closer together, allowing for the discovery of communities based on the main topics in those domains.

paperscape

The system incorporates zooming and panning interaction to navigate the space and to get detailed information of the papers that each point in the map represents, but most importantly it lets users look at the citations of a publication. As can be seen in the following image, this approach makes it easier to understand how far apart the citations of a paper are, helping users to understand if it has crossed the border to other subtopics.

A very relevant paper in the field has citations from different parts of the “universe”

A very relevant paper in the field has citations from different parts of the “universe”

Data visualisation approaches could be as infinite as the creativity of its designers/developers. Therefore, other network visualisations can emerge using completely different representations to what we have seen so far that stand in the line between art and science. This is the case of Citeology (Matejka, Grossman and Fitzmaurice 2012), a tool that depicts the relationship between research publications of over 30 years in the CHI/UIST Human Interaction conferences, through their use of citations. Citeology organises papers by year and sorts them with the most often-cited papers in the middle. A single click on any of them reveals both references and citations, allowing for the discovery of relevant papers.

 

An example of the references and citations of a paper in Citeology

An example of the references and citations of a paper in Citeology

As can be seen in the image, this visualisation provides context about the impact of the paper, showing how much it covers the state of the art and how much it has influenced new research. As an important takeaway, this visualisation is a way to mix the concept of information linked with time.

Conclusion

Data visualisation is becoming a key discipline to support the generation of insight from large and complex datasets. In academia, the wide use of visualisation techniques such as network visualisations can help us to discover and understand new research trends, key persons and institutions, but even more importantly, they can help us to understand how knowledge is being generated and transferred.

SIRIS Academic strongly believes in the need to use data visualisation as a tool to help us to think better and to take transdisciplinary decisions using the expertise and backgrounds of the different members of our team.

REFERENCES

Henry, N., Goodell, H., Elmqvist, N. and Fekete, J. D., 2007. 20 Years of Four HCI Conferences: A Visual Exploration. International Journal of Human-Computer Interaction, 23(3), pp. 239-285.

Dörk, M., Henry, N., Ramos, G. and Dumais, S., 2012. PivotPaths: Strolling through Faceted Information Spaces. TVCG: IEEE Transactions on Visualization and Computer Graphics (Proceedings InfoVis 2012), 18(12), pp. 2709-2718.

Matejka, J., Grossman, T., Fitzmaurice, G., 2012. Citeology: visualising paper genealogy. CHI’12 Extended Abstracts on Human Factors. 

*Note: This article gives the views of the author, and not the position of SIRIS Lab, nor of SIRIS Academic. Please review our Comments Policy if you have any concerns on posting a comment below.