Posted by Nodus Labs | April 6, 2022
In this case study, we will demonstrate how you can use text network analysis to analyze EU grant opportunities. We will demonstrate how you can use the graph to identify the most relevant topics in the existing grant proposals. We will also show how you can use network visualization to identify the structural gaps: the topics that could be connected, but do not yet exist in the same grant proposals — meaning you could use those gaps to propose an innovative and interesting application to the funding bodies.
We will be using the data scraped from the EU electronic grant system (via Kaggle). It contains the titles, the descriptions, and the dates of the grant opportunities, which we can use to build the text graph. To analyze this data, we will be using InfraNodus text network visualization and analysis system.
Selecting the Data for Analysis
We are interested in analyzing the existing opportunities and it would be interesting to also have several dimensions to this data, such as the category of the grants, the time when the opportunity becomes available, etc. Therefore, we need to check that the CSV file we get contains all this data.
We download the file from Kaggle, and as it’s 24 Mb, we need to split it into approximately 10 parts, so that every file is 3Mb, which is within InfraNodus’ limit. The reason the limit is in place is because when you get more data, the graph becomes unreadable, so it is useful to import the data in chunks. We can use CSVSplit resource for splitting this file and will obtain about 10 files of about 3 Mb each, which can be used for analysis. The first file will contain the most recent grant opportunities, which is what we’re interested in.
When we upload the file on InfraNodus, we see a preview of its structure. We need to click the columns we want to import. In our case, for the first iteration, we just want the titles:
We also want to be able to filter our data based on the grant type. So let’s use the field
Type to categorize our data. The distinct values found in this field will be used to tag each row from the column
title that we selected earlier. We can then filter to show the statements only of one type on the graph.
Finally, we can choose the column
start to add a timestamp to each statement, so we can observe how the grant opportunities evolved over time.
Then we click import and the data we selected will be visualized as a graph
Reading the Text Network Visualization
Now that we imported the file, we will have the data from the
title column visualized as a graph. The words (lemmas) are the nodes, and their co-occurrences are the connections between them. The more influential nodes in the text network that have a higher betweenness centrality measure are shown bigger on the graph. The nodes / words that belong into clusters and tend to co-occur together will be marked as belonging to the same community (having the same color). As a result, we have this image:
We can see from the graph (also shown in the Analytics panel) the following information:
- The most influential nodes (shown bigger on the graph) are: technology, partnership, energy, system, innovation, climate, battery. These are the most relevant topics for all the grants. We should make sure our application has at least some of those.
- The main topical groups are:
a) climate health risk
b) energy system innovative
c) partnership battery cooperation
d) solution base smart
We can decipher what these topics mean pretty easily, but we can also click “show categories” button under them and use IBM Watson’s AI to categorize those using a general taxonomy:
We can see, that the main topics are:
a) health and fitness / disease: health risks posed by the climate change
b) energy: innovative energy systems
c) electric vehicles: batteries and other energy sources (lots of stress on cooperation here)
d) business software: smart solutions
Therefore, if we were to write an application, we could focus on one of those topics (e.g. innovative energy systems and car batteries, making sure we propose it as a partnership among many stakeholders). We could also make a connection between them and propose something new — that could potentially interest the funding bodies as an innovative idea.
Finding the Structural Gaps and Innovative Opportunities
The graph above shows the main topics that are present in the European Union’s grant opportunities, but what if we’re interested in the gaps between them?
Addressing such gaps can lead us to the new interesting and innovative ideas that bridge the topics that have not been linked before.
In order to discover those gaps, we can use InfraNodus‘ Insight feature. It identifies the distinct topics inside the graph that could be better connected and highlights them on the graph as well as the gap between them. We can then use OpenAI’s GPT-3 system (Insight generation tool in InfraNodus) to generate research questions in relation to this gap:
In this case, the system recommends us to link the two topics (partnership battery) and (solution base) to think how we can build high-resilience green buildings that could withstand changing environments and cycles. Not a bad idea for an EU grant application!
If you would like to try it out on this data, get the raw CSV file on Kaggle and try it on www.infranodus.com