Knowledge Base Text Analysis with NLP

Many organizations run into a problem where their knowledge base is too bloated or not very well organized. The ontology that was created a few years ago is not relevant anymore, it becomes hard to find content, let alone categorize it. This has detrimental effects not only on the quality of customer support but also on search engine optimization as it becomes increasingly hard to get the support portal to the top of Google search results.

In this case study, we will demonstrate how text analysis, NLP, and the latest AI algorithms can be used to improve any knowledge base, get an overview of its content, and understand how to develop it further. We will also demonstrate how this approach can be used to improve search engine rankings for target keywords.

We will be using our own Nodus Labs knowledge base, hosted by Zendesk, one of the most popular knowledge base CMS providers. Our goals are to

1) Retrieve the main topics to get an overview

2) Categorize it in a better way so that it’s easier to navigate

3) See what content is missing to create more relevant content for our users

4) SEO to increase acquisition via Google search

We will be updating this article with the Google Analytics data that will demonstrate whether this approach was successful in attracting more readers and if the time / engagement increased.

Update (19 February 2023): As we predicted, the structurization of our Zendesk support portal based on the text analysis below improved its search engine rankings. Here’s a comparison from Google Analytics before to after:

A change in referrals from search engines to Nodus Labs support portal after it was optimized using InfraNodus: 50% more sessions, faster resolution time.

We also saw a 20% increase in category page views, which shows that the readers’ navigation of the support portal became much more structured.

Exporting Knowledge Base Content from Zendesk

First, we need to export Zendesk content in order to analyze it.

In order to do that, we wrote an open-source public Python script, which is available in our repo: https://github.com/noduslabs/zendesk_export

If you would like to do the same, just follow the instructions on the README.md page.

You can also use this script to backup your knowledge base as you can include the images and the category structure.

For our purposes, we get the text files only in MD format, they will be saved in the backups/yyyy-mm-dd folder.

Knowledge Base Content Analysis

The next step is to get those MD files into text analysis tool InfraNodus that can retrieve the main topics and keywords from our knowledge base, as well as the connections between the different pages.

If you’d like to learn more about the specifics of importing the data, please, read the help article on Nodus Labs Support Portal on importing Zendesk data to InfraNodus for text analysis.

To make it easier, you can import just the content, without linking the page structure. In the example below, we imported both the content, the pages, and links between them, but you can select to see the concepts only in the graph view dialogue (1)

This is the result we get. The graph shows us the main concepts contained in our knowledge base. The Analytics panel shows the main topical groups and keywords:

As we can see, the main keywords (Analytics > Most Influential Elements) are:

graph, node, topic, infranodus, idea, add, text — 
which basically talks about how you can add ideas and text to InfraNodus’ graph to get the topics.

Directly we can get feedback from here that the word “graph” is used too often, so it’s not so good for our search engine optimization tactics as nobody is using “graph” to find this type of content. However, it may be used by our readers, so we will keep the word, but also try to use a different synonym.

For instance, using InfraNodus’ own Keyword Research tool we can see the context where the word “graph” is used is associated with charts and mathematical functions, and that is not what InfraNodus is about:

Keyword research shows that the word “graph” is used in the context of online calculators, chart makers, plot generators, but not for text analysis. So we should rethink its use in the knowledge base and in the interface at large, perhaps.

Then we can also see the main categories for the topics of the knowledge base:

1. Graph Node

Try InfraNodus Text Network Visualization Tool developed by Nodus Labs. You can use it to make sense of disjointed bits and pieces of information, get visual summaries for text documents, and generate insight for your research process: www.infranodus.com

2. InfraNodus Import

3. AI Ideation

4. Cluster Gaps

5. Relation Panel

6. Text Network Discourse

Loosely, we can say they are talking about
a) graphs and nodes (technical terms from network science),
b) various import functions (important for infranodus),
c) AI ideation (a trending topic and an important functionality),
d) structural gaps between clusters (special sauce of InfraNodus)
e) data panels (explaining how to use the tool)
f) text network discourse (text analysis)

In combination with the previous insight we can say that the content is perhaps a bit too technical, focusing on graph and network analysis rather than text analysis. We might want to shift that. Otherwise, all the important topics like AI ideation and structural gap detection are there.

Actionable insights:

  1. talk less about the “graphs”, use less technical, more SEO-friendly terms
  2. talk more about “text analysis” and various text analysis techniques and approaches
  3. keep a good coverage of AI ideation and structural gap insights (already exists)

Modifying the Structure of the Knowledge Base

The current structure of our knowledge base is the following:

  1. How to Use InfraNodus
    – Core Workflows
    – Discovering Information
    – Finding a Niche
    – Developing an Idea
    – Cognitive Reconfiguration
    – Presenting Ideas and Exporting Graphs
    – Importing External Data
    – Social Network Analysis
    – FAQ
  2. Tools and Methodologies
    – AI-Augmented Writing and Thinking
    – Ideation and Brainstorming
    – Personal Knowledge Management
    – SEO Keyword Research
  3. Network and Graph Concepts
    – Essential Network Concepts
    – Measures of Influence
    – Network Structure Measures
    – Structural Gaps
    – Graph Theory Applications
  4. Subscriptions and Payments
  5. Case Studies

If we visualize it as a graph with InfraNodus > Add a New Text app we see the following structure:

We can directly see that while the structure talks about SEO research, there is not so many content on SEO in the actual content (actionable insight: increase their number).

While, at the same time, the structure doesn’t reflect Text Analysis and Structural Gap categories that are present in the content.

Actionable insights:

  1. talk more about SEO research in the knowledge base
  2. add categories on text analysis and structural gap detection

Therefore, the new category structure proposed at this point of analysis is:

  1. How to Use InfraNodus

    – Introduction: Starting to Use InfraNodus
    Add a New Text and Explore a Graph
    Getting an Overview of a Topic with the Google App
    Using InfraNodus x GPT-3 AI: a Basic Workflow Tutorial

    – Adding a Text and AI Ideation
    Using the Text Editor to Add and Edit Content (to check!)
    Live AI Ideation Workflow: Develop Ideas using GPT-3 and Text Visualizatio
    Explore a Topic using GPT-3 AI and Text Data Network Visualization
    Delete a Statement from a Graph
    How to Write a Text using OpenAI’s GPT-3 as a Conversational Partner

    – Text Analysis
    – Analyze an Existing Discourse: Network Science and Text Mining – use this Case Study: Text Mining and Topic Modeling but extend
    How to Analyze a Book with Visual Text Mining and Networks
    How to Make a Visual Summary of an Article
    Generate a Summary for a Book or an Article with GPT-3 AI and Text Network Analysis

    – Importing Files and External Data
    How to Import a CSV / Excel Spreadsheet Data
    Adding a Text File or a PDF Document
    How to Visualize Google Search Results
    Visualize the Tweets from a Twitter List
    Sentiment Analysis using Amazon Product Review Data
    Live Graph Updates: RSS, Twitter, Google Search
    Import Scientific Papers, Visualize the Scientific Discourse, Perform Literature Review
    How to Scrape Data from Any Web Page
    How to Scrape the Content of a URL / Website Page Behind a Paywall

    – Mind Mapping and Visual Ideation
    Generate a Mind Map from any Text
    How to Generate a Word Cloud with a Context
    Convert Your Mind Maps into Text to Gain a Different Perspective
    How to Develop Ideas with Networks
    Workflow: Network Thinking and Mindmapping for Ideation and Brainstorming
    Brainstorming and Writing using the Network Thinking Approach

    – Using the Graph Interface
    How to Read and Interpret Text Network Graphs
    How to Merge Nodes into Topics and Unlock Merged Nodes
    How to Search and Find the Content in Your Graphs
    How to Add the Nodes and Edges into the Graph Manually
    Delete the Nodes / Words and Add them to the Stopwords List
    Search and Find Relevant Parts of Text using a Network Graph
    How to see all the nodes’ labels and words on the graph?
    How to Rename a Node / Word in the Graph
    Dynamic Graph: Filtering a Certain Time Span

    – Text Categorization
    – Automatic Text Categorization with InfraNodus
    –  How to Add Tags to your Graph Data and Statements
     Filter the Graph and Statements by Tags / Categories
    Text Classification and Taxonomy using Topic Categories

    – Language Settings
    How to Change Your Language Setting
    In Your Own Language: Lemmatization and Stopwords Removal
    How to Disable Stopwords Removal
    Combining Words and / or Nodes: Named Entities in a Graph
    How to Automatically Translate CSV File Data

    – Organizing Your Workspace
    Open or Find an Existing Graph
    How to Rename a Graph or a Context
    How to Add Your Graph into Favorites?
    Save and Retrieve Meta Information about Your Graphs
    Delete a Whole Text Graph
    Top Graphs Synthesis

    – Comparative Text Analysis
    How to Compare Text Statements with Different Tag Categories
    How to Compare Text Graphs to Find the Similarities and Differences

    – Saving Your Work
    How to Use Project Notes to Interpret Existing Content
    How the Save the Current Version of the Graph?
    How to Save Your Text Graph Analytics

    – Export the Mind Network
    Export your Graphs, Text Data and Analytics Results
    How to Export the Topical Clustes for Further Statistical Analysis or Machine Learning Models
    How to Share your Graph and Data: from Private to Public

    – Showcase Your Work
    🎥 How to Watch the Dynamic Evolution of a Graph?
    How to Change the Appearance and the Settings for Your Graphs
    Export High Resolution (SVG) Graph Images and Post-Production
    – How to Embed Your Graphs to Other Sites

    – Troubleshooting
    My Graph Doesn’t Show Up Correctly (no nodes or strange symbols)
    Lemmatization Techniques and Word Endings Cut Offs for Spanish, Portugese, Indonesian
    Problem: My CSV / MD / PDF / Text Files are Not Recognized
    Cannot Log In — How to Fix Login Issues

  2. Advanced Network Thinking
    – Cognitive Variability Framework
    Cognitive Variability: InfraNodus Thinking Dynamics Sensor
    Mind Viral Immunity
    Conversational Chatbot based on Text Network Analysis and GPT-3 AI

    – Discourse Structure Analysis
    Measure the Urgency of a Discourse Using Its Network Structure
    Case Study: Measure Diversity of a Discourse

    – Reveal Non-Obvious
    Revealing the Non-Obvious: Graph Exploration Workflow
    A Recommender System for Thinking and Insight Generation

    – Discovering Niches
    Finding Opportunity within the Attention-Knowledge Gap

    – Finding Relations Between Ideas
    What is in a Relation: How to Measure the Importance of Phrases and Bigrams in a Discourse

    – Discovering Gaps in Thinking
    Identifying Structural Gaps in a Discourse
    How to Find the Gaps in a Discourse to Generate New Ideas (with a little help from GPT-3:)

    – Network Art
    Making Music with Network Graphs: Auditory Feedback on Your Research Process via MIDI
  3. Network Science
    – Essential Concepts
    – Measures of Influence
    – Network Structure
    – Structural Gaps
    – Graph Theory Applications
  4. Subscriptions & Payments
    – Registration
    – Managing Subscriptions
    – Cancellations and Refunds
  5. Personal Knowledge Management

    – Personal Knowledge Management
    PKM Workflow: AI-generated Insights for Your Obsidian / LogSeq Knowledge Graph
    Visualize Your Evernote Notes to Get an Overview and Discover the New Idea
    How to Import and Visualize Your Roam Research, Obsidian and Zettelkasten Markdown Format Notes
    How does the [[backlink]] syntax work in InfraNodus and what’s its difference from LogSeq / Obsidian?
    How to Add [Wiki-Links] / [[Square Brackets]] / Tags / Categories intoto the Graph
    How Are Backlinks from Roam Research / Obsidian / Logseq Converted into a Network Grap
    Obsidian vs Roam Research vs LogSeq vs RemNote
    How to Export a Public Part of a Private Knowledge Graph in any PKM System

– Creative Thinking
🎥 Case Study: Creative Thinking using the Insight Recommender System
Case Study: Generate Insight using Text Network Analysis

6. Marketing, SEO, and Consulting

– SEO
How to Use SEO Keyword Research and GPT-3 AI to Generate an Outline for a Blog Article or Product Description
SERP: Study the Context Around Any Search QuerGoogle Keyword Suggestions: How to Know What People Search For
SEO: Using Text Network Analysis to Find a Content Niche
SEO: How to Analyze a Website’s Content and Extract the Top Keyword Phrases
SEO: How to Compare Two Web Pages and Find What’s Missing
Generate Top SEO Topics for Your Website from Google Search Console Keywords

– Marketing
🎥 How to Discover a New Market Niche
Case Study: Sentiment Analysis of Customer Feedback
Zendesk Knowledge Base SEO Optimization and Text Analysis

– Business Development
Crunchbase Data Visualization and Discourse Analysis

– Social Network Analysis
Import and Visualize the Social Network on Twitter for a Search Term or a Topic
CrunchBase Analysis: Investors’ Social Networks

– Media Study and News Analysis
Case Study: Analyze the News Discourse with Graphs


As can be seen, we have much more present topical clusters on text mining, discourse (gap) analysis, AI ideation, and SEO — the topics which we wanted to highlight in the knowledge base. We will wait for 2 weeks for Google to re-index the support portal and report the SEO results here.

Also, the structure of the knowledge base became more granular, so it is now easier to find the content relevant to each specific use case.

It will also make it easier to add new content as we can now easily see what kind of content is present sufficiently (e.g. SEO) and what content is missing (e.g. AI Ideation and getting insight from PKM graphs).

On the internet people come and go, but we would like to stay in touch. If you like what you're reading, please, consider connecting to Nodus Labs on Facebook, Twitter and Patreon, so we can inform you about the latest updates and engage in a dialogue.