Building an Interactive Graph Visualization

import os import re import matplotlib.pyplot as plt df = pd.read_csv("C:/datasources/WTKvectorsearch.csv", sep='|') dfkey = pd.read_csv("C:/datasources/WTKrelatedmap.csv", sep='|') # Rename columns df.rename(columns={"url": "WTKlink"}, inplace=True) df.rename(columns={"Links": "ArticleSource"}, inplace=True) # Format tags df['tags'] = df['tags'].astype(str).apply(lambda t: t.strip('[')) df['tags'] = df['tags'].astype(str).apply(lambda u: u.strip(']')) df['tags'] = df['tags'].str.replace(',', '|') df['tags'] = df['tags'].str.replace('\'', '') df['tags'] = df['tags'].str.replace(' ', '') # Ensure Priority column is numeric df['Priority'] = pd.to_numeric(df['Priority'], errors='coerce') # Sort by Priority and reset index df_sorted = df.sort_values(by='Priority', ascending=False).reset_index(drop=True) # Create a list of top 20 ArticleId values top = df_sorted['ArticleId'].head(20).tolist() df = df_sorted.head(20) df.insert(loc=1, column='Type', value="Primary", allow_duplicates=True) # Select rows from dfkey where ArticleId is in the top list dfkey = dfkey[dfkey['ArticleId'].isin(top)] # Convert Related column from string representation of list to actual list dfkey['Related'] = dfkey['Related'].apply(eval) # Expand the Related column dfkey_expanded = dfkey.explode('Related').reset_index(drop=True) dfkey_expanded = dfkey_expanded.drop_duplicates() # Get all unique values from ArticleId and Related columns unique_values = pd.concat([dfkey_expanded['ArticleId'], dfkey_expanded['Related']]).unique() # Convert to list unique_values_list = unique_values.tolist() dftemp = df_sorted[df_sorted['ArticleId'].isin(unique_values_list)] dftemp.insert(loc=1, column='Type', value="Related", allow_duplicates=True) # Merge df and dftemp, prioritizing df values merged_df = pd.concat([df, dftemp], ignore_index=True) # Sort by 'Type' to prioritize 'Primary' over 'Related' merged_df = merged_df.sort_values(by='Type', ascending=True) # Drop duplicates, keeping the first occurrence (Primary) merged_df = merged_df.drop_duplicates(subset='ArticleId', keep='first').reset_index(drop=True) # Rename columns dfkey_expanded.rename(columns={"ArticleId": "From"}, inplace=True) dfkey_expanded.rename(columns={"Related": "To"}, inplace=True) merged_df.rename(columns={"Priority": "Weight"}, inplace=True) # Select relevant columns final = merged_df[['ArticleId', 'Type', 'Title', 'tags', 'PublicationDate', 'Publication', 'Summary', 'ArticleSource', 'wtkURL', 'Weight']] # Subtract 699 from each value in the 'Weight' column final['Weight'] = final['Weight'] - 699 # Use qcut to scale data into simplified integer values (0-11 range) final['Weight'] = pd.qcut(final['Weight'], 12, labels=False, duplicates='drop') final = final[final['Weight'].notna()] final['Weight'] = final['Weight'].astype(int) # Plot to confirm binning success #final['Weight'].value_counts(sort=True).plot.bar() # Strip html from smmaries for display final['Summary'] = final['Summary'].str.replace(r'<[^<>]*>', '', regex=True) # Rename columns for kumu dfkey_expanded.rename(columns={"ArticleId": "From"}, inplace=True) dfkey_expanded.rename(columns={"Related": "To"}, inplace=True) final.rename(columns={"ArticleId": "Label"}, inplace=True) writer = pd.ExcelWriter("C:\\datasources\\topstoriesmap.xlsx") final.to_excel(writer,'Sheet1', index=False) dfkey_expanded.to_excel(writer,'Sheet2', index=False) writer.save()

Reflections

When I woke up today, I was determined to add this feature to my app. Initially I was considering computing the graph directly in the page using D3. Maybe I'll end up doing that eventually. I'm not thrilled about embedded iframes, but nearly everything else about Kumu is great and relatively easy, so I went with that.

It was a bit of a heavy lift to get all of this done in one long day. But the result of that labor is absolutely cool. I've spent a long time imagining a webpage like the one I've created. Now that it's made, I'm feeling like I did something personally significant, even if not that many people end up using it.

Read my novels:

Small Gods of Time Travel is available as a web book on IPFS and as a 41 piece Tezos NFT collection on Objkt.

The Paradise Anomaly is available in print via Blurb and for Kindle on Amazon.

Psychic Avalanche is available in print via Blurb and for Kindle on Amazon.

One Man Embassy is available in print via Blurb and for Kindle on Amazon.

Flying Saucer Shenanigans is available in print via Blurb and for Kindle on Amazon.

Rainbow Lullaby is available in print via Blurb and for Kindle on Amazon.

The Ostermann Method is available in print via Blurb and for Kindle on Amazon.

Blue Dragon Mississippi is available in print via Blurb and for Kindle on Amazon.

See my NFTs:

Small Gods of Time Travel is a 41 piece Tezos NFT collection on Objkt that goes with my book by the same name.

History and the Machine is a 20 piece Tezos NFT collection on Objkt based on my series of oil paintings of interesting people from history.

Artifacts of Mind Control is a 15 piece Tezos NFT collection on Objkt based on declassified CIA documents from the MKULTRA program.