The above image was made with stable diffusion using the prompt 'A colorful network graph with a faded urban landscape in the background.'
My new conspiracy news search app got a major upgrade today. The page now features an embedded interactive graph visualization. Each of WantToKnow.info's 20 highest rated stories are shown along with the 10 articles from the archive that they're most closely related to. I'm satisfied with how it turned out.
To make this graph, I started with two csv files containing the data I needed. I combined these, did a bunch of pandas, and created a 2-sheet Excel file describing the graph. This Excel file was then uploaded to Kumu, where I was able to use a variation of css to style the graph's display. Here's the python script I wrote to do that:
import os
import re
import matplotlib.pyplot as plt
df = pd.read_csv("C:/datasources/WTKvectorsearch.csv", sep='|')
dfkey = pd.read_csv("C:/datasources/WTKrelatedmap.csv", sep='|')
# Rename columns
df.rename(columns={"url": "WTKlink"}, inplace=True)
df.rename(columns={"Links": "ArticleSource"}, inplace=True)
# Format tags
df['tags'] = df['tags'].astype(str).apply(lambda t: t.strip('['))
df['tags'] = df['tags'].astype(str).apply(lambda u: u.strip(']'))
df['tags'] = df['tags'].str.replace(',', '|')
df['tags'] = df['tags'].str.replace('\'', '')
df['tags'] = df['tags'].str.replace(' ', '')
# Ensure Priority column is numeric
df['Priority'] = pd.to_numeric(df['Priority'], errors='coerce')
# Sort by Priority and reset index
df_sorted = df.sort_values(by='Priority', ascending=False).reset_index(drop=True)
# Create a list of top 20 ArticleId values
top = df_sorted['ArticleId'].head(20).tolist()
df = df_sorted.head(20)
df.insert(loc=1, column='Type', value="Primary", allow_duplicates=True)
# Select rows from dfkey where ArticleId is in the top list
dfkey = dfkey[dfkey['ArticleId'].isin(top)]
# Convert Related column from string representation of list to actual list
dfkey['Related'] = dfkey['Related'].apply(eval)
# Expand the Related column
dfkey_expanded = dfkey.explode('Related').reset_index(drop=True)
dfkey_expanded = dfkey_expanded.drop_duplicates()
# Get all unique values from ArticleId and Related columns
unique_values = pd.concat([dfkey_expanded['ArticleId'], dfkey_expanded['Related']]).unique()
# Convert to list
unique_values_list = unique_values.tolist()
dftemp = df_sorted[df_sorted['ArticleId'].isin(unique_values_list)]
dftemp.insert(loc=1, column='Type', value="Related", allow_duplicates=True)
# Merge df and dftemp, prioritizing df values
merged_df = pd.concat([df, dftemp], ignore_index=True)
# Sort by 'Type' to prioritize 'Primary' over 'Related'
merged_df = merged_df.sort_values(by='Type', ascending=True)
# Drop duplicates, keeping the first occurrence (Primary)
merged_df = merged_df.drop_duplicates(subset='ArticleId', keep='first').reset_index(drop=True)
# Rename columns
dfkey_expanded.rename(columns={"ArticleId": "From"}, inplace=True)
dfkey_expanded.rename(columns={"Related": "To"}, inplace=True)
merged_df.rename(columns={"Priority": "Weight"}, inplace=True)
# Select relevant columns
final = merged_df[['ArticleId', 'Type', 'Title', 'tags', 'PublicationDate', 'Publication', 'Summary', 'ArticleSource', 'wtkURL', 'Weight']]
# Subtract 699 from each value in the 'Weight' column
final['Weight'] = final['Weight'] - 699
# Use qcut to scale data into simplified integer values (0-11 range)
final['Weight'] = pd.qcut(final['Weight'], 12, labels=False, duplicates='drop')
final = final[final['Weight'].notna()]
final['Weight'] = final['Weight'].astype(int)
# Plot to confirm binning success
#final['Weight'].value_counts(sort=True).plot.bar()
# Strip html from smmaries for display
final['Summary'] = final['Summary'].str.replace(r'<[^<>]*>', '', regex=True)
# Rename columns for kumu
dfkey_expanded.rename(columns={"ArticleId": "From"}, inplace=True)
dfkey_expanded.rename(columns={"Related": "To"}, inplace=True)
final.rename(columns={"ArticleId": "Label"}, inplace=True)
writer = pd.ExcelWriter("C:\\datasources\\topstoriesmap.xlsx")
final.to_excel(writer,'Sheet1', index=False)
dfkey_expanded.to_excel(writer,'Sheet2', index=False)
writer.save()
When I woke up today, I was determined to add this feature to my app. Initially I was considering computing the graph directly in the page using D3. Maybe I'll end up doing that eventually. I'm not thrilled about embedded iframes, but nearly everything else about Kumu is great and relatively easy, so I went with that.
It was a bit of a heavy lift to get all of this done in one long day. But the result of that labor is absolutely cool. I've spent a long time imagining a webpage like the one I've created. Now that it's made, I'm feeling like I did something personally significant, even if not that many people end up using it.
Read Free Mind Gazette on Substack
Read my novels:
See my NFTs: