Interactive 3D clustering of 1,001 British Columbian voices on artificial intelligence
Welcome to the BC AI Survey Data Storytelling Project!
This interactive dashboard presents semantic visualizations of open-ended responses from 1,001 British Columbians about artificial intelligence. Each 3D scatter plot represents survey responses as points in semantic space, where similar responses cluster together naturally.
š How to explore: Click on any visualization below to dive into the 3D interactive plots. Rotate, zoom, and hover over points to discover the authentic voices and perspectives of BC residents.
š§ New in this version: Every question page now also includes an AIāgenerated roundtable dialogue between cluster "representatives" (synthetic voices derived from thematic groupings of responses) followed by an automatically generated synthesis summary.
3D tāSNE landscapes of response embeddings let you visually browse the conceptual terrain of what people said.
Open any question below to explore its 3D semantic map.
A generated conversation between cluster personas lets you hear contrasting themes as if the ideas were debating each other.
Scroll below the 3D plot on a question page to access its dialogue module.
The project blends unsupervised semantic mapping with generative narrative synthesis. Below is the endātoāend flow from raw freeātext to interactive 3D maps and voiced roundtable dialogues.
1. Ingestion & Cleaning: Openāended survey answers are deāduplicated by exact text (frequency retained) and basic filters remove empty / placeholder values.
2. Semantic Embeddings: Each unique response is embedded into a 3,072ādimensional vector (OpenAI
text-embedding-3-large) capturing contextual meaning.
3. Dimensionality Reduction: 3D tāSNE (adaptive perplexity) projects embeddings while preserving local semantic neighborhoods for visual exploration.
4. Automatic Clustering: KāMeans is run over the 3D projection for k ā [2,30]; the k with highest silhouette score is selected to represent emergent themes (no manual tuning).
5. Frequency & Sizing: Identical responses are aggregated; marker size scales by cubeāroot(frequency) so common sentiments stand out without overwhelming rare but distinct ideas.
6. Color Mapping: A reproducible discrete palette assigns stable colors per cluster; the color legend is carried into both visualization and dialogue UI.
7. Cluster āVibeā Extraction: For each cluster a oneāsentence tonal/style descriptor is LLMāgenerated once and cached (e.g., āpragmatic, solutionsāfocused empathyā).
8. Dialogue Transcript Generation: An LLM simulates a turnābased exchange where each cluster speaks in its persona, referencing representative response content. A final Synthesis line summarizes convergence and tension points.
9. TextātoāSpeech (Per Line): Each dialogue line is rendered to audio via distinct vetted voices (e.g., alloy, echo, fable, onyxā¦) with the vibe passed as system instructions (not spoken aloud).
10. Playback Assembly: A manifest (
audio_manifest.json) indexes lines, voices, cluster labels, and paths. The frontāend module streams sequential audio, highlights the active line, supports keyboard control, speed adjustment, preloading, and autoāscroll.
11. Interactive Exploration: Users rotate / zoom 3D semantic maps, inspect original wording via hover, then scroll to the roundtable to āhearā thematic perspectives converseālinking quantitative structure with qualitative nuance.
Reproducibility & Efficiency: Cluster vibes and transcripts are cached to avoid repeated LLM cost; visualization export + dialogue assets share a consolidated
clustered_responses.json for downstream reuse.