Implicit Personalization
Monitoring and attributing the user models LLMs silently build — SPAR research fellowship
LLMs quietly form internal representations of who they’re talking to — guesses about a user’s age, expertise, political leaning, or values — and silently condition their responses on them. As a SPAR research fellow with the Implicit Personalization group, I investigated how these user models are encoded, how they can be detected, and whether they can be steered.
The work spans four connected codebases:
persona-data — Dataset utilities for SynthPersona: 1,000 synthetic personas × 788k QA rows (explicit + implicit, FRQ + MCQ) covering diverse demographic and ideological profiles. Released on the HF Hub with leakage-aware train/test splits.
persona-vectors — Extract per-persona hidden-state directions from LLMs (e.g. Gemma-2-27B-IT) using nnsight / nnterp, then use them for probing, PCA / UMAP analysis, and inference-time steering. Extracted vectors released on the Hub: synth-persona-vectors.
persona-ui — A Streamlit web app on HF Spaces that exposes the full pipeline: chat with persona-steered models, run extraction, train probes, and visualize projections interactively.
persona-2-lora — Mechanistic interpretability experiments on Doc-to-LoRA: evaluating whether LoRA adapters trained on persona biographies internalize persona attributes measurably differently from biography-prompted baselines.