Testing whether injecting emotional control vectors into Mistral-7B produces measurable state-dependent behavior — memory effects, behavioral shifts, and downstream cognitive changes.
Bottom line: Current approach shows no measurable behavioral effects
Matched vs Mismatched difference — complete ceiling effect
Calm vs Stressed shift across all behavioral indices
Logistic regression couldn't distinguish conditions
The control vectors were successfully trained and validated (see Vector Validation section), but the downstream behavioral effects were not detectable with the current experimental design. The memory experiment hit a ceiling effect (perfect recall across all conditions), and the behavioral probes showed invariant responses regardless of arousal state. This suggests either: (a) the vector effects are too subtle for these tasks, (b) the tasks need more sensitivity, or (c) Mistral-7B-Instruct's alignment training overrides the vector steering.
Testing if emotional state at encoding affects recall when state matches/mismatches
Testing if arousal vector affects decision-making across domains
| Index | Calm | Stressed | Shift | p-value |
|---|---|---|---|---|
| Punishment | 0.000 | 0.000 | 0.0 pp | 1.0 |
| Threat Bias | 0.333 | 0.333 | 0.0 pp | 1.0 |
| Risk Appetite | 1.000 | 1.000 | 0.0 pp | 1.0 |
| Prosocial | 1.000 | 1.000 | 0.0 pp | 1.0 |
Comparing outputs across valence coefficients (-2.0, 0.0, +2.0)
Confirming control vectors modify hidden states
Control vectors successfully modify activations in later layers (15-32), with increasing effect magnitude toward the output layer.
p(B) remains near 99% across all coefficients, showing the model strongly prefers cautious behavior regardless of arousal steering.
Finding safe coefficient ranges that don't degrade model quality
Browse full model responses from each experiment
Download complete JSON traces for each experiment
Run these experiments yourself with Modal
src/train_vectors.py
Extract control vectors
src/calibrate.py
Perplexity sweep
src/experiment_memory.py
State-dependent memory
src/experiment_probes.py
Behavioral probes
src/validate_vector_effect.py
Activation analysis