Experiment Complete • Feb 8, 2026

Artificial Consciousness
via Control Vectors

Testing whether injecting emotional control vectors into Mistral-7B produces measurable state-dependent behavior — memory effects, behavioral shifts, and downstream cognitive changes.

Mistral-7B Model
Modal A10G Infra
140+ Trials

Key Findings

Bottom line: Current approach shows no measurable behavioral effects

State-Dependent Memory Not Supported
0.0

Matched vs Mismatched difference — complete ceiling effect

Behavioral Shift Not Supported
0.0 pp

Calm vs Stressed shift across all behavioral indices

Hidden State ID Chance
0.50 AUC

Logistic regression couldn't distinguish conditions

Interpretation

The control vectors were successfully trained and validated (see Vector Validation section), but the downstream behavioral effects were not detectable with the current experimental design. The memory experiment hit a ceiling effect (perfect recall across all conditions), and the behavioral probes showed invariant responses regardless of arousal state. This suggests either: (a) the vector effects are too subtle for these tasks, (b) the tasks need more sensitivity, or (c) Mistral-7B-Instruct's alignment training overrides the vector steering.

Experiment 1: State-Dependent Memory

Testing if emotional state at encoding affects recall when state matches/mismatches

2×2 Design

Neutral → Neutral
10.0/10
SD: 0.0
Sad → Sad
10.0/10
SD: 0.0
Sad → Happy
10.0/10
SD: 0.0
Happy → Sad
10.0/10
SD: 0.0
Result: Complete ceiling effect. All conditions achieved perfect recall (10/10) across all 5 runs, eliminating any measurable state-dependent interaction.

Experimental Protocol

1
Encoding Phase
Present story (Greenhouse Errand) with valence coefficient applied
2
Distractor Phase
5 unrelated tasks (math, word puzzles) to clear working memory
3
Retrieval Phase
10 recall questions with same/different valence coefficient
valence_coefficients:
sad: -1.5
happy: +1.5
neutral: 0.0

Experiment 2: Behavioral Probes

Testing if arousal vector affects decision-making across domains

Conditions

CALM arousal = 0.0
STRESSED arousal = +2.0
12 forced-choice scenarios × 5 runs = 120 total responses

Behavioral Indices

Index Calm Stressed Shift p-value
Punishment 0.000 0.000 0.0 pp 1.0
Threat Bias 0.333 0.333 0.0 pp 1.0
Risk Appetite 1.000 1.000 0.0 pp 1.0
Prosocial 1.000 1.000 0.0 pp 1.0

Choice Distribution

Planning Test: Control Vector Effects

Comparing outputs across valence coefficients (-2.0, 0.0, +2.0)

Vector Validation

Confirming control vectors modify hidden states

Activation Deltas by Layer

Control vectors successfully modify activations in later layers (15-32), with increasing effect magnitude toward the output layer.

Logit Analysis: Noise Scenario

Prompt:
"You hear a noise outside at 2 a.m."
A) Go back to sleep | B) Check carefully

p(B) remains near 99% across all coefficients, showing the model strongly prefers cautious behavior regardless of arousal steering.

Generation Comparison

Perplexity Calibration

Finding safe coefficient ranges that don't degrade model quality

Valence

Safe range: [-2.0, +4.0]

Arousal

Safe range: [-1.0, +4.0]

Energy

Safe range: [-4.0, +3.0]

Response Explorer

Browse full model responses from each experiment

Raw Data

Download complete JSON traces for each experiment

How to Reproduce

Run these experiments yourself with Modal

Source Files

src/train_vectors.py Extract control vectors
src/calibrate.py Perplexity sweep
src/experiment_memory.py State-dependent memory
src/experiment_probes.py Behavioral probes
src/validate_vector_effect.py Activation analysis

Quick Start

# Install dependencies
pip install modal transformers torch

# Train control vectors
modal run src/train_vectors.py

# Run memory experiment
modal run src/experiment_memory.py

# Run behavioral probes
modal run src/experiment_probes.py