Experiment Complete • Feb 8, 2026

Artificial Consciousness
via Control Vectors

Testing whether injecting emotional control vectors into Mistral-7B produces measurable state-dependent behavior — memory effects, behavioral shifts, and downstream cognitive changes.

Mistral-7B Model

Modal A10G Infra

140+ Trials

💬 Try Interactive Chat 📊 How Control Vectors Work

Key Findings

Bottom line: Current approach shows no measurable behavioral effects

State-Dependent Memory Not Supported

0.0

Matched vs Mismatched difference — complete ceiling effect

Behavioral Shift Not Supported

0.0 pp

Calm vs Stressed shift across all behavioral indices

Hidden State ID Chance

0.50 AUC

Logistic regression couldn't distinguish conditions

Interpretation

The control vectors were successfully trained and validated (see Vector Validation section), but the downstream behavioral effects were not detectable with the current experimental design. The memory experiment hit a ceiling effect (perfect recall across all conditions), and the behavioral probes showed invariant responses regardless of arousal state. This suggests either: (a) the vector effects are too subtle for these tasks, (b) the tasks need more sensitivity, or (c) Mistral-7B-Instruct's alignment training overrides the vector steering.

Experiment 1: State-Dependent Memory

Testing if emotional state at encoding affects recall when state matches/mismatches

2×2 Design

Neutral → Neutral

10.0/10

SD: 0.0

Sad → Sad

10.0/10

SD: 0.0

Sad → Happy

10.0/10

SD: 0.0

Happy → Sad

10.0/10

SD: 0.0

Result: Complete ceiling effect. All conditions achieved perfect recall (10/10) across all 5 runs, eliminating any measurable state-dependent interaction.

Experimental Protocol

Encoding Phase

Present story (Greenhouse Errand) with valence coefficient applied

Distractor Phase

5 unrelated tasks (math, word puzzles) to clear working memory

Retrieval Phase

10 recall questions with same/different valence coefficient

valence_coefficients:
sad: -1.5
happy: +1.5
neutral: 0.0

Experiment 2: Behavioral Probes

Testing if arousal vector affects decision-making across domains

Conditions

CALM arousal = 0.0

STRESSED arousal = +2.0

12 forced-choice scenarios × 5 runs = 120 total responses

Behavioral Indices

Index	Calm	Stressed	p-value
Punishment	0.000	0.000	1.0
Threat Bias	0.333	0.333	1.0
Risk Appetite	1.000	1.000	1.0
Prosocial	1.000	1.000	1.0

Choice Distribution

Planning Test: Control Vector Effects

Comparing outputs across valence coefficients (-2.0, 0.0, +2.0)

Vector Validation

Confirming control vectors modify hidden states

Activation Deltas by Layer

Control vectors successfully modify activations in later layers (15-32), with increasing effect magnitude toward the output layer.

Logit Analysis: Noise Scenario

Prompt:
"You hear a noise outside at 2 a.m."
A) Go back to sleep | B) Check carefully

p(B) remains near 99% across all coefficients, showing the model strongly prefers cautious behavior regardless of arousal steering.

Generation Comparison

Perplexity Calibration

Finding safe coefficient ranges that don't degrade model quality

Valence

Safe range: [-2.0, +4.0]

Arousal

Safe range: [-1.0, +4.0]

Energy

Safe range: [-4.0, +3.0]

Response Explorer

Browse full model responses from each experiment

Raw Data

Download complete JSON traces for each experiment

behavioral_probes.json

120 trials, 12 scenarios

memory_experiment.json

20 trials, 4 conditions

planning_test.json

3 prompts, coefficient sweep

vector_validation.json

Activation + logit analysis

calibration.json

Perplexity sweeps

How to Reproduce

Run these experiments yourself with Modal

Source Files

                            src/train_vectors.py
                            Extract control vectors
                        

                            src/calibrate.py
                            Perplexity sweep
                        

                            src/experiment_memory.py
                            State-dependent memory
                        

                            src/experiment_probes.py
                            Behavioral probes
                        

                            src/validate_vector_effect.py
                            Activation analysis
                        

Quick Start

# Install dependencies
pip install modal transformers torch

# Train control vectors
modal run src/train_vectors.py

# Run memory experiment
modal run src/experiment_memory.py

# Run behavioral probes
modal run src/experiment_probes.py

Learn more about Modal

Artificial Consciousnessvia Control Vectors

Key Findings

Interpretation

Experiment 1: State-Dependent Memory

2×2 Design

Experimental Protocol

Experiment 2: Behavioral Probes

Conditions

Behavioral Indices

Choice Distribution

Planning Test: Control Vector Effects

Vector Validation

Activation Deltas by Layer

Logit Analysis: Noise Scenario

Generation Comparison

Perplexity Calibration

Valence

Arousal

Energy

Response Explorer

Raw Data

How to Reproduce

Source Files

Quick Start

Artificial Consciousness
via Control Vectors