Phase 2 Testing Concept

CENTRAL QUESTION

Can Vorwerk's brand promise survive without the brand artifact?

Gelinggarantie has always been hardware-bound — the Thermomix on the counter is what makes "success guaranteed" credible. Phase 2 built an actual software artifact in its place. The test is whether that artifact can carry the same promise.

What we built (Phase 2 Weeks 1–4)

The artifact under test, and why each piece matters for Vorwerk's question

Each component is the software equivalent of a mechanism that makes hardware Gelinggarantie credible.

Curated 53-recipe BBQ corpus (cost-tier, equipment, dietary, flavor metadata) — the content layer. The BBQ equivalent of Cookidoo: validated, grounded knowledge the AI draws from. Without it the AI hallucinates; with it, it speaks from a real corpus.

Four behavior files (home / plan-from-filter / plan-from-recipe / assemble) — the flow layer. Translates Thermomix's fixed sequence (select → ingredients → steps → result) into a conversational arc from intent to plan.

Discovery determinism + completeness gate — the reliability layer. Codified question order and completeness thresholds so the AI behaves like an instrument, not a chatbot. What makes the AI trustworthy the way hardware is trustworthy.

Six tool-driven UI primitives + surface-aware state — the constraint layer. Structured tools (present_choice, present_recipe_cards, route_to, …) keep the AI on a safe path at every key decision point.

Built collaboratively: Igor (agent behavior + corpus), Jürgen (tools + design system), MAD (design + UX).

Phase 2 must answer (unmoderated)

We built a working AI planning agent — not a clickable demo. Does this artifact, in actual self-directed use, produce the response Phase 1 documented in moderated discussion?

Without a researcher in the room. Without hardware on the counter. With Vorwerk delivering only software.

Two strategic questions

(a) Category permission without hardware — can Vorwerk credibly enter BBQ?

(b) Digital Gelinggarantie — can AI companionship deliver the success-guarantee experience that hardware delivered before?

Nested, not parallel: (a) collapses into (b). If the AI companion reliably produces the trust-and-confidence response, Vorwerk has permission — because they're still delivering the core promise, just through a different medium.

Methodology embodies the thesis

Boswell — AI conducting voice interviews with structured insight extraction — is itself an artifact in the category Vorwerk is investing in.

Using AI to do the emotional/relational work of post-experience research IS a demonstration of the capability that motivates the Ember investment in the first place.

The instrument embodies the question.

SECONDARY CLAIM: Category permission at warm-referral distance

Measured indirectly via:

Recruitment chain integrity (does the warm referral chain complete?)
Boswell topic 6 (brand reaction)

Not via direct interrogation. We observe the chain's behavior rather than ask "do you accept Vorwerk in BBQ?"

PRIMARY CLAIM: Digital Gelinggarantie lands

Measured via Boswell topics 1–5:

Experience anchor
Emotional resonance moments
Trust in the plan
Companion character
Coherence across entry kinds

F&F sample is acceptable here — this is a product-effect read, not a market-attitude read.

Sample

MAD F&F: ~10 (Igor + Sebastian + Jürgen + Carla networks)
Vorwerk-provided: ~5 (composition at Vorwerk's discretion)
Target completions: ~12
Overshoot: invite 16–18

Cohort tag (MAD F&F / Vorwerk insider / Vorwerk-initiated warm referral) captured per participant at Boswell briefing step — enables cohort splits in analysis without dictating Vorwerk's recruitment mix.

Timeline

Wed (next check-in): present to Julius
Week 6: finalize Boswell briefing materials, 1–2 internal test interviews, NDA + email templates
Week 7: pre-screens go out, tests begin (Igor away — Boswell runs autonomously)
Week 8: remaining tests, transcript coding, synthesis
Mid-June: findings delivered to Julius

Slippage lands in mid-June as follow-on rather than gating the early-June close.

Boswell — 8 topics, ~25 min

Experience anchor (3–4 min)
Emotional resonance moments (4–5)
Trust in the plan (4–5)
Companion character (3)
Coherence across entries (2–3)
Vorwerk-in-BBQ reaction (3)
Use intent (2–3)
Open close (1–2)

Research-informed, conversational, adaptive. Briefing materials bias question generation toward these eight topics; Boswell conducts adaptively.

Participant flow (per person)

Pre-screen ask → yes/no consent
Invitation email + lightweight NDA + single link
Prototype: 2–3 plans across different entry kinds (~20 min)
Boswell voice interview (~25 min)
Transcript captured, copy to participant

Bundled session is the default. Async fallback 24–48h with one reminder. Vorwerk-branded prototype from first screen. "Things will break" explicit framing on the landing page.

Analysis

Coding dimensions (pre-specified):

Trust-in-plan language
Companion-character language (relational / functional / absent)
Magic-moment narratives
Brand reaction valence + reasoning
Coherence-across-entries reaction
Use intent and conditions
Spontaneous side observations (pricing, community, hardware)

No pre-committed numerical thresholds. Strength of claim to Julius follows from strength of qualitative pattern observed.

Cohort splits (MAD F&F / Vorwerk insider / Vorwerk-initiated) reported where they reveal a difference.

Output

Question + strategic frame
Sample composition
Digital Gelinggarantie — pattern, evidence, hedges
Side observations (pricing / community / hardware if spontaneous)
UI breakdowns → MAD as backlog (separate from strategic findings)
Implications for Go/No-Go and the rest of Phase 2

OUT OF SCOPE

UI usability / visual polish → MAD backlog
Willingness-to-pay / pricing tiers
Hardware integration (H6 separate)
Community / social (WS4 emergent only)
Market sizing / addressable market
During-cook / post-event surfaces

Sample limits: product-effect read, NOT market-attitude read. Stated upfront in deliverable.

unsupported node type: file