igor schwarzmann Phase 2 Testing Concept ← notes

CENTRAL QUESTION

Can Vorwerk's brand promise survive without the brand artifact?

Gelinggarantie has always been hardware-bound — the Thermomix on the counter is what makes "success guaranteed" credible. Phase 2 built an actual software artifact in its place. The test is whether that artifact can carry the same promise.

What we built (Phase 2 Weeks 1–4)

The artifact under test, and why each piece matters for Vorwerk's question

Each component is the software equivalent of a mechanism that makes hardware Gelinggarantie credible.

Curated 53-recipe BBQ corpus (cost-tier, equipment, dietary, flavor metadata) — the content layer. The BBQ equivalent of Cookidoo: validated, grounded knowledge the AI draws from. Without it the AI hallucinates; with it, it speaks from a real corpus.

Four behavior files (home / plan-from-filter / plan-from-recipe / assemble) — the flow layer. Translates Thermomix's fixed sequence (select → ingredients → steps → result) into a conversational arc from intent to plan.

Discovery determinism + completeness gate — the reliability layer. Codified question order and completeness thresholds so the AI behaves like an instrument, not a chatbot. What makes the AI trustworthy the way hardware is trustworthy.

Six tool-driven UI primitives + surface-aware state — the constraint layer. Structured tools (present_choice, present_recipe_cards, route_to, …) keep the AI on a safe path at every key decision point.

Built collaboratively: Igor (agent behavior + corpus), Jürgen (tools + design system), MAD (design + UX).

Phase 2 must answer (unmoderated)

We built a working AI planning agent — not a clickable demo. Does this artifact, in actual self-directed use, produce the response Phase 1 documented in moderated discussion?

Without a researcher in the room. Without hardware on the counter. With Vorwerk delivering only software.

Two strategic questions

(a) Category permission without hardware — can Vorwerk credibly enter BBQ?

(b) Digital Gelinggarantie — can AI companionship deliver the success-guarantee experience that hardware delivered before?

Nested, not parallel: (a) collapses into (b). If the AI companion reliably produces the trust-and-confidence response, Vorwerk has permission — because they're still delivering the core promise, just through a different medium.

Methodology embodies the thesis

Boswell — AI conducting voice interviews with structured insight extraction — is itself an artifact in the category Vorwerk is investing in.

Using AI to do the emotional/relational work of post-experience research IS a demonstration of the capability that motivates the Ember investment in the first place.

The instrument embodies the question.

SECONDARY CLAIM: Category permission at warm-referral distance

Measured indirectly via:

  • Recruitment chain integrity (does the warm referral chain complete?)
  • Boswell topic 6 (brand reaction)

Not via direct interrogation. We observe the chain's behavior rather than ask "do you accept Vorwerk in BBQ?"

PRIMARY CLAIM: Digital Gelinggarantie lands

Measured via Boswell topics 1–5:

  1. Experience anchor
  2. Emotional resonance moments
  3. Trust in the plan
  4. Companion character
  5. Coherence across entry kinds

F&F sample is acceptable here — this is a product-effect read, not a market-attitude read.

Sample

  • MAD F&F: ~10 (Igor + Sebastian + Jürgen + Carla networks)
  • Vorwerk-provided: ~5 (composition at Vorwerk's discretion)
  • Target completions: ~12
  • Overshoot: invite 16–18

Cohort tag (MAD F&F / Vorwerk insider / Vorwerk-initiated warm referral) captured per participant at Boswell briefing step — enables cohort splits in analysis without dictating Vorwerk's recruitment mix.

Timeline

  • Wed (next check-in): present to Julius
  • Week 6: finalize Boswell briefing materials, 1–2 internal test interviews, NDA + email templates
  • Week 7: pre-screens go out, tests begin (Igor away — Boswell runs autonomously)
  • Week 8: remaining tests, transcript coding, synthesis
  • Mid-June: findings delivered to Julius

Slippage lands in mid-June as follow-on rather than gating the early-June close.

Boswell — 8 topics, ~25 min

  1. Experience anchor (3–4 min)
  2. Emotional resonance moments (4–5)
  3. Trust in the plan (4–5)
  4. Companion character (3)
  5. Coherence across entries (2–3)
  6. Vorwerk-in-BBQ reaction (3)
  7. Use intent (2–3)
  8. Open close (1–2)

Research-informed, conversational, adaptive. Briefing materials bias question generation toward these eight topics; Boswell conducts adaptively.

Participant flow (per person)

  1. Pre-screen ask → yes/no consent
  2. Invitation email + lightweight NDA + single link
  3. Prototype: 2–3 plans across different entry kinds (~20 min)
  4. Boswell voice interview (~25 min)
  5. Transcript captured, copy to participant

Bundled session is the default. Async fallback 24–48h with one reminder. Vorwerk-branded prototype from first screen. "Things will break" explicit framing on the landing page.

Analysis

Coding dimensions (pre-specified):

  • Trust-in-plan language
  • Companion-character language (relational / functional / absent)
  • Magic-moment narratives
  • Brand reaction valence + reasoning
  • Coherence-across-entries reaction
  • Use intent and conditions
  • Spontaneous side observations (pricing, community, hardware)

No pre-committed numerical thresholds. Strength of claim to Julius follows from strength of qualitative pattern observed.

Cohort splits (MAD F&F / Vorwerk insider / Vorwerk-initiated) reported where they reveal a difference.

Output

  1. Question + strategic frame
  2. Sample composition
  3. Digital Gelinggarantie — pattern, evidence, hedges
  4. Side observations (pricing / community / hardware if spontaneous)
  5. UI breakdowns → MAD as backlog (separate from strategic findings)
  6. Implications for Go/No-Go and the rest of Phase 2

OUT OF SCOPE

  • UI usability / visual polish → MAD backlog
  • Willingness-to-pay / pricing tiers
  • Hardware integration (H6 separate)
  • Community / social (WS4 emergent only)
  • Market sizing / addressable market
  • During-cook / post-event surfaces

Sample limits: product-effect read, NOT market-attitude read. Stated upfront in deliverable.

unsupported node type: file