E-Commerce: A/B Testing

A/BTesting

Data-driven decisions instead of gut feeling: A/B testing shows you with statistical confidence which variant of your shop converts better. We derive hypotheses from real user data, set up tests with methodological rigour, wait for sufficient sample size, and interpret results so you understand what you've learned — not just which number is larger. Every test becomes a building block of your optimisation knowledge.

A/B Testing challenges

Without clean tests, optimisation stays a matter of opinion, and that's exactly what gets expensive. Design changes are made on a hunch and no one knows afterwards whether they lifted or lowered conversion; tests were stopped too early and produced ineffective results, and with no documentation the same hypotheses keep resurfacing.

Design changes to the shop are made based on opinions and gut feeling, and no one knows afterwards whether they improved or worsened conversion.

Tests have been stopped too early in the past as soon as one variant was ahead — and implemented changes turned out to be ineffective or even harmful.

There is no systematic documentation of past tests, so the same hypotheses keep coming up and no cumulative knowledge about users is built.

What matters for A/B Testing

Clean A/B testing lives and dies by the discipline not to stop too early. Ending a test the moment one variant leads falls victim to the peeking problem, because without a pre-calculated sample size and sufficient run time the results are not statistically reliable. Decisions on that basis can even worsen conversion, even though the number looked good at the moment you stopped.

The quality of the hypothesis matters more than the tool. The testing tool is interchangeable, but a hypothesis cleanly derived from data with a clear expectation yields insight even when the tested variant loses. That is exactly the difference between learning and guessing: a good hypothesis explains the result, a bad one leaves only a number.

Every result is knowledge about your users, even a losing one. If you do not document tests, you lose that knowledge at the next staff change and retest the same hypotheses without noticing. A maintained test log is therefore institutional capital and one of the most valuable resources in the whole CRO process.

Good testing interprets results instead of just announcing the bigger number. The real question is not which variant won but what you learned about your customers' behaviour. That way every test becomes a building block of a cumulative understanding that makes each further hypothesis sharper and each further test more valuable.

Hypothesis Development

An A/B test without a hypothesis is guesswork. We derive test hypotheses from heatmaps, session recordings, funnel analyses, and user interviews. Each hypothesis names an observed problem, a proposed solution, and a measurable expectation. Every test has a clear objective — and you know what you learn, regardless of how it turns out.

Test Setup & Tooling

We set up A/B tests with established testing tools, configure correct audience segmentation, and ensure variants are distributed evenly and consistently. Sample ratio mismatch and other common implementation errors are actively checked before a test starts.

Statistical Analysis

A test is not finished when one variant leads, but when the sample size is sufficient for statistical significance. We calculate the required sample size before the test starts, monitor continuously, and stop only when reliable results are available. Stopping early leads to false positive findings.

Learning Documentation

Test results are documented in a central test log: hypothesis, result, statistical significance, and derived action. This knowledge base makes every test result the foundation for future hypotheses. Over months, an institutional understanding of what works for your users develops.

Good to know

Early stopping leads to wrong conclusions

Stopping a test as soon as one variant leads falls into the peeking problem. Without a pre-calculated sample size and sufficient runtime, results are not statistically reliable — and decisions based on them can actually worsen conversion.

Hypothesis matters more than tool

The A/B testing tool is interchangeable. The quality of the hypothesis determines the learning value of the test. A hypothesis cleanly derived from data with clear expectations yields insights even when the tested variant loses.

Test log is institutional capital

Every test result — won or lost — is knowledge about your users. Organisations that don't document results lose this knowledge with the next personnel change and retest the same hypotheses. A maintained test log is one of the most valuable resources in the CRO process.

Decisions backed by data

With us you're always at the cutting edge of technology and benefit directly from our developer expertise. Together we analyze your shop, identify key areas and develop tailor-made solutions. Your goals and expectations are at the center of our work.

  1. Developers, not resellers

    Your shop is built by developers who really understand the code. We pass nothing to subcontractors.

  2. Shopware down to the detail

    Architecture, API integration and performance from hundreds of project hours.

  3. One team, every discipline

    Development, design and marketing come from one team that works without friction at the handoffs.

  4. Built for growth

    We build measurably for conversion, load time and revenue.

  5. Partner, not vendor

    We stay on after launch and keep developing your shop continuously.

Ready for your successful online shop?

Whether it's an improvement or a fresh start: a no-obligation conversation never hurt anyone.

Profile picture of Paul Kalisch, Executive Partner
Paul Kalisch
Executive Partner

Related articles from our blog

Frequently asked questions

How much traffic do I need for meaningful A/B tests?
That depends on the element being tested, the expected effect size, and the accepted significance level. As a rough guide: several hundred conversions per month enables tests with well-measurable effects. For lower traffic, we recommend qualitative methods and best-practice optimisations that deliver results even without large samples.
What do you typically test first?
We prioritise by potential, effort, and confidence. Checkout elements typically have the highest potential because cart abandonment is most costly there. Product page elements with high traffic follow. We never start with small cosmetic changes when larger structural levers haven't been tested yet.
What happens if a test shows no statistically significant results?
A null result is still a result: it shows the tested element has no measurable influence on conversion. We document this and derive whether the hypothesis was wrong or whether the effect is too small for the available sample size. Both are valuable for prioritising the next tests.