How to A/B Test Recommendations

The Problem: Guessing What Works

The Problem

Most merchants change their recommendation settings based on gut feeling and never validate the results. They boost a signal weight, see revenue go up, and credit the change — without knowing if revenue would have gone up anyway due to seasonality, a marketing campaign, or random variation.

SellerZoom's A/B Testing feature lets you run controlled experiments on three dimensions of your recommendation engine: signal weights (the algorithm), widget variants (the presentation), and bundle discounts (the pricing). Traffic is split randomly between control and variant, and the system reports results with statistical confidence.

Three Types of Experiments

Signal Weight Tests compare different recommendation algorithm weights. For example, your current weights might be 40% co-purchase, 30% semantic similarity, 20% margin, 10% intent. You could test a variant with 25% co-purchase, 25% semantic, 35% margin, 15% intent to see if margin-weighted recommendations drive more profit without hurting clicks.

Widget Variant Tests compare different visual presentations. Test carousel vs. grid layout, 3 products vs. 6 products per row, minimal design vs. detailed cards with ratings and descriptions, or sidebar placement vs. inline below-product placement.

Bundle Discount Tests find the optimal discount percentage for your bundles. Test 10% vs. 15% vs. 20% off — the goal is to maximize total bundle revenue, not just conversion rate.

Setting It Up: Step by Step

Go to A/B Tests

Click A/B Tests in the sidebar. You'll see any existing experiments and their status (draft, running, completed). Click "New Experiment" to create one.

Choose Experiment Type

Select signal weights, widget variant, or bundle discount. Each type has a different configuration interface — signal weights show sliders for each signal, widget variants show layout options, bundle discounts show percentage inputs.

Configure the Variant

Set up the variant you want to test against your current control. The control is always your existing live configuration. Set the traffic split (default: 50/50) — use 80/20 if you want to be cautious with a risky change.

Launch & Wait

Launch the experiment. SellerZoom assigns each visitor to control or variant persistently (session-based) so they get a consistent experience. The experiment runs until it reaches statistical significance — this typically takes 1–3 weeks depending on your traffic volume.

Apply the Winner

When results are significant, SellerZoom declares a winner and shows the confidence level. Click "Apply Winner" to push the winning variant to 100% of traffic. Your old settings are saved for rollback.

Why Testing Beats Tweaking

Controlled experiments isolate the variable. When you change a setting and observe a metric change, you don't know if the setting caused it. A/B testing shows the same visitors in the same time period with the same conditions — the only difference is the variable you're testing.

Small wins compound. An 8% improvement from a widget test, a 5% improvement from signal weight optimization, and a 12% improvement from bundle pricing compound into a 27% total improvement. Without testing, you'd never find these incremental gains.

It prevents costly mistakes. A change that seems smart can actually hurt revenue. Testing catches negative results before they impact 100% of your traffic — the 50% on control protects you from bad ideas.

Pro Tip

Run one experiment at a time per type. Overlapping experiments on the same dimension (e.g., two simultaneous signal weight tests) will contaminate each other's results. It's fine to run a signal weight test and a widget test simultaneously since they're independent.

Case Study: Fashion Boutique

Case Study

Velvet & Thread

Background: Velvet & Thread is a women's fashion store on Shopify (2,200 SKUs) with strong traffic but a below-average recommendation click-through rate of 2.1%. They suspected the widget design was the issue — too small, too far below the fold, and showing only 3 products.

Implementation: Ran a widget variant test: Control (3-product carousel below reviews) vs. Variant (6-product grid directly below the main product image). 50/50 split for 2 weeks on 48,000 sessions.

+18%Recommendation conversion rate

3.8%New widget CTR (was 2.1%)

97%Statistical confidence

+$14KMonthly attributed revenue lift

Key insight: The grid format directly below the product image captured attention that the carousel below reviews was missing. Shoppers scrolled right past the old position. The experiment proved this in 12 days — without it, Velvet & Thread would still be debating the change in Slack.

Stop Guessing, Start Testing

Run your first recommendation experiment in 5 minutes. Statistical confidence, not gut feeling.

Get Started Free

← Back to A/B Testing feature overview