Skip to main content

Example Project: Developing a Tool to Analyze Experiment Results and Generate Custom Reports

Outline

This comprehensive project will solidify your Python SDK knowledge by guiding you through building a real-world Opal tool. You'll integrate data fetching, data processing, statistical analysis, and report generation.

Project Goal: Build an Opal tool that takes Optimizely experiment IDs, fetches relevant data (simulated or from a mock API), performs custom statistical analysis (e.g., Bayesian A/B testing, frequentist A/B testing calculations), and generates a summarized report (e.g., JSON or a simple HTML snippet).

Scenario: A marketing team wants a quick way to get deeper insights into their Optimizely experiments, beyond the standard dashboard. They need a tool that can calculate custom metrics or perform specific statistical tests not readily available.

Key Steps and Implementation Details

Project Setup:

  • Create a new Python project and virtual environment.
  • Install necessary libraries: optimizely-opal.opal-tools-sdk, fastapi, uvicorn, pandas, numpy, scipy.
  • Create a main.py for your FastAPI app and a src/tools/experiment_analyzer.py for your tool logic.

Data Input and Tool Definition:

  • Define a Pydantic model for the tool's parameters. This should include:
    • experiment_id: str (The ID of the experiment).
    • control_data: A nested Pydantic model or dictionary with visitors: int and conversions: int.
    • variant_data: A list of nested Pydantic models or dictionaries, each with name: str, visitors: int, conversions: int.
    • alpha: float (Significance level, e.g., 0.05).
  • Use the @tool decorator to define your analyze_experiment_results tool.

Simulate Data Fetching (or Mock API):

  • For simplicity, you won't connect to a real Optimizely API in this exercise. Instead, your tool will receive the control_data and variant_data directly as parameters.
  • In a real application, this step would involve making authenticated API calls to Optimizely's APIs (e.g., Experimentation REST API) to retrieve raw experiment data.

Statistical Analysis:

  • Implement functions to perform common A/B testing calculations. A good starting point is a Chi-Squared Test for Proportions to determine statistical significance between variants and control.

  • Formula for Chi-Squared (simplified for 2x2 table):

    • Expected conversions for control: (total_control_visitors * total_conversions) / total_visitors
    • Expected non-conversions for control: (total_control_visitors * total_non_conversions) / total_visitors
    • Calculate for each variant similarly.
    • Chi-squared statistic: sum((observed - expected)^2 / expected) for all cells.
    • Compare the chi-squared statistic to a critical value (from a chi-squared distribution table) or use scipy.stats.chi2_contingency.
  • Example Chi-Squared Calculation (simplified):

from scipy.stats import chi2_contingency

def calculate_chi_squared(control_conversions, control_visitors, variant_conversions, variant_visitors):
    # Create a contingency table
    # Rows: Control, Variant
    # Columns: Conversions, Non-Conversions
    control_non_conversions = control_visitors - control_conversions
    variant_non_conversions = variant_visitors - variant_conversions

    contingency_table = [
        [control_conversions, control_non_conversions],
        [variant_conversions, variant_non_conversions]
    ]

    chi2, p_value, _, _ = chi2_contingency(contingency_table)
    return chi2, p_value

# Example usage:
# chi2, p_value = calculate_chi_squared(100, 1000, 120, 1000)
# print(f"Chi2: {chi2}, P-value: {p_value}")

Report Generation:

  • The tool should return a structured JSON object containing:
    • experiment_id
    • control_results (conversions, visitors, conversion_rate)
    • variant_results (for each variant: name, conversions, visitors, conversion_rate, statistical_significance_vs_control, p_value_vs_control, uplift_vs_control)
    • overall_conclusion (e.g., "Variant X is statistically significant winner," "No significant difference found.")
  • Consider adding a simple HTML report string as an output field for better readability in some contexts.

Tool Implementation (src/tools/experiment_analyzer.py):

from optimizely_opal.opal_tools_sdk import tool
from pydantic import BaseModel, Field
from typing import List, Dict, Any
from scipy.stats import chi2_contingency
import math

class VariantMetrics(BaseModel):
    name: str = Field(..., description="Name of the variant (e.g., 'Control', 'Variant A').")
    visitors: int = Field(..., description="Number of unique visitors to this variant.", ge=0)
    conversions: int = Field(..., description="Number of conversions for this variant.", ge=0)

class ExperimentAnalysisParams(BaseModel):
    experiment_id: str = Field(..., description="The ID of the Optimizely experiment.")
    control: VariantMetrics = Field(..., description="Metrics for the control group.")
    variants: List[VariantMetrics] = Field(..., description="List of metrics for each variant group.")
    alpha: float = Field(0.05, description="Significance level (alpha) for statistical tests.", ge=0.01, le=0.1)

@tool(name="analyze_experiment_results", description="Performs statistical analysis on Optimizely experiment data.")
async def analyze_experiment_results_tool(params: ExperimentAnalysisParams):
    """
    Analyzes experiment data, calculates conversion rates, uplift, and statistical significance.
    """
    results = {
        "experiment_id": params.experiment_id,
        "control_results": {
            "name": params.control.name,
            "visitors": params.control.visitors,
            "conversions": params.control.conversions,
            "conversion_rate": (params.control.conversions / params.control.visitors) if params.control.visitors > 0 else 0
        },
        "variant_results": [],
        "overall_conclusion": "Analysis completed."
    }

    if params.control.visitors == 0:
        results["overall_conclusion"] = "Control group has no visitors, cannot perform analysis."
        return results

    control_cr = results["control_results"]["conversion_rate"]
    overall_significant_difference = False

    for variant in params.variants:
        variant_cr = (variant.conversions / variant.visitors) if variant.visitors > 0 else 0
        uplift = ((variant_cr - control_cr) / control_cr) * 100 if control_cr > 0 else float('inf')

        statistical_significance = "N/A"
        p_value = None

        if variant.visitors > 0 and variant.conversions >= 0: # Ensure valid data for chi-squared
            try:
                # Contingency table for Chi-Squared Test
                # Rows: Control, Variant
                # Columns: Conversions, Non-Conversions
                control_non_conversions = params.control.visitors - params.control.conversions
                variant_non_conversions = variant.visitors - variant.conversions

                contingency_table = [
                    [params.control.conversions, control_non_conversions],
                    [variant.conversions, variant_non_conversions]
                ]

                # Perform Chi-Squared test
                chi2, p_value, _, _ = chi2_contingency(contingency_table)

                if p_value < params.alpha:
                    statistical_significance = "Statistically Significant"
                    overall_significant_difference = True
                else:
                    statistical_significance = "Not Statistically Significant"
            except ValueError as e:
                statistical_significance = f"Error in Chi-Squared: {str(e)}"
                p_value = None
            except Exception as e:
                statistical_significance = f"Unexpected error in Chi-Squared: {str(e)}"
                p_value = None

        results["variant_results"].append({
            "name": variant.name,
            "visitors": variant.visitors,
            "conversions": variant.conversions,
            "conversion_rate": variant_cr,
            "uplift_vs_control_percent": round(uplift, 2) if math.isfinite(uplift) else "Infinity",
            "statistical_significance_vs_control": statistical_significance,
            "p_value_vs_control": round(p_value, 4) if p_value is not None else "N/A"
        })

    if overall_significant_difference:
        results["overall_conclusion"] = "One or more variants showed a statistically significant difference from control."
    else:
        results["overall_conclusion"] = "No statistically significant difference found between variants and control."

    return results

# Example of how to integrate this tool into your FastAPI app (main.py)
# from .tools.experiment_analyzer import analyze_experiment_results_tool
# opal_tools_service.register_tool(analyze_experiment_results_tool)

Testing:

  • Run your FastAPI application locally (uvicorn main:app --reload).
  • Use Postman or Insomnia to send POST requests to <a href="http://localhost:8000/tools/analyze_experiment_results">http://localhost:8000/tools/analyze_experiment_results</a> with various ExperimentAnalysisParams JSON bodies.
  • Test cases:

    • Control and variant data with clear differences (expect significance).
    • Control and variant data with small differences (expect no significance).
    • Edge cases: zero visitors, zero conversions.
    • Invalid alpha values.
  • By completing this project, you will have developed a robust, data-driven Opal tool using the Python SDK, capable of automating complex backend processes and integrating with diverse data sources. This demonstrates a powerful application of Opal tools for enhancing data analysis workflows within Optimizely.