get_zero_shot_metrics¶

Download and process ProteinGym zero-shot benchmarking metrics.

This loads performance metrics for zero-shot models across 217 DMS assays. The benchmarking uses 5 metrics to evaluate model performance in predicting experimental DMS measurements without training on the specific assay labels.

Metrics included: 1. Spearman's rank correlation coefficient (primary metric) 2. Area Under the ROC Curve (AUC) 3. Matthews Correlation Coefficient (MCC) for bimodal measurements 4. Normalized Discounted Cumulative Gains (NDCG) for identifying top variants 5. Top K Recall (top 10% of DMS values)

Parameters:	`cache_dir` (`str`, default: `'.cache'` ) – Directory to cache downloaded files

Returns:	`Dict[str, DataFrame]` – Dictionary with 5 entries (one per metric), each containing a DataFrame with: `Dict[str, DataFrame]` – Rows: 217 DMS assays `Dict[str, DataFrame]` – Columns: Model performance scores (79 models in v1.2)

Source code in proteingympy/make_zeroshot_dms_benchmarks.py

def get_zero_shot_metrics(cache_dir: str = ".cache") -> Dict[str, pd.DataFrame]:
    """
    Download and process ProteinGym zero-shot benchmarking metrics.

    This loads performance metrics for zero-shot models across 217 DMS assays.
    The benchmarking uses 5 metrics to evaluate model performance in predicting
    experimental DMS measurements without training on the specific assay labels.

    Metrics included:
    1. Spearman's rank correlation coefficient (primary metric)
    2. Area Under the ROC Curve (AUC) 
    3. Matthews Correlation Coefficient (MCC) for bimodal measurements
    4. Normalized Discounted Cumulative Gains (NDCG) for identifying top variants
    5. Top K Recall (top 10% of DMS values)

    Args:
        cache_dir: Directory to cache downloaded files

    Returns:
        Dictionary with 5 entries (one per metric), each containing a DataFrame with:
        - Rows: 217 DMS assays
        - Columns: Model performance scores (79 models in v1.2)
    """
    os.makedirs(cache_dir, exist_ok=True)

    # Option 1: Load from GitHub (older approach with 62 models)
    # benchmark_data = _load_from_github()

    # Option 2: Load from Zenodo v1.2 (79 models)
    benchmark_data = _load_from_zenodo_v12(cache_dir)

    return benchmark_data