Metadata prediction models infer biological metadata from observed expression data. Given a gene expression profile, the model predicts the likely biological characteristics such as cell type, tissue, disease state, and more.
This is useful when you want to:
gem-1-bulk_predict-metadata: Bulk
RNA-seq metadata prediction modelgem-1-sc_predict-metadata: Single-cell
RNA-seq metadata prediction modelNote: These endpoints may require 1-2 minutes of startup time if they have been scaled down. Plan accordingly for interactive use.
Metadata prediction encodes your expression data into the model’s latent space and then uses classifiers to predict the most likely metadata values for each sample. The model returns:
Metadata prediction queries are simpler than other model types—you only need to provide expression counts:
# Get the example query structure
example_query <- get_example_query(model_id = "gem-1-bulk_predict-metadata")$example_query
# Inspect the query structure
str(example_query)The query structure includes:
inputs: A list of count vectors,
where each element is a named list with a counts field
containing expression values
seed (optional): Random seed for
reproducibility
Here’s a complete example predicting metadata for expression samples:
# Start with example query structure
query <- get_example_query(model_id = "gem-1-bulk_predict-metadata")$example_query
# Replace with your actual expression counts
# Each input should be a list with a counts vector
query$inputs <- list(
list(counts = sample1_counts),
list(counts = sample2_counts),
list(counts = sample3_counts)
)
# Optional: set seed for reproducibility
query$seed <- 42
# Submit the query
result <- predict_query(query, model_id = "gem-1-bulk_predict-metadata")For predicting metadata of a single sample:
A list of expression count vectors. Each element should be a named list containing:
counts: A vector of non-negative
integers representing gene expression countsThe results from metadata prediction include several components:
The metadata data frame contains the predicted values
for each sample:
For categorical metadata fields, the model returns probability distributions over all possible values. These are useful for understanding prediction confidence:
# If probabilities are included in the output
# Access cell type probabilities for first sample
# The exact structure depends on the API response format
# Example: viewing top predicted cell types
cell_type_probs <- result$outputs$classifier_probs$cell_type[[1]]
head(sort(cell_type_probs, decreasing = TRUE))Annotate unlabeled samples with predicted metadata:
# Load your unlabeled samples
unlabeled_counts <- read.csv("unlabeled_samples.csv", row.names = 1)
# Create query
query <- get_example_query(model_id = "gem-1-bulk_predict-metadata")$example_query
query$inputs <- lapply(1:ncol(unlabeled_counts), function(i) {
list(counts = unlabeled_counts[, i])
})
# Predict metadata
result <- predict_query(query, model_id = "gem-1-bulk_predict-metadata")
# Combine with sample IDs
annotations <- result$outputs$metadata
annotations$sample_id <- colnames(unlabeled_counts)Validate existing sample labels against predicted metadata:
# Compare predicted vs. provided labels
provided_labels <- c("UBERON:0002107", "UBERON:0002107", "UBERON:0000955", "UBERON:0000955")
predicted_labels <- result$outputs$metadata$tissue_ontology_id
# Identify potential mismatches
mismatches <- which(provided_labels != predicted_labels)
if (length(mismatches) > 0) {
message("Potential mislabeled samples: ", paste(mismatches, collapse = ", "))
}The counts vector for each sample must match the model’s expected number of genes. If the length doesn’t match, the API will return a validation error.
Use get_example_query() to see the expected
structure.
Ensure your counts are in the same gene order expected by the model.
The gene order should match what the baseline model expects—you can
retrieve this from any prediction result’s gene_order
field.
All count values must be non-negative integers. Floats that are whole
numbers (like 10.0) are accepted, but negative values will
cause validation errors.