07 LLM Ensembles
research · software
Simulated voting scenarios show that for a single question, deceptiveness δ alone dictates the ensemble accuracy as ensemble size approaches infinity. However, the rate of convergence is governed by the bewilderment.

Simulated voting scenarios show that for a single question, deceptiveness δ alone dictates the ensemble accuracy as ensemble size approaches infinity. However, the rate of convergence is governed by the bewilderment.

Question difficulty is a function of deceptiveness (δ), a question's tendency to elicit a single specious answer, and bewilderment (η), the degree to which it encourages random guessing. (Illustration generated by ChatGPT.)

Question difficulty is a function of deceptiveness (δ), a question's tendency to elicit a single specious answer, and bewilderment (η), the degree to which it encourages random guessing. (Illustration generated by ChatGPT.)

An ensemble of 50 models answers multiplication questions of varying difficulty. Notably, trust improves at high voting thresholds.

An ensemble of 50 models answers multiplication questions of varying difficulty. Notably, trust improves at high voting thresholds.

A single model and ensemble extract salient features from the text of echocardiogram reports. Accuracy, yield, and trust are shown as a function of voting threshold and question distributions for model-extracted left ventricular ejection fraction.

A single model and ensemble extract salient features from the text of echocardiogram reports. Accuracy, yield, and trust are shown as a function of voting threshold and question distributions for model-extracted left ventricular ejection fraction.