This analysis compares various measurement metrics across different model configurations, with a focus on efficiency and performance. The data is color-coded to highlight the best (green), close to best (yellow), still good overall (orange), and worst values (red).
Key findings:
The most notable metrics for comparison are Mean PPL (perplexity), KLD (Kullback-Leibler Divergence), and the bottom section metrics including 1PPL/MB and various efficiency ratios which indicate performance relative to model size.
Measurement | IQ1_M (mine) | IQ1_M (main) | IQ2_XXS (mine) | IQ2_XXS (main) | IQ2_S (mine) | UD-IQ1_M (unsloth) | Q2_K_L (mine) | Q2_K_L (main) | UD-Q2_K_XL (unsloth) | IQ3_XXS (mine) | IQ3_XXS (main) |
---|---|---|---|---|---|---|---|---|---|---|---|
Size (GB) | 25.32 | 24.57 | 30.17 | 28.56 | 34.34 | 35.4 | 44 | 40.57 | 42.6 | 44.96 | 41.66 |
Mean PPL | 11.81 | 13.79 | 10.55 | 11.66 | 9.85 | 10.30 | 9.02 | 9.88 | 9.31 | 9.266434 | 9.76184 |
KLD | |||||||||||
Mean | 0.691 | 0.933 | 0.464 | 0.664 | 0.361 | 0.376 | 0.217 | 0.332 | 0.185 | 0.164 | 0.244 |
Max | 17.819 | 23.806 | 26.647 | 26.761 | 17.597 | 21.264 | 24.180 | 17.556 | 23.286 | 28.166 | 25.849 |
99.9% | 9.912 | 10.822 | 7.897 | 10.029 | 6.693 | 6.995 | 7.129 | 12.766 | 4.213 | 4.232 | 4.964 |
99% | 5.463 | 6.250 | 4.084 | 5.094 | 3.237 | 3.560 | 2.108 | 2.966 | 1.844 | 1.600 | 2.178 |
median | 0.315 | 0.387 | 0.187 | 0.335 | 0.141 | 0.131 | 0.067 | 0.125 | 0.060 | 0.055 | 0.099 |
10% | 0.0053 | 0.0099 | 0.002 | 0.004 | 0.0012 | 0.0012 | 0.0005 | 0.0009 | 0.0004 | 0.0004 | 0.0005 |
5% | 0.00097 | 0.00179 | 0.0003 | 0.00064 | 0.00019 | 0.00018 | 0.00008 | 0.00013 | 0.00005 | 0.00005 | 0.00007 |
1% | 0.00046 | 0.00073 | 0.00011 | 0.00030 | 0.00007 | 0.00007 | 0.00002 | 0.00004 | 0.00001 | 0.00001 | 0.00002 |
Delta probs | |||||||||||
Mean | -8.03% | -10.30% | -4.62% | -6.70% | -3.38% | -3.46% | -2.14% | -2.37% | -1.38% | -1.13% | -1.57% |
Max | 99.67% | 98.73% | 99.81% | 99.81% | 99.13% | 98.90% | 99.88% | 99.81% | 99.83% | 99.91% | 99.99% |
99.9% | 77.40% | 79.77% | 76.35% | 75.42% | 75.03% | 76.59% | 69.34% | 75.65% | 69.69% | 65.60% | 71.73% |
99% | 42.37% | 41.40% | 41.62% | 47.11% | 40.65% | 40.56% | 32.34% | 41.89% | 33.46% | 31.38% | 37.88% |
95.00% | 15.79% | 18.51% | 16.32% | 19.86% | 16.95% | 15.56% | 12.41% | 17.30% | 12.83% | 12.71% | 16.04% |
90.00% | 6.59% | 7.56% | 7.69% | 9.05% | 7.62% | 7.33% | 5.92% | 8.86% | 6.43% | 6.50% | 8.23% |
75.00% | 0.16% | 0.13% | 0.44% | 0.35% | 0.54% | 0.51% | 0.53% | 0.89% | 0.70% | 0.70% | 0.86% |
Median | -0.78% | -1.21% | -0.18% | -0.42% | -0.09% | -0.09% | -0.03% | -0.02% | -0.01% | -0.01% | -0.01% |
25.00% | -11.66% | -15.85% | -6.11% | -9.93% | -4.65% | -4.56% | -2.86% | -3.40% | -2.11% | -1.96% | -2.66% |
10.00% | -35.57% | -46.38% | -23.74% | -34.00% | -19.19% | -18.97% | -12.61% | -16.60% | -10.78% | -10.12% | -13.88% |
5.00% | -56.91% | -68.67% | -40.94% | -53.40% | -33.86% | -34.31% | -23.01% | -30.06% | -20.17% | -18.53% | -24.41% |
1.00% | -91.26% | -95.39% | -80.42% | -87.98% | -70.51% | -73.12% | -55.83% | -67.16% | -49.11% | -44.35% | -53.65% |
0.10% | -99.61% | -99.87% | -98.74% | -99.76% | -95.85% | -95.98% | -99.92% | -99.92% | -82.64% | -78.71% | -86.82% |
Minimum | -100.00% | -100.00% | -100.00% | -100.00% | -99.95% | -99.95% | -100.00% | -100.00% | -99.96% | -100.00% | -100.00% |
RMS Δp | 23.63% | 27.63% | 19.13% | 23.06% | 16.86% | 17.16% | 13.55% | 16.31% | 12.16% | 11.30% | 13.69% |
Same top | 68.58% | 62.65% | 74.02% | 67.77% | 76.74% | 77.00% | 82.92% | 77.85% | 83.42% | 84.20% | 80.09% |
1PPL/MB | 3.217 | 2.952 | 3.140 | 3.002 | 2.956 | 2.743 | 2.521 | 2.434 | 2.522 | 2.401 | 2.459 |
1/mean KLD/GB | 0.0571 | 0.0437 | 0.0715 | 0.0528 | 0.0806 | 0.0751 | 0.1045 | 0.0741 | 0.1268 | 0.1357 | 0.0983 |
1/median KLD/GB | 0.2690 | 0.1442 | 0.5076 | 0.2553 | 0.7196 | 0.7434 | 1.6611 | 0.8113 | 1.7799 | 1.9124 | 1.0377 |
1/RMS/GB | 0.1607559878 | 0.147351131 | 0.1733007884 | 0.1518255381 | 0.1724846058 | 0.1645804449 | 0.1677165724 | 0.1516537142 | 0.1936001069 | 0.1981899277 | 0.1753256929 |
top P/GB | 43.54236966 | 40.76096623 | 40.76096623 | 42.1431626 | 44.74616598 | 45.57342992 | 53.06575329 | 52.11571564 | 51.06444189 | 53.94945508 | 52.02427633 |
Lower values indicate better performance
Lower values indicate better alignment with the reference distribution
Visualizing the color-coded bottom metrics that indicate performance efficiency relative to model size
Metric | IQ1_M (mine) | IQ1_M (main) | IQ2_XXS (mine) | IQ2_XXS (main) | IQ2_S (mine) | UD-IQ1_M (unsloth) | Q2_K_L (mine) | Q2_K_L (main) | UD-Q2_K_XL (unsloth) | IQ3_XXS (mine) | IQ3_XXS (main) |
---|---|---|---|---|---|---|---|---|---|---|---|
1PPL/MB | 3.217 | 2.952 | 3.140 | 3.002 | 2.956 | 2.743 | 2.521 | 2.434 | 2.522 | 2.401 | 2.459 |
1/mean KLD/GB | 0.0571 | 0.0437 | 0.0715 | 0.0528 | 0.0806 | 0.0751 | 0.1045 | 0.0741 | 0.1268 | 0.1357 | 0.0983 |
1/median KLD/GB | 0.2690 | 0.1442 | 0.5076 | 0.2553 | 0.7196 | 0.7434 | 1.6611 | 0.8113 | 1.7799 | 1.9124 | 1.0377 |
1/RMS/GB | 0.1608 | 0.1474 | 0.1733 | 0.1518 | 0.1725 | 0.1646 | 0.1677 | 0.1517 | 0.1936 | 0.1982 | 0.1753 |
top P/GB | 43.5424 | 40.7610 | 40.7610 | 42.1432 | 44.7462 | 45.5734 | 53.0658 | 52.1157 | 51.0644 | 53.9495 | 52.0243 |
The storage size of the model in gigabytes. Smaller models generally require less computational resources.
Perplexity measures how well a probability model predicts a sample. Lower values indicate better performance and more accurate predictions.
Measures how one probability distribution diverges from a second, expected probability distribution. Lower values indicate better alignment with reference distribution.
The difference in probability distributions between the model and a reference. Values closer to zero indicate better alignment.
Root Mean Square of the delta probabilities, providing a single metric for the overall deviation. Lower values are better.
Percentage of cases where the model predicts the same top token as the reference. Higher values indicate better alignment.
Perplexity efficiency relative to model size. Higher values indicate more efficient perplexity performance per megabyte.
KLD efficiency relative to model size. Higher values indicate more efficient KLD performance per gigabyte.
Alternative metric for KLD efficiency using median values. Higher values indicate better efficiency.
RMS delta probability efficiency relative to model size. Higher values indicate better efficiency.
Same top prediction percentage efficiency relative to model size. Higher values indicate better efficiency.
Based on the data analysis, several important conclusions can be drawn:
These findings suggest that model optimization techniques like those used in the unsloth variants can significantly improve efficiency without sacrificing performance. Additionally, newer model architectures (like IQ3) demonstrate enhanced efficiency metrics compared to earlier versions.