A statistical analysis of known warriors fighting either battle and both battles-- Reno, then Custer-- indicates some 399 Indians who fought Reno, also fought Custer. Based on the same statistical methodology, it would indicate just under 2,100 Indians engaged in the Custer [+Reno] fighting.
If I remember correctly, on occasion there were some posts here that expressed some serious doubt about the validity of Fred's analysis.
In contrast, I am convinced that Fred is mostly correct with this, though there are a few secondary points that in my opinion can be improved on a bit.
The most important thing to point out is, that those numbers are entirely anchored on the number of warriors fighting Reno in the valley. Fred's numbers here assume those to be 900, exactly. For a different base, the numbers for Custer/both/total warriors change accordingly.
For those who have not yet read Fred's
Participants, a short explanation were those numbers originated from (I might have made a few errors in transcription, so the numbers I give here might deviate slightly from the book):
Fred has assembled some 250 names of warriors where we can reasonable sure if they actively participated in the valley fight, Custer fight or both.
Tabulating the numbers for the different actions gives:
........ names...battle_total...relative_to_Reno
Reno ....106......900........... 1
Custer...185.....1572........... 1.75
both......47......399........... 0.44
total....244.....2071........... 2.3
I subjected this lists of names to a slightly more sophisticated analysis, and my conclusions for the respective ratios would be (2-sigma confidence interval, assuming a ratio of Cheyenne/Arapaho to Sioux warriors of 1:6 ):
........ Sioux....Cheyenne/A.....relative_to_Reno
Reno .......75.......23........... 1
Custer.....112.......59........... 1.60+/-0.22
both........35.......13........... 0.48+/-0.07
total......152.......69........... 2.13+/-0.25
For a range of 900+/-100 warriors fighting Reno this would yield:
Reno: 900+/-100
Custer: 1440+/-260
Total: 1910+/-310
Both: 430+/-80
For a range I consider more plausible, 700+/-200 the corresponding numbers were:
Reno: 700+/-200
Custer: 1120+/-360
Total: 1490+/-460
Both: 330+/-110
On the surface this might appear to be slightly questionable math here, but bear with me for a while and this might make more sense.
--------------------------------------------------------------------
The details
--------------------------------------------------------------------
--------Sampling Bias?------------------------------------------
Now the obvious question is how sure can we be that those 200 something names indeed are a representative random sample of the totality of warriors fighting on that day, which might have been anywhere between 1000 and 2500?
The first obvious selection bias would likely be that those warriors who were in the thick of the fighting, performing deeds of valor and/or getting themselves killed or wounded will be more likely to get remembered. That's a benign error for our purpose, as for military analysis those actively shaping the fight are far more important than those that hung back to the rear and were effectively "just there" without participating meaningfully. And in any case, I see no good reason to assume that the fraction of frontline fighter vs. back row hangers on would be significantly different for both actions, apart from the last moments of the Custer fight. And if I understand Fred's selection correctly, those just watching or participating in the after-action looting are not included.
A further problem might be that not all tribal division were equally attractive to White investigators after the battle. Specifically the reservations of the most numerous divisions at the LBH, Oglala and Hunkpapa, with their "Chiefs" Sitting Bull and Crazy Horse would have been the primary points of interest, and in addition the Cheyenne with their reservation's proximity to the battlefield. Again, this is fortunately a fairly benign error, as an oversampling of the most numerous subgroups will lead to a comparatively small error, and the Cheyenne are distinct enough to treat them separately.
In addition, we have also a good sample number of the third-largest Lakota division at the LBH, the Minneconjou:
Tribe ...... names
Cheyenne ... 67
Oglala ..... 43
Hunkpapa ... 40
Minneconjou. 26
Sans Arc .... 8
Dakota ...... 7
Brule ....... 4
Else: ....... 5
Lakota with unknown affiliation: 21So we have a good representation of the camps upper and lower end, as well as intermediate positions.
And the number of samples per tribe roughly follows what I had concluded from the surrender counts, as ballpark ceiling Lakota numbers for the LBH camp:
Oglala ..... 1300
Hunkpapa ... 1500
Minneconjou. 1000
Sans Arc .... 900
Brule ....... 300
Else: ....... 500
,
taking into consideration of the Reservation/Tribe designation problem mentioned earlier (Sans Arc and Minneconjou both got to Cheyenne River, and a few "Minneconjou" might have been considered Sans Arc a decade or two ago), the likely over-representation of both Oglala and Hunkpapa and the fact that some 300-400 Hunkpapa stayed in Canada, which very likely were not interviewed.
More importantly, comparing the Cheyenne numbers with the Sioux numbers shows that the Cheyenne were markedly less represented on the Reno battlefield than on the Custer battlefield. This is in agreement with contemporary sources, pointing again to the validity of the sample.
And it also suggests that the Cheyenne should indeed be treated separately, as their over-representation in the sample would otherwise skew the results.
And finally, it allows us to probe how robust the results are.
The Cheyenne have by far the most outlying Custer/Reno ratio of more than 5:2, compared to 3:2 for the average of all others.
Fred is weighting the Cheyenne with about 1:2 compared to 1:6 in my calculation, very likely a massive overestimation of their numbers.
Yet Fred's results are not so terribly different from mine, still within my calculated error ranges.
So overall systematic sampling bias errors are likely within the same ballpark as the purely statistical error ranges resulting from the incompleteness of the names list.
For the final numbers, I used a ratio of 1 Cheyenne/Arapaho warrior to 6 Sioux warriors. For ratios between 1:4 and 1:10, which should cover the plausible range, the variation is within less than 5% of my
central value.
-------------Multiple Names?------------------------------------------
Then there is the question of multiple names for the same individual. Some time ago Fred assured me that he was aware of the issue, and were fairly confident that he caught the majority of those.
But after taking some time to compare the names on the list with the biography snippets Fred included in the more exhaustive "Indian participants" list, and with the roster of known Indian casualties compiled by Rod Thomas here:
www.littlebighorn.info/Articles/IndianCasualties.pdfand the discussion in Hardorffs
Hooka Hey!,
I'm fairly sure that i was still able to identify a significant number of duplicates, without getting anywhere near to find them all.
So my number of unique Indian names is only 221 compared to Fred's 244.
How many of those 221 might still be duplicates? Here the exhaustive casualty list by Rod Thomas was very helpful. Of the 31 confirmed individual warriors on that list, Fred have 28 accounted for on the 25th (the missing 3 occurred most likely on the 26th), but those 28 individuals appear 47 times on the list, i.e. there are 19 duplicates.
Most of those are Cheyenne, 12 out of 19 entries for the 7 known Cheyenne casualties.
In contrast, out of 28 Lakota casualty entries, only 7 are duplicates
For obvious reasons, those ratios represent an upper limit for the fraction of duplicates on the list, with true values likely much smaller
To be conservative, I will assume 20% duplicates beyond what would be expected from a random sampling process for the Sioux, 50% duplicates for the Cheyenne/Arapaho for error range calculations.
----Probability Distribution - Confidence Level - Error Ranges?-------
(For those who might be interested in checking out my results, an excel sheet with the data and calculations will be attached to the post----Gahh, forgot it, and can't edit it in, if someone wants it, please PM me----. Excuse me if I'm not using the correct English mathematical terminology, but anyone who knows this stuff should be able to understand it anyway)
As always the disclaimer that I'm not
that firm in statistics, and always open to
constructive criticism, e.g.
"no, you should have used this and this method instead, with gives better results for this and this reason, while still being reasonably uncomplicated"
As already indicated in the text above, we have a not-entirely-random sample of some 200 names, representing a subsample of a larger population of possibly 1000-2500 unique individuals.
I will approximate the problem as several independent binomial distributions for each case of "fought/fought not Custer/Reno/both".
For a more mathematically rigorous analysis the multivariate hypergeometric or multinomial distributions might be preferable, but that would be most likely beyond the scope of our little forum here.
Especially as the standard binomial distribution will be a good approximation in the case of the Sioux, and a not-quite-so-good, but likely erring on the high side approximation for the Cheyenne.
For the binomial distribution we get a standard variation of
\sigma = \sqrt{n*p*(1-p)},
where n is our sample size, and p the probability for a positive result.
An Example:
With n=112 Sioux names known to have fought Custer, out of a totality of 152 Sioux names, we get a probability p= 112/152=0.737, yielding a standard deviation of
\sigma = 4.66
This means that mathematically, could we repeatedly draw 152 names out of a box that contained all the Sioux warrior names that fought at the LBH, with a probability of about 68% our result would be between 107.34 and 116.66 times "fought Custer". Obviously there are no 1/3 names, so its more appropriate to use about 70% probability for a result between 107 and 117 that fought Custer.
(This is not exactly correct. Strictly speaking, the true, unknown value lies within +/-4.66 of our 112 with a probability of 68%)
If you want a higher probability, you need to extend the confidence interval.
For 2 * \sigma you get a probability of about 95%, and that's what I have used.
......
Okay, when actually formulating that stuff, it gets lengthier and lengthier, and I suspect nobody is still with me here anyway ... ?
So unless there is explicit demand, i will mostly cut it here
......
With one exception, the approximation used for calculating the compound errors of the final ratios given at the top of the post.
This is the approximation used by pretty much anyone evaluating error propagation for any practical purpose:
\sigma_f =\sqrt{\sum_{i=1}^n (\frac{\partial f}{\partial x_i})^2 {\sigma_i}^2}
for more details and examples:
en.wikipedia.org/wiki/Propagation_of_uncertainty