Statistics and the International Genealogical Index (IGI)

The only approach that I have seen is Martin Ecclestone's The diffusion of English surnames (Local Historian, 1989) which examines the 1988 edition of the IGI. The following is a paraphrase of his illuminating article. This was groundbreaking material at the time. Now, in this age of the IGI on CD-ROM and IGI on-line, are these latest versions able to deliver similar or indeed enhanced statistical information?

All statistics in the following are copyright 1989 Martin Ecclestone, who has gracefully approved their reproduction here.

Mr Ecclestone wrote this article before the advent of the CD-ROM version of the IGI. A great advantage of the previous fiche version was its amenability to statistical analysis. The numbered frames of the fiche made it straightforward to count the number of entries per surname and the proportion of a county that any given surname entries consititutes. Repeating the exercise for each of the 39 English counties allows the geographical distribution of that surname's frequency to be tabulated.

An obvious drawback is the wide range of dates in the IGI - from 1538 to about 1900. Plus the well-known inconsistency of geographical coverage. Do these minuses nullify any findings? Mr Ecclestone attempts to address these issues.

He considers the dates objection by constructing a histogram of 2760 dates randomly selected from the Index. The resulting graph reveals a steady growth in entries from 1538, peaking in 1837, when there is a dramatic drop. This occurs because many parish record transcriptions stop in 1837 when the St Catherine's house records begin.

The histogram, and the following table reveals that the 1988 IGI entries are chiefly representative of eighteenth century England.

County No. of frames 1801 Pop/Frames Median Date
Bedford 14849 4.27 1754
Berkshire 12206 8.95 1754
Bucks 12470 8.62 1754
Cambridge 9179 9.73 1746
Chesire 12352 15.52 1754
Cornwall 26719 7.05 1789
Cumberland 18022 6.50 1803
Derby 21343 7.55 1796
Devon 42081 8.15 1726
Dorset 3987 28.92 1774
Durham 19975 8.03 1758
Essex 12044 18.80 1734
Gloucester 25905 9.68 1762
Hants/IOW 21411 10.26 1808
Hereford 6115 14.59 1777
Hertford 18015 5.42 1741
Hunts 991 37.91 1772
Kent 24875 12.37 1773
Lancashire 83541 8.05 1814
Leicester 17348 7.50 1774
Lincoln 37412 5.57 1711
London 154724 5.29 1770
Norfolk 20116 13.59 1735
Northants 5407 24.37 1746
Northumberland 21707 7.24 1767
Nottingham 17267 8.13 1801
Oxford 8649 12.67 1798
Rutland 1900 8.61 1741
Shropshire 23931 7.01 1773
Somerset 8820 31.04 1752
Stafford 33846 7.07 1780
Suffolk 21171 9.94 1779
Surrey 21121 12.74 1810
Sussex 20815 7.65 1748
Warwick 39389 5.29 1818
Westmoreland 5705 7.29 1770
Wiltshire 11613 15.94 1772
Worcester 23546 5.92 1806
Yorkshire 102989 8.34 1783
ENGLAND 987574 8.39 1772
WALES+      
MONMOUTH (1984) 32589 18.01 1820

Note: The number of frames excludes frames with no surname.
Note: The median date is the date for which there are as many earlier dates as later dates in the sample.
For England as a whole, the median date is 1772, the lower quartile date is 1693, and the upper quartile date is 1820. Thus 50% of the IGI entries fall in the inter-quartile range of 127 years.
earliest lowest median date for a county is Lincolnshire (1711), whilst the latest is Warwickshire (1818). The interquartile range for any county can be estimated as the difference between its median date and 1888.

IGI County Coverage

Column 3 of the above table is the ratio between the 1801 county populations and the number of IGI frames for each county. This ratio is 8.39 for England as a whole, but varies between 4.27 (Bedfordshire) and 37.9 (Huntingdonshire). High values represent counties that are under represented in the IGI in relation to their 1801 population, whilst conversely low values show the counties whose registers are the most complete or have been most fully transcribed.

Back Projection

"The tabulated ratios may be used to convert the number of frames containing a particular surname into an estimate of the 1801 population of that surname."

Mr Ecclestone cites the example of the surname Fuller. There are 7.5 frames of Fullers in the Bedford county index. Thus he estimates there were 32 (7.5 x 4.27) Fullers alive in 1801. Applying this method to the rest, results in an estimate of 4275 Fullers for England as a whole.

With my own name, Dance, there are 22 frames for the county of Worcester, which equates to a population of 130 people in 1801. I know from the censuses that the actual population in 1851 is 144, so the 130 estimate is a reasonable one. It is however important to cleanse the IGI data of any duplicates or patron submittals.

IGI Births/Marriages/Deaths Coverage

Martin Ecclestone says that "a measure of completeness of the English index is the proportion of births and marriages that are recorded as IGI entries at different periods." He gives the proportion of marriages (derived from random sampling ) as:

1540-1599 40%
1600-1699 39%
1700-1799 34%
1800-early 1800s 29%

This is then compared with an independent estimate of the number of marriages that actually occurred during the same decades. The same procedure is used to compare IGI baptismal records with total births.

Decade IGI bapt IGI marr Total births Total marr % bapt IGI % marr IGI
1570-9 0.16 0.12 1.135 0.333 14% 36%
1629-9 0.56 0.18 1.517 0.372 37% 48%
1650-9 0.37 0.11 1.445 0.452 26% 24%
1670-9 0.55 0.18 1.471 0.354 37% 51%
1720-9 0.71 0.22 1.754 0.480 40% 46%
1770-9 1.16 0.30 2.409 0.589 48% 51%
1820-9 1.96 0.40 4.770 0.980 41% 41%

The above table summarises his results from seven selected decades. "It demonstrates that births and marriages are more or less equally recorded except during the sixteenth century" and "apart from the Commonwealth period... the IGI is 40% to 50% complete between 1600 and 1837."

Mr Ecclestone concludes that the IGI contains almost a half of the number of records possible, during the 18th century. (This figure needs to be adjusted for individual counties, as shown in the first table). Although the median date varies for each county, "since surname distributions change rather slowly, it is felt that those which are obtained from the IGI data are probably fair descriptions of the mid-eighteenth century situation."

The article then proceeds to give some case studies from actual surname examples, and shows how their diffusion can be measured. Overall, it is a fascinating article. If you are interested in the possibilities of the IGI, then seek out a copy.

The Local Historian is published by the British Association for Local History (BALH).