I’m working on a problem which means I have to work out if a given set of astronomical alignments are significant. I have a possible solution, so now I’m testing it one someone else’s data. What I’m doing is treating the data as a Binomial Distribution. I have a few aims with this technique. First it has to give reliable results. Next I have to understand it. Thirdly and equally importantly I have to be able to describe it so that archaeologists and historians can follow the argument. If they can’t then it gets pointless writing. My analysis may not be correct, so I’m putting it up here and submitting that to Carnival of Mathematics and Tangled Bank to see if people think the maths is wrong. I’m also putting it up on Revise and Dissent where it will get submitted to the History Carnival and Four Stone Hearth to see if it’s intelligible and sounds reasonable to Historians and Archaeologists.
Roman Camps and their Orientations reconsidered.
There is currently a debate in the pages of the Oxford Journal of Archaeology on the orientations of Roman camps and forts. Richardson (2005:514-426) argues that the orientation of these camps is non-random and relied on some form of astronomical observation. He presents data which he argues supports his case. Recently Peterson (2007:103-108) has argued this relies on a flawed use of the Chi-squared test. I accept Peterson’s findings that Chi-squared is not a useful method. However examining the camps as a binomial distribution would be feasible and would make explicit the archaeological and astronomical assumptions made in the argument.
What is a Roman Camp?
The sites being examined are Roman camps and forts in England. One of the major advantages that the Roman army had over the native opposition when occupying new territory was their organisation. The Roman army was effectively a professional army taking on amateurs. Their camps reflect this organisation. Typically their early camps a ditch surrounded by a bank in a playing-card shape. They followed a set design. The rationale for this was if there were attacked by surprise equipment and people would be in the same place at each camp, minimising the effects of the surprise.
A Roman fort at Wallsend. Photo from Google Earth
The ancient sources give some detail on how to lay out a Roman camp. The main gate should face the enemy, or the line of advance (Vegetius 1.23, Hyginus 56). The rear gate should be on the higher ground to aid surveillance. Sites overlooked by hills were considered a bad idea, as were sites near woodlands which would allow the enemy to sneak up on the camp. The basic layout of the camp could be set up quickly by surveyors using gromae, surveying tools for laying out lines at right angles. Hyginus (chapter 12) states that you set up your groma at the junction at the centre of the camp and lay out your roads to the gates from there.
This would appear to be an efficient method of laying out a camp. Were observations to orientate the camp also part of the method? It doesn’t seem necessary, but Richardson (415,422-23) provides quotes from ancient sources which suggest this is plausible hypothesis in some circumstances.
Testing the camps for astronomical significance
To test his hypothesis Richardson has examined plans and measured the angle the long axis of a camp makes to the meridian, which yields an azimuth from true north. The accuracy of these measurements is questionable, not through Richardson’s work but in that they rely on an accurate marking of north. A bigger difficulty is that such measurements do not include declination data. If the local horizon is flat then an alignment will point to a different part of the celestial sphere than if it is mountainous. Finally Richardson notes that some Roman camps, being square, lacked a long axis. He states “…these were allocated to the nearest convenient group.” He then applies the Chi-squared test to his data. This is identified by Peterson as the weak point of Richardson’s analysis, so this should be examined closely.
Richardson’s hypothesis is that there is a deliberate orientation towards the cardinal points. There are 360 degrees in a circle and four cardinal points. A camp can point directly at a cardinal point. It can be within one degree of the target, within two degrees, within three degrees and so on. However, by the time you get to forty-five degrees you’re working closer to the next cardinal point. Richardson therefore says there are forty-five possible probability bins into which a camp can be placed. If there are sixty-seven camps in the sample, you would expect (67/45 = 1.49) 1 and a bit camps in each bin. Between one and two then. What Richardson finds that some bins have considerably more. For the Chi-Squared test what you do is go in each bin and subracted the expected value from observed value. You take the answer and square it, and then divide it by the expected value. You do this for all the bins and add up the results and call this final value chi-squared. Confusingly the Greek letter χ – chi looks like an X so you’ll often see it written X2.
Now what does this value tell us? What we need to do is look up the value in a χ2 table. Richardson has argued there are 45 different bins, so we look up the χ2 value with 44 degrees of freedom. This will give us a range of values. As χ2 gets higher, so it becomes more and more improbable. In Richardson’s case he argues that this value of χ2 in would only happen less than one time in a thousand by chance. The problem is how robust are these results?
This is the key criticism of Peterson, who argues that Chi-squared tests cannot reliably work on small samples. Worryingly he argues that the minimum expected value in any bin should be five, which means the minimum sample size for such a test should be 5×45= 225 camps. As a general rule this sample size is simply larger than could reasonably be hoped for.
There may be an alternative method. There is a probability that the camps will or will not satisfy certain criteria. This would appear to be suitable for examination as a binomial distribution. A binomial distribution is the set of circumstances that an event either will or will not occur in n attempts, where n is the sample size.
The easiest example is tossing a coin. It either will or will not land on heads. If you tossed sixty coins at the same time, then on average they would have heads thirty times. But not everyone would have thirty heads. There would be a normal distribution which means that there would be a spread of results some people would have thirty-one heads, others only twenty-nine. The formulae governing a binomial distribution are simple.
n is the sample size
p is the probability the event will happen between 0 – absolutely will not happen – and 1 absolutely will happen.
q is the probability the event will not happen, which will be 1-p
The average result will be np. In out case of sixty coin tosses, the average result will be 60×0.5=30 heads.
The standard deviation will be √(npq)
The reason you would want the standard deviation is that it tells you how significant a result might be. If you toss sixty coins in this distribution 2/3 of the time you will have 30 +/- 1 standard deviation heads. Ninety-five per cent of the time you will have 30 +/- two standard deviations. Ninety-nine per cent of the time you will have 30 +/- three standard deviations. Usually in the social sciences people get interested after 2 standard deviations.
Graph of probable results of tossing 60 coins. Red area is one standard deviation. Red and Orange two standard deviations. Click for full size.
This method doesn’t require the probability to be 50:50. Take a standard six sided die. The probability of throwing a six is one in six. If you throw sixty dice you should expect about 10 sixes. Sometimes you’ll have more other times less. Would it be unusual to throw twelve. In this case the formulae can show the answer.
Average = np = 60*1/6 = 10
Standard Deviation = √(npq) = √ (60×1/6×5/6) = 2.886751
So two thirds of the time you should expect to throw between eight and twelve sixes.
Chance of throwing sixes with sixty dice. Red area is one standard deviation. Red and Orange two standard deviations. Click for full size.
The Binomial distribution of Roman camps
What has this got to do with Roman camps? The Roman camps have a probability associated with them. They could by chance line up they do. So to assign a value to p we need to openly state what the probability they align as they do by chance is. This is an explicitly archaeological question. To return to the camps, how to you align a camp to North?
You could align hold up a plumb line to the North star and let it drop. If you do this then there’s a problem. Polaris isn’t exactly on the North Celestial Pole. In AD 250 it was at 89° 17′. So there’s a one and a half degree spread on either side of the North Celestial Pole. If you factor in at least some human error then the minimum accuracy you can argue for is 2/360. If you bring in evidence from the historical data then you could point out these camps were built during the day when Polaris wasn’t visible. This could make the alignments less accurate and anything +/- five degrees might have been thought to have been aligned north, in which case p is 10/360. Others might argue for other accuracies, but to get a value for p, you have to be explicit about which archaeological or historical evidence you’re using. Once you have a value you can plug it in, and if you like use a few other values too to see how robust your results are.
In the case of Richardson’s data he gives the data in 10° wide blocks between 0° and 180° from north. This gives a camp a 1/18 chance of being in any bin and allows us to plot this probability graph. The average is four. I’ve shaded to six green and to eight orange. In reality the standard deviation is 1.9, so the shading should stop just short, but his result of six camps facing within ten degrees of north does not look significant enough to be troubled with.
His other hypothesis is that the camps were aligned within 10 to cardinal points. In this instance this makes p equal to 4/18. This is an interesting result. On average we should expect fifteen camps to match. The standard deviation of 3.4 means that 15+/-6.8 between nine and twenty-two camps should face cardinal points. There is an element of doubt. If you throw enough tests at the data then you’d expect about one in twenty to have significance at two standard deviations. Nonetheless as a filter to ask if an idea is worth investigating further it is very useful.
Camps aligned to cardinal points. Click for full size.
Part of the problem is that it’s clear that local considerations, like the presence of a hostile force or direction of travel took precedence in planning. With closer survey it would be possible to eliminate some of these known non-astronomical alignments. The easiest category to remove would be those whose alignments matches the alignments of local roads as it seems highly likely that the army would have stopped here while moving along this route. The next step would be to survey the sites and see if other limitations like local topography define the orientations of camps. If pre-defined orientations can be eliminated then it may be possible to show astronomical alignment was used as a guide when there wasn’t a local enemy to face, or a route to follow.
There are a couple of advantages in using a binomial test for statistical significance. One is that it is comparatively simpler to use than Chi-squared. Chi-squared may not be arcane mathematics, but the suggestion by Peterson that help needs to be sought can be given to most archaeologists. Even in cases where Chi-squared is appropriate, if archaeologists cannot follow the argument, then perhaps another method is needed. Hopefully the above method is slightly more transparent.
The other advantage is that it can be applied to smaller samples. If the sample is too small to be statistically significant then this will become apparent. Take for instance tossing two coins. If they both come up heads is that significant? np states the expected average is one, two standard deviations would be 1.4, so we would expect 95% of samples to fall between 1 +/- 1.4. Given the maximum result will be 2, we will always be in this range. The test tells us the sample is too small to be helpful.
Also because the test is about evaluating the probability something will happen it can be used for non-spatial data. For example recently there have been claims that the Iron Age peoples of Fiskerton have been able to predict lunar eclipses using Saros cycles. This is data distributed in time, but nonetheless is should be possible to use the archaeological record and the data to argue for a value of p for the probability of predicting an eclipse in a Saros cycle within a given timeframe. That would probably be the other example used if I expand this into a full paper.
Peterson, J.W.M. 2007. ‘Random Orientation of Roman Camps’. Oxford Journal of Archaeology 26(1). 103-108.
Richardson, A. 2005. ‘The Orientation of Roman Camps and Forts’. Oxford Journal of Archaeology 24(4). 415-426.