Tag Archives: calibration

If you want to be cited, calibrate your agent-based model: A Reply to Chattoe-Brown

By Marijn A. Keijzer

This is a reply to a previous comment, (Chattoe-Brown 2022).

The social simulation literature has called on its proponents to enhance the quality and realism of their contributions through systematic validation and calibration (Flache et al., 2017). Model validation typically refers to assessments of how well the predictions of their agent-based models (ABMs) map onto empirically observed patterns or relationships. Calibration, on the other hand, is the process of enhancing the realism of the model by parametrizing it based on empirical data (Boero & Squazzoni, 2005). We would expect that presenting a validated or calibrated model serves as a signal of model quality, and would thus be a desirable characteristic of a paper describing an ABM.

In a recent contribution to RofASSS, Edmund Chattoe-Brown provocatively argued that model validation does not bear fruit for researchers interested in boosting their citations. In a sample of articles from JASSS published on opinion dynamics he observed that “the sample clearly divides into non-validated research with more citations and validated research with fewer” (Chattoe-Brown, 2022). Well-aware of the bias and limitations of the sample at hand, Chattoe-Brown calls on refutation of his hypothesis. An analysis of the corpus of articles in Web of Science, presented here, could serve that goal.

To test whether there exists an effect of model calibration and/or validation on the citation counts of papers, I compare citation counts of a larger number of original research articles on agent-based models published in the literature. I extracted 11,807 entries from Web of Science by searching for items that contained the phrases “agent-based model”, “agent-based simulation” or “agent-based computational model” in its abstract.[1] I then labeled all items that mention “validate” in its abstract as validated ABMs and those that mention “calibrate” as calibrated ABMs. This measure if rather crude, of course, as descriptions containing phrases like “we calibrated our model” or “others should calibrate our model” are both labeled as calibrated models. However, if mentioning that future research should calibrate or validate the model is not related to citations counts (which I would argue it indeed is not), then this inaccuracy does not introduce bias.

The shares of entries that mention calibration or validation are somewhat small. Overall, just 5.62% of entries mention validation, 3.21% report a calibrated model and 0.65% fall in both categories. The large sample size, however, will still enable the execution of proper statistical analysis and hypothesis testing.

How are mentions of calibration and validation in the abstract related to citation counts at face value? Bivariate analyses show only minor differences, as revealed in Figure 1. In fact, the distribution of citations for validated and non-validated ABMs (panel A) is remarkably similar. Wilcoxon tests with continuity correction—the nonparametric version of the simple t test—corroborate their similarity (W = 3,749,512, p = 0.555). The differences in citations between calibrated and non-calibrated models appear, albeit still small, more pronounced. Calibrated ABMs are cited slightly more often (panel B), as also supported by a bivariate test (W = 1,910,772, p < 0.001).

Picture 1

Figure 1. Distributions of number of citations of all the entries in the dataset for validated (panel A) and calibrated (panel B) ABMs and their averages with standard errors over years (panels C and D)

Age of the paper might be a more important determinant of citation counts, as panels C and D of Figure 1 suggest. Clearly, the age of a paper should be important here, because older papers have had much more opportunity to get cited. In particular, papers younger than 10 years seem to not have matured enough for its citation rates to catch up to older articles. When comparing the citation counts of purely theoretical models with calibrated and validated versions, this covariate should not be missed, because the latter two are typically much younger. In other words, the positive relationship between model calibration/validation and citation counts could be hidden in the bivariate analysis, as model calibration and validation are recent trends in ABM research.

I run a Poisson regression on the number of citations as explained by whether they are validated and calibrated (simultaneously) and whether they are both. The age of the paper is taken into account, as well as the number of references that the paper uses itself (controlling for reciprocity and literature embeddedness, one might say). Finally, the fields in which the papers have been published, as registered by Web of Science, have been added to account for potential differences between fields that explains both citation counts and conventions about model calibration and validation.

Table 1 presents the results from the four models with just the main effects of validation and calibration (model 1), the interaction of validation and calibration (model 2) and the full model with control variables (model 3).

Table 1. Poisson regression on the number of citations

# Citations
(1) (2) (3)
Validated -0.217*** -0.298*** -0.094***
(0.012) (0.014) (0.014)
Calibrated 0.171*** 0.064*** 0.076***
(0.014) (0.016) (0.016)
Validated x Calibrated 0.575*** 0.244***
(0.034) (0.034)
Age 0.154***
Cited references 0.013***
Field included No No Yes
Constant 2.553*** 2.556*** 0.337**
(0.003) (0.003) (0.164)
Observations 11,807 11,807 11,807
AIC 451,560 451,291 301,639
Note: *p<0.1; **p<0.05; ***p<0.01

The results from the analyses clearly suggest a negative effect of model validation and a positive effect of model calibration on the likelihood of being cited. The hypothesis that was so “badly in need of refutation” (Chattoe-Brown, 2022) will remain unrefuted for now. The effect does turn positive, however, when the abstract makes mention of calibration as well. In both the controlled (model 3) and uncontrolled (model 2) analyses, combining the effects of validation and calibration yields a positive coefficient overall.[2]

The controls in model 3 substantially affect the estimates from the three main factors of interest, while remaining in expected directions themselves. The age of a paper indeed helps its citation count, and so does the number of papers the item cites itself. The fields, furthermore, take away from the main effects somewhat, too, but not to a problematic degree. In an additional analysis, I have looked at the relationship between the fields and whether they are more likely to publish calibrated or validated models and found no substantial relationships. Citation counts will differ between fields, however. The papers in our sample are more often cited in, for example, hematology, emergency medicine and thermodynamics. The ABMs in the sample coming from toxicology, dermatology and religion are on the unlucky side of the equation, receiving less citations on average. Finally, I have also looked at papers published in JASSS specifically, due to the interest of Chattoe-Brown and the nature of this outlet. Surprisingly, the same analyses run on the subsample of these papers (N=376) showed a negative relationship between citation counts and model calibration/validation. Does the JASSS readership reveal its taste for artificial societies?

In sum, I find support for the hypothesis of Chattoe-Brown (2022) on the negative relationship between model validation and citations counts for papers presenting ABMs. If you want to be cited, you should not validate your ABM. Calibrated ABMs, on the other hand, are more likely to receive citations. What is more, ABMs that were both calibrated and validated are most the most successful papers in the sample. All conclusions were drawn considering (i.e. controlling for) the effects of age of the paper, the number of papers the paper cited itself, and (citation conventions in) the field in which it was published.

While the patterns explored in this and Chattoe-Brown’s recent contribution are interesting, or even puzzling, they should not distract from the goal of moving towards realistic agent-based simulations of social systems. In my opinion, models that combine rigorous theory with strong empirical foundations are instrumental to the creation of meaningful and purposeful agent-based models. Perhaps the results presented here should just be taken as another sign that citation counts are a weak signal of academic merit at best.

Data, code and supplementary analyses

All data and code used for this analysis, as well as the results from the supplementary analyses described in the text, are available here: https://osf.io/x9r7j/


[1] Note that the hyphen between “agent” and “based” does not affect the retrieved corpus. Both contributions that mention “agent based” and “agent-based” were retrieved.

[2] A small caveat to the analysis of the interaction effect is that the marginal improvement of model 2 upon model 1 is rather small (AIC difference of 269). This is likely (partially) due to the small number of papers that mention both calibration and validation (N=77).


Marijn Keijzer acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010.


Boero, R., & Squazzoni, F. (2005). Does empirical embeddedness matter? Methodological issues on agent-based models for analytical social science. Journal of Artificial Societies and Social Simulation, 8(4), 1–31. https://www.jasss.org/8/4/6.html

Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models

Flache, A., Mäs, M., Feliciani, T., Chattoe-Brown, E., Deffuant, G., Huet, S., & Lorenz, J. (2017). Models of social influence: towards the next frontiers. Journal of Artificial Societies and Social Simulation, 20(4). https://doi.org/10.18564/jasss.3521

Keijzer, M. (2022) If you want to be cited, calibrate your agent-based model: Reply to Chattoe-Brown. Review of Artificial Societies and Social Simulation, 9th Mar 2022. https://rofasss.org/2022/03/09/Keijzer-reply-to-Chattoe-Brown


Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM

By Edmund Chattoe-Brown

Today we have naming of parts. Yesterday,
We had daily cleaning. And tomorrow morning,
We shall have what to do after firing. But to-day,
Today we have naming of parts. Japonica
Glistens like coral in all of the neighbouring gardens,
And today we have naming of parts.
(Naming of Parts, Henry Reed, 1942)

It is not difficult to establish by casual reading that there are almost as many ways of using crucial terms like calibration and validation in ABM as there are actual instances of their use. This creates several damaging problems for scientific progress in the field. Firstly, when two different researchers both say they “validated” their ABMs they may mean different specific scientific activities. This makes it hard for readers to evaluate research generally, particularly if researchers assume that it is obvious what their terms mean (rather than explaining explicitly what they did in their analysis). Secondly, based on this, each researcher may feel that the other has not really validated their ABM but has instead done something to which a different name should more properly be given. This compounds the possible confusion in debate. Thirdly, there is a danger that researchers may rhetorically favour (perhaps unconsciously) uses that, for example, make their research sound more robustly empirical than it actually is. For example, validation is sometimes used to mean consistency with stylised facts (rather than, say, correspondence with a specific time series according to some formal measure). But we often have no way of telling what the status of the presented stylised facts is. Are they an effective summary of what is known in a field? Are they the facts on which most researchers agree or for which the available data presents the clearest picture? (Less reputably, can readers be confident that they were not selected for presentation because of their correspondence?) Fourthly, because these terms are used differently by different researchers it is possible that valuable scientific activities that “should” have agreed labels will “slip down the terminological cracks” (either for the individual or for the ABM community generally). Apart from clear labels avoiding confusion for others, they may help to avoid confusion for you too!

But apart from these problems (and there may be others but these are not the main thrust of my argument here) there is also a potential impasse. There simply doesn’t seem to be any value in arguing about what the “correct” meaning of validation (for example) should be. Because these are merely labels there is no objective way to resolve this issue. Further, even if we undertook to agree the terminology collectively, each individual would tend to argue for their own interpretation without solid grounds (because there are none to be had) and any collective decision would probably therefore be unenforceable. If we decide to invent arbitrary new terminology from scratch we not only run the risk of adding to the existing confusion of terms (rather than reducing it) but it is also quite likely that everyone will find the new terms unhelpful.

Unfortunately, however, we probably cannot do without labels for these scientific activities involved in quality controlling ABMs. If we had to describe everything we did without any technical shorthand, presenting research might well become impossibly unwieldy.

My proposed solution is therefore to invent terms from scratch (so we don’t end up arguing about our different customary usages to no purpose) but to do so on the basis of actual scientific practices reported in published research. For example, we might call the comparison of corresponding real and simulated data (which at least has the endorsement of the much used Gilbert and Troitzsch 2005 – see pp. 15-19 – to be referred to as validation) CORAS – Comparison Of Real And Simulated. Similarly, assigning values to parameters given the assumptions of model “structures” might be called PANV – Parameters Assigned Numerical Values.

It is very important to be clear what the intention is here. Naming cannot solve scientific problems or disagreements. (Indeed, failure to grasp this may well be why our terminology is currently so muddled as people try to get their different positions through “on the nod”.) For example, if we do not believe that correspondence with stylised facts and comparison measures on time series have equivalent scientific status then we will have to agree distinct labels for them and have the debate about their respective value separately. Perhaps the former could be called COSF – Comparison Of Stylised Facts. But it seems plainly easier to describe specific scientific activities accurately and then find labels for them than to have to wade through the existing marsh of ambiguous terminology and try to extract the associated science. An example of a practice which does not seem to have even one generally agreed label (and therefore seems to be neglected in ABM as a practice) is JAMS – Justifying A Model Structure. (Why are your agents adaptive rather than habitual or rational? Why do they mix randomly rather than in social networks?)

Obviously, there still needs to be community agreement for such a convention to be useful (and this may need to be backed institutionally for example by reviewing requirements). But the logic of the approach avoids several existing problems. Firstly, while the labels are useful shorthand, they are not arbitrary. Each can be traced back to a clearly definable scientific practice. Secondly, this approach steers a course between the Scylla of fruitless arguments from current muddled usage and the Charybdis of a novel set of terminology that is equally unhelpful to everybody. (Even if people cannot agree on labels, they knew how they built and evaluated their ABMs so they can choose – or create – new labels accordingly.) Thirdly, the proposed logic is extendable. As we clarify our thinking, we can use it to label (or improve the labels of) any current set of scientific practices. We will do not have to worry that we will run out of plausible words in everyday usage.

Below I suggest some more scientific practices and possible terms for them. (You will see that I have also tried to make the terms as pronounceable and distinct as possible.)

Practice Term
Checking the results of an ABM by building another.[1] CAMWA (Checking A Model With Another).
Checking ABM code behaves as intended (for example by debugging procedures, destructive testing using extreme values and so on). TAMAD (Testing A Model Against Description).
Justifying the structure of the environment in which agents act. JEM (Justifying the Environment of a Model): This is again a process that may pass unnoticed in ABM typically. For example, by assuming that agents only consider ethnic composition, the Schelling Model (Schelling 1969, 1971) does not “allow” locations to be desirable because, for example, they are near good schools. This contradicts what was known empirically well before (see, for example, Rossi 1955) and it isn’t clear whether simply saying that your interest is in an “abstract” model can justify this level of empirical neglect.
Finding out what effect parameter values have on ABM behaviour. EVOPE (Exploring Value Of Parameter Effects).
Exploring the sensitivity of an ABM to structural assumptions not justified empirically (see Chattoe-Brown 2021). ESOSA (Exploring the Sensitivity Of Structural Assumptions).

Clearly this list is incomplete but I think it would be more effective if characterising the scientific practices in existing ABM and naming them distinctively was a collective enterprise.


This research is funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5 (PI: Professor Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University: https://gtr.ukri.org/projects?ref=ES%2FS015159%2F1).


[1] It is likely that we will have to invent terms for subcategories of practices which differ in their aims or warranted conclusions. For example, rerunning the code of the original author (CAMWOC – Checking A Model With Original Code), building a new ABM from a formal description like ODD (CAMUS – Checking A Model Using Specification) and building a new ABM from the published description (CAMAP – Checking A Model As Published, see Chattoe-Brown et al. 2021).


Chattoe-Brown, Edmund (2021) ‘Why Questions Like “Do Networks Matter?” Matter to Methodology: How Agent-Based Modelling Makes It Possible to Answer Them’, International Journal of Social Research Methodology, 24(4), pp. 429-442. doi:10.1080/13645579.2020.1801602

Chattoe-Brown, Edmund, Gilbert, Nigel, Robertson, Duncan A. and Watts Christopher (2021) ‘Reproduction as a Means of Evaluating Policy Models: A Case Study of a COVID-19 Simulation’, medRXiv, 23 February. doi:10.1101/2021.01.29.21250743

Gilbert, Nigel and Troitzsch, Klaus G. (2005) Simulation for the Social Scientist, second edition (Maidenhead: Open University Press).

Rossi, Peter H. (1955) Why Families Move: A Study in the Social Psychology of Urban Residential Mobility (Glencoe, IL, Free Press).

Schelling, Thomas C. (1969) ‘Models of Segregation’, American Economic Review, 59(2), May, pp. 488-493. (available at https://www.jstor.org/stable/1823701)

Chattoe-Brown, E. (2022) Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM. Review of Artificial Societies and Social Simulation, 11th January 2022. https://rofasss.org/2022/01/11/naming-of-parts/


A Forgotten Contribution: Jean-Paul Grémy’s Empirically Informed Simulation of Emerging Attitude/Career Choice Congruence (1974)

By Edmund Chattoe-Brown

Since this is new venture, we need to establish conventions. Since JASSS has been running since 1998 (twenty years!) it is reasonable to argue that something un-cited in JASSS throughout that period has effectively been forgotten by the ABM community. This contribution by Grémy is actually a single chapter in a book otherwise by Boudon (a bibliographical oddity that may have contributed to its neglect. Grémy also appears to have published mostly in French, which may also have had an effect. An English summary of his contribution to simulation might be another useful item for RofASSS.) Boudon gets 6 hits on the JASSS search engine (as of 31.05.18), none of which mention simulation and Gremy gets no hits (as does Grémy: unfortunately it is hard to tell how online search engines “cope with” accents and thus whether this is a “real” result).

Since this book is still readily available as a mass-market paperback, I will not reprise the argument of the simulation here (and its limitations relative to existing ABM methodology could be a future RofASSS contribution). Nonetheless, even approximately empirical modelling in the mid-seventies is worthy of note and the article is early to say other important things (for example about simulation being able to avoid “technical assumptions” – made for solubility rather than realism).

The point of this contribution is to draw attention to an argument that I have only heard twice (and only found once in print) namely that we should look at the form of real data as an initial justification for using ABM at all (please correct me if there are earlier or better examples). Grémy (1974, p. 210) makes the point that initial incongruities between the attitudes that people hold (altruistic versus selfish) and their career choices (counsellor versus corporate raider) can be resolved in either direction as time passes (he knows this because Boudon analysed some data collected by Rosenberg at two points from US university students) as well as remaining unresolved and, as such, cannot readily be explained by some sort of “statistical trend” (that people become more selfish as they get older or more altruistic as they become more educated). He thus hypothesises (reasonably it seems to me) that the data requires a model of some sort of dynamic interaction process that Grémy then simulates, paying some attention to their survey results both in constraining the model and analysing its behaviour.

This seems to me an important methodological practice to rescue from neglect. (It is widely recognised anecdotally that people tend to use the research methods they know and like rather than the ones that are suitable.) Elsewhere (Chattoe-Brown 2014), inspired by this argument, I have shown how even casually accessed attitude change data really looks nothing like the output of the (very popular) Zaller-Deffuant model of opinion change (very roughly, 228 hits in JASSS for Deffuant, 8 for Zaller and 9 for Zaller-Deffuant though hyphens sometimes produce unreliable results for online search engines too.) The attitude of the ABM community to data seems to be rather uncomfortable. Perhaps support in theory and neglect in practice would sum it up (Angus and Hassani-Mahmooei 2015, Table 5 in section 4.5). But if our models can’t even “pass first base” with existing real data (let alone be calibrated and validated) should we be too surprised if what seems plausible to us does not seem plausible to social scientists in substantive domains (and thus diminishes their interest in ABM as a “real method?”) Even if others in the ABM community disagree with my emphasis on data (and I know that they do) I think this is a matter that should be properly debated rather than just left floating about in coffee rooms (as such this is what we intend RofASSS to facilitate). As W. C. Fields is reputed to have said (though actually the phrase appears to have been common currency), we would wish to avoid ABM being just “Another good story ruined by an eyewitness”.


Angus, Simon D. and Hassani-Mahmooei, Behrooz (2015) ‘“Anarchy” Reigns: A Quantitative Analysis of Agent-Based Modelling Publication Practices in JASSS, 2001-2012’, Journal of Artificial Societies and Social Simulation, 18(4):16.

Chattoe-Brown, Edmund (2014) ‘Using Agent Based Modelling to Integrate Data on Attitude Change’, Sociological Research Online, 19(1):16.

Gremy, Jean-Paul (1974) ‘Simulation Techniques’, in Boudon, Raymond, The Logic of Sociological Explanation (Harmondsworth: Penguin), chapter 11:209-227.

Chattoe-Brown, E. (2018) A Forgotten Contribution: Jean-Paul Grémy’s Empirically Informed Simulation of Emerging Attitude/Career Choice Congruence (1974). Review of Artificial Societies and Social Simulation, 1st June 2018. https://rofasss.org/2018/06/01/ecb/