Tag Archives: MarijnKeijzer

If you want to be cited, calibrate your agent-based model: A Reply to Chattoe-Brown

By Marijn A. Keijzer

This is a reply to a previous comment, (Chattoe-Brown 2022).

The social simulation literature has called on its proponents to enhance the quality and realism of their contributions through systematic validation and calibration (Flache et al., 2017). Model validation typically refers to assessments of how well the predictions of their agent-based models (ABMs) map onto empirically observed patterns or relationships. Calibration, on the other hand, is the process of enhancing the realism of the model by parametrizing it based on empirical data (Boero & Squazzoni, 2005). We would expect that presenting a validated or calibrated model serves as a signal of model quality, and would thus be a desirable characteristic of a paper describing an ABM.

In a recent contribution to RofASSS, Edmund Chattoe-Brown provocatively argued that model validation does not bear fruit for researchers interested in boosting their citations. In a sample of articles from JASSS published on opinion dynamics he observed that “the sample clearly divides into non-validated research with more citations and validated research with fewer” (Chattoe-Brown, 2022). Well-aware of the bias and limitations of the sample at hand, Chattoe-Brown calls on refutation of his hypothesis. An analysis of the corpus of articles in Web of Science, presented here, could serve that goal.

To test whether there exists an effect of model calibration and/or validation on the citation counts of papers, I compare citation counts of a larger number of original research articles on agent-based models published in the literature. I extracted 11,807 entries from Web of Science by searching for items that contained the phrases “agent-based model”, “agent-based simulation” or “agent-based computational model” in its abstract.[1] I then labeled all items that mention “validate” in its abstract as validated ABMs and those that mention “calibrate” as calibrated ABMs. This measure if rather crude, of course, as descriptions containing phrases like “we calibrated our model” or “others should calibrate our model” are both labeled as calibrated models. However, if mentioning that future research should calibrate or validate the model is not related to citations counts (which I would argue it indeed is not), then this inaccuracy does not introduce bias.

The shares of entries that mention calibration or validation are somewhat small. Overall, just 5.62% of entries mention validation, 3.21% report a calibrated model and 0.65% fall in both categories. The large sample size, however, will still enable the execution of proper statistical analysis and hypothesis testing.

How are mentions of calibration and validation in the abstract related to citation counts at face value? Bivariate analyses show only minor differences, as revealed in Figure 1. In fact, the distribution of citations for validated and non-validated ABMs (panel A) is remarkably similar. Wilcoxon tests with continuity correction—the nonparametric version of the simple t test—corroborate their similarity (W = 3,749,512, p = 0.555). The differences in citations between calibrated and non-calibrated models appear, albeit still small, more pronounced. Calibrated ABMs are cited slightly more often (panel B), as also supported by a bivariate test (W = 1,910,772, p < 0.001).

Picture 1

Figure 1. Distributions of number of citations of all the entries in the dataset for validated (panel A) and calibrated (panel B) ABMs and their averages with standard errors over years (panels C and D)

Age of the paper might be a more important determinant of citation counts, as panels C and D of Figure 1 suggest. Clearly, the age of a paper should be important here, because older papers have had much more opportunity to get cited. In particular, papers younger than 10 years seem to not have matured enough for its citation rates to catch up to older articles. When comparing the citation counts of purely theoretical models with calibrated and validated versions, this covariate should not be missed, because the latter two are typically much younger. In other words, the positive relationship between model calibration/validation and citation counts could be hidden in the bivariate analysis, as model calibration and validation are recent trends in ABM research.

I run a Poisson regression on the number of citations as explained by whether they are validated and calibrated (simultaneously) and whether they are both. The age of the paper is taken into account, as well as the number of references that the paper uses itself (controlling for reciprocity and literature embeddedness, one might say). Finally, the fields in which the papers have been published, as registered by Web of Science, have been added to account for potential differences between fields that explains both citation counts and conventions about model calibration and validation.

Table 1 presents the results from the four models with just the main effects of validation and calibration (model 1), the interaction of validation and calibration (model 2) and the full model with control variables (model 3).

Table 1. Poisson regression on the number of citations

# Citations
(1) (2) (3)
Validated -0.217*** -0.298*** -0.094***
(0.012) (0.014) (0.014)
Calibrated 0.171*** 0.064*** 0.076***
(0.014) (0.016) (0.016)
Validated x Calibrated 0.575*** 0.244***
(0.034) (0.034)
Age 0.154***
(0.0005)
Cited references 0.013***
(0.0001)
Field included No No Yes
Constant 2.553*** 2.556*** 0.337**
(0.003) (0.003) (0.164)
Observations 11,807 11,807 11,807
AIC 451,560 451,291 301,639
Note: *p<0.1; **p<0.05; ***p<0.01

The results from the analyses clearly suggest a negative effect of model validation and a positive effect of model calibration on the likelihood of being cited. The hypothesis that was so “badly in need of refutation” (Chattoe-Brown, 2022) will remain unrefuted for now. The effect does turn positive, however, when the abstract makes mention of calibration as well. In both the controlled (model 3) and uncontrolled (model 2) analyses, combining the effects of validation and calibration yields a positive coefficient overall.[2]

The controls in model 3 substantially affect the estimates from the three main factors of interest, while remaining in expected directions themselves. The age of a paper indeed helps its citation count, and so does the number of papers the item cites itself. The fields, furthermore, take away from the main effects somewhat, too, but not to a problematic degree. In an additional analysis, I have looked at the relationship between the fields and whether they are more likely to publish calibrated or validated models and found no substantial relationships. Citation counts will differ between fields, however. The papers in our sample are more often cited in, for example, hematology, emergency medicine and thermodynamics. The ABMs in the sample coming from toxicology, dermatology and religion are on the unlucky side of the equation, receiving less citations on average. Finally, I have also looked at papers published in JASSS specifically, due to the interest of Chattoe-Brown and the nature of this outlet. Surprisingly, the same analyses run on the subsample of these papers (N=376) showed a negative relationship between citation counts and model calibration/validation. Does the JASSS readership reveal its taste for artificial societies?

In sum, I find support for the hypothesis of Chattoe-Brown (2022) on the negative relationship between model validation and citations counts for papers presenting ABMs. If you want to be cited, you should not validate your ABM. Calibrated ABMs, on the other hand, are more likely to receive citations. What is more, ABMs that were both calibrated and validated are most the most successful papers in the sample. All conclusions were drawn considering (i.e. controlling for) the effects of age of the paper, the number of papers the paper cited itself, and (citation conventions in) the field in which it was published.

While the patterns explored in this and Chattoe-Brown’s recent contribution are interesting, or even puzzling, they should not distract from the goal of moving towards realistic agent-based simulations of social systems. In my opinion, models that combine rigorous theory with strong empirical foundations are instrumental to the creation of meaningful and purposeful agent-based models. Perhaps the results presented here should just be taken as another sign that citation counts are a weak signal of academic merit at best.

Data, code and supplementary analyses

All data and code used for this analysis, as well as the results from the supplementary analyses described in the text, are available here: https://osf.io/x9r7j/

Notes

[1] Note that the hyphen between “agent” and “based” does not affect the retrieved corpus. Both contributions that mention “agent based” and “agent-based” were retrieved.

[2] A small caveat to the analysis of the interaction effect is that the marginal improvement of model 2 upon model 1 is rather small (AIC difference of 269). This is likely (partially) due to the small number of papers that mention both calibration and validation (N=77).

Acknowledgements

Marijn Keijzer acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010.

References

Boero, R., & Squazzoni, F. (2005). Does empirical embeddedness matter? Methodological issues on agent-based models for analytical social science. Journal of Artificial Societies and Social Simulation, 8(4), 1–31. https://www.jasss.org/8/4/6.html

Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models

Flache, A., Mäs, M., Feliciani, T., Chattoe-Brown, E., Deffuant, G., Huet, S., & Lorenz, J. (2017). Models of social influence: towards the next frontiers. Journal of Artificial Societies and Social Simulation, 20(4). https://doi.org/10.18564/jasss.3521


Keijzer, M. (2022) If you want to be cited, calibrate your agent-based model: Reply to Chattoe-Brown. Review of Artificial Societies and Social Simulation, 9th Mar 2022. https://rofasss.org/2022/03/09/Keijzer-reply-to-Chattoe-Brown


 

No one can predict the future: More than a semantic dispute

By Carlos A. de Matos Fernandes and Marijn A. Keijzer

(A contribution to the: JASSS-Covid19-Thread)

Models are pivotal to battle the current COVID-19 crisis. In their call to action, Squazzoni et al. (2020) convincingly put forward how social simulation researchers could and should respond in the short run by posing three challenges for the community among which is a COVID-19 prediction challenge. Although Squazzoni et al. (2020) stress the importance of transparent communication of model assumptions and conditions, we question the liberal use of the word ‘prediction’ for the outcomes of the broad arsenal of models used to mitigate the COVID-19 crisis by ours and other modelling communities. Four key arguments are provided that advocate using expectations derived from scenarios when explaining our models to a wider, possibly non-academic audience.

The current COVID-19 crisis necessitates that we implement life-changing policies that, to a large extent, build upon predictions from complex, quickly adapted, and sometimes poorly understood models. The examples of models spurring the news to produce catchphrase headlines are abundant (Imperial College, AceMod-Australian Census-based Epidemic Model, IndiaSIM, IHME, etc.). And even though most of these models will be useful to assess the comparative effectiveness of interventions in our aim to ‘flatten the curve’, the predictions that disseminate to news media are those of total cases or timing of the inflection point.

The current focus on predictive epidemiological and behavioural models brings back an important discussion about prediction in social systems. “[T]here is a lot of pressure for social scientists to predict” (Edmonds, Polhill & Hales, 2019), and we might add ‘especially nowadays’. But forecasting in human systems is often tricky (Hofman, Sharma & Watts, 2017). Approaches that take well-understood theories and simple mechanisms often fail to grasp the complexity of social systems, yet models that rely on complex supervised machine learning-like approaches may offer misleading levels of confidence (as was elegantly shown recently by Salganik et al., 2020). COVID-19 models appear to be no exception as a recent review concluded that “[…] their performance estimates are likely to be optimistic and misleading” (Wynants et al., 2020, p. 9). Squazzoni et al. describe these pitfalls too (2020: paragraph 3.3). In the crisis at hand, it may even be counter-productive to rely on complex models that combine well-understood mechanisms with many uncertain parameters (Elsenbroich & Badham, 2020).

Considering the level of confidence we can have about predictive models in general, we believe there is an issue with the way predictions are communicated by the community. Scientists often use ‘prediction’ to refer to some outcome of a (statistical) model where they ‘predict’ aspects of the data that are already known, but momentarily set aside. Edmonds et al. (2019: paragraph 2.4) state that “[b]y ‘prediction’, we mean the ability to reliably anticipate well-defined aspects of data that is not currently known to a useful degree of accuracy via computations using the model”. Predictive accuracy, in this case, can then be computed later on, by comparing the prediction to the truth. Scientists know that when talking about predictions of their models, they don’t claim to generalize to situations outside of the narrow scope of their study sample or their artificial society. We are not predicting the future, and wouldn’t claim we could. However, this is wildly different from how ‘prediction’ is commonly understood: As an estimation of some unknown thing in the future. Now that our models quickly disseminate to the general public, we need to be careful with the way we talk about their outcomes.

Predictions in the COVID-19 crisis will remain imperfect. In the current virus outbreak, society cannot afford to rely on the falsification of models for interventions against empirical data. As the virus remains to spread rapidly, our only option is to rely on models as a basis for policy, ceteris paribus. And it is precisely here – at ‘ceteris paribus’ – where the terminology ‘predictions’ miss the mark. All things will not be equal tomorrow, the next day, or the day after that (Van Bavel et al. [2020] note numerous topics that affect managing the COVID-19 pandemic and its impact on society). Policies around the globe are constantly being tweaked, and people’s behaviour changes dramatically as a consequence (Google, 2020). Relying on predictions too much may give a false sense of security.

We propose to avoid using the word ‘prediction’ too much and talk about scenarios or expectations instead where possible. We identify four reasons why you should avoid talking about prediction right now:

  1. Not everyone is acquainted with noise and emergence. Computational Social Scientists generally understand the effects of noise in social systems (Squazzoni et al., 2020: paragraph 1.8). Small behavioural irregularities can be reinforced in complex systems fostering unexpected outcomes. Yet, scientists not acquainted with studying complex social systems may be unfamiliar with the principles we have internalized by now, and put over-confidence in the median outputs of volatile models that enter the scientific sphere as predictions.
  2. Predictions do not convey uncertainty. The general public is usually unacquainted with academic esoteric concepts. For instance, showing a flatten-the-curve scenario generally builds upon mean or median approximation, oftentimes neglecting to include variability of different scenarios. Still, there are numerous other outcomes, building on different parameter values. We fear that by stating a prediction to an undisciplined public, they expect such a thing to occur for certain. If we forecast a sunny day, but there’s rain, people are upset. Talking about scenarios, expectations, and mechanisms may prevent confusion and opposition when the forecast does not occur.
  3. It’s a model, not a reality. The previous argument feeds into the third notion: Be honest about what you model. A model is a model. Even the most richly calibrated model is a model. That is not to say that such models are not informative (we reiterate: models are not a shot in the dark). Still, richly calibrated models based on poor data may be more misleading than less calibrated models (Elsenbroich & Badham, 2020). Empirically calibrated models may provide more confidence at face value, but it lies in the nature of complex systems that small measurement errors in the input data may lead to big deviations in outputs. Models present a scenario for our theoretical reasoning with a given set of parameter values. We can update a model with empirical data to increase reliability but it remains a scenario about a future state given an (often expansive) set of assumptions (recently beautifully visualized by Koerth, Bronner, & Mithani, 2020).
  4. Stop predicting, start communicating. Communication is pivotal during a crisis. An abundance of research shows that communicating clearly and honestly is a best practice during a crisis, generally comforting the general public (e.g., Seeger, 2006). Squazzoni et al. (2020) call for transparent communication. by stating that “[t]he limitations of models and the policy recommendations derived from them have to be openly communicated and transparently addressed”. We are united in our aim to avert the COVID-19 crisis but should be careful that overconfidence doesn’t erode society’s trust in science. Stating unequivocally that we hope – based on expectations – to avert a crisis by implementing some policy, does not preclude altering our course of action when an updated scenario about the future may require us to do so. Modellers should communicate clearly to policy-makers and the general public that this is the role of computational models that are being updated daily.

Squazzoni et al. (2020) set out the agenda for our community in the coming months and it is an important one. Let’s hope that the expectations from the scenarios in our well-informed models will not fall on deaf ears.

References

Edmonds, B., Polhill, G., & Hales, D. (2019). Predicting Social Systems – a Challenge. Review of Artificial Societies and Social Simulation, 4th June 2019. https://rofasss.org/2018/11/04/predicting-social-systems-a-challenge

Edmonds, B., Le Page, C., Bithell, M., Chattoe-Brown, E., Grimm, V., Meyer, R., Montañola-Sales, C., Ormerod, P., Root, H., & Squazzoni, F. (2019). Different Modelling Purposes. Journal of Artificial Societies and Social Simulation, 22(3), 6. <http://jasss.soc.surrey.ac.uk/22/3/6.html> doi: 10.18564/jasss.3993

Elsenbroich, C., & Badham, J. (2020). Focussing on our Strengths. Review of Artificial Societies and Social Simulation, 12th April 2020.
https://rofasss.org/2020/04/12/focussing-on-our-strengths/

Google. (2020). COVID-19 Mobility Reports. https://www.google.com/covid19/mobility/ (Accessed 15th April 2020)

Hofman, J. M., Sharma, A., & Watts, D. J. (2017). Prediction and Explanation in Social Systems. Science, 355, 486–488. doi: 10.1126/science.aal3856

Koerth, M., Bronner, L., & Mithani, J. (2020, March 31). Why It’s So Freaking Hard To Make A Good COVID-19 Model. FiveThirtyEight. https://fivethirtyeight.com/

Salganik, M. J. et al. (2020). Measuring the Predictability of Life Outcomes with a Scientific Mass Collaboration. PNAS. 201915006. doi: 10.1073/pnas.1915006117

Seeger, M. W. (2006). Best Practices in Crisis Communication: An Expert Panel Process, Journal of Applied Communication Research, 34(3), 232-244.  doi: 10.1080/00909880600769944

Squazzoni, F., Polhill, J. G., Edmonds, B., Ahrweiler, P., Antosz, P., Scholz, G., Chappin, É., Borit, M., Verhagen, H., Giardini, F. and Gilbert, N. (2020) Computational Models That Matter During a Global Pandemic Outbreak: A Call to Action. Journal of Artificial Societies and Social Simulation, 23(2):10. <http://jasss.soc.surrey.ac.uk/23/2/10.html>. doi: 10.18564/jasss.4298

Van Bavel, J. J. et al. (2020). Using social and behavioural science to support COVID-19 pandemic response. PsyArXiv. https://doi.org/10.31234/osf.io/y38m9

Wynants. L., et al. (2020). Prediction models for diagnosis and prognosis of COVID-19 infection: systematic review and critical appraisal. BMJ, 369, m1328. doi: 10.1136/bmj.m1328


de Matos Fernandes, C. A. and Keijzer, M. A. (2020) No one can predict the future: More than a semantic dispute. Review of Artificial Societies and Social Simulation, 15th April 2020. https://rofasss.org/2020/04/15/no-one-can-predict-the-future/