Tag Archives: response

If you want to be cited, calibrate your agent-based model: A Reply to Chattoe-Brown

By Marijn A. Keijzer

This is a reply to a previous comment, (Chattoe-Brown 2022).

The social simulation literature has called on its proponents to enhance the quality and realism of their contributions through systematic validation and calibration (Flache et al., 2017). Model validation typically refers to assessments of how well the predictions of their agent-based models (ABMs) map onto empirically observed patterns or relationships. Calibration, on the other hand, is the process of enhancing the realism of the model by parametrizing it based on empirical data (Boero & Squazzoni, 2005). We would expect that presenting a validated or calibrated model serves as a signal of model quality, and would thus be a desirable characteristic of a paper describing an ABM.

In a recent contribution to RofASSS, Edmund Chattoe-Brown provocatively argued that model validation does not bear fruit for researchers interested in boosting their citations. In a sample of articles from JASSS published on opinion dynamics he observed that “the sample clearly divides into non-validated research with more citations and validated research with fewer” (Chattoe-Brown, 2022). Well-aware of the bias and limitations of the sample at hand, Chattoe-Brown calls on refutation of his hypothesis. An analysis of the corpus of articles in Web of Science, presented here, could serve that goal.

To test whether there exists an effect of model calibration and/or validation on the citation counts of papers, I compare citation counts of a larger number of original research articles on agent-based models published in the literature. I extracted 11,807 entries from Web of Science by searching for items that contained the phrases “agent-based model”, “agent-based simulation” or “agent-based computational model” in its abstract.[1] I then labeled all items that mention “validate” in its abstract as validated ABMs and those that mention “calibrate” as calibrated ABMs. This measure if rather crude, of course, as descriptions containing phrases like “we calibrated our model” or “others should calibrate our model” are both labeled as calibrated models. However, if mentioning that future research should calibrate or validate the model is not related to citations counts (which I would argue it indeed is not), then this inaccuracy does not introduce bias.

The shares of entries that mention calibration or validation are somewhat small. Overall, just 5.62% of entries mention validation, 3.21% report a calibrated model and 0.65% fall in both categories. The large sample size, however, will still enable the execution of proper statistical analysis and hypothesis testing.

How are mentions of calibration and validation in the abstract related to citation counts at face value? Bivariate analyses show only minor differences, as revealed in Figure 1. In fact, the distribution of citations for validated and non-validated ABMs (panel A) is remarkably similar. Wilcoxon tests with continuity correction—the nonparametric version of the simple t test—corroborate their similarity (W = 3,749,512, p = 0.555). The differences in citations between calibrated and non-calibrated models appear, albeit still small, more pronounced. Calibrated ABMs are cited slightly more often (panel B), as also supported by a bivariate test (W = 1,910,772, p < 0.001).

Picture 1

Figure 1. Distributions of number of citations of all the entries in the dataset for validated (panel A) and calibrated (panel B) ABMs and their averages with standard errors over years (panels C and D)

Age of the paper might be a more important determinant of citation counts, as panels C and D of Figure 1 suggest. Clearly, the age of a paper should be important here, because older papers have had much more opportunity to get cited. In particular, papers younger than 10 years seem to not have matured enough for its citation rates to catch up to older articles. When comparing the citation counts of purely theoretical models with calibrated and validated versions, this covariate should not be missed, because the latter two are typically much younger. In other words, the positive relationship between model calibration/validation and citation counts could be hidden in the bivariate analysis, as model calibration and validation are recent trends in ABM research.

I run a Poisson regression on the number of citations as explained by whether they are validated and calibrated (simultaneously) and whether they are both. The age of the paper is taken into account, as well as the number of references that the paper uses itself (controlling for reciprocity and literature embeddedness, one might say). Finally, the fields in which the papers have been published, as registered by Web of Science, have been added to account for potential differences between fields that explains both citation counts and conventions about model calibration and validation.

Table 1 presents the results from the four models with just the main effects of validation and calibration (model 1), the interaction of validation and calibration (model 2) and the full model with control variables (model 3).

Table 1. Poisson regression on the number of citations

# Citations
(1) (2) (3)
Validated -0.217*** -0.298*** -0.094***
(0.012) (0.014) (0.014)
Calibrated 0.171*** 0.064*** 0.076***
(0.014) (0.016) (0.016)
Validated x Calibrated 0.575*** 0.244***
(0.034) (0.034)
Age 0.154***
(0.0005)
Cited references 0.013***
(0.0001)
Field included No No Yes
Constant 2.553*** 2.556*** 0.337**
(0.003) (0.003) (0.164)
Observations 11,807 11,807 11,807
AIC 451,560 451,291 301,639
Note: *p<0.1; **p<0.05; ***p<0.01

The results from the analyses clearly suggest a negative effect of model validation and a positive effect of model calibration on the likelihood of being cited. The hypothesis that was so “badly in need of refutation” (Chattoe-Brown, 2022) will remain unrefuted for now. The effect does turn positive, however, when the abstract makes mention of calibration as well. In both the controlled (model 3) and uncontrolled (model 2) analyses, combining the effects of validation and calibration yields a positive coefficient overall.[2]

The controls in model 3 substantially affect the estimates from the three main factors of interest, while remaining in expected directions themselves. The age of a paper indeed helps its citation count, and so does the number of papers the item cites itself. The fields, furthermore, take away from the main effects somewhat, too, but not to a problematic degree. In an additional analysis, I have looked at the relationship between the fields and whether they are more likely to publish calibrated or validated models and found no substantial relationships. Citation counts will differ between fields, however. The papers in our sample are more often cited in, for example, hematology, emergency medicine and thermodynamics. The ABMs in the sample coming from toxicology, dermatology and religion are on the unlucky side of the equation, receiving less citations on average. Finally, I have also looked at papers published in JASSS specifically, due to the interest of Chattoe-Brown and the nature of this outlet. Surprisingly, the same analyses run on the subsample of these papers (N=376) showed a negative relationship between citation counts and model calibration/validation. Does the JASSS readership reveal its taste for artificial societies?

In sum, I find support for the hypothesis of Chattoe-Brown (2022) on the negative relationship between model validation and citations counts for papers presenting ABMs. If you want to be cited, you should not validate your ABM. Calibrated ABMs, on the other hand, are more likely to receive citations. What is more, ABMs that were both calibrated and validated are most the most successful papers in the sample. All conclusions were drawn considering (i.e. controlling for) the effects of age of the paper, the number of papers the paper cited itself, and (citation conventions in) the field in which it was published.

While the patterns explored in this and Chattoe-Brown’s recent contribution are interesting, or even puzzling, they should not distract from the goal of moving towards realistic agent-based simulations of social systems. In my opinion, models that combine rigorous theory with strong empirical foundations are instrumental to the creation of meaningful and purposeful agent-based models. Perhaps the results presented here should just be taken as another sign that citation counts are a weak signal of academic merit at best.

Data, code and supplementary analyses

All data and code used for this analysis, as well as the results from the supplementary analyses described in the text, are available here: https://osf.io/x9r7j/

Notes

[1] Note that the hyphen between “agent” and “based” does not affect the retrieved corpus. Both contributions that mention “agent based” and “agent-based” were retrieved.

[2] A small caveat to the analysis of the interaction effect is that the marginal improvement of model 2 upon model 1 is rather small (AIC difference of 269). This is likely (partially) due to the small number of papers that mention both calibration and validation (N=77).

Acknowledgements

Marijn Keijzer acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010.

References

Boero, R., & Squazzoni, F. (2005). Does empirical embeddedness matter? Methodological issues on agent-based models for analytical social science. Journal of Artificial Societies and Social Simulation, 8(4), 1–31. https://www.jasss.org/8/4/6.html

Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models

Flache, A., Mäs, M., Feliciani, T., Chattoe-Brown, E., Deffuant, G., Huet, S., & Lorenz, J. (2017). Models of social influence: towards the next frontiers. Journal of Artificial Societies and Social Simulation, 20(4). https://doi.org/10.18564/jasss.3521


Keijzer, M. (2022) If you want to be cited, calibrate your agent-based model: Reply to Chattoe-Brown. Review of Artificial Societies and Social Simulation, 9th Mar 2022. https://rofasss.org/2022/03/09/Keijzer-reply-to-Chattoe-Brown


 

Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”

By Frank Dignum

This is a reply to a review in JASSS (Chattoe-Brown 2021) of (Dignum 2021).

Before responding to some of the specific concerns of Edmund I would like to thank him for the thorough review. I am especially happy with his conclusion that the book is solid enough to make it a valuable contribution to scientific progress in modelling crises. That was the main aim of the book and it seems that is achieved. I want to reiterate what we already remarked in the book; we do not claim that we have the best or only way of developing an Agent-Based Model (ABM) for crises. Nor do we claim that our simulations were without limitations. But we do think it is an extensive foundation from which others can start, either picking up some bits and pieces, deviating from it in specific ways or extending it in specific ways.

The concerns that are expressed by Edmund are certainly valid. I agree with some of them, but will nuance some others. First of all the concern about the fact that we seem to abandon the NetLogo implementation and move to Repast. This fact does not make the ABM itself any less valid! In itself it is also an important finding. It is not possible to scale such a complex model in NetLogo beyond around two thousand agents. This is not just a limitation of our particular implementation, but a more general limitation of the platform. It leads to the important challenge to get more computer scientists involved to develop platforms for social simulations that both support the modelers adequately and provide efficient and scalable implementations.

That the sheer size of the model and the results make it difficult to trace back the importance and validity of every factor on the results is completely true. We have tried our best to highlight the most important aspects every time. But, this leaves questions as to whether we make the right selection of highlighted aspects. As an illustration to this, we have been busy for two months to justify our results of the simulations of the effectiveness of the track and tracing apps. We basically concluded that we need much better integrated analysis tools in the simulation platform. NetLogo is geared towards creating one simulation scenario, running the simulation and analyzing the results based on a few parameters. This is no longer sufficient when we have a model with which we can create many scenarios and have many parameters that influence a result. We used R now to interpret the flood of data that was produced with every scenario. But, R is not really the most user friendly tool and also not specifically meant for analyzing the data from social simulations.

Let me jump to the third concern of Edmund and link it to the analysis of the results as well. While we tried to justify the results of our simulation on the effectiveness of the track and tracing app we compared our simulation with an epidemiological based model. This is described in chapter 12 of the book. Here we encountered the difference in assumed number of contacts per day a person has with other persons. One can take the results, as quoted by Edmund as well, of 8 or 13 from empirical work and use them in the model. However, the dispute is not about the number of contacts a person has per day, but what counts as a contact! For the COVID-19 simulations standing next to a person in the queue in a supermarket for five minutes can count as a contact, while such a contact is not a meaningful contact in the cited literature. Thus, we see that what we take as empirically validated numbers might not at all be the right ones for our purpose. We have tried to justify all the values of parameters and outcomes in the context for which the simulations were created. We have also done quite some sensitivity analyses, which we did not all report on just to keep the volume of the book to a reasonable size. Although we think we did a proper job in justifying all results, that does not mean that one can have different opinions on the value that some parameters should have. It would be very good to check the influence on the results of changes in these parameters. This would also progress scientific insights in the usefulness of complex models like the one we made!

I really think that an ABM crisis response should be institutional. That does not mean that one institution determines the best ABM, but rather that the ABM that is put forward by that institution is the result of a continuous debate among scientists working on ABM’s for that type of crisis. For us, one of the more important outcomes of the ASSOCC project is that we really need much better tools to support the types of simulations that are needed for a crisis situation. However, it is very difficult to develop these tools as a single group. A lot of the effort needed is not publishable and thus not valued in an academic environment. I really think that the efforts that have been put in platforms such as NetLogo and Repast are laudable. They have been made possible by some generous grants and institutional support. We argue that this continuous support is also needed in order to be well equipped for a next crisis. But we do not argue that an institution would by definition have the last word in which is the best ABM. In an ideal case it would accumulate all academic efforts as is done in the climate models, but even more restricted models would still be better than just having a thousand individuals all claiming to have a useable ABM while governments have to react quickly to a crisis.

The final concern of Edmund is about the empirical scale of our simulations. This is completely true! Given the scale and details of what we can incorporate we can only simulate some phenomena and certainly not everything around the COVID-19 crisis. We tried to be clear about this limitation. We had discussions about the Unity interface concerning this as well. It is in principle not very difficult to show people walking in the street, taking a car or a bus, etc. However, we decided to show a more abstract representation just to make clear that our model is not a complete model of a small town functioning in all aspects. We have very carefully chosen which scenarios we can realistically simulate and give some insights in reality from. Maybe we should also have discussed more explicitly all the scenarios that we did not run with the reasons why they would be difficult or unrealistic in our ABM. One never likes to discuss all the limitations of one’s labor, but it definitely can be very insightful. I have made up for this a little bit by submitting an to a special issue on predictions with ABM in which I explain in more detail, which should be the considerations to use a particular ABM to try to predict some state of affairs. Anyone interested to learn more about this can contact me.

To conclude this response to the review, I again express my gratitude for the good and thorough work done. The concerns that were raised are all very valuable to concern. What I tried to do in this response is to highlight that these concerns should be taken as a call to arms to put effort in social simulation platforms that give better support for creating simulations for a crisis.

References

Dignum, F. (Ed.) (2021) Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis. Springer. DOI:10.1007/978-3-030-76397-8

Chattoe-Brown, E. (2021) A review of “Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis”. Journal of Artificial Society and Social Simulation. 24(4). https://www.jasss.org/24/4/reviews/1.html


Dignum, F. (2020) Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”. Review of Artificial Societies and Social Simulation, 4th Nov 2021. https://rofasss.org/2021/11/04/dignum-review-response/