Tag Archives: comment

If you want to be cited, calibrate your agent-based model: A Reply to Chattoe-Brown

By Marijn A. Keijzer

This is a reply to a previous comment, (Chattoe-Brown 2022).

The social simulation literature has called on its proponents to enhance the quality and realism of their contributions through systematic validation and calibration (Flache et al., 2017). Model validation typically refers to assessments of how well the predictions of their agent-based models (ABMs) map onto empirically observed patterns or relationships. Calibration, on the other hand, is the process of enhancing the realism of the model by parametrizing it based on empirical data (Boero & Squazzoni, 2005). We would expect that presenting a validated or calibrated model serves as a signal of model quality, and would thus be a desirable characteristic of a paper describing an ABM.

In a recent contribution to RofASSS, Edmund Chattoe-Brown provocatively argued that model validation does not bear fruit for researchers interested in boosting their citations. In a sample of articles from JASSS published on opinion dynamics he observed that “the sample clearly divides into non-validated research with more citations and validated research with fewer” (Chattoe-Brown, 2022). Well-aware of the bias and limitations of the sample at hand, Chattoe-Brown calls on refutation of his hypothesis. An analysis of the corpus of articles in Web of Science, presented here, could serve that goal.

To test whether there exists an effect of model calibration and/or validation on the citation counts of papers, I compare citation counts of a larger number of original research articles on agent-based models published in the literature. I extracted 11,807 entries from Web of Science by searching for items that contained the phrases “agent-based model”, “agent-based simulation” or “agent-based computational model” in its abstract.[1] I then labeled all items that mention “validate” in its abstract as validated ABMs and those that mention “calibrate” as calibrated ABMs. This measure if rather crude, of course, as descriptions containing phrases like “we calibrated our model” or “others should calibrate our model” are both labeled as calibrated models. However, if mentioning that future research should calibrate or validate the model is not related to citations counts (which I would argue it indeed is not), then this inaccuracy does not introduce bias.

The shares of entries that mention calibration or validation are somewhat small. Overall, just 5.62% of entries mention validation, 3.21% report a calibrated model and 0.65% fall in both categories. The large sample size, however, will still enable the execution of proper statistical analysis and hypothesis testing.

How are mentions of calibration and validation in the abstract related to citation counts at face value? Bivariate analyses show only minor differences, as revealed in Figure 1. In fact, the distribution of citations for validated and non-validated ABMs (panel A) is remarkably similar. Wilcoxon tests with continuity correction—the nonparametric version of the simple t test—corroborate their similarity (W = 3,749,512, p = 0.555). The differences in citations between calibrated and non-calibrated models appear, albeit still small, more pronounced. Calibrated ABMs are cited slightly more often (panel B), as also supported by a bivariate test (W = 1,910,772, p < 0.001).

Picture 1

Figure 1. Distributions of number of citations of all the entries in the dataset for validated (panel A) and calibrated (panel B) ABMs and their averages with standard errors over years (panels C and D)

Age of the paper might be a more important determinant of citation counts, as panels C and D of Figure 1 suggest. Clearly, the age of a paper should be important here, because older papers have had much more opportunity to get cited. In particular, papers younger than 10 years seem to not have matured enough for its citation rates to catch up to older articles. When comparing the citation counts of purely theoretical models with calibrated and validated versions, this covariate should not be missed, because the latter two are typically much younger. In other words, the positive relationship between model calibration/validation and citation counts could be hidden in the bivariate analysis, as model calibration and validation are recent trends in ABM research.

I run a Poisson regression on the number of citations as explained by whether they are validated and calibrated (simultaneously) and whether they are both. The age of the paper is taken into account, as well as the number of references that the paper uses itself (controlling for reciprocity and literature embeddedness, one might say). Finally, the fields in which the papers have been published, as registered by Web of Science, have been added to account for potential differences between fields that explains both citation counts and conventions about model calibration and validation.

Table 1 presents the results from the four models with just the main effects of validation and calibration (model 1), the interaction of validation and calibration (model 2) and the full model with control variables (model 3).

Table 1. Poisson regression on the number of citations

# Citations
(1) (2) (3)
Validated -0.217*** -0.298*** -0.094***
(0.012) (0.014) (0.014)
Calibrated 0.171*** 0.064*** 0.076***
(0.014) (0.016) (0.016)
Validated x Calibrated 0.575*** 0.244***
(0.034) (0.034)
Age 0.154***
(0.0005)
Cited references 0.013***
(0.0001)
Field included No No Yes
Constant 2.553*** 2.556*** 0.337**
(0.003) (0.003) (0.164)
Observations 11,807 11,807 11,807
AIC 451,560 451,291 301,639
Note: *p<0.1; **p<0.05; ***p<0.01

The results from the analyses clearly suggest a negative effect of model validation and a positive effect of model calibration on the likelihood of being cited. The hypothesis that was so “badly in need of refutation” (Chattoe-Brown, 2022) will remain unrefuted for now. The effect does turn positive, however, when the abstract makes mention of calibration as well. In both the controlled (model 3) and uncontrolled (model 2) analyses, combining the effects of validation and calibration yields a positive coefficient overall.[2]

The controls in model 3 substantially affect the estimates from the three main factors of interest, while remaining in expected directions themselves. The age of a paper indeed helps its citation count, and so does the number of papers the item cites itself. The fields, furthermore, take away from the main effects somewhat, too, but not to a problematic degree. In an additional analysis, I have looked at the relationship between the fields and whether they are more likely to publish calibrated or validated models and found no substantial relationships. Citation counts will differ between fields, however. The papers in our sample are more often cited in, for example, hematology, emergency medicine and thermodynamics. The ABMs in the sample coming from toxicology, dermatology and religion are on the unlucky side of the equation, receiving less citations on average. Finally, I have also looked at papers published in JASSS specifically, due to the interest of Chattoe-Brown and the nature of this outlet. Surprisingly, the same analyses run on the subsample of these papers (N=376) showed a negative relationship between citation counts and model calibration/validation. Does the JASSS readership reveal its taste for artificial societies?

In sum, I find support for the hypothesis of Chattoe-Brown (2022) on the negative relationship between model validation and citations counts for papers presenting ABMs. If you want to be cited, you should not validate your ABM. Calibrated ABMs, on the other hand, are more likely to receive citations. What is more, ABMs that were both calibrated and validated are most the most successful papers in the sample. All conclusions were drawn considering (i.e. controlling for) the effects of age of the paper, the number of papers the paper cited itself, and (citation conventions in) the field in which it was published.

While the patterns explored in this and Chattoe-Brown’s recent contribution are interesting, or even puzzling, they should not distract from the goal of moving towards realistic agent-based simulations of social systems. In my opinion, models that combine rigorous theory with strong empirical foundations are instrumental to the creation of meaningful and purposeful agent-based models. Perhaps the results presented here should just be taken as another sign that citation counts are a weak signal of academic merit at best.

Data, code and supplementary analyses

All data and code used for this analysis, as well as the results from the supplementary analyses described in the text, are available here: https://osf.io/x9r7j/

Notes

[1] Note that the hyphen between “agent” and “based” does not affect the retrieved corpus. Both contributions that mention “agent based” and “agent-based” were retrieved.

[2] A small caveat to the analysis of the interaction effect is that the marginal improvement of model 2 upon model 1 is rather small (AIC difference of 269). This is likely (partially) due to the small number of papers that mention both calibration and validation (N=77).

Acknowledgements

Marijn Keijzer acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010.

References

Boero, R., & Squazzoni, F. (2005). Does empirical embeddedness matter? Methodological issues on agent-based models for analytical social science. Journal of Artificial Societies and Social Simulation, 8(4), 1–31. https://www.jasss.org/8/4/6.html

Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models

Flache, A., Mäs, M., Feliciani, T., Chattoe-Brown, E., Deffuant, G., Huet, S., & Lorenz, J. (2017). Models of social influence: towards the next frontiers. Journal of Artificial Societies and Social Simulation, 20(4). https://doi.org/10.18564/jasss.3521


Keijzer, M. (2022) If you want to be cited, calibrate your agent-based model: Reply to Chattoe-Brown. Review of Artificial Societies and Social Simulation, 9th Mar 2022. https://rofasss.org/2022/03/09/Keijzer-reply-to-Chattoe-Brown


 

The Poverty of Suggestivism – the dangers of “suggests that” modelling

By Bruce Edmonds

Vagueness and refutation

A model[1] is basically composed of two parts (Zeigler 1976, Wartofsky 1979):

  1. A set of entities (such as mathematical equations, logical rules, computer code etc.) which can be used to make some inferences as to the consequences of that set (usually in conjunction with some data and parameter values)
  2. A mapping from this set to what it aims to represent – what the bits mean

Whilst a lot of attention has been paid to the internal rigour of the set of entities and the inferences that are made from them (1), the mapping to what that represents (2) has often been left as implicit or incompletely described – sometimes only indicated by the labels given to its parts. The result is a model that vaguely relates to its target, suggesting its properties analogically. There is not well-defined way that a model is to be applied to anything observed, but a new map is invented each time it is used to think about a particular case. I call this way of modelling “Suggestivism”, because the model “suggests” things about what is being modelled.

This is partly a recapitulation of Popper’s critique of vague theories in his book “The Poverty of Historicism” (1957). He characterised such theories as “irrefutable”, because whatever the facts, these theories could be made to fit them. Irrefutability is an indicator of a lack of precise mapping to reality – such vagueness makes refutation very hard. However, it is only an indicator; there may be other reasons than vagueness for it not being possible to test a theory – it is their disconnection from well-defined empirical reference that is the issue here.

Some might go as far as suggesting that any model or theory that is not refutable is “unscientific”, but this goes too far, implying a very restricted definition of what ‘science’ is. We need analogies to think about what we are doing and to gain insight into what we are studying, e.g. (Hartman 1997) – for humans they are unavoidable, ‘baked’ into the way language works (Lakoff 1987). A model might make a set of ideas clear and help map out the consequences of a set of assumptions/structures/processes. Many of these suggestivist models relate to a set of ideas and it is the ideas that relate to what is observed (albeit informally) (Edmonds 2001). However, such models do not capture anything reliable about what they refer to, and in that sense are not part of the set of the established statements and theories that is at the core of science  (Arnold 2014).

The dangers of suggestivist modelling

As above, there are valid uses of abstract or theoretical modelling where this is explicitly acknowledged and where no conclusions about observed phenomena are made. So what are the dangers of suggestivist modelling – why am I making such a fuss about it?

Firstly, that people often seem to confuse a model as an analogy – a way of thinking about stuff – and a model that tells us reliably about what we are studying. Thus they give undue weight to the analyses of abstract models that are, in fact, just thought experiments. Making models is a very intimate way of theorising – one spends an extended period of time interacting with one’s model: developing, checking, analysing etc. The result is a particularly strong version of “Kuhnian Spectacles” (Kuhn 1962) causing us to see the world though our model for weeks after. Under this strong influence it is natural to confuse what we can reliably infer about the world and how we are currently perceiving/thinking about it. Good scientists should then pause and wait for this effect to wear off so that they can effectively critique what they have done, its limitations and what its implications are. However, often in the rush to get their work out, modellers often do not do this, resulting in a sloppy set of suggestive interpretations of their modelling.

Secondly, empirical modelling is hard. It is far easier (and, frankly, more fun) to play with non-empirical models. A scientific culture that treats suggestivist modelling as substantial progress and significantly rewards modellers that do it, will effectively divert a lot of modelling effort in this direction. Chattoe-Brown (2018) displayed evidence of this in his survey of opinion dynamics models – abstract, suggestivist modelling got far more reward (in terms of citations) than those that tried to relate their model to empirical data in a direct manner. Abstract modelling has a role in science, but if it is easier and more rewarding then the field will become unbalanced. It may give the impression of progress but not deliver on this impression. In a more mature science, researchers working on measurement methods (steps from observation to models) and collecting good data are as important as the theorists (Moss 1998).

Thirdly, it is hard to judge suggestivist models. Given their connection to the modelling target is vague there cannot be any decisive test of its success. Good modellers should declare the exact purpose of their model, e.g. that is analogical or merely exploring the consequences of theory (Edmonds et al. 2019), but then accept the consequences of this choice – namely, that it excludes  making conclusions about the observed world. If it is for a theoretical exploration then the comprehensiveness of the exploration, the scope of the exploration and the applicability of the model can be judged, but if the model is analogical or illustrative then this is harder. Whilst one model may suggest X, another may suggest the opposite. It is quite easy to fix a model to get the outcomes one wants. Clearly, if a model makes startling suggestions – illustrating totally new ideas or making a counter-example to widely held assumptions – then this helps science by widening the pool of theories or hypotheses that are considered. However most suggestivist modelling does not do this.

Fourthly, their sheer flexibility of as to application causes problems – if one works hard enough one can invent mappings to a wide range of cases, the limits are only those of our imagination. In effect, having a vague mapping from model to what it models adds in huge flexibility in a similar way to having a large number of free (non-empirical) parameters. This flexibility gives an impression of generality, and many desire simple and general models for complex phenomena. However, this is illusory because a different mapping is needed for each case, to make it apply. Given the above (1)+(2) definition of a model this means that, in fact, it is a different model for each case – what a model refers to, is part of the model. The same flexibility makes such models impossible to refute, since one can just adjust the mapping to save them. The apparent generality and lack of refutation means that such models hang around in the literature, due to their surface attractiveness.

Finally, these kinds of model are hugely influential beyond the community of modellers to the wider public including policy actors. Narratives that start in abstract models make their way out and can be very influential (Vranckx 1999). Despite the lack of rigorous mapping from model to reality, suggestivist models look impressive, look scientific. For example, very abstract models from the Neo-Classical ‘Chicago School’ of economists supported narratives about the optimal efficiency of markets, leading to a reluctance to regulate them (Krugman 2009). A lack of regulation seemed to be one of the factors behind the 2007/8 economic crash (Baily et al 2008). Modellers may understand that other modellers get over-enthusiastic and over-interpret their models, but others may not. It is the duty of modellers to give an accurate impression of the reliability of any modelling results and not to over-hype them.

How to recognise a suggestivist model

It can be hard to detangle how empirically vague a model is, because many descriptions about modelling work do not focus on making the mapping to what it represents precise. The reasons for this are various, for example: the modeller might be conflating reality and what is in the model in their minds, the researcher is new to modelling and has not really decided what the purpose of their model is, the modeller might be over-keen to establish the importance of their work and so is hyping the motivation and conclusions, they might simply not got around to thinking enough about the relationship between their model and what it might represent, or they might not have bothered to make the relationship explicit in their description. Whatever the reason the reader of any description of such work is often left with an archaeological problem: trying to unearth what the relationship might be, based on indirect clues only. The only way to know for certain is to take a case one knows about and try and apply the model to it, but this is a time consuming process and relies upon having a case with suitable data available. However, there are some indicators, albeit fallible ones, including the following.

  • A relatively simple model is interpreted as explaining a wide range of observed, complex phenomena
  • No data from an observed case study is compared to data from the model (often no data is brought in at all, merely abstract observations) – despite this, conclusions about some observed phenomena are made
  • The purpose of the model is not explicitly declared
  • The language of the paper seems to conflate talking about the model with what is being modelled
  • In the paper there is sudden abstraction ‘jump’ between the motivation and the description of the model and back again to the interpretation of the results in terms of that motivation. The abstraction jumps involved are large and justified by some a priori theory or modelling precedents rather than evidence.

How to avoid suggestivist modelling

How to avoid the dangers of suggestivist modelling should be clear from the above discussion, but I will make them explicit here.

  • Be clear about the model purpose – that is what kind of thing the model aims to achieve which indicates how it should be judged by others (Edmonds et al 2019)
  • Do not make any conclusions about the real world if you have not related the model to any data
  • Do not make any policy conclusions – things that might affect other people’s lives – without at least some independent validation of the model outcomes
  • Document how a model relates (or should relate) to data, the nature of that data and maybe even the process whereby that data should be obtained (Achter et al 2019)
  • Be explicit as possible about what kinds of phenomena the model applies to – the limits of its scope
  • Keep the language about the model and what is being modelled distinct – for any statement it should be clear whether it is talking about the model or what it models (Edmonds 2020)
  • Highlight any bold assumptions in the specification of the model or describe what empirical foundation there is for them – be honest about these

Conclusion

Models can serve many different purposes (Epstein 2008). This is fine as long as the purpose of models are always made clear, and model results are not interpreted further than their established purpose allows. Research which gives the impression that analogical, illustrative or theoretical modelling can tell us anything reliable about observed complex phenomena is not only sloppy science, but can have a deleterious impact – giving an impression of progress whilst diverting attention from empirically reliable work. Like a bad investment: if it looks too good and too easy to be true, it is probably isn’t.

Notes

[1] We often use the word “model” in a lazy way to indicate (1) rather than (1)+(2) in this definition, but a set of entities without any meaning or mapping to anything else is not a model, as it does not represent anything. For example, a random set of equations or program instructions does not make a model.

Acknowledgements

Bruce Edmonds is supported as part of the ESRC-funded, UK part of the “ToRealSim” project, grant number ES/S015159/1.

References

Achter, S., Borit, M., Chattoe-Brown, E., Palaretti, C. & Siebers, P.-O. (2019) Cherchez Le RAT: A Proposed Plan for Augmenting Rigour and Transparency of Data Use in ABM. Review of Artificial Societies and Social Simulation, 4th June 2019. https://rofasss.org/2019/06/04/rat/

Arnold, E. (2014). What’s wrong with social simulations?. The Monist, 97(3), 359-377. DOI:10.5840/monist201497323

Baily, M. N., Litan, R. E., & Johnson, M. S. (2008). The origins of the financial crisis. Fixing Finance Series – Paper 3, The Brookings Institution. https://www.brookings.edu/wp-content/uploads/2016/06/11_origins_crisis_baily_litan.pdf

Chattoe-Brown, E. (2018) What is the earliest example of a social science simulation (that is nonetheless arguably an ABM) and shows real and simulated data in the same figure or table? Review of Artificial Societies and Social Simulation, 11th June 2018. https://rofasss.org/2018/06/11/ecb/

Edmonds, B. (2001) The Use of Models – making MABS actually work. In. Moss, S. and Davidsson, P. (eds.), Multi Agent Based Simulation, Lecture Notes in Artificial Intelligence, 1979:15-32. http://cfpm.org/cpmrep74.html

Edmonds, B. (2020) Basic Modelling Hygiene – keep descriptions about models and what they model clearly distinct. Review of Artificial Societies and Social Simulation, 22nd May 2020. https://rofasss.org/2020/05/22/modelling-hygiene/

Edmonds, B., le Page, C., Bithell, M., Chattoe-Brown, E., Grimm, V., Meyer, R., Montañola-Sales, C., Ormerod, P., Root H. & Squazzoni. F. (2019) Different Modelling Purposes. Journal of Artificial Societies and Social Simulation, 22(3):6. http://jasss.soc.surrey.ac.uk/22/3/6.html.

Epstein, J. M. (2008). Why model?. Journal of artificial societies and social simulation, 11(4), 12. https://jasss.soc.surrey.ac.uk/11/4/12.html

Hartmann, S. (1997): Modelling and the Aims of Science. In: Weingartner, P. et al (ed.) : The Role of Pragmatics in Contemporary Philosophy: Contributions of the Austrian Ludwig Wittgenstein Society. Vol. 5. Wien und Kirchberg: Digi-Buch. pp. 380-385. https://epub.ub.uni-muenchen.de/25393/

Krugman, P. (2009) How Did Economists Get It So Wrong? New York Times, Sept. 2nd 2009. https://www.nytimes.com/2009/09/06/magazine/06Economic-t.html

Kuhn, T.S. (1962) The Structure of Scientific Revolutions. Chicago: University of Chicago Press.

Lakoff, G. (1987) Women, fire, and dangerous things. University of Chicago Press, Chicago.

Morgan, M. S., & Morrison, M. (1999). Models as mediators. Cambridge: Cambridge University Press.

Moss, S. (1998) Social Simulation Models and Reality: Three Approaches. Centre for Policy Modelling  Discussion Paper: CPM-98-35, http://cfpm.org/cpmrep35.html

Popper, K. (1957). The poverty of historicism. Routledge.

Vranckx, An. (1999) Science, Fiction & the Appeal of Complexity. In Aerts, Diederik, Serge Gutwirth, Sonja Smets, and Luk Van Langehove, (eds.) Science, Technology, and Social Change: The Orange Book of “Einstein Meets Magritte.” Brussels: Vrije Universiteit Brussel; Dordrecht: Kluwer., pp. 283–301.

Wartofsky, M. W. (1979). The model muddle: Proposals for an immodest realism. In Models (pp. 1-11). Springer, Dordrecht.

Zeigler, B. P. (1976). Theory of Modeling and Simulation. Wiley Interscience, New York.


Edmonds, B. (2022) The Poverty of Suggestivism – the dangers of "suggests that" modelling. Review of Artificial Societies and Social Simulation, 28th Feb 2022. https://rofasss.org/2022/02/28/poverty-suggestivism


 

If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation

By Edmund Chattoe-Brown

As part of a previous research project, I collected a sample of the Opinion Dynamics (hereafter OD) models published in JASSS that were most highly cited in JASSS. The idea here was to understand what styles of OD research were most influential in the journal. In the top 50 on 19.10.21 there were eight such articles. Five were self-contained modelling exercises (Hegselmann and Krause 2002, 58 citations, Deffuant et al. 2002, 35 citations, Salzarulo 2006, 13 citations, Deffuant 2006, 13 citations and Urbig et al. 2008, 9 citations), two were overviews of OD modelling (Flache et al. 2017, 13 citations and Sobkowicz 2009, 10 citations) and one included an OD example in an article mainly discussing the merits of cellular automata modelling (Hegselmann and Flache 1998, 12 citations). In order to get in to the top 50 on that date you had to achieve at least 7 citations. In parallel, I have been trying to identify Agent-Based Models that are validated (undergo direct comparison of real and equivalent simulated data). Based on an earlier bibliography (Chattoe-Brown 2020) which I extended to the end of 2021 for JASSS and articles which were described as validated in the highly cited articles listed above, I managed to construct a small and unsystematic sample of validated OD models. (Part of the problem with a systematic sample is that validated models are not readily searchable as a distinct category and there are too many OD models overall to make reading them all feasible. Also, I suspect, validated models just remain rare in line with the larger scale findings of Dutton and Starbuck (1971, p. 130, table 1) and discouragingly, much more recently, Angus and Hassani-Mahmooei (2015, section 4.5, figure 9). Obviously, since part of the sample was selected by total number of citations, one cannot make a comparison on that basis, so instead I have used the best possible alternative (given the limitations of the sample) and compared articles on citations per year. The problem here is that attempting validated modelling is relatively new while older articles inevitably accumulate citations however slowly. But what I was trying to discover was whether new validated models could be cited at a much higher annual rate without reaching the top 50 (or whether, conversely, older articles could have a high enough total citations to get into the top 50 without having a particularly impressive annual citation rate.) One would hope that, ultimately, validated models would tend to receive more citations than those that were not validated (but see the rather disconcerting related findings of Serra-Garcia and Gneezy 2021). Table 1 shows the results sorted by citations per year.

Article Status Number of JASSS Citations[1] Number of Years[2] Citations Per Year
Bernardes et al. 2002 Validated 1 20 0.05
Bernardes et al. 2001 Validated 2 21 0.096
Fortunato and Castellano 2007 Validated 2 15 0.13
Caruso and Castorina 2005 Validated 4 17 0.24
Chattoe-Brown 2014 Validated 2 8 0.25
Brousmiche et al. 2016 Validated 2 6 0.33
Hegselmann and Flache 1998 Non-Validated 12 24 0.5
Urbig et al. 2008 Non-Validated 9 14 0.64
Sobkowicz 2009 Non-Validated 10 13 0.77
Deffuant 2006 Non-Validated 13 16 0.81
Salzarulo 2006 Non-Validated 13 16 0.81
Duggins 2017 Validated 5 5 1
Deffuant et al. 2002 Non-Validated 35 20 1.75
Flache et al. 2017 Non-Validated 13 5 2.6
Hegselmann and Krause 2002 Non-Validated 58 20 2.9

Table 1. Annual Citation Rates for OD Articles Highly Cited in JASSS (Systematic Sample) and Validated OD Articles in or Cited in JASSS (Unsystematic Sample)

With the notable (and potentially encouraging) exception of Duggins (2017), the most recent validated OD model I have been able to discover in JASSS, the sample clearly divides into non-validated research with more citations and validated research with fewer. The position of Duggins (2017) might suggest greater recent interest in validated OD models. Unfortunately, however, qualitative analysis of the citations suggests that these are not cited as validated models per se (and thus as a potential improvement over non-validated models) but merely as part of general classes of OD model (like those involving social networks or repulsion – moving away from highly discrepant opinions). This tendency to cite validated models without acknowledging that they are validated (and what the implications of that might be) is widespread in the articles I looked at.

Obviously, there is plenty wrong with this analysis. Even looking at citations per annum we are arguably still partially sampling on the dependent variable (articles selected for being widely cited prove to be widely cited!) and the sample of validated OD models is unsystematic (though in fairness the challenges of producing a systematic sample are significant.[3]) But the aim here is to make a distinctive use of RoFASSS as a rapid mode of permanent publication and to think differently about science. If I tried to publish this in a peer reviewed journal, the amount of labour required to satisfy reviewers about the research design would probably be prohibitive (even if it were possible). As a result, the case to answer about this apparent (and perhaps undesirable) pattern in data might never see the light of day.

But by publishing quickly in RoFASSS without the filter of peer review I actively want my hypothesis to be rejected or replaced by research based on a better design (and such research may be motivated precisely by my presenting this interesting pattern with all its imperfections). When it comes to scientific progress, the chance to be clearly wrong now could be more useful than the opportunity to be vaguely right at some unknown point in the future.

Acknowledgements

This analysis was funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5 (PI: Professor Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University: https://gtr.ukri.org/projects?ref=ES%2FS015159%2F1).

Notes

[1] Note that the validated OD models had their citations counted manually while the high total citation articles had them counted automatically. This may introduce some comparison error but there is no reason to think that either count will be terribly inaccurate.

[2] Including the year of publication and the current year (2021).

[3] Note, however, that there are some checks and balances on sample quality. Highly successful validated OD models would have shown up independently in the top 50. There is thus an upper bound to the impact of the articles I might have missed in manually constructing my “version 1” bibliography. The unsystematic review of 47 articles by Sobkowicz (2009) also checks independently on the absence of validated OD models in JASSS to that date and confirms the rarity of such articles generally. Only four of the articles that he surveys are significantly empirical.

References

Angus, Simon D. and Hassani-Mahmooei, Behrooz (2015) ‘“Anarchy” Reigns: A Quantitative Analysis of Agent-Based Modelling Publication Practices in JASSS, 2001-2012’, Journal of Artificial Societies and Social Simulation, 18(4), October, article 16, <http://jasss.soc.surrey.ac.uk/18/4/16.html>. doi:10.18564/jasss.2952

Bernardes, A. T., Costa, U. M. S., Araujo, A. D. and Stauffer, D. (2001) ‘Damage Spreading, Coarsening Dynamics and Distribution of Political Votes in Sznajd Model on Square Lattice’, International Journal of Modern Physics C: Computational Physics and Physical Computation, 12(2), February, pp. 159-168. doi:10.1140/e10051-002-0013-y

Bernardes, A. T., Stauffer, D. and Kertész, J. (2002) ‘Election Results and the Sznajd Model on Barabasi Network’, The European Physical Journal B: Condensed Matter and Complex Systems, 25(1), January, pp. 123-127. doi:10.1142/S0129183101001584

Brousmiche, Kei-Leo, Kant, Jean-Daniel, Sabouret, Nicolas and Prenot-Guinard, François (2016) ‘From Beliefs to Attitudes: Polias, A Model of Attitude Dynamics Based on Cognitive Modelling and Field Data’, Journal of Artificial Societies and Social Simulation, 19(4), October, article 2, <https://www.jasss.org/19/4/2.html>. doi:10.18564/jasss.3161

Caruso, Filippo and Castorina, Paolo (2005) ‘Opinion Dynamics and Decision of Vote in Bipolar Political Systems’, arXiv > Physics > Physics and Society, 26 March, version 2. doi:10.1142/S0129183105008059

Chattoe-Brown, Edmund (2014) ‘Using Agent Based Modelling to Integrate Data on Attitude Change’, Sociological Research Online, 19(1), February, article 16, <https://www.socresonline.org.uk/19/1/16.html>. doi:0.5153/sro.3315

Chattoe-Brown Edmund (2020) ‘A Bibliography of ABM Research Explicitly Comparing Real and Simulated Data for Validation: Version 1’, CPM Report CPM-20-216, 12 June, <http://cfpm.org/discussionpapers/256>

Deffuant, Guillaume (2006) ‘Comparing Extremism Propagation Patterns in Continuous Opinion Models’, Journal of Artificial Societies and Social Simulation, 9(3), June, article 8, <https://www.jasss.org/9/3/8.html>.

Deffuant, Guillaume, Amblard, Frédéric, Weisbuch, Gérard and Faure, Thierry (2002) ‘How Can Extremism Prevail? A Study Based on the Relative Agreement Interaction Model’, Journal of Artificial Societies and Social Simulation, 5(4), October, article 1, <https://www.jasss.org/5/4/1.html>.

Duggins, Peter (2017) ‘A Psychologically-Motivated Model of Opinion Change with Applications to American Politics’, Journal of Artificial Societies and Social Simulation, 20(1), January, article 13, <http://jasss.soc.surrey.ac.uk/20/1/13.html>. doi:10.18564/jasss.3316

Dutton, John M. and Starbuck, William H. (1971) ‘Computer Simulation Models of Human Behavior: A History of an Intellectual Technology’, IEEE Transactions on Systems, Man, and Cybernetics, SMC-1(2), April, pp. 128-171. doi:10.1109/TSMC.1971.4308269

Flache, Andreas, Mäs, Michael, Feliciani, Thomas, Chattoe-Brown, Edmund, Deffuant, Guillaume, Huet, Sylvie and Lorenz, Jan (2017) ‘Models of Social Influence: Towards the Next Frontiers’, Journal of Artificial Societies and Social Simulation, 20(4), October, article 2, <http://jasss.soc.surrey.ac.uk/20/4/2.html>. doi:10.18564/jasss.3521

Fortunato, Santo and Castellano, Claudio (2007) ‘Scaling and Universality in Proportional Elections’, Physical Review Letters, 99(13), 28 September, article 138701. doi:10.1103/PhysRevLett.99.138701

Hegselmann, Rainer and Flache, Andreas (1998) ‘Understanding Complex Social Dynamics: A Plea For Cellular Automata Based Modelling’, Journal of Artificial Societies and Social Simulation, 1(3), June, article 1, <https://www.jasss.org/1/3/1.html>.

Hegselmann, Rainer and Krause, Ulrich (2002) ‘Opinion Dynamics and Bounded Confidence Models, Analysis, and Simulation’, Journal of Artificial Societies and Social Simulation, 5(3), June, article 2, <http://jasss.soc.surrey.ac.uk/5/3/2.html>.

Salzarulo, Laurent (2006) ‘A Continuous Opinion Dynamics Model Based on the Principle of Meta-Contrast’, Journal of Artificial Societies and Social Simulation, 9(1), January, article 13, <http://jasss.soc.surrey.ac.uk/9/1/13.html>.

Serra-Garcia, Marta and Gneezy, Uri (2021) ‘Nonreplicable Publications are Cited More Than Replicable Ones’, Science Advances, 7, 21 May, article eabd1705. doi:10.1126/sciadv.abd1705

Sobkowicz, Pawel (2009) ‘Modelling Opinion Formation with Physics Tools: Call for Closer Link with Reality’, Journal of Artificial Societies and Social Simulation, 12(1), January, article 11, <http://jasss.soc.surrey.ac.uk/12/1/11.html>.

Urbig, Diemo, Lorenz, Jan and Herzberg, Heiko (2008) ‘Opinion Dynamics: The Effect of the Number of Peers Met at Once’, Journal of Artificial Societies and Social Simulation, 11(2), March, article 4, <http://jasss.soc.surrey.ac.uk/11/2/4.html>.


Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models


 

Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM

By Edmund Chattoe-Brown


Today we have naming of parts. Yesterday,
We had daily cleaning. And tomorrow morning,
We shall have what to do after firing. But to-day,
Today we have naming of parts. Japonica
Glistens like coral in all of the neighbouring gardens,
And today we have naming of parts.
(Naming of Parts, Henry Reed, 1942)

It is not difficult to establish by casual reading that there are almost as many ways of using crucial terms like calibration and validation in ABM as there are actual instances of their use. This creates several damaging problems for scientific progress in the field. Firstly, when two different researchers both say they “validated” their ABMs they may mean different specific scientific activities. This makes it hard for readers to evaluate research generally, particularly if researchers assume that it is obvious what their terms mean (rather than explaining explicitly what they did in their analysis). Secondly, based on this, each researcher may feel that the other has not really validated their ABM but has instead done something to which a different name should more properly be given. This compounds the possible confusion in debate. Thirdly, there is a danger that researchers may rhetorically favour (perhaps unconsciously) uses that, for example, make their research sound more robustly empirical than it actually is. For example, validation is sometimes used to mean consistency with stylised facts (rather than, say, correspondence with a specific time series according to some formal measure). But we often have no way of telling what the status of the presented stylised facts is. Are they an effective summary of what is known in a field? Are they the facts on which most researchers agree or for which the available data presents the clearest picture? (Less reputably, can readers be confident that they were not selected for presentation because of their correspondence?) Fourthly, because these terms are used differently by different researchers it is possible that valuable scientific activities that “should” have agreed labels will “slip down the terminological cracks” (either for the individual or for the ABM community generally). Apart from clear labels avoiding confusion for others, they may help to avoid confusion for you too!

But apart from these problems (and there may be others but these are not the main thrust of my argument here) there is also a potential impasse. There simply doesn’t seem to be any value in arguing about what the “correct” meaning of validation (for example) should be. Because these are merely labels there is no objective way to resolve this issue. Further, even if we undertook to agree the terminology collectively, each individual would tend to argue for their own interpretation without solid grounds (because there are none to be had) and any collective decision would probably therefore be unenforceable. If we decide to invent arbitrary new terminology from scratch we not only run the risk of adding to the existing confusion of terms (rather than reducing it) but it is also quite likely that everyone will find the new terms unhelpful.

Unfortunately, however, we probably cannot do without labels for these scientific activities involved in quality controlling ABMs. If we had to describe everything we did without any technical shorthand, presenting research might well become impossibly unwieldy.

My proposed solution is therefore to invent terms from scratch (so we don’t end up arguing about our different customary usages to no purpose) but to do so on the basis of actual scientific practices reported in published research. For example, we might call the comparison of corresponding real and simulated data (which at least has the endorsement of the much used Gilbert and Troitzsch 2005 – see pp. 15-19 – to be referred to as validation) CORAS – Comparison Of Real And Simulated. Similarly, assigning values to parameters given the assumptions of model “structures” might be called PANV – Parameters Assigned Numerical Values.

It is very important to be clear what the intention is here. Naming cannot solve scientific problems or disagreements. (Indeed, failure to grasp this may well be why our terminology is currently so muddled as people try to get their different positions through “on the nod”.) For example, if we do not believe that correspondence with stylised facts and comparison measures on time series have equivalent scientific status then we will have to agree distinct labels for them and have the debate about their respective value separately. Perhaps the former could be called COSF – Comparison Of Stylised Facts. But it seems plainly easier to describe specific scientific activities accurately and then find labels for them than to have to wade through the existing marsh of ambiguous terminology and try to extract the associated science. An example of a practice which does not seem to have even one generally agreed label (and therefore seems to be neglected in ABM as a practice) is JAMS – Justifying A Model Structure. (Why are your agents adaptive rather than habitual or rational? Why do they mix randomly rather than in social networks?)

Obviously, there still needs to be community agreement for such a convention to be useful (and this may need to be backed institutionally for example by reviewing requirements). But the logic of the approach avoids several existing problems. Firstly, while the labels are useful shorthand, they are not arbitrary. Each can be traced back to a clearly definable scientific practice. Secondly, this approach steers a course between the Scylla of fruitless arguments from current muddled usage and the Charybdis of a novel set of terminology that is equally unhelpful to everybody. (Even if people cannot agree on labels, they knew how they built and evaluated their ABMs so they can choose – or create – new labels accordingly.) Thirdly, the proposed logic is extendable. As we clarify our thinking, we can use it to label (or improve the labels of) any current set of scientific practices. We will do not have to worry that we will run out of plausible words in everyday usage.

Below I suggest some more scientific practices and possible terms for them. (You will see that I have also tried to make the terms as pronounceable and distinct as possible.)

Practice Term
Checking the results of an ABM by building another.[1] CAMWA (Checking A Model With Another).
Checking ABM code behaves as intended (for example by debugging procedures, destructive testing using extreme values and so on). TAMAD (Testing A Model Against Description).
Justifying the structure of the environment in which agents act. JEM (Justifying the Environment of a Model): This is again a process that may pass unnoticed in ABM typically. For example, by assuming that agents only consider ethnic composition, the Schelling Model (Schelling 1969, 1971) does not “allow” locations to be desirable because, for example, they are near good schools. This contradicts what was known empirically well before (see, for example, Rossi 1955) and it isn’t clear whether simply saying that your interest is in an “abstract” model can justify this level of empirical neglect.
Finding out what effect parameter values have on ABM behaviour. EVOPE (Exploring Value Of Parameter Effects).
Exploring the sensitivity of an ABM to structural assumptions not justified empirically (see Chattoe-Brown 2021). ESOSA (Exploring the Sensitivity Of Structural Assumptions).

Clearly this list is incomplete but I think it would be more effective if characterising the scientific practices in existing ABM and naming them distinctively was a collective enterprise.

Acknowledgements

This research is funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5 (PI: Professor Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University: https://gtr.ukri.org/projects?ref=ES%2FS015159%2F1).

Notes

[1] It is likely that we will have to invent terms for subcategories of practices which differ in their aims or warranted conclusions. For example, rerunning the code of the original author (CAMWOC – Checking A Model With Original Code), building a new ABM from a formal description like ODD (CAMUS – Checking A Model Using Specification) and building a new ABM from the published description (CAMAP – Checking A Model As Published, see Chattoe-Brown et al. 2021).

References

Chattoe-Brown, Edmund (2021) ‘Why Questions Like “Do Networks Matter?” Matter to Methodology: How Agent-Based Modelling Makes It Possible to Answer Them’, International Journal of Social Research Methodology, 24(4), pp. 429-442. doi:10.1080/13645579.2020.1801602

Chattoe-Brown, Edmund, Gilbert, Nigel, Robertson, Duncan A. and Watts Christopher (2021) ‘Reproduction as a Means of Evaluating Policy Models: A Case Study of a COVID-19 Simulation’, medRXiv, 23 February. doi:10.1101/2021.01.29.21250743

Gilbert, Nigel and Troitzsch, Klaus G. (2005) Simulation for the Social Scientist, second edition (Maidenhead: Open University Press).

Rossi, Peter H. (1955) Why Families Move: A Study in the Social Psychology of Urban Residential Mobility (Glencoe, IL, Free Press).

Schelling, Thomas C. (1969) ‘Models of Segregation’, American Economic Review, 59(2), May, pp. 488-493. (available at https://www.jstor.org/stable/1823701)


Chattoe-Brown, E. (2022) Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM. Review of Artificial Societies and Social Simulation, 11th January 2022. https://rofasss.org/2022/01/11/naming-of-parts/


 

Challenges and opportunities in expanding ABM to other fields: the example of psychology

By Dino Carpentras

Centre for Social Issues Research, Department of Psychology, University of Limerick

The loop of isolation

One of the problems discussed during the last public meeting of the European Social Simulation Association (ESSA) at the Social Simulation Conference 2021 was the problem of reaching different communities outside the ABM one. This is a serious problem as we are risking getting trapped in a vicious cycle of isolation.

The cycle can be explained as follows. (a) Many fields are not familiar with ABM methods and standards. This results in the fact that (b) both reviewers and editors will struggle in understanding and evaluating the quality of an ABM paper. In general, this translates in a higher rejection rate and way longer time before publication. As results (c) fewer researchers in ABM will be willing to send their work to other communities, and, in general, fewer ABM works will be published in journals of other communities. Fewer articles using ABM makes it such that (d) fewer people would be aware of ABM, understand their methods and standards and even consider it an established research method.

Another point to consider is that, as time passes, each field evolves and develops new standards and procedures. Unfortunately, if two fields are not enough aware of each other, the new procedures will appear even more alien to members of the other community reinforcing the previously discussed cycle. A schematic of this is offered in figure 1.

fig1_v2

Figure 1: Vicious cycle of isolation

The challenge

Of course, a “brute force” solution would be to keep sending articles to journals in different fields until they get published. However, this would be extremely expensive in terms of time, and probably most researchers will not be happy of following this path.

A more elaborated solution could be framed as “progressively getting to know each other.” This would consist in modellers getting more familiar with the target community and vice versa. In this way, people from ABM would be able to better understand the jargon, the assumptions and even what is interesting enough to be the main result of a paper in a specific discipline. This would make it easier for members of our community to communicate research results using the language and methods familiar to the other field.

At the same time, researchers in the other field could slowly integrate ABM into their work, showing the potential of ABM and making it appear less alien to their peers. All of this would revert the previously discussed vicious cycle, by producing a virtuous one which would bring the two fields closer and closer.

Unfortunately, such goal cannot be obtained overnight, as it probably will require several events, collaborations, publications and probably several years (or even decades!). However, as result, our field would be familiar to and recognized by multiple other fields, enormously increasing the scientific impact of our research as well as the number of people working in ABM.

In this short communication, I would like to, firstly, highlight the importance and the challenges of reaching out other fields and, secondly, show a practical example with the field of psychology. I have chosen this field for no particular reason, besides the fact that I am currently working in the department of psychology. This gave me the opportunity of interacting with several researchers in this field.

In the next sections, I will summarize the main points of several informal discussions with these researchers. Specifically, I will try to highlight what they reported to be promising or interesting in ABM and also what felt alien or problematic to them.

Let me also stress that this does not want to be a complete overview, nor it should be thought as a summary of “what every psychologist think about ABM.” Instead, this is simply a summary of the discussions I had so far. What I hope, is that this will be at least a little useful to our community for building better connections with other fields.

The elephant in the room

Before moving to the list of comments on ABM I have collected, I want to address one point which appeared almost every time I discussed ABM with psychologists. Actually, it appeared almost every time I discuss ABM with people outside our field. This is the problem of experiments and validation.

I know there was recently a massive discussion on the SimSoc mailing list on opinion dynamics and validation, and this discussion will probably continue. Therefore, I am not going to discuss if all models should be tested, if a validated model should be considered superior, etc. Indeed, I do not want to discuss at all if validation should be considered important within our community. Instead, I want to discuss how important this is while interacting with other communities.

Indeed, many other fields give empirical data and validation a key role, having even developed different methods to test the quality of a hypothesis or a model when comparing it to empirical data (e.g. calculation of p-value, Krishnaiah 1980). Also, I repeatedly experienced disappointment or even mockery when I explained to non-ABM people that the model I was explaining them about was not empirically validated (e.g. the Deffuant model of opinion dynamics). In one single case, I even had a person laughing at me for this.

Unfortunately, many people which are not familiar with ABM end up considering it almost like a “nice exercise,” and even “not a real science.” This could be extremely dangerous for our field. Indeed, if multiple researchers will start thinking of ABM as a lesser science, communication with other fields – as well as obtaining funding for research – would get exponentially harder for our community.

Also, please, let me stress again to not “confuse the message with the messenger.” Here, I am not claiming that an unvalidated model should be considered inferior, or anything like that. What I am saying is that many people outside our field think in a similar fashion and this may eventually turn into a way bigger problem for us.

I will further discuss this point in the conclusion section, however, I will not claim that we should get rid of “pure models,” or that every model should be validated. What I will claim is that we should promote more empirical works as they will allow us to interact more easily with other fields.

Further points

In this section, I have collected (in no particular order) different comments and suggestions I have received from psychologist on the topic ABM. All of them had at least some experience of working side to side with a researcher developing ABMs.

Also in this case, please, remember that this are not my claims, but feedbacks I received. Furthermore, they should not be analysed as “what ABM is,” but more as “how ABM may look like to people in another field.”

  1. Some psychologists showed interest in the possibility of having loops in ABMs, which allow for relationships which go beyond simple cause and effect. Indeed, several models in psychology are structured in the form of “parameter X influences parameter Y” (and Y cannot influence X, forming a loop). While this approach is very common in psychology, many researchers are not satisfied with it, making ABMs are a very good opportunity for the development of more realistic models.
  2. Some psychologists said that at first impact, ABM looks very interesting. However, the extensive use of equations can confuse or even scare people who are not very used to them.
  3. Some praised Schelling’s model (Schelling 1971). Especially the approach of developing a hypothesis and then using an ABM to falsify it.
  4. Some criticized that often is not clear what an ABM should be used for or what such a model “is telling us.”
  5. Similarly, the use of models with a big number of parameters was criticized as “[these models] can eventually produce any result.”
  6. Another confusion that appeared multiple times was that often it is not clear if the model should be analysed and interpreted at the individual level (e.g. agents which start from state A often end up in state B) or at the more global level (e.g. distribution A results in distribution B).
  7. Another major complaint was that psychological measures are nominal or ordinal, while many models suppose interval-like variables.
  8. Another criticism was based on the fact that often agents behave all in the same way without including personal differences.
  9. In psychology there is a lot of attention on the sample size and if this is big enough to produce significant results. Some stressed that in many ABM works it is often not clear if the sample size (i.e. the number of agents) is sufficient for supporting the analysis.

Conclusion

I would like to stress again that these comments are not supposed to represent the thoughts of every psychologist, nor that I am suggesting that all the ABM literature should adapt to them or that they are always correct. For example, to my personal opinion, point 5 and 8 are pushing towards opposite directions; one aiming at simpler models and the other pushing towards complexity. Similarly, I do not think we should decrease the number of equations in our works to meet point 2. However, I think we should consider these feedbacks when planning interactions with the psychology community.

As mentioned before, a crucial role when interacting with other communities is played by experiments and validations. Even points 6 and especially points 7 and 9 suggest how member of this community often try to look for 1-to-1 relationships between agents of simulations and people in the real world.

fig2

Figure 2: (left) Empirical ABM acting as a bridge between theoretical ABM and other research fields. (Right) as the relationship between ABM and the other field matures, people become familiar with ABM standards and a direct link to theoretical ABM can be established.

As suggested by someone during the already mentioned discussion in the SimSoc mailing list, this could be solved by introducing a new figure (or, equivalently, a new research field) dedicated to empirical work in ABM. Following this solution, theoretical modellers could keep developing models without having to worry about validation. This would be similar to the work carried out by theoretical researchers in physics. At the same time, we would have also a stream of research dedicated to “experimental ABM.” People working on this topic will further explore the connection between models and the empirical world through experiments and validation processes. Of course, the two should not be mutually exclusive, as a researcher (or a piece of research) may still fall in both categories. However, having this distinction may help in giving more space to empirical work.

I believe that the role of experimental ABM could be crucial for developing good interactions between ABM and other communities. Indeed, this type of research could be accepted much more easily by other communities, producing better interactions with ABM. Especially, mentioning experiments and validation, could strongly decrease the initial mistrust that many people show when discussing ABM. Furthermore, as ABM develops stronger connections with another field, and our methods and standards become more familiar, we would probably also observe more people from the other community which would start looking into more theoretical ABM approaches and what-if scenarios (see fig 2).

References

Krishnaiah, P. R. (Ed.). (1980). A Hand Book of Statistics (Vol. 1). Motilal Banarsidass Publishe.

Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1(2), 143-186.

Edmonds, B. and Moss, S. (2005) From KISS to KIDS – an ‘anti-simplistic’ modelling approach. In P. Davidsson et al. (Eds.): Multi Agent Based Simulation 2004. Springer, Lecture Notes in Artificial Intelligence, 3415:130–144.


Carpentras, D. (2020) Challenges and opportunities in expanding ABM to other fields: the example of psychology. Review of Artificial Societies and Social Simulation, 20th December 2021. https://rofasss.org/2021/12/20/challenges/


 

Benefits of Open Research in Social Simulation: An Early-Career Researcher’s Perspective

By Hyesop Shin

Research Associate at the School of Geographical and Earth Sciences, University of Glasgow, UK

In March 2017, in my first year of PhD, I attended a talk at the Microsoft Research Lab in Cambridge UK. It was about the importance of reproducibility and replicability in science. Inspired by the talk, I redesigned my research beyond my word processer and hard disk to open repositories and social media. Through my experience, there have been some challenges to learn other people’s work and replicate them to my project, but I found it more beneficial to share my problem and solutions for other people who may have encountered the same problem.

Having spoken to many early career researchers (ECRs) regarding the need for open science, specifically whether sharing codes is essential, the consensus was that it was not an essential component for their degree. A few answered that they were too embarrassed to share their codes online because their codes were not well coded enough. I somewhat empathised with their opinions, but at the same time, would insist that open research can gain more benefits than shame.

I wrote this short piece to openly discuss the benefits of conducting open research and suggest some points that ECRs should keep in mind. During the writing, there are some screenshots taken from my PhD work (Shin, 2021). I conclude my writing by accommodating personal experiences or other thoughts that might give more insights to the audience.

Benefits of Aiming an Open Project

I argue here that being transparent and honest about your model development strengthens the credibility of the research. In doing so, my thesis shared the original data, the scripts with annotations that are downloadable and executable, and wiki pages to summarise the outcomes and interpretations (see Figure 1 for examples). This evidence enables scholars and technicians to visit the repository if they are interested in the source codes or outcomes. Also, people can comment if any errors or bugs are identified, or the model is not executing on their machine or may suggest alternative ways to tackle the same problem. Even during the development, many developers share their work via online repositories (e.g. Github, Gitlab), and social media to ask for advice. Agent-based models are mostly uploaded on CoMSeS.net (previously named OpenABM). All of this can improve the quality of research.

Picture 1

Figure 1 A screenshot of a Github page showing how open platforms can help other people to understand the outcomes step by step

More practically, one can learn new ideas by helping each other. If there was a technical issue that can’t be solved, the problem should not be kept hidden, but rather be opened and solved together with experts online and offline. Figure 2 is a pragmatic example of posing questions to a wide range of developers on Stackoverflow – an online community of programmers to share and build codes. Providing my NetLogo codes, I asked how to send an agent group from one location to the other. The anonymous person, whose ID was JenB, kindly responded to me with a new set of codes, which helped me structure the codes more effectively.

Picture 2

Figure 2 Raising a question about sending agents from one location to another in NetLogo

Another example was about the errors I had encountered whilst I was running NetLogo with an R package “nlrx” (Salecker et al., 2019). Here, R was used as a compiler to submit iterative NetLogo jobs on the HPC (High Performance Computing) cluster to improve the execution speed. However, much to my surprise, I received error messages due to early terminations of failed HPC jobs. Not knowing what to do, I posed a question to the developer of the package (see Figure 3) and luckily got a response that the R ecosystem stores all the assigned objects in the RAM, but even with gigabytes of RAM, it struggles to write 96,822 patches over 8764 ticks on a spreadsheet.

Stackoverflow has kindly informed that NetLogo has a memory ceiling of 1GB[i] and keeps each run in the memory before it shuts down. Thus, if the model is huge and requires several iterations, then it is more likely that the execution speed will decrease after a few iterations. Before this information was seen, it was not understood why the model took 1 hour 20 minutes to finish the first run but struggled to maintain that speed on the twentieth run. Hence, sharing technical obstacles that occur in the middle of research can save a lot of time even for those who are contemplating similar research.

Picture 3

Figure 3 Comments posted on an online repository regarding the memory issue that NetLogo and R encountered

The Future for Open Research

For future quantitative studies in social simulation, this paper suggests students and researchers in their early careers should acclimatise themselves to using open-source platforms to conduct sustainable research. As clarity, conciseness, and coherence are featured as the important C’s for writing skills, good programming should take into consideration the following points.

First is clarity and conciseness (C&C). Here, clarity means that the scripts should be neatly documented. The computer does not know whether the codes are dirty or neat, it only cares whether it is syntactically correct, but it matters when other people attempt to understand the task. If the outcome produces the same results, it is always better to write clearer and simpler codes for other people and future upgrades. Thus, researchers should refer to other people’s work and learn how to code effectively. Another way to maintain clarity in coding is to keep descriptive and distinctive names for new variables. This statement might seem contradictory to the conciseness issue, but this is important as one of the common mistakes users make is to assign variables with abstract names such as LP1, LP2…LP10, which seems clear and concise for the model builder, but is even harder for the others when reviewing the code. The famous quote from Einstein, “Everything should be made as simple as possible, but not simpler.” is the appropriate phrase that model builders should always keep in mind. Hence, instead of coding LP9, names such as LandPriceIncreaseRate2009 (camel cases) or landprice_incrate_2009 (snake cases) can be more effective for the reviewers to understand the model.

Second is reproducibility and replicability (R&R). To be reproducible and replicable, initially, no errors should occur when others execute the script, and possible errors or bugs should be reported. It will also be more useful to document the libraries and the dependencies required. This is quite important as different OSs (operating systems) have different behaviours to install packages. For instance, the sf package in R has slightly different ways to install the package between OSs where Windows and MacOSX can be installed from the binary package while Linux needs to separately install GDAL (to read and write vector and raster data), Proj (which deals with projection), and GEOS (which provides geospatial functions) prior to the package installation. Finally, it would be very helpful if unit testing is included in the model. While R and Python provide splendid examples in their vignettes, NetLogo remains to offer the library models but goes no further than that. Offering unit testing examples can give a better understanding when the whole model is too complicated for others to comprehend. It can also give the impression that the modeller has full control of the model because without the unit test the verification process becomes error-prone. The good news is that NetLogo has most recently released the Beginner’s Interactive Dictionary with friendly explanations with videos and code examples[ii].

Third is to maintain version control. In terms of sustainability, researchers should be aware of software maintenance. Much programming software relies on libraries and packages that are built on a particular version. If the software is upgraded and no longer accepts the previous versions, then the package developers need to keep updating to run it on a new version. For example, NetLogo 6.0 experienced a significant change compared to versions 5.X. The biggest change was the replacement of tasks[iii] by anonymous procedures (Wilensky, 1999). This means that tasks are no longer primitives but are converted to arrow syntax. For example, if there is a list of [a b c], the previous task is asked to add the first, second, and third element as foreach [a b c] [ ?a+?b+?c ], while the new version does the same job as foreach [a b c][ add_all → a + b + c]. If the models haven’t converted to a new version it can be viewable as a read-only model but can’t be executed. Other geospatial packages in R such as rgdal and sf, have also struggled whenever a major update was made on their own packages or on the R version itself due to a lot of dependencies. Even ArcGIS, a UI (User Interface) software, had issues when they upgraded it from version 9.3 to 10. The projects that were coded under the VBA script in 9.3 were broken because it was not recognised as a correct function in the new version based on Python. This is also another example that backward compatibility and deprecation mechanisms are important.

Lastly, for more advanced users, it is also recommended to use a collaborative platform that executes every result from the codes with the exact version. One of the platforms is Codeocean. The Nature research team has recently chosen the platform to peer-review the codes (Perkel, 2019). The Nature editors and peer-reviewers strongly believed that coding has become a norm across many disciplines, and hence have asserted that the model process including the quality of data, conciseness, reproducibility, and documentation of the model should be placed as a requirement. Although the training procedure can be difficult at first, it will lead researchers to conduct themselves with more responsibility.

Looking for Opinions

With the advent of the era of big data and data science where people collaborate online and the ‘sharing is caring’ atmosphere has become a norm (Arribas-Bel et al., 2021; Lovelace, 2021), I insist that open research should no longer be an option. However, one may argue that although open research is by far an excellent model that can benefit many of today’s projects, there are certain types of risks that might concern ECRs such as intellectual property issues, code quality and technical security. Thus, if you have had different opinions regarding this issue, or simply favour adding your experiences during your PhD in social simulation, please add your thoughts via a thread.

Notes

[i] http://ccl.northwestern.edu/netlogo/docs/faq.html#how-big-can-my-model-be-how-many-turtles-patches-procedures-buttons-and-so-on-can-my-model-contain

[ii] https://ccl.northwestern.edu/netlogo/bind/

[iii] Tasks can be equations, x + y, or a set of lists [1 2 3 4 5]

References

Arribas-Bel, D., Alvanides, S., Batty, M., Crooks, A., See, L., & Wolf, L. (2021). Urban data/code: A new EP-B section. Environment and Planning B: Urban Analytics and City Science, 23998083211059670. https://doi.org/10.1177/23998083211059670

Lovelace, R. (2021). Open source tools for geographic analysis in transport planning. Journal of Geographical Systems, 23(4), 547–578. https://doi.org/10.1007/s10109-020-00342-2

Perkel, J. M. (2019). Make code accessible with these cloud services. Nature, 575(7781), 247. https://doi.org/10.1038/d41586-019-03366-x

Salecker, J., Sciaini, M., Meyer, K. M., & Wiegand, K. (2019). The nlrx r package: A next-generation framework for reproducible NetLogo model analyses. Methods in Ecology and Evolution, 10(11), 1854–1863. https://doi.org/10.1111/2041-210X.13286

Shin, H. (2021). Assessing Health Vulnerability to Air Pollution in Seoul Using an Agent-Based Simulation. University of Cambridge. https://doi.org/https://doi.org/10.17863/CAM.65615

Wilensky, U. (1999). Netlogo. Northwestern University: Evanston, IL, USA. https://ccl.northwestern.edu/netlogo/


Shin, H. (2021) Benefits of Open Research in Social Simulation: An Early-Career Researcher’s Perspective. Review of Artificial Societies and Social Simulation, 24th Nov 2021. https://rofasss.org/2021/11/23/benefits-open-research/


 

Reply to Frank Dignum

By Edmund Chattoe-Brown

This is a reply to Frank Dignum’s reply (about Edmund Chattoe-Brown’s review of Frank’s book)

As my academic career continues, I have become more and more interested in the way that people justify their modelling choices, for example, almost every Agent-Based Modeller makes approving noises about validation (in the sense of comparing real and simulated data) but only a handful actually try to do it (Chattoe-Brown 2020). Thus I think two specific statements that Frank makes in his response should be considered carefully:

  1. … we do not claim that we have the best or only way of developing an Agent-Based Model (ABM) for crises.” Firstly, negative claims (“This is not a banana”) are not generally helpful in argument. Secondly, readers want to know (or should want to know) what is being claimed and, importantly, how they would decide if it is true “objectively”. Given how many models sprang up under COVID it is clear that what is described here cannot be the only way to do it but the question is how do we know you did it “better?” This was also my point about institutionalisation. For me, the big lesson from COVID was how much the automatic response of the ABM community seems to be to go in all directions and build yet more models in a tearing hurry rather than synthesise them, challenge them or test them empirically. I foresee a problem both with this response and our possible unwillingness to be self-aware about it. Governments will not want a million “interesting” models to choose from but one where they have externally checkable reasons to trust it and that involves us changing our mindset (to be more like climate modellers for example, Bithell & Edmonds 2020). For example, colleagues and I developed a comparison methodology that allowed for the practical difficulties of direct replication (Chattoe-Brown et al. 2021).
  2. The second quotation which amplifies this point is: “But we do think it is an extensive foundation from which others can start, either picking up some bits and pieces, deviating from it in specific ways or extending it in specific ways.” Again, here one has to ask the right question for progress in modelling. On what scientific grounds should people do this? On what grounds should someone reuse this model rather than start their own? Why isn’t the Dignum et al. model built on another “market leader” to set a good example? (My point about programming languages was purely practical not scientific. Frank is right that the model is no less valid because the programming language was changed but a version that is now unsupported seems less useful as a basis for the kind of further development advocated here.)

I am not totally sure I have understood Frank’s point about data so I don’t want to press it but my concern was that, generally, the book did not seem to “tap into” relevant empirical research (and this is a wider problem that models mostly talk about other models). It is true that parameter values can be adjusted arbitrarily in sensitivity analysis but that does not get us any closer to empirically justified parameter values (which would then allow us to attempt validation by the “generative methodology”). Surely it is better to build a model that says something about the data that exists (however imperfect or approximate) than to rely on future data collection or educated guesses. I don’t really have the space to enumerate the times the book said “we did this for simplicity”, “we assumed that” etc. but the cumulative effect is quite noticeable. Again, we need to be aware of the models which use real data in whatever aspects and “take forward” those inputs so they become modelling standards. This has to be a collective and not an individualistic enterprise.

References

Bithell, M. and Edmonds, B. (2020) The Systematic Comparison of Agent-Based Policy Models – It’s time we got our act together!. Review of Artificial Societies and Social Simulation, 11th May 2021. https://rofasss.org/2021/05/11/SystComp/

Chattoe-Brown, E. (2020) A Bibliography of ABM Research Explicitly Comparing Real and Simulated Data for Validation. Review of Artificial Societies and Social Simulation, 12th June 2020. https://rofasss.org/2020/06/12/abm-validation-bib/

Chattoe-Brown, E. (2021) A review of “Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis”. Journal of Artificial Society and Social Simulation. 24(4). https://www.jasss.org/24/4/reviews/1.html

Chattoe-Brown, E., Gilbert, N., Robertson, D. A., & Watts, C. J. (2021). Reproduction as a Means of Evaluating Policy Models: A Case Study of a COVID-19 Simulation. medRxiv 2021.01.29.21250743; DOI: https://doi.org/10.1101/2021.01.29.21250743

Dignum, F. (2020) Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”. Review of Artificial Societies and Social Simulation, 4th Nov 2021. https://rofasss.org/2021/11/04/dignum-review-response/

Dignum, F. (Ed.) (2021) Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis. Springer. DOI:10.1007/978-3-030-76397-8


Chattoe-Brown, E. (2021) Reply to Frank Dignum. Review of Artificial Societies and Social Simulation, 10th November 2021. https://rofasss.org/2021/11/10/reply-to-dignum/


 

Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”

By Frank Dignum

This is a reply to a review in JASSS (Chattoe-Brown 2021) of (Dignum 2021).

Before responding to some of the specific concerns of Edmund I would like to thank him for the thorough review. I am especially happy with his conclusion that the book is solid enough to make it a valuable contribution to scientific progress in modelling crises. That was the main aim of the book and it seems that is achieved. I want to reiterate what we already remarked in the book; we do not claim that we have the best or only way of developing an Agent-Based Model (ABM) for crises. Nor do we claim that our simulations were without limitations. But we do think it is an extensive foundation from which others can start, either picking up some bits and pieces, deviating from it in specific ways or extending it in specific ways.

The concerns that are expressed by Edmund are certainly valid. I agree with some of them, but will nuance some others. First of all the concern about the fact that we seem to abandon the NetLogo implementation and move to Repast. This fact does not make the ABM itself any less valid! In itself it is also an important finding. It is not possible to scale such a complex model in NetLogo beyond around two thousand agents. This is not just a limitation of our particular implementation, but a more general limitation of the platform. It leads to the important challenge to get more computer scientists involved to develop platforms for social simulations that both support the modelers adequately and provide efficient and scalable implementations.

That the sheer size of the model and the results make it difficult to trace back the importance and validity of every factor on the results is completely true. We have tried our best to highlight the most important aspects every time. But, this leaves questions as to whether we make the right selection of highlighted aspects. As an illustration to this, we have been busy for two months to justify our results of the simulations of the effectiveness of the track and tracing apps. We basically concluded that we need much better integrated analysis tools in the simulation platform. NetLogo is geared towards creating one simulation scenario, running the simulation and analyzing the results based on a few parameters. This is no longer sufficient when we have a model with which we can create many scenarios and have many parameters that influence a result. We used R now to interpret the flood of data that was produced with every scenario. But, R is not really the most user friendly tool and also not specifically meant for analyzing the data from social simulations.

Let me jump to the third concern of Edmund and link it to the analysis of the results as well. While we tried to justify the results of our simulation on the effectiveness of the track and tracing app we compared our simulation with an epidemiological based model. This is described in chapter 12 of the book. Here we encountered the difference in assumed number of contacts per day a person has with other persons. One can take the results, as quoted by Edmund as well, of 8 or 13 from empirical work and use them in the model. However, the dispute is not about the number of contacts a person has per day, but what counts as a contact! For the COVID-19 simulations standing next to a person in the queue in a supermarket for five minutes can count as a contact, while such a contact is not a meaningful contact in the cited literature. Thus, we see that what we take as empirically validated numbers might not at all be the right ones for our purpose. We have tried to justify all the values of parameters and outcomes in the context for which the simulations were created. We have also done quite some sensitivity analyses, which we did not all report on just to keep the volume of the book to a reasonable size. Although we think we did a proper job in justifying all results, that does not mean that one can have different opinions on the value that some parameters should have. It would be very good to check the influence on the results of changes in these parameters. This would also progress scientific insights in the usefulness of complex models like the one we made!

I really think that an ABM crisis response should be institutional. That does not mean that one institution determines the best ABM, but rather that the ABM that is put forward by that institution is the result of a continuous debate among scientists working on ABM’s for that type of crisis. For us, one of the more important outcomes of the ASSOCC project is that we really need much better tools to support the types of simulations that are needed for a crisis situation. However, it is very difficult to develop these tools as a single group. A lot of the effort needed is not publishable and thus not valued in an academic environment. I really think that the efforts that have been put in platforms such as NetLogo and Repast are laudable. They have been made possible by some generous grants and institutional support. We argue that this continuous support is also needed in order to be well equipped for a next crisis. But we do not argue that an institution would by definition have the last word in which is the best ABM. In an ideal case it would accumulate all academic efforts as is done in the climate models, but even more restricted models would still be better than just having a thousand individuals all claiming to have a useable ABM while governments have to react quickly to a crisis.

The final concern of Edmund is about the empirical scale of our simulations. This is completely true! Given the scale and details of what we can incorporate we can only simulate some phenomena and certainly not everything around the COVID-19 crisis. We tried to be clear about this limitation. We had discussions about the Unity interface concerning this as well. It is in principle not very difficult to show people walking in the street, taking a car or a bus, etc. However, we decided to show a more abstract representation just to make clear that our model is not a complete model of a small town functioning in all aspects. We have very carefully chosen which scenarios we can realistically simulate and give some insights in reality from. Maybe we should also have discussed more explicitly all the scenarios that we did not run with the reasons why they would be difficult or unrealistic in our ABM. One never likes to discuss all the limitations of one’s labor, but it definitely can be very insightful. I have made up for this a little bit by submitting an to a special issue on predictions with ABM in which I explain in more detail, which should be the considerations to use a particular ABM to try to predict some state of affairs. Anyone interested to learn more about this can contact me.

To conclude this response to the review, I again express my gratitude for the good and thorough work done. The concerns that were raised are all very valuable to concern. What I tried to do in this response is to highlight that these concerns should be taken as a call to arms to put effort in social simulation platforms that give better support for creating simulations for a crisis.

References

Dignum, F. (Ed.) (2021) Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis. Springer. DOI:10.1007/978-3-030-76397-8

Chattoe-Brown, E. (2021) A review of “Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis”. Journal of Artificial Society and Social Simulation. 24(4). https://www.jasss.org/24/4/reviews/1.html


Dignum, F. (2020) Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”. Review of Artificial Societies and Social Simulation, 4th Nov 2021. https://rofasss.org/2021/11/04/dignum-review-response/


 

The Systematic Comparison of Agent-Based Policy Models – It’s time we got our act together!

By Mike Bithell and Bruce Edmonds

Model Intercomparison

The recent Covid crisis has led to a surge of new model development and a renewed interest in the use of models as policy tools. While this is in some senses welcome, the sudden appearance of many new models presents a problem in terms of their assessment, the appropriateness of their application and reconciling any differences in outcome. Even if they appear similar, their underlying assumptions may differ, their initial data might not be the same, policy options may be applied in different ways, stochastic effects explored to a varying extent, and model outputs presented in any number of different forms. As a result, it can be unclear what aspects of variations in output between models are results of mechanistic, parameter or data differences. Any comparison between models is made tricky by differences in experimental design and selection of output measures.

If we wish to do better, we suggest that a more formal approach to making comparisons between models would be helpful. However, it appears that this is not commonly undertaken most fields in a systematic and persistent way, except for the field of climate change, and closely related fields such as pollution transport or economic impact modelling (although efforts are underway to extend such systematic comparison to ecosystem models –  Wei et al., 2014, Tittensor et al., 2018⁠). Examining the way in which this is done for climate models may therefore prove instructive.

Model Intercomparison Projects (MIP) in the Climate Community

Formal intercomparison of atmospheric models goes back at least to 1989 (Gates et al., 1999)⁠ with the first atmospheric model inter-comparison project (AMIP), initiated by the World Climate Research Programme. By 1999 this had contributions from all significant atmospheric modelling groups, providing standardised time-series of over 30 model variables for one particular historical decade of simulation, with a standard experimental setup. Comparisons of model mean values with available data helped to reveal overall model strengths and weaknesses: no single model was best at simulation of all aspects of the atmosphere, with accuracy varying greatly between simulations. The model outputs also formed a reference base for further inter-comparison experiments including targets for model improvement and reduction of systematic errors, as well as a starting point for improved experimental design, software and data management standards and protocols for communication and model intercomparison. This led to AMIPII and, subsequently, to a series of Climate model inter-comparison projects (CMIP) beginning with CMIP I in 1996. The latest iteration (CMIP 6) is a collection of 23 separate model intercomparison experiments covering atmosphere, ocean, land surface, geo-engineering, and the paleoclimate. This collection is aimed at the upcoming 2021 IPCC process (AR6). Participating projects go through an endorsement process for inclusion, (a process agreed with modelling groups), based on 10 criteria designed to ensure some degree of coherence between the various models – a further 18 MIPS are also listed as currently active (https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6). Groups contribute to a central set of common experiments covering the period 1850 to the near-present. An overview of the whole process can be found in (Eyring et al., 2016).

The current structure includes a set of three overarching questions covering the dynamics of the earth system, model systematic biases and understanding possible future change under uncertainty. Individual MIPS may build on this to address one or more of a set of 7 “grand science challenges” associated with the climate. Modelling groups agree to provide outputs in a standard form, obtained from a specified set of experiments under the same design, and to provide standardised documentation to go with their models. Originally (up to CMIP 5), outputs were then added to a central public repository for further analysis, however the output grew so large under CMIP6 that now the data is held dispersed over repositories maintained by separate groups.

Other Examples

Two further more recent examples of collective model  development may also be helpful to consider.

Firstly, an informal network collating models across more than 50 research groups has already been generated as a result of the COVID crisis –  the Covid Forecast Hub (https://covid19forecasthub.org). This is run by a small number of research groups collaborating with the US Centre for Disease Control and is strongly focussed on the epidemiology. Participants are encouraged to submit weekly forecasts, and these are integrated into a data repository and can be vizualized on the website – viewers can look at forward projections, along with associated confidence intervals and model evaluation scores, including those for an ensemble of all models. The focus on forecasts in this case arises out of the strong policy drivers for the current crisis, but the main point is that it is possible to immediately view measures of model performance and to compare the different model types: one clear message that rapidly becomes apparent is that many of the forward projections have 95% (and at some times, even 50%) confidence intervals for incident deaths that more than span the full range of the past historic data. The benefit of comparing many different models in this case is apparent, as many of the historic single-model projections diverge strongly from the data (and the models most in error are not consistently the same ones over time), although the ensemble mean tends to be better.

As a second example, one could consider the Psychological Science Accelerator (PSA: Moshontz et al 2018, https://psysciacc.org/). This is a collaborative network set up with the aim of addressing the “replication crisis” in psychology: many previously published results in psychology have proved problematic to replicate as a result of small or non-representative sampling or use of experimental designs that do not generalize well or have not been used consistently either within or across studies. The PSA seeks to ensure accumulation of reliable and generalizable evidence in psychological science, based on principles of inclusion, decentralization, openness, transparency and rigour. The existence of this network has, for example, enabled the reinvestigation of previous  experiments but with much larger and less nationally biased samples (e.g. Jones et al 2021).

The Benefits of the Intercomparison Exercises and Collaborative Model Building

More specifically, long-term intercomparison projects help to do the following.

  • Build on past effort. Rather than modellers re-inventing the wheel (or building a new framework) with each new model project, libraries of well-tested and documented models, with data archives, including code and experimental design, would allow researchers to more efficiently work on new problems, building on previous coding effort
  • Aid replication. Focussed long term intercomparison projects centred on model results with consistent standardised data formats would allow new versions of code to be quickly tested against historical archives to check whether expected results could be recovered and where differences might arise, particularly if different modelling languages were being used
  • Help to formalize. While informal code archives can help to illustrate the methods or theoretical foundations of a model, intercomparison projects help to understand which kinds of formal model might be good for particular applications, and which can be expected to produce helpful results for given desired output measures
  • Build credibility. A continuously updated set of model implementations and assessment of their areas of competence and lack thereof (as compared with available datasets) would help to demonstrate the usefulness (or otherwise) of ABM as a way to represent social systems
  • Influence Policy (where appropriate). Formal international policy organisations such as the IPCC or the more recently formed IPBES are effective partly through an underpinning of well tested and consistently updated models. As yet it is difficult to see whether such a body would be appropriate or effective for social systems, as we lack the background of demonstrable accumulated and well tested model results.

Lessons for ABM?

What might we be able to learn from the above, if we attempted to use a similar process to compare ABM policy models?

In the first place, the projects started small and grew over time: it would not be necessary, for example, to cover all possible ABM applications at the outset. On the other hand, the latest CMIP iterations include a wide range of different types of model covering many different aspects of the earth system, so that the breadth of possible model types need not be seen as a barrier.

Secondly, the climate inter-comparison project has been persistent for some 30 years – over this time many models have come and gone, but the history of inter-comparisons allows for an overview of how well these models have performed over time – data from the original AMIP I models is still available on request, supporting assessments concerning  long-term model improvement.

Thirdly, although climate models are complex – implementing a variety of different mechanisms in different ways – they can still be compared by use of standardised outputs, and at least some (although not necessarily all) have been capable of direct comparison with empirical data.

Finally, an agreed experimental design and public archive for documentation and output that is stable over time is needed; this needs to be done via a collective agreement among the modelling groups involved so as to ensure a long-term buy-in from the community as a whole, so that there is a consistent basis for long-term model development, building on past experience.

The need for aligning or reproducing ABMs has long been recognised within the community (Axtell et al. 1996; Edmonds & Hales 2003), but on a one-one basis for verifying the specification of models against their implementation, although (Hales et al. 2003) discusses a range of possibilities. However, this is far from a situation where many different models of basically the same phenomena are systematically compared – this would be a larger scale collaboration lasting over a longer time span.

The community has already established a standardised form of documentation in the ODD protocol. Sharing of model code is also becoming routine, and can be easily achieved through COMSES, Github or similar. The sharing of data in a long-term archive may require more investigation. As a starting project COVID-19 provides an ideal opportunity for setting up such a model inter-comparison project – multiple groups already have running examples, and a shared set of outputs and experiments should be straightforward to agree on. This would potentially form a basis for forward looking experiments designed to assist with possible future pandemic problems, and a basis on which to build further features into the existing disease-focussed modelling, such as the effects of economic, social and psychological issues.

Additional Challenges for ABMs of Social Phenomena

Nobody supposes that modelling social phenomena is going to have the same set of challenges that climate change models face. Some of the differences include:

  • The availability of good data. Social science is bedevilled by a paucity of the right kind of data. Although an increasing amount of relevant data is being produced, there are commercial, ethical and data protection barriers to accessing it and the data rarely concerns the same set of actors or events.
  • The understanding of micro-level behaviour. Whilst the micro-level understanding of our atmosphere is very well established, those of the behaviour of the most important actors (humans) is not. However, it may be that better data might partially substitute for a generic behavioural model of decision-making.
  • Agreement upon the goals of modelling. Although there will always be considerable variation in terms of what is wanted from a model of any particular social phenomena, a common core of agreed objectives will help focus any comparison and give confidence via ensembles of projections. Although the MIPs and Covid Forecast Hub are focussed on prediction, it may be that empirical explanation may be more important in other areas.
  • The available resources. ABM projects tend to be add-ons to larger endeavours and based around short-term grant funding. The funding for big ABM projects is yet to be established, not having the equivalent of weather forecasting to piggy-back on.
  • Persistence of modelling teams/projects. ABM tends to be quite short-term with each project developing a new model for a new project. This has made it hard to keep good modelling teams together.
  • Deep uncertainty. Whilst the set of possible factors and processes involved in a climate change model are well established, which social mechanisms need to be involved in any model of any particular social phenomena is unknown. For this reason, there is deep disagreement about the assumptions to be made in such models, as well as sharp divergence in outcome due to changes brought about by a particular mechanism but not included in a model. Whilst uncertainty in known mechanisms can be quantified, assessing the impact of those due to such deep uncertainty is much harder.
  • The sensitivity of the political context. Even in the case of Climate Change, where the assumptions made are relatively well understood and done on objective bases, the modelling exercise and its outcomes can be politically contested. In other areas, where the representation of people’s behaviour might be key to model outcomes, this will need even more care (Adoha & Edmonds 2017).

However, some of these problems were solved in the case of Climate Change as a result of the CMIP exercises and the reports they ultimately resulted in. Over time the development of the models also allowed for a broadening and updating of modelling goals, starting from a relatively narrow initial set of experiments. Ensuring the persistence of individual modelling teams is easier in the context of an internationally recognised comparison project, because resources may be easier to obtain, and there is a consistent central focus. The modelling projects became longer-term as individual researchers could establish a career doing just climate change modelling and importance of the work increasingly recognised. An ABM modelling comparison project might help solve some of these problems as the importance of its work is established.

Towards an Initial Proposal

The topic chosen for this project should be something where there: (a) is enough public interest to justify the effort, (b) there are a number of models with a similar purpose in mind being developed.  At the current stage, this suggests dynamic models of COVID spread, but there are other possibilities, including: transport models (where people go and who they meet) or criminological models (where and when crimes happen).

Whichever ensemble of models is focussed upon, these models should be compared on a core of standard, with the same:

  • Start and end dates (but not necessarily the same temporal granularity)
  • Covering the same set of regions or cases
  • Using the same population data (though possibly enhanced with extra data and maybe scaled population sizes)
  • With the same initial conditions in terms of the population
  • Outputting a core of agreed measures (but maybe others as well)
  • Checked against their agreement against a core set of cases, with agreed data sets
  • Reported on in a standard format (though with a discussion section for further/other observations)
  • well documented and with code that is open access
  • Run a minimum of times with different random seeds

Any modeller/team that had a suitable model and was willing to adhere to the rules would be welcome to participate (commercial, government or academic) and these teams would collectively decide the rules, development and write any reports on the comparisons. Other interested stakeholder groups could be involved including professional/academic associations, NGOs and government departments but in a consultative role providing wider critique – it is important that the terms and reports from the exercise be independent or any particular interest or authority.

Conclusion

We call upon those who think ABMs have the potential to usefully inform policy decisions to work together, in order that the transparency and rigour of our modelling matches our ambition. Whilst model comparison exercises of the kind described are important for any simulation work, particular care needs to be taken when the outcomes can affect people’s lives.

References

Aodha, L. & Edmonds, B. (2017) Some pitfalls to beware when applying models to issues of policy relevance. In Edmonds, B. & Meyer, R. (eds.) Simulating Social Complexity – a handbook, 2nd edition. Springer, 801-822. (A version is at http://cfpm.org/discussionpapers/236)

Axtell, R., Axelrod, R., Epstein, J. M., & Cohen, M. D. (1996). Aligning simulation models: A case study and results. Computational & Mathematical Organization Theory, 1(2), 123-141. https://link.springer.com/article/10.1007%2FBF01299065

Edmonds, B., & Hales, D. (2003). Replication, replication and replication: Some hard lessons from model alignment. Journal of Artificial Societies and Social Simulation, 6(4), 11. http://jasss.soc.surrey.ac.uk/6/4/11.html

Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., & Taylor, K. E. (2016). Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geoscientific Model Development, 9(5), 1937–1958. https://doi.org/10.5194/gmd-9-1937-2016

Gates, W. L., Boyle, J. S., Covey, C., Dease, C. G., Doutriaux, C. M., Drach, R. S., Fiorino, M., Gleckler, P. J., Hnilo, J. J., Marlais, S. M., Phillips, T. J., Potter, G. L., Santer, B. D., Sperber, K. R., Taylor, K. E., & Williams, D. N. (1999). An Overview of the Results of the Atmospheric Model Intercomparison Project (AMIP I). In Bulletin of the American Meteorological Society (Vol. 80, Issue 1, pp. 29–55). American Meteorological Society. https://doi.org/10.1175/1520-0477(1999)080<0029:AOOTRO>2.0.CO;2

Hales, D., Rouchier, J., & Edmonds, B. (2003). Model-to-model analysis. Journal of Artificial Societies and Social Simulation, 6(4), 5. http://jasss.soc.surrey.ac.uk/6/4/5.html

Jones, B.C., DeBruine, L.M., Flake, J.K. et al. To which world regions does the valence–dominance model of social perception apply?. Nat Hum Behav 5, 159–169 (2021). https://doi.org/10.1038/s41562-020-01007-2

Moshontz, H. + 85 others (2018) The Psychological Science Accelerator: Advancing Psychology Through a Distributed Collaborative Network ,  1(4) 501-515. https://doi.org/10.1177/2515245918797607

Tittensor, D. P., Eddy, T. D., Lotze, H. K., Galbraith, E. D., Cheung, W., Barange, M., Blanchard, J. L., Bopp, L., Bryndum-Buchholz, A., Büchner, M., Bulman, C., Carozza, D. A., Christensen, V., Coll, M., Dunne, J. P., Fernandes, J. A., Fulton, E. A., Hobday, A. J., Huber, V., … Walker, N. D. (2018). A protocol for the intercomparison of marine fishery and ecosystem models: Fish-MIP v1.0. Geoscientific Model Development, 11(4), 1421–1442. https://doi.org/10.5194/gmd-11-1421-2018

Wei, Y., Liu, S., Huntzinger, D. N., Michalak, A. M., Viovy, N., Post, W. M., Schwalm, C. R., Schaefer, K., Jacobson, A. R., Lu, C., Tian, H., Ricciuto, D. M., Cook, R. B., Mao, J., & Shi, X. (2014). The north american carbon program multi-scale synthesis and terrestrial model intercomparison project – Part 2: Environmental driver data. Geoscientific Model Development, 7(6), 2875–2893. https://doi.org/10.5194/gmd-7-2875-2014


Bithell, M. and Edmonds, B. (2020) The Systematic Comparison of Agent-Based Policy Models - It’s time we got our act together!. Review of Artificial Societies and Social Simulation, 11th May 2021. https://rofasss.org/2021/05/11/SystComp/


 

Should the family size be used in COVID-19 vaccine prioritization strategy to prevent variants diffusion? A first investigation using a basic ABM

By Gianfranco Giulioni

Department of Philosophical, Pedagogical and Economic-Quantitative Sciences, University of Chieti-Pescara, Italy

(A contribution to the: JASSS-Covid19-Thread)

When writing this document, few countries have made significant progress in vaccinating their population while many others still move first steps.

Despite the importance of COVID-19 adverse effects on society, there seems to be too little debate on the best option for progressing the vaccination process after the front-line healthcare personnel has been immunized.

The overall adopted strategies in the front-runner countries prioritize people using their health fragility, and age. For example, this strategy’s effectiveness is supported by Bubar et al. (2021), who provide results based on a detailed age-stratified Susceptible, Exposed, Infectious, Recovered (SEIR) model.

During the Covid infection outbreak, the importance of families in COVID diffusion was stressed by experts and media. This observation motivates the present effort, which investigates if considering family size among the vaccine prioritization strategy can have a role.

This document describes an ABM model developed with the intent of analyzing the question. The model is basic and has the essentials features to investigate the issue.

As highlighted by Squazzoni et al. (2020) a careful investigation of pandemics requires the cooperation of many scientists from different disciplines. To ease this cooperation and to the aim of transparency (Barton et al. 2020), the code is made publicly available to allow further developments and accurate parameters calibration to those who might be interested. (https://github.com/gfgprojects/abseir_family)

The following part of the document will sketch the model functioning and provide some considerations on families’ effects on vaccination strategy.

Brief Model Description

The ABSEIR-family model code is written in Java, taking advantage of the Repast Simphony modeling system (https://repast.github.io/).

Figure 1 gives an overview of the current development state of the model core classes.

Briefly, the code handles the relevant events of a pandemic:

  • the appearance of the first case,
  • the infection diffusion by contacts,
  • the introduction of measures for diffusion limitation such as quarantine,
  • the activation and implementation of the immunization process.

The distinguishing feature of the model is that individuals are grouped in families. This grouping allows considering two different diffusion speeds: fast among family members and slower when contacts involve two individuals from different families.

Figure 1: relationships between the core classes of the ABSEIR-family model and their variables and methods.

It is perhaps worth describing the evolution of an individual state to sketch the functioning of the model.

An individual’s dynamic is guided by a variable named infectionAge. In the beginning, all the individuals have this variable at zero. The program increases the infectionAge of all the individuals having a non zero value of this variable at each time step.

When an individual has contact with an infectious, s/he can get the infection or not. If infected, the individual enters the latency period, i.e. her/his infectionAge is set to 1 and the variable starts moving ahead with time, but s/he is not infectious. Individuals whose infectionAge is greater than the latency period length (ll ) become infectious.

At each time step, an infectious meets all her/his family members and mof randomly chosen non-family members. S/he passes on the infection with probability pif to family members and pof to non-family members. The infection can be passed on only if the contacted individual’s infectionAge equals zero and if s/he is not in quarantine.

The infectious phase ends when the infection is discovered (quarantine) or when the individual recovers i.e., the infectionAge is greater than the latency period length plus the infection length parameter (li).

At the present stage of development, the code does not handle the virus adverse post-infection evolution. All the infected individuals in this model recover. The infectionAge is set at a negative value at recovery because recovereds stay immune for a while (lr). Similarly, vaccination set the individual’s  infectionAge to a (high) negative value (lv).

At the present state of the pandemic evolution it is perhaps useful to use the model to get insights into how the family size could affect the vaccination process’s effectiveness. This will be attempted hereafter.

Highlighting the relevance of families size by an ad-hoc example

The relevance of family size in vaccination strategy can be shown using the following ad-hoc example.

Suppose there are two covid-free villages (say village A and B) whose health authorities are about to start vaccinations to avoid the disease spreading.

Villages are identical in the other aspects except for the family size distribution. Each village has 50 inhabitants, but village A has 10 families with five components each, while village B has two five members families and 40 singletons. Five vaccines arrive each day in each village.

Some additional extreme assumptions are made to make differences straightforward.

First, healthy family members are infected for sure by a member who contracted the virus. Second, each individual has the same number of contacts (say n) outside the family and the probability to pass  on the virus in external contacts is lower than 1. Symptoms take several days before showing up.

Now, the health authority are about to start the vaccination process and has to decide how to employ the available vaccines.

Intuition would suggest that Village B’s health authority should immunize large families first. Indeed, if case zero arrives at the end of the second vaccination day, the spread of the disease among the population should be limited because the virus can be passed on by external contacts only; and the probability of transmitting the virus in external contacts is lower than in the family.

But, should this strategy be used even by village A health authority?

To answer this question, we compare the family-based vaccination strategy with a random-based vaccination strategy. In a random-based vaccination strategy, we expect one members to be immunized in each family at the end of the second vaccination day. In the family-based vaccination strategy, two families are immunized at the end of the second vaccination day. Now, suppose one of the not-immunized citizens gets the virus at the end of day two. It is easy to verify there will be an infected more in the family-based strategy (all the five components of the family) than in the random-based strategy (4 components because one of them was immunized before). Furthermore, this implies that there will be n additional dangerous external contacts in the family-based strategy than in the random-based strategy.

These observations make us conclude that a random vaccination strategy will slow down the infection dynamics in village A while it will speed up infections in village B, and the opposite is true for the family-based immunization strategy.

Some simulation exercises

In this part of the document, the model described above will be used to compare further the family-based and random-based vaccination strategy to be used against the appearance of a new case (or variant) in a situation similar to that described in the example but with a more realistic setting.

As one can easily imagine, the family size distribution and COVID transmission risk in families are crucial to our simulation exercises. It is therefore important to gather real-world information for these phenomena. Fortunately, recent scientific contributions can help.

Several authors point out that a Poisson distribution is a good statistical model representing the family size distribution. This distribution is suitable because a single parameter characterizes it, i.e., its average, but it has the drawback of having a positive probability for zero value. Recently, Jarosz (2020) confirms the Poisson distribution’s goodness for modeling family size and shows how shifting it by one unit would be a valid alternative to solve the zero family size problem.

Furthermore, average family sizes data can be easily found using, for example, the OECD family database (http://www.oecd.org/social/family/database.htm).

The current version of the database (updated on 06-12-2016) presents data for 2015 with some exceptions. It shows how the average size of families in OECD countries is 2.46, ranging from Mexico (3.93) to Sweden (1.8).

The result in Metlay et al. (2021) guides the choice of the infection in the family parameter. They  provide evidence of an overall household infection risk of 10.1%

Simulation exercises consist in parameters sensitivity analysis with respect to the benchmark parameter set reported hereafter.

The simulation initialization is done by loading the family size distribution. Two alternative distributions are used and are tuned to obtain a system with a total number of individuals close to 20000. The two distributions are characterized by different average family sizes (afs) and are shown in figure 2.

Figure 2: two family size distributions used to initialize the simulation. Figures by the dots inform on the frequency of the corresponding size. Black square relates to the distribution with an average of 2.5; red circles relate to the distribution with an average of 3.5

The description of the vaccination strategy gives a possibility to list other relevant parameters. The immunization center is endowed with nv doses of vaccine at each time starting from time tv. At time t0, the state of one of the individuals is changed from susceptible to infected. This subject (case zero) is taken from a family having three susceptibles among their components.

Case zero undergoes the same process as all other following infected individuals described above.

The relevant parameters of the simulations are reported in table 1.

var description values reference
ni number of individuals ≅20000
afs average family size 2.5;3.5 OECD
nv number of vaccine doses available at each time 50;100;150
tv vaccination starting time 1
t0 case zero appearance time 10
ll length of latency 3 Buran et al 2021
li length of infectious period 5 Buran et al 2021
pif probability to infect a family member 0.1 Metlay et al 2021
pof probability to infect a non-family individual 0.01;0.02;0.03
mof number of non-family contacts of an infectious 10

Table 1: relevant parameters of the model.

We are now going to discuss the results of our simulation exercises. We focus particularly on the number of people infected up to a given point in time.

Due to the presence of random elements, each run has a different trajectory. We limit these effects as much as possible to allow ceteris paribus comparisons. For example, we keep the family size distribution equal across runs by loading the distributions displayed in figure 2 instead of using the run-time random number generator. Again, we set the number of non-family contacts (mof) equal for all the agents, although the code could set it randomly at each time step. Despite these randomness reductions, significant differences in the dynamics remain within the same parametrization because of randomness in the network of contacts.

To allow comparisons among different parametrizations in the presence of different evolution, we use the cross-section distributions of the total number of infected at the end of the infection process (i.e. time 200).

Figure 3 reports the empirical cumulative distribution function (ecdf) of several parametrizations. To easily read the figure, we put the different charts as in a plane having the average family size (afs) in the abscissa and the number of available vaccines (nv) in the ordinate. From above, we know two values of afs (i.e. 2.5 and 3.5) and three values of nv (i.e. 50, 100 and 150) are considered. Therefore figure 3 is made up of 6 charts.

Each chart reports ecdfs corresponding to the three different pof levels reported in table 1. In particular, circles denote edcfs for pof = 0.01, squares are for  pof = 0.02 and triangles for  pof = 0.03. At the end, choosing a parameters values triplet (afs, nv, pof), two ecdfs are identified. The red one is for the random-based, while the black one is for the family-based vaccination strategy. The family based vaccination strategy prioritizes families with higher number of members not yet infected.

Figure 3 shows mixed results: the random-based vaccination strategy outperforms the family-based one (the red line is above the balck one) for some parameters combinations while the reverse holds for others. In particular, the random-based tends to dominate the family-based strategy in case of larger family (afs = 3.5) and low and high vaccination levels (nv = 50 and 150). The opposite is true with smaller families at the same vaccination levels. The intermediate level of vaccination provides exceptions.

Figure 3: empirical cumulative distribution function of several parametrizations. The ecdfs is build by taking the number of infected people at period 200 of 100 runs with different random seed for each parametrization.

It is perhaps useful to highlight how, in the model, the family-based vaccination strategy stops the diffusion of a new wave or variant with a significant probability for smaller average family size and low and high vaccination levels (bottom-left and top-left charts) and for large average family size and middle level of vaccination (middle-right chart).

A conclusive note

At present, the model is very simple and can be improved in several directions. The most useful would probably be the inclusion of family-specific information. Setting up the model with additional information on each family member’s age or health state would allow overcoming the “universal mixing assumption” (Watts et al., 2020) currently in the model. Furthermore, additional vaccination strategy prioritization based on multiple criteria (such as vaccinating the families of most fragile or elderly) could be compared.

Initializing the model with census data of a local community could give a chance to analyze a more realistic setting in the wake of Pescarmona et al. (2020) and be more useful and understandable to (local) policy makers (Edmonds, 2020).

Developing the model to provide estimations for hospitalization and mortality is another needed step towards more sound vaccination strategies comparison.

Vaccinating by families could balance direct (vaccinating highest risk individuals) and indirect protection, i.e., limiting the probability the virus reaches most fragiles by vaccinating people with many contacts. It could also have positive economic effects relaunching, for example, family tourism. However, it cannot be implemented at risk of worsening the pandemic.

The present text aims only at posing a question. Further assessments following Squazzoni et al.’s (2020) recommendations are needed.

References

Barton, C.M. et al. (2020) Call for transparency of COVID-19 models. Science, 368(6490), 482-483. doi:10.1126/science.abb8637

Bubar, K.M. et al. (2021) Model-informed COVID-19 vaccine prioritization strategies by age and serostatus. Science 371, 916–921. doi:10.1126/science.abe6959

Edmonds, B. (2020) What more is needed for truly democratically accountable modelling? Review of Artificial Societies and Social Simulation, 2nd May 2020. https://rofasss.org/2020/05/02/democratically-accountable-modelling/

Jarosz, B. (2021) Poisson Distribution: A Model for Estimating Households by Household Size. Population Research and Policy Review, 40, 149–162. doi:10.1007/s11113-020-09575-x

Metlay J.P., Haas J.S., Soltoff A.E., Armstrong KA. Household Transmission of SARS-CoV-2. (2021) JAMA Netw Open, 4(2):e210304. doi:10.1001/jamanetworkopen.2021.0304

Pescarmona, G., Terna, P., Acquadro, A., Pescarmona, P., Russo, G., and Terna, S. (2020) How Can ABM Models Become Part of the Policy-Making Process in Times of Emergencies – The S.I.S.A.R. Epidemic Model. Review of Artificial Societies and Social Simulation, 20th Oct 2020. https://rofasss.org/2020/10/20/sisar/

Watts, C.J., Gilbert, N., Robertson, D., Droy, L.T., Ladley, D and Chattoe-Brown, E. (2020) The role of population scale in compartmental models of COVID-19 transmission. Review of Artificial Societies and Social Simulation, 14th August 2020. https://rofasss.org/2020/08/14/role-population-scale/

Squazzoni, F., Polhill, J. G., Edmonds, B., Ahrweiler, P., Antosz, P., Scholz, G., Chappin, É., Borit, M., Verhagen, H., Giardini, F. and Gilbert, N. (2020) Computational Models That Matter During a Global Pandemic Outbreak: A Call to Action. Journal of Artificial Societies and Social Simulation, 23(2):10. <http://jasss.soc.surrey.ac.uk/23/2/10.html>. doi: 10.18564/jasss.4298


Giulioni, G. (2020) Should the family size be used in COVID-19 vaccine prioritization strategy to prevent variants diffusion? A first investigation using a basic ABM. Review of Artificial Societies and Social Simulation, 15th April 2021. https://rofasss.org/2021/04/15/famsize/