Tag Archives: comment

Is The Journal of Artificial Societies and Social Simulation Parochial? What Might That Mean? Why Might It Matter?

By Edmund Chattoe-Brown

Introduction

The Journal of Artificial Societies and Social Simulation (hereafter JASSS) retains a distinctive position amongst journals publishing articles on social simulation and Agent-Based Modelling. Many journals have published a few Agent-Based Models, some have published quite a few but it is hard to name any other journal that predominantly does this and has consistently done so over two decades. Using Web of Science on 25.07.22, there are 5540 hits including the search term <“agent-based model”> anywhere in their text. JASSS does indeed have the most of any single journal with 268 hits (5% of the total to the nearest integer). The basic search returns about 200 distinct journals and about half of these have 10 hits or less. Since this search is arranged by hit count, this means that the unlisted journals have even fewer hits than those listed i. e. less than 7 per journal. This supports the claim that the great majority of journals have very limited engagement with Agent-Based Modelling. Note that the point here is to evidence tendencies effectively and not to claim that this specific search term tells us the precise relative frequency of articles on the subject of Agent-Based Modelling in different journals.

This being so, it seems reasonable – and desirable for other practical reasons like being entirely open access, online and readily searchable – to use JASSS as a sample – though clearly not necessarily a representative sample – of what may be happening in Agent-Based Modelling more generally. This is the case study approach (Yin 2009) where smaller samples may be practically unavoidable to discuss richer or more complex phenomena like the actual structures of arguments rather than something quantitative like, say, the number of sources cited by each article.

This piece is motivated by the scepticism that some reviewers have displayed about such a case study approach focused on JASSS and conclusions drawn from it. It is actually quite strange to have the editors and reviewers of a journal argue against its ability to tell us anything useful about wider Agent-Based Modelling research even as a starting point (particularly since this approach has been used in articles previously published in the journal, see for example, Meyer et al. 2009 and Hauke et al. 2017). Of course, it is a given that different journals have unique editorial policies, distinct reviewer pools and so on. Though this may mean, for example, that journals only irregularly publishing Agent-Based Models are actually less typical because it is more arbitrary who reviews for them and there may therefore be less reviewing skill and consensus about the value of articles involved. Anecdotally, I have found this to be true in medical journals where excellent articles rub shoulders with much more problematic ones in a small overall pool. The point of my argument is not to claim that JASSS can really stand in for ABM research as a whole – which it plainly cannot – but that, if the case study approach is to be accepted at all, JASSS is one of the few journals that successfully qualifies for it on empirically justifiable grounds. Conversely, given the potentially distinctive character of journals and the wide spread of Agent-Based Modelling, attempts at representative sampling may be very challenging in resource terms.

Method and Results

Again, using Web of Science on 04.07.22, I searched for the most highly cited articles containing the string “opinion dynamics”. I am well aware that this will not capture all articles that actually have opinion dynamics as their subject matter but this is not the intention. The intention is to describe a reproducible and measurable procedure correlated with the importance of articles so my results can be checked, criticised and extended. Comparing results based on other search terms would be part of that process. Then I took the first ten distinct journals that could be identified from this set of articles in order of citation count. The idea here was to see what journals had published the most important articles in the field overall – at least as identified by this particular search term – and then follow up their coverage of opinion dynamics generally. In addition, for each journal, I accessed the top 50 most cited articles and then checked how many articles containing the string “opinion dynamics” featured in that top 50. The idea here was to assess the extent to which opinion dynamics articles were important to the impact of a particular journal. Table 1 shows the results of this analysis.

Journal Title “opinion dynamics” Articles in the Top 50 Most Cited Most Highly Cited “opinion dynamics” Article Citations Number of Articles Containing the String “opinion dynamics”
Reviews of Modern Physics 0 2380 1
JASSS 6 1616 64
International Journal of Modern Physics C 4 376 72
Dynamic Games and Applications 1 338 5
Physical Review Letters 0 325 5
Global Challenges 1 272 1
IEEE Transactions on Automatic Control 0 269 38
SIAM Review 0 258 2
Central European Journal of Operations Research 1 241 1
Physica A: Statistical Mechanics and Its Applications 0 231 143

Table 1. The Coverage, Commitment and Importance of Different Journals in Regard to “opinion dynamics”: Top Ten by Citation Count of Most Influential Article.

This list attempts to provide two somewhat separate assessments of a journal with regard to “opinion dynamics”. The first is whether it has a substantial body of articles on the topic: Coverage. The second is whether, by the citation levels of the journal generally, “opinion dynamics” models are important to it: Commitment. These journals have been selected on a third dimension, their ability to contribute at least one very influential article to the literature as a whole: Importance.

The resulting patterns are interesting in several ways. Firstly, JASSS appears unique in this sample in being a clearly social science journal rather than a physical science journal or one dealing with instrumental problems like operations research or automatic control. It is an interesting corollary how many “opinion dynamics” models in a physics journal will have been reviewed by social scientists or modellers with a social science orientation at least. This is part of a wider question about whether, for example, physics journals are mainly interested in these models as formal systems rather than as having likely application to real societies. Secondly, 3 journals out of 10 have only a single “opinion dynamics” article – and a further journal has only 2 – which are nonetheless, extremely highly cited relative to such articles as a whole. It is unclear whether this “only one but what a one” pattern has any wider significance. It should also be noted that the most highly cited article in JASSS is four times more highly cited than the next most cited. Only 4 of these journals out of 10 could really be said to have a usable sample of such articles for case study analysis. Thirdly, only 2 journals out of 10 have a significant number of articles sufficiently important that they appear in the top 50 most cited and 5 journals have no “opinion dynamics” articles in their top 50 most cited at all. This makes the point that a journal can have good coverage of the topic and contain at least one highly cited article without “opinion dynamics” necessarily being a commitment of the journal.

Thus it seems that to be a journal contributing at least one influential article to the field as a whole, to have several articles that are amongst the most cited by that journal and to have a non-trivial number of articles overall is unusual. Only one other journal in the top 10 meets all three criteria (International Journal of Physics C). This result is corroborated in Table 2 which carries out the same analysis for all additional journals containing at least one highly cited “opinion dynamics” article (with an arbitrary cut off of at least 100 citations for that article). There prove to be fourteen such journals in addition to the ten above.

Journal Title “opinion dynamics” Articles in the Top 50 Most Cited Most Highly Cited “opinion dynamics” Article Citations Number of Articles Containing the String “opinion dynamics”
Mathematics of Operations Research 1 215 2
Information Sciences 0 186 14
Physica D: Nonlinear Phenomena 0 182 4
Journal of Complex Networks 1 177 5
Annual Reviews in Control 2 165 4
Information Fusion 0 154 11
IEEE Transactions on Control of Network Systems 3 151 12
Automatica 0 141 32
Public Opinion Quarterly 0 132 5
Physical Review E 0 129 74
SIAM Journal on Control and Optimization 0 127 13
Europhysics Letters 0 116 3
Knowledge-Based Systems 0 112 5
Scientific Reports 0 111 26

Table 2. The Coverage, Commitment and Importance of Different Journals in Regard to “opinion dynamics”: All Remaining Distinct Journals whose most important “opinion dynamics” article receives at least 100 citations.

Table 2 confirms the dominance of physical science journals and those solving instrumental problems as opposed to those evidently dealing with the social sciences: A few terms like complex networks are ambivalent in this regard however. Further it confirms the scarcity of journals that simultaneously contribute at least one influential article to the wider field, have a sensibly sized sample of articles on this topic – so that provisional but nonetheless empirical hypotheses might be derived from a case study – and have “opinion dynamics” articles in their top 50 most cited articles as a sign of the importance of the topic to the journal and its readers. To some extent, however, the latter confirmation is an unavoidable artefact of the sampling strategy. As the most cited article becomes less highly cited, the chance it will appear in the top 50 most cited for a particular journal will almost certainly fall unless the journal is very new or generally not highly cited.

As a third independent check, I again used Web of Science to identify all journals which had – somewhat arbitrarily – at least 30 articles on “opinion dynamics”, giving some sense of their contribution. Only two more journals (see Table 3) not already occurring in the two tables above were identified. Generally, this analysis considers only journal articles and not conference proceedings and book chapter serials whose peer review status is less clear/comparable.

Journal Title “opinion dynamics” Articles in the Top 50 Most Cited Most Highly Cited “opinion dynamics” Article Citations Number of Articles Containing the String “opinion dynamics”
Advances in Complex Systems 5 54 42
Plos One 0 53 32

Table 3. The Coverage, Commitment and Importance of Different Journals: All Journals with at Least 30 “opinion dynamics” hits not already listed in Tables 1 and 2.

This cross check shows that while the additional journals do have sample of articles large enough to form the basis for a case study, they either have not yet contributed a really influential article to the wider field – less than half the number of citations of the journals which qualify for Tables 1 and 2, do not have a high commitment to opinion dynamics – in terms of impact within the journal and among its readers – or both.

Before concluding this analysis, it is worth briefly reflecting on what these three criteria jointly tell us – though other criteria could also be used in further research. By sampling on highly cited articles we focus on journals that have managed to go beyond their core readership and influence the field as a whole. There is a danger that journals that have never done this are merely “talking to themselves” and may therefore form a less effective basis for a case study speaking to the field as a whole. By attending to the number of articles in the top 50 for the journal, we get a sense of whether the topic is central (or only peripheral) to that journal/its readership and, again, journals where the topic is central stand a chance of being better case studies than those where it is peripheral. The criteria for having enough articles is simply a practical one for conducting a meaningful case study. Researchers using different methods may disagree about how many instances you need to draw useful conclusions but there is general agreement that it is more than one!

Analysis and Conclusions

The present article was motivated by an attempt to evaluate the claim that JASSS may be parochial and therefore not constitute a suitable basis for provisional hypotheses generated by case study analysis of its articles. Although the argument presented here is clearly rough and ready – and could be improved on by subsequent researchers – it does not appear to support this claim. JASSS actually seems to be one of very few journals – arguably the only social science one – that simultaneously has made at least one really influential contribution to the wider field of opinion dynamics, has a large enough number of articles on the topic for plausible generalisation and has quite a few such articles in its top 50, which shows the importance of the topic to the journal and its wider readership. Unless one wishes to reject case study analysis altogether, there are – in fact – very few other journals on which it can effectively be done for this topic.

But actually, my main conclusion is a wider reflection on peer reviewing, sampling and scientific progress based on reviewer resistance to the case study approach. There are 1386 articles with the search term “opinion dynamics” in Web of Science as of 25.07.22. It is clearly not realistic for one article – or even one book – to analyse all that content, particularly qualitatively. This being so we have to consider what is practical and adequate to generate hypotheses suitable for publication and further development of research along these lines. Case studies of single journals are not the only strategy but do have a recognised academic tradition in methodology (Brown 2008). We could sample randomly from the population of articles but I have never yet seen such a qualitative analysis based on sampling and it is not clear whether it would be any better received by potential reviewers. (In particular, with many journals each having only a few examples of Agent-Based Models, realistically low sampling rates would leave many journals unrepresented altogether which would be a problem if they had distinctive approaches.)  Most journals – including JASSS – have word limits and this restricts how much you can report. Qualitative analysis is more drawn-out than quantitative analysis which limits this research style further in terms of practical sample sizes. Both reading whole articles for analysis and writing up the resulting conclusions takes more resources of time and word count. As long as one does not claim that a qualitative analysis from JASSS can stand for all Agent-Based Modelling – but is merely a properly grounded hypothesis for further investigation – and shows ones working properly to support that further investigation, it isn’t really clear why that shouldn’t be sufficient for publication. Particularly as I have now shown that JASSS isn’t notably parochial along several potentially relevant dimensions. If a reviewer merely conjectures that your results won’t generalise, isn’t the burden of proof then on them to do the corresponding analysis and publish it? Otherwise the danger is that we are setting conjecture against actual evidence – however imperfect – and this runs the risk of slowing scientific progress by favouring research compatible with traditionally approved perspectives in publication. It might be useful to revisit the everyday idea of burden of proof in assessing the arguments of reviewers. What does it take in terms of evidence and argument (rather than simply power) for a comment by a reviewer to scientifically require an answer? It is a commonplace that a disproved hypothesis is more valuable to science than a mere conjecture or something that cannot be proven one way or another. One reason for this is that scientific procedure illustrates methodological possibility as well as generating actual results. A sample from JASSS may not stand for all research but it shows how a conclusion might ultimately be reached for all research if the resources were available and the administrative constraints of academic publishing could be overcome.

As I have argued previously (Chattoe-Brown 2022), and has now been pleasingly illustrated (Keijzer 2022), this situation may create an important and distinctive role for RofASSS. It may be valuable to get hypotheses, particularly ones that potentially go against the prevailing wisdom, “out there” so they can subsequently be tested more rigorously rather than having to wait until the framer of the hypothesis can meet what may be a counsel of perfection from peer reviewers. Another issue with reviewing is a tendency to say what will not do rather than what will do. This rather the puts the author at the mercy of reviewers during the revision process. RofASSS can also be used to hive off “contextual” analyses – like this one regarding what it might mean for a journal to be parochial – so that they can be developed in outline for the general benefit of the Agent-Based Modelling community – rather than having to add length to specific articles depending on the tastes of particular reviewers.

Finally, as should be obvious, I have only suggested that JASSS is not parochial in regard to articles involving the string “opinion dynamics”. However, I have also illustrated how this kind of analysis could be done systematically for different topics to justify the claim that a particular journal can serve as a reasonable basis for a case study.

Acknowledgements

This analysis was funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5.

References

Brown, Patricia Anne (2008) ‘A Review of the Literature on Case Study Research’, Canadian Journal for New Scholars in Education/Revue Canadienne des Jeunes Chercheures et Chercheurs en Éducation, 1(1), July, pp. 1-13, https://journalhosting.ucalgary.ca/index.php/cjnse/article/view/30395.

Chattoe-Brown, E. (2022) ‘If You Want to Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly in Need of Refutation’, Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models

Hauke, Jonas, Lorscheid, Iris and Meyer, Matthias (2017) ‘Recent Development of Social Simulation as Reflected in JASSS Between 2008 and 2014: A Citation and Co-Citation Analysis’, Journal of Artificial Societies and Social Simulation, 20(1), 5. https://www.jasss.org/20/1/5.html. doi:10.18564/jasss.3238

Keijzer, M. (2022) ‘If You Want to be Cited, Calibrate Your Agent-Based Model: Reply to Chattoe-Brown’, Review of Artificial Societies and Social Simulation, 9th Mar 2022. https://rofasss.org/2022/03/09/Keijzer-reply-to-Chattoe-Brown

Meyer, Matthias, Lorscheid, Iris and Troitzsch, Klaus G. (2009) ‘The Development of Social Simulation as Reflected in the First Ten Years of JASSS: A Citation and Co-Citation Analysis’, Journal of Artificial Societies and Social Simulation, 12(4), 12,. https://www.jasss.org/12/4/12.html.

Yin, R. K. (2009) Case Study Research: Design and Methods, fourth edition (Thousand Oaks, CA: Sage).


Chattoe-Brown, E. (2022) Is The Journal of Artificial Societies and Social Simulation Parochial? What Might That Mean? Why Might It Matter? Review of Artificial Societies and Social Simulation, 10th Sept 2022. https://rofasss.org/2022/09/10/is-the-journal-of-artificial-societies-and-social-simulation-parochial-what-might-that-mean-why-might-it-matter/


 

If you want to be cited, calibrate your agent-based model: A Reply to Chattoe-Brown

By Marijn A. Keijzer

This is a reply to a previous comment, (Chattoe-Brown 2022).

The social simulation literature has called on its proponents to enhance the quality and realism of their contributions through systematic validation and calibration (Flache et al., 2017). Model validation typically refers to assessments of how well the predictions of their agent-based models (ABMs) map onto empirically observed patterns or relationships. Calibration, on the other hand, is the process of enhancing the realism of the model by parametrizing it based on empirical data (Boero & Squazzoni, 2005). We would expect that presenting a validated or calibrated model serves as a signal of model quality, and would thus be a desirable characteristic of a paper describing an ABM.

In a recent contribution to RofASSS, Edmund Chattoe-Brown provocatively argued that model validation does not bear fruit for researchers interested in boosting their citations. In a sample of articles from JASSS published on opinion dynamics he observed that “the sample clearly divides into non-validated research with more citations and validated research with fewer” (Chattoe-Brown, 2022). Well-aware of the bias and limitations of the sample at hand, Chattoe-Brown calls on refutation of his hypothesis. An analysis of the corpus of articles in Web of Science, presented here, could serve that goal.

To test whether there exists an effect of model calibration and/or validation on the citation counts of papers, I compare citation counts of a larger number of original research articles on agent-based models published in the literature. I extracted 11,807 entries from Web of Science by searching for items that contained the phrases “agent-based model”, “agent-based simulation” or “agent-based computational model” in its abstract.[1] I then labeled all items that mention “validate” in its abstract as validated ABMs and those that mention “calibrate” as calibrated ABMs. This measure if rather crude, of course, as descriptions containing phrases like “we calibrated our model” or “others should calibrate our model” are both labeled as calibrated models. However, if mentioning that future research should calibrate or validate the model is not related to citations counts (which I would argue it indeed is not), then this inaccuracy does not introduce bias.

The shares of entries that mention calibration or validation are somewhat small. Overall, just 5.62% of entries mention validation, 3.21% report a calibrated model and 0.65% fall in both categories. The large sample size, however, will still enable the execution of proper statistical analysis and hypothesis testing.

How are mentions of calibration and validation in the abstract related to citation counts at face value? Bivariate analyses show only minor differences, as revealed in Figure 1. In fact, the distribution of citations for validated and non-validated ABMs (panel A) is remarkably similar. Wilcoxon tests with continuity correction—the nonparametric version of the simple t test—corroborate their similarity (W = 3,749,512, p = 0.555). The differences in citations between calibrated and non-calibrated models appear, albeit still small, more pronounced. Calibrated ABMs are cited slightly more often (panel B), as also supported by a bivariate test (W = 1,910,772, p < 0.001).

Picture 1

Figure 1. Distributions of number of citations of all the entries in the dataset for validated (panel A) and calibrated (panel B) ABMs and their averages with standard errors over years (panels C and D)

Age of the paper might be a more important determinant of citation counts, as panels C and D of Figure 1 suggest. Clearly, the age of a paper should be important here, because older papers have had much more opportunity to get cited. In particular, papers younger than 10 years seem to not have matured enough for its citation rates to catch up to older articles. When comparing the citation counts of purely theoretical models with calibrated and validated versions, this covariate should not be missed, because the latter two are typically much younger. In other words, the positive relationship between model calibration/validation and citation counts could be hidden in the bivariate analysis, as model calibration and validation are recent trends in ABM research.

I run a Poisson regression on the number of citations as explained by whether they are validated and calibrated (simultaneously) and whether they are both. The age of the paper is taken into account, as well as the number of references that the paper uses itself (controlling for reciprocity and literature embeddedness, one might say). Finally, the fields in which the papers have been published, as registered by Web of Science, have been added to account for potential differences between fields that explains both citation counts and conventions about model calibration and validation.

Table 1 presents the results from the four models with just the main effects of validation and calibration (model 1), the interaction of validation and calibration (model 2) and the full model with control variables (model 3).

Table 1. Poisson regression on the number of citations

# Citations
(1) (2) (3)
Validated -0.217*** -0.298*** -0.094***
(0.012) (0.014) (0.014)
Calibrated 0.171*** 0.064*** 0.076***
(0.014) (0.016) (0.016)
Validated x Calibrated 0.575*** 0.244***
(0.034) (0.034)
Age 0.154***
(0.0005)
Cited references 0.013***
(0.0001)
Field included No No Yes
Constant 2.553*** 2.556*** 0.337**
(0.003) (0.003) (0.164)
Observations 11,807 11,807 11,807
AIC 451,560 451,291 301,639
Note: *p<0.1; **p<0.05; ***p<0.01

The results from the analyses clearly suggest a negative effect of model validation and a positive effect of model calibration on the likelihood of being cited. The hypothesis that was so “badly in need of refutation” (Chattoe-Brown, 2022) will remain unrefuted for now. The effect does turn positive, however, when the abstract makes mention of calibration as well. In both the controlled (model 3) and uncontrolled (model 2) analyses, combining the effects of validation and calibration yields a positive coefficient overall.[2]

The controls in model 3 substantially affect the estimates from the three main factors of interest, while remaining in expected directions themselves. The age of a paper indeed helps its citation count, and so does the number of papers the item cites itself. The fields, furthermore, take away from the main effects somewhat, too, but not to a problematic degree. In an additional analysis, I have looked at the relationship between the fields and whether they are more likely to publish calibrated or validated models and found no substantial relationships. Citation counts will differ between fields, however. The papers in our sample are more often cited in, for example, hematology, emergency medicine and thermodynamics. The ABMs in the sample coming from toxicology, dermatology and religion are on the unlucky side of the equation, receiving less citations on average. Finally, I have also looked at papers published in JASSS specifically, due to the interest of Chattoe-Brown and the nature of this outlet. Surprisingly, the same analyses run on the subsample of these papers (N=376) showed a negative relationship between citation counts and model calibration/validation. Does the JASSS readership reveal its taste for artificial societies?

In sum, I find support for the hypothesis of Chattoe-Brown (2022) on the negative relationship between model validation and citations counts for papers presenting ABMs. If you want to be cited, you should not validate your ABM. Calibrated ABMs, on the other hand, are more likely to receive citations. What is more, ABMs that were both calibrated and validated are most the most successful papers in the sample. All conclusions were drawn considering (i.e. controlling for) the effects of age of the paper, the number of papers the paper cited itself, and (citation conventions in) the field in which it was published.

While the patterns explored in this and Chattoe-Brown’s recent contribution are interesting, or even puzzling, they should not distract from the goal of moving towards realistic agent-based simulations of social systems. In my opinion, models that combine rigorous theory with strong empirical foundations are instrumental to the creation of meaningful and purposeful agent-based models. Perhaps the results presented here should just be taken as another sign that citation counts are a weak signal of academic merit at best.

Data, code and supplementary analyses

All data and code used for this analysis, as well as the results from the supplementary analyses described in the text, are available here: https://osf.io/x9r7j/

Notes

[1] Note that the hyphen between “agent” and “based” does not affect the retrieved corpus. Both contributions that mention “agent based” and “agent-based” were retrieved.

[2] A small caveat to the analysis of the interaction effect is that the marginal improvement of model 2 upon model 1 is rather small (AIC difference of 269). This is likely (partially) due to the small number of papers that mention both calibration and validation (N=77).

Acknowledgements

Marijn Keijzer acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010.

References

Boero, R., & Squazzoni, F. (2005). Does empirical embeddedness matter? Methodological issues on agent-based models for analytical social science. Journal of Artificial Societies and Social Simulation, 8(4), 1–31. https://www.jasss.org/8/4/6.html

Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models

Flache, A., Mäs, M., Feliciani, T., Chattoe-Brown, E., Deffuant, G., Huet, S., & Lorenz, J. (2017). Models of social influence: towards the next frontiers. Journal of Artificial Societies and Social Simulation, 20(4). https://doi.org/10.18564/jasss.3521


Keijzer, M. (2022) If you want to be cited, calibrate your agent-based model: Reply to Chattoe-Brown. Review of Artificial Societies and Social Simulation, 9th Mar 2022. https://rofasss.org/2022/03/09/Keijzer-reply-to-Chattoe-Brown


 

The Poverty of Suggestivism – the dangers of “suggests that” modelling

By Bruce Edmonds

Vagueness and refutation

A model[1] is basically composed of two parts (Zeigler 1976, Wartofsky 1979):

  1. A set of entities (such as mathematical equations, logical rules, computer code etc.) which can be used to make some inferences as to the consequences of that set (usually in conjunction with some data and parameter values)
  2. A mapping from this set to what it aims to represent – what the bits mean

Whilst a lot of attention has been paid to the internal rigour of the set of entities and the inferences that are made from them (1), the mapping to what that represents (2) has often been left as implicit or incompletely described – sometimes only indicated by the labels given to its parts. The result is a model that vaguely relates to its target, suggesting its properties analogically. There is not a well-defined way that the model is to be applied to anything observed, but a new map is invented each time it is used to think about a particular case. I call this way of modelling “Suggestivism”, because the model “suggests” things about what is being modelled.

This is partly a recapitulation of Popper’s critique of vague theories in his book “The Poverty of Historicism” (1957). He characterised such theories as “irrefutable”, because whatever the facts, these theories could be made to fit them. Irrefutability is an indicator of a lack of precise mapping to reality – such vagueness makes refutation very hard. However, it is only an indicator; there may be other reasons than vagueness for it not being possible to test a theory – it is their disconnection from well-defined empirical reference that is the issue here.

Some might go as far as suggesting that any model or theory that is not refutable is “unscientific”, but this goes too far, implying a very restricted definition of what ‘science’ is. We need analogies to think about what we are doing and to gain insight into what we are studying, e.g. (Hartman 1997) – for humans they are unavoidable, ‘baked’ into the way language works (Lakoff 1987). A model might make a set of ideas clear and help map out the consequences of a set of assumptions/structures/processes. Many of these suggestivist models relate to a set of ideas and it is the ideas that relate to what is observed (albeit informally) (Edmonds 2001). However, such models do not capture anything reliable about what they refer to, and in that sense are not part of the set of the established statements and theories that is at the core of science  (Arnold 2014).

The dangers of suggestivist modelling

As above, there are valid uses of abstract or theoretical modelling where this is explicitly acknowledged and where no conclusions about observed phenomena are made. So what are the dangers of suggestivist modelling – why am I making such a fuss about it?

Firstly, that people often seem to confuse a model as an analogy – a way of thinking about stuff – and a model that tells us reliably about what we are studying. Thus they give undue weight to the analyses of abstract models that are, in fact, just thought experiments. Making models is a very intimate way of theorising – one spends an extended period of time interacting with one’s model: developing, checking, analysing etc. The result is a particularly strong version of “Kuhnian Spectacles” (Kuhn 1962) causing us to see the world though our model for weeks after. Under this strong influence it is natural to confuse what we can reliably infer about the world and how we are currently perceiving/thinking about it. Good scientists should then pause and wait for this effect to wear off so that they can effectively critique what they have done, its limitations and what its implications are. However, often in the rush to get their work out, modellers often do not do this, resulting in a sloppy set of suggestive interpretations of their modelling.

Secondly, empirical modelling is hard. It is far easier (and, frankly, more fun) to play with non-empirical models. A scientific culture that treats suggestivist modelling as substantial progress and significantly rewards modellers that do it, will effectively divert a lot of modelling effort in this direction. Chattoe-Brown (2018) displayed evidence of this in his survey of opinion dynamics models – abstract, suggestivist modelling got far more reward (in terms of citations) than those that tried to relate their model to empirical data in a direct manner. Abstract modelling has a role in science, but if it is easier and more rewarding then the field will become unbalanced. It may give the impression of progress but not deliver on this impression. In a more mature science, researchers working on measurement methods (steps from observation to models) and collecting good data are as important as the theorists (Moss 1998).

Thirdly, it is hard to judge suggestivist models. Given their connection to the modelling target is vague there cannot be any decisive test of its success. Good modellers should declare the exact purpose of their model, e.g. that is analogical or merely exploring the consequences of theory (Edmonds et al. 2019), but then accept the consequences of this choice – namely, that it excludes  making conclusions about the observed world. If it is for a theoretical exploration then the comprehensiveness of the exploration, the scope of the exploration and the applicability of the model can be judged, but if the model is analogical or illustrative then this is harder. Whilst one model may suggest X, another may suggest the opposite. It is quite easy to fix a model to get the outcomes one wants. Clearly, if a model makes startling suggestions – illustrating totally new ideas or making a counter-example to widely held assumptions – then this helps science by widening the pool of theories or hypotheses that are considered. However most suggestivist modelling does not do this.

Fourthly, their sheer flexibility of as to application causes problems – if one works hard enough one can invent mappings to a wide range of cases, the limits are only those of our imagination. In effect, having a vague mapping from model to what it models adds in huge flexibility in a similar way to having a large number of free (non-empirical) parameters. This flexibility gives an impression of generality, and many desire simple and general models for complex phenomena. However, this is illusory because a different mapping is needed for each case, to make it apply. Given the above (1)+(2) definition of a model this means that, in fact, it is a different model for each case – what a model refers to, is part of the model. The same flexibility makes such models impossible to refute, since one can just adjust the mapping to save them. The apparent generality and lack of refutation means that such models hang around in the literature, due to their surface attractiveness.

Finally, these kinds of model are hugely influential beyond the community of modellers to the wider public including policy actors. Narratives that start in abstract models make their way out and can be very influential (Vranckx 1999). Despite the lack of rigorous mapping from model to reality, suggestivist models look impressive, look scientific. For example, very abstract models from the Neo-Classical ‘Chicago School’ of economists supported narratives about the optimal efficiency of markets, leading to a reluctance to regulate them (Krugman 2009). A lack of regulation seemed to be one of the factors behind the 2007/8 economic crash (Baily et al 2008). Modellers may understand that other modellers get over-enthusiastic and over-interpret their models, but others may not. It is the duty of modellers to give an accurate impression of the reliability of any modelling results and not to over-hype them.

How to recognise a suggestivist model

It can be hard to detangle how empirically vague a model is, because many descriptions about modelling work do not focus on making the mapping to what it represents precise. The reasons for this are various, for example: the modeller might be conflating reality and what is in the model in their minds, the researcher is new to modelling and has not really decided what the purpose of their model is, the modeller might be over-keen to establish the importance of their work and so is hyping the motivation and conclusions, they might simply not got around to thinking enough about the relationship between their model and what it might represent, or they might not have bothered to make the relationship explicit in their description. Whatever the reason the reader of any description of such work is often left with an archaeological problem: trying to unearth what the relationship might be, based on indirect clues only. The only way to know for certain is to take a case one knows about and try and apply the model to it, but this is a time consuming process and relies upon having a case with suitable data available. However, there are some indicators, albeit fallible ones, including the following.

  • A relatively simple model is interpreted as explaining a wide range of observed, complex phenomena
  • No data from an observed case study is compared to data from the model (often no data is brought in at all, merely abstract observations) – despite this, conclusions about some observed phenomena are made
  • The purpose of the model is not explicitly declared
  • The language of the paper seems to conflate talking about the model with what is being modelled
  • In the paper there are sudden abstraction ‘jumps’ between the motivation and the description of the model and back again to the interpretation of the results in terms of that motivation. The abstraction jumps involved are large and justified by some a priori theory or modelling precedents rather than evidence.

How to avoid suggestivist modelling

How to avoid the dangers of suggestivist modelling should be clear from the above discussion, but I will make them explicit here.

  • Be clear about the model purpose – that is does the model aim to achieve, which indicates how it should be judged by others (Edmonds et al 2019)
  • Do not make any conclusions about the real world if you have not related the model to any data
  • Do not make any policy conclusions – things that might affect other people’s lives – without at least some independent validation of the model outcomes
  • Document how a model relates (or should relate) to data, the nature of that data and maybe even the process whereby that data should be obtained (Achter et al 2019)
  • Be explicit as possible about what kinds of phenomena the model applies to – the limits of its scope
  • Keep the language about the model and what is being modelled distinct – for any statement it should be clear whether it is talking about the model or what it models (Edmonds 2020)
  • Highlight any bold assumptions in the specification of the model or describe what empirical foundation there is for them – be honest about these

Conclusion

Models can serve many different purposes (Epstein 2008). This is fine as long as the purpose of models are always made clear, and model results are not interpreted further than their established purpose allows. Research which gives the impression that analogical, illustrative or theoretical modelling can tell us anything reliable about observed complex phenomena is not only sloppy science, but can have a deleterious impact – giving an impression of progress whilst diverting attention from empirically reliable work. Like a bad investment: if it looks too good and too easy to be true, it probably isn’t.

Notes

[1] We often use the word “model” in a lazy way to indicate (1) rather than (1)+(2) in this definition, but a set of entities without any meaning or mapping to anything else is not a model, as it does not represent anything. For example, a random set of equations or program instructions does not make a model.

Acknowledgements

Bruce Edmonds is supported as part of the ESRC-funded, UK part of the “ToRealSim” project, grant number ES/S015159/1.

References

Achter, S., Borit, M., Chattoe-Brown, E., Palaretti, C. & Siebers, P.-O. (2019) Cherchez Le RAT: A Proposed Plan for Augmenting Rigour and Transparency of Data Use in ABM. Review of Artificial Societies and Social Simulation, 4th June 2019. https://rofasss.org/2019/06/04/rat/

Arnold, E. (2014). What’s wrong with social simulations?. The Monist, 97(3), 359-377. DOI:10.5840/monist201497323

Baily, M. N., Litan, R. E., & Johnson, M. S. (2008). The origins of the financial crisis. Fixing Finance Series – Paper 3, The Brookings Institution. https://www.brookings.edu/wp-content/uploads/2016/06/11_origins_crisis_baily_litan.pdf

Chattoe-Brown, E. (2018) What is the earliest example of a social science simulation (that is nonetheless arguably an ABM) and shows real and simulated data in the same figure or table? Review of Artificial Societies and Social Simulation, 11th June 2018. https://rofasss.org/2018/06/11/ecb/

Edmonds, B. (2001) The Use of Models – making MABS actually work. In. Moss, S. and Davidsson, P. (eds.), Multi Agent Based Simulation, Lecture Notes in Artificial Intelligence, 1979:15-32. http://cfpm.org/cpmrep74.html

Edmonds, B. (2020) Basic Modelling Hygiene – keep descriptions about models and what they model clearly distinct. Review of Artificial Societies and Social Simulation, 22nd May 2020. https://rofasss.org/2020/05/22/modelling-hygiene/

Edmonds, B., le Page, C., Bithell, M., Chattoe-Brown, E., Grimm, V., Meyer, R., Montañola-Sales, C., Ormerod, P., Root H. & Squazzoni. F. (2019) Different Modelling Purposes. Journal of Artificial Societies and Social Simulation, 22(3):6. http://jasss.soc.surrey.ac.uk/22/3/6.html.

Epstein, J. M. (2008). Why model?. Journal of artificial societies and social simulation, 11(4), 12. https://jasss.soc.surrey.ac.uk/11/4/12.html

Hartmann, S. (1997): Modelling and the Aims of Science. In: Weingartner, P. et al (ed.) : The Role of Pragmatics in Contemporary Philosophy: Contributions of the Austrian Ludwig Wittgenstein Society. Vol. 5. Wien und Kirchberg: Digi-Buch. pp. 380-385. https://epub.ub.uni-muenchen.de/25393/

Krugman, P. (2009) How Did Economists Get It So Wrong? New York Times, Sept. 2nd 2009. https://www.nytimes.com/2009/09/06/magazine/06Economic-t.html

Kuhn, T.S. (1962) The Structure of Scientific Revolutions. Chicago: University of Chicago Press.

Lakoff, G. (1987) Women, fire, and dangerous things. University of Chicago Press, Chicago.

Morgan, M. S., & Morrison, M. (1999). Models as mediators. Cambridge: Cambridge University Press.

Moss, S. (1998) Social Simulation Models and Reality: Three Approaches. Centre for Policy Modelling  Discussion Paper: CPM-98-35, http://cfpm.org/cpmrep35.html

Popper, K. (1957). The poverty of historicism. Routledge.

Vranckx, An. (1999) Science, Fiction & the Appeal of Complexity. In Aerts, Diederik, Serge Gutwirth, Sonja Smets, and Luk Van Langehove, (eds.) Science, Technology, and Social Change: The Orange Book of “Einstein Meets Magritte.” Brussels: Vrije Universiteit Brussel; Dordrecht: Kluwer., pp. 283–301.

Wartofsky, M. W. (1979). The model muddle: Proposals for an immodest realism. In Models (pp. 1-11). Springer, Dordrecht.

Zeigler, B. P. (1976). Theory of Modeling and Simulation. Wiley Interscience, New York.


Edmonds, B. (2022) The Poverty of Suggestivism – the dangers of "suggests that" modelling. Review of Artificial Societies and Social Simulation, 28th Feb 2022. https://rofasss.org/2022/02/28/poverty-suggestivism


 

If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation

By Edmund Chattoe-Brown

As part of a previous research project, I collected a sample of the Opinion Dynamics (hereafter OD) models published in JASSS that were most highly cited in JASSS. The idea here was to understand what styles of OD research were most influential in the journal. In the top 50 on 19.10.21 there were eight such articles. Five were self-contained modelling exercises (Hegselmann and Krause 2002, 58 citations, Deffuant et al. 2002, 35 citations, Salzarulo 2006, 13 citations, Deffuant 2006, 13 citations and Urbig et al. 2008, 9 citations), two were overviews of OD modelling (Flache et al. 2017, 13 citations and Sobkowicz 2009, 10 citations) and one included an OD example in an article mainly discussing the merits of cellular automata modelling (Hegselmann and Flache 1998, 12 citations). In order to get in to the top 50 on that date you had to achieve at least 7 citations. In parallel, I have been trying to identify Agent-Based Models that are validated (undergo direct comparison of real and equivalent simulated data). Based on an earlier bibliography (Chattoe-Brown 2020) which I extended to the end of 2021 for JASSS and articles which were described as validated in the highly cited articles listed above, I managed to construct a small and unsystematic sample of validated OD models. (Part of the problem with a systematic sample is that validated models are not readily searchable as a distinct category and there are too many OD models overall to make reading them all feasible. Also, I suspect, validated models just remain rare in line with the larger scale findings of Dutton and Starbuck (1971, p. 130, table 1) and discouragingly, much more recently, Angus and Hassani-Mahmooei (2015, section 4.5, figure 9). Obviously, since part of the sample was selected by total number of citations, one cannot make a comparison on that basis, so instead I have used the best possible alternative (given the limitations of the sample) and compared articles on citations per year. The problem here is that attempting validated modelling is relatively new while older articles inevitably accumulate citations however slowly. But what I was trying to discover was whether new validated models could be cited at a much higher annual rate without reaching the top 50 (or whether, conversely, older articles could have a high enough total citations to get into the top 50 without having a particularly impressive annual citation rate.) One would hope that, ultimately, validated models would tend to receive more citations than those that were not validated (but see the rather disconcerting related findings of Serra-Garcia and Gneezy 2021). Table 1 shows the results sorted by citations per year.

Article Status Number of JASSS Citations[1] Number of Years[2] Citations Per Year
Bernardes et al. 2002 Validated 1 20 0.05
Bernardes et al. 2001 Validated 2 21 0.096
Fortunato and Castellano 2007 Validated 2 15 0.13
Caruso and Castorina 2005 Validated 4 17 0.24
Chattoe-Brown 2014 Validated 2 8 0.25
Brousmiche et al. 2016 Validated 2 6 0.33
Hegselmann and Flache 1998 Non-Validated 12 24 0.5
Urbig et al. 2008 Non-Validated 9 14 0.64
Sobkowicz 2009 Non-Validated 10 13 0.77
Deffuant 2006 Non-Validated 13 16 0.81
Salzarulo 2006 Non-Validated 13 16 0.81
Duggins 2017 Validated 5 5 1
Deffuant et al. 2002 Non-Validated 35 20 1.75
Flache et al. 2017 Non-Validated 13 5 2.6
Hegselmann and Krause 2002 Non-Validated 58 20 2.9

Table 1. Annual Citation Rates for OD Articles Highly Cited in JASSS (Systematic Sample) and Validated OD Articles in or Cited in JASSS (Unsystematic Sample)

With the notable (and potentially encouraging) exception of Duggins (2017), the most recent validated OD model I have been able to discover in JASSS, the sample clearly divides into non-validated research with more citations and validated research with fewer. The position of Duggins (2017) might suggest greater recent interest in validated OD models. Unfortunately, however, qualitative analysis of the citations suggests that these are not cited as validated models per se (and thus as a potential improvement over non-validated models) but merely as part of general classes of OD model (like those involving social networks or repulsion – moving away from highly discrepant opinions). This tendency to cite validated models without acknowledging that they are validated (and what the implications of that might be) is widespread in the articles I looked at.

Obviously, there is plenty wrong with this analysis. Even looking at citations per annum we are arguably still partially sampling on the dependent variable (articles selected for being widely cited prove to be widely cited!) and the sample of validated OD models is unsystematic (though in fairness the challenges of producing a systematic sample are significant.[3]) But the aim here is to make a distinctive use of RoFASSS as a rapid mode of permanent publication and to think differently about science. If I tried to publish this in a peer reviewed journal, the amount of labour required to satisfy reviewers about the research design would probably be prohibitive (even if it were possible). As a result, the case to answer about this apparent (and perhaps undesirable) pattern in data might never see the light of day.

But by publishing quickly in RoFASSS without the filter of peer review I actively want my hypothesis to be rejected or replaced by research based on a better design (and such research may be motivated precisely by my presenting this interesting pattern with all its imperfections). When it comes to scientific progress, the chance to be clearly wrong now could be more useful than the opportunity to be vaguely right at some unknown point in the future.

Acknowledgements

This analysis was funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5 (PI: Professor Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University: https://gtr.ukri.org/projects?ref=ES%2FS015159%2F1).

Notes

[1] Note that the validated OD models had their citations counted manually while the high total citation articles had them counted automatically. This may introduce some comparison error but there is no reason to think that either count will be terribly inaccurate.

[2] Including the year of publication and the current year (2021).

[3] Note, however, that there are some checks and balances on sample quality. Highly successful validated OD models would have shown up independently in the top 50. There is thus an upper bound to the impact of the articles I might have missed in manually constructing my “version 1” bibliography. The unsystematic review of 47 articles by Sobkowicz (2009) also checks independently on the absence of validated OD models in JASSS to that date and confirms the rarity of such articles generally. Only four of the articles that he surveys are significantly empirical.

References

Angus, Simon D. and Hassani-Mahmooei, Behrooz (2015) ‘“Anarchy” Reigns: A Quantitative Analysis of Agent-Based Modelling Publication Practices in JASSS, 2001-2012’, Journal of Artificial Societies and Social Simulation, 18(4), October, article 16, <http://jasss.soc.surrey.ac.uk/18/4/16.html>. doi:10.18564/jasss.2952

Bernardes, A. T., Costa, U. M. S., Araujo, A. D. and Stauffer, D. (2001) ‘Damage Spreading, Coarsening Dynamics and Distribution of Political Votes in Sznajd Model on Square Lattice’, International Journal of Modern Physics C: Computational Physics and Physical Computation, 12(2), February, pp. 159-168. doi:10.1140/e10051-002-0013-y

Bernardes, A. T., Stauffer, D. and Kertész, J. (2002) ‘Election Results and the Sznajd Model on Barabasi Network’, The European Physical Journal B: Condensed Matter and Complex Systems, 25(1), January, pp. 123-127. doi:10.1142/S0129183101001584

Brousmiche, Kei-Leo, Kant, Jean-Daniel, Sabouret, Nicolas and Prenot-Guinard, François (2016) ‘From Beliefs to Attitudes: Polias, A Model of Attitude Dynamics Based on Cognitive Modelling and Field Data’, Journal of Artificial Societies and Social Simulation, 19(4), October, article 2, <https://www.jasss.org/19/4/2.html>. doi:10.18564/jasss.3161

Caruso, Filippo and Castorina, Paolo (2005) ‘Opinion Dynamics and Decision of Vote in Bipolar Political Systems’, arXiv > Physics > Physics and Society, 26 March, version 2. doi:10.1142/S0129183105008059

Chattoe-Brown, Edmund (2014) ‘Using Agent Based Modelling to Integrate Data on Attitude Change’, Sociological Research Online, 19(1), February, article 16, <https://www.socresonline.org.uk/19/1/16.html>. doi:0.5153/sro.3315

Chattoe-Brown Edmund (2020) ‘A Bibliography of ABM Research Explicitly Comparing Real and Simulated Data for Validation: Version 1’, CPM Report CPM-20-216, 12 June, <http://cfpm.org/discussionpapers/256>

Deffuant, Guillaume (2006) ‘Comparing Extremism Propagation Patterns in Continuous Opinion Models’, Journal of Artificial Societies and Social Simulation, 9(3), June, article 8, <https://www.jasss.org/9/3/8.html>.

Deffuant, Guillaume, Amblard, Frédéric, Weisbuch, Gérard and Faure, Thierry (2002) ‘How Can Extremism Prevail? A Study Based on the Relative Agreement Interaction Model’, Journal of Artificial Societies and Social Simulation, 5(4), October, article 1, <https://www.jasss.org/5/4/1.html>.

Duggins, Peter (2017) ‘A Psychologically-Motivated Model of Opinion Change with Applications to American Politics’, Journal of Artificial Societies and Social Simulation, 20(1), January, article 13, <http://jasss.soc.surrey.ac.uk/20/1/13.html>. doi:10.18564/jasss.3316

Dutton, John M. and Starbuck, William H. (1971) ‘Computer Simulation Models of Human Behavior: A History of an Intellectual Technology’, IEEE Transactions on Systems, Man, and Cybernetics, SMC-1(2), April, pp. 128-171. doi:10.1109/TSMC.1971.4308269

Flache, Andreas, Mäs, Michael, Feliciani, Thomas, Chattoe-Brown, Edmund, Deffuant, Guillaume, Huet, Sylvie and Lorenz, Jan (2017) ‘Models of Social Influence: Towards the Next Frontiers’, Journal of Artificial Societies and Social Simulation, 20(4), October, article 2, <http://jasss.soc.surrey.ac.uk/20/4/2.html>. doi:10.18564/jasss.3521

Fortunato, Santo and Castellano, Claudio (2007) ‘Scaling and Universality in Proportional Elections’, Physical Review Letters, 99(13), 28 September, article 138701. doi:10.1103/PhysRevLett.99.138701

Hegselmann, Rainer and Flache, Andreas (1998) ‘Understanding Complex Social Dynamics: A Plea For Cellular Automata Based Modelling’, Journal of Artificial Societies and Social Simulation, 1(3), June, article 1, <https://www.jasss.org/1/3/1.html>.

Hegselmann, Rainer and Krause, Ulrich (2002) ‘Opinion Dynamics and Bounded Confidence Models, Analysis, and Simulation’, Journal of Artificial Societies and Social Simulation, 5(3), June, article 2, <http://jasss.soc.surrey.ac.uk/5/3/2.html>.

Salzarulo, Laurent (2006) ‘A Continuous Opinion Dynamics Model Based on the Principle of Meta-Contrast’, Journal of Artificial Societies and Social Simulation, 9(1), January, article 13, <http://jasss.soc.surrey.ac.uk/9/1/13.html>.

Serra-Garcia, Marta and Gneezy, Uri (2021) ‘Nonreplicable Publications are Cited More Than Replicable Ones’, Science Advances, 7, 21 May, article eabd1705. doi:10.1126/sciadv.abd1705

Sobkowicz, Pawel (2009) ‘Modelling Opinion Formation with Physics Tools: Call for Closer Link with Reality’, Journal of Artificial Societies and Social Simulation, 12(1), January, article 11, <http://jasss.soc.surrey.ac.uk/12/1/11.html>.

Urbig, Diemo, Lorenz, Jan and Herzberg, Heiko (2008) ‘Opinion Dynamics: The Effect of the Number of Peers Met at Once’, Journal of Artificial Societies and Social Simulation, 11(2), March, article 4, <http://jasss.soc.surrey.ac.uk/11/2/4.html>.


Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models


 

Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM

By Edmund Chattoe-Brown


Today we have naming of parts. Yesterday,
We had daily cleaning. And tomorrow morning,
We shall have what to do after firing. But to-day,
Today we have naming of parts. Japonica
Glistens like coral in all of the neighbouring gardens,
And today we have naming of parts.
(Naming of Parts, Henry Reed, 1942)

It is not difficult to establish by casual reading that there are almost as many ways of using crucial terms like calibration and validation in ABM as there are actual instances of their use. This creates several damaging problems for scientific progress in the field. Firstly, when two different researchers both say they “validated” their ABMs they may mean different specific scientific activities. This makes it hard for readers to evaluate research generally, particularly if researchers assume that it is obvious what their terms mean (rather than explaining explicitly what they did in their analysis). Secondly, based on this, each researcher may feel that the other has not really validated their ABM but has instead done something to which a different name should more properly be given. This compounds the possible confusion in debate. Thirdly, there is a danger that researchers may rhetorically favour (perhaps unconsciously) uses that, for example, make their research sound more robustly empirical than it actually is. For example, validation is sometimes used to mean consistency with stylised facts (rather than, say, correspondence with a specific time series according to some formal measure). But we often have no way of telling what the status of the presented stylised facts is. Are they an effective summary of what is known in a field? Are they the facts on which most researchers agree or for which the available data presents the clearest picture? (Less reputably, can readers be confident that they were not selected for presentation because of their correspondence?) Fourthly, because these terms are used differently by different researchers it is possible that valuable scientific activities that “should” have agreed labels will “slip down the terminological cracks” (either for the individual or for the ABM community generally). Apart from clear labels avoiding confusion for others, they may help to avoid confusion for you too!

But apart from these problems (and there may be others but these are not the main thrust of my argument here) there is also a potential impasse. There simply doesn’t seem to be any value in arguing about what the “correct” meaning of validation (for example) should be. Because these are merely labels there is no objective way to resolve this issue. Further, even if we undertook to agree the terminology collectively, each individual would tend to argue for their own interpretation without solid grounds (because there are none to be had) and any collective decision would probably therefore be unenforceable. If we decide to invent arbitrary new terminology from scratch we not only run the risk of adding to the existing confusion of terms (rather than reducing it) but it is also quite likely that everyone will find the new terms unhelpful.

Unfortunately, however, we probably cannot do without labels for these scientific activities involved in quality controlling ABMs. If we had to describe everything we did without any technical shorthand, presenting research might well become impossibly unwieldy.

My proposed solution is therefore to invent terms from scratch (so we don’t end up arguing about our different customary usages to no purpose) but to do so on the basis of actual scientific practices reported in published research. For example, we might call the comparison of corresponding real and simulated data (which at least has the endorsement of the much used Gilbert and Troitzsch 2005 – see pp. 15-19 – to be referred to as validation) CORAS – Comparison Of Real And Simulated. Similarly, assigning values to parameters given the assumptions of model “structures” might be called PANV – Parameters Assigned Numerical Values.

It is very important to be clear what the intention is here. Naming cannot solve scientific problems or disagreements. (Indeed, failure to grasp this may well be why our terminology is currently so muddled as people try to get their different positions through “on the nod”.) For example, if we do not believe that correspondence with stylised facts and comparison measures on time series have equivalent scientific status then we will have to agree distinct labels for them and have the debate about their respective value separately. Perhaps the former could be called COSF – Comparison Of Stylised Facts. But it seems plainly easier to describe specific scientific activities accurately and then find labels for them than to have to wade through the existing marsh of ambiguous terminology and try to extract the associated science. An example of a practice which does not seem to have even one generally agreed label (and therefore seems to be neglected in ABM as a practice) is JAMS – Justifying A Model Structure. (Why are your agents adaptive rather than habitual or rational? Why do they mix randomly rather than in social networks?)

Obviously, there still needs to be community agreement for such a convention to be useful (and this may need to be backed institutionally for example by reviewing requirements). But the logic of the approach avoids several existing problems. Firstly, while the labels are useful shorthand, they are not arbitrary. Each can be traced back to a clearly definable scientific practice. Secondly, this approach steers a course between the Scylla of fruitless arguments from current muddled usage and the Charybdis of a novel set of terminology that is equally unhelpful to everybody. (Even if people cannot agree on labels, they knew how they built and evaluated their ABMs so they can choose – or create – new labels accordingly.) Thirdly, the proposed logic is extendable. As we clarify our thinking, we can use it to label (or improve the labels of) any current set of scientific practices. We will do not have to worry that we will run out of plausible words in everyday usage.

Below I suggest some more scientific practices and possible terms for them. (You will see that I have also tried to make the terms as pronounceable and distinct as possible.)

Practice Term
Checking the results of an ABM by building another.[1] CAMWA (Checking A Model With Another).
Checking ABM code behaves as intended (for example by debugging procedures, destructive testing using extreme values and so on). TAMAD (Testing A Model Against Description).
Justifying the structure of the environment in which agents act. JEM (Justifying the Environment of a Model): This is again a process that may pass unnoticed in ABM typically. For example, by assuming that agents only consider ethnic composition, the Schelling Model (Schelling 1969, 1971) does not “allow” locations to be desirable because, for example, they are near good schools. This contradicts what was known empirically well before (see, for example, Rossi 1955) and it isn’t clear whether simply saying that your interest is in an “abstract” model can justify this level of empirical neglect.
Finding out what effect parameter values have on ABM behaviour. EVOPE (Exploring Value Of Parameter Effects).
Exploring the sensitivity of an ABM to structural assumptions not justified empirically (see Chattoe-Brown 2021). ESOSA (Exploring the Sensitivity Of Structural Assumptions).

Clearly this list is incomplete but I think it would be more effective if characterising the scientific practices in existing ABM and naming them distinctively was a collective enterprise.

Acknowledgements

This research is funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5 (PI: Professor Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University: https://gtr.ukri.org/projects?ref=ES%2FS015159%2F1).

Notes

[1] It is likely that we will have to invent terms for subcategories of practices which differ in their aims or warranted conclusions. For example, rerunning the code of the original author (CAMWOC – Checking A Model With Original Code), building a new ABM from a formal description like ODD (CAMUS – Checking A Model Using Specification) and building a new ABM from the published description (CAMAP – Checking A Model As Published, see Chattoe-Brown et al. 2021).

References

Chattoe-Brown, Edmund (2021) ‘Why Questions Like “Do Networks Matter?” Matter to Methodology: How Agent-Based Modelling Makes It Possible to Answer Them’, International Journal of Social Research Methodology, 24(4), pp. 429-442. doi:10.1080/13645579.2020.1801602

Chattoe-Brown, Edmund, Gilbert, Nigel, Robertson, Duncan A. and Watts Christopher (2021) ‘Reproduction as a Means of Evaluating Policy Models: A Case Study of a COVID-19 Simulation’, medRXiv, 23 February. doi:10.1101/2021.01.29.21250743

Gilbert, Nigel and Troitzsch, Klaus G. (2005) Simulation for the Social Scientist, second edition (Maidenhead: Open University Press).

Rossi, Peter H. (1955) Why Families Move: A Study in the Social Psychology of Urban Residential Mobility (Glencoe, IL, Free Press).

Schelling, Thomas C. (1969) ‘Models of Segregation’, American Economic Review, 59(2), May, pp. 488-493. (available at https://www.jstor.org/stable/1823701)


Chattoe-Brown, E. (2022) Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM. Review of Artificial Societies and Social Simulation, 11th January 2022. https://rofasss.org/2022/01/11/naming-of-parts/


 

Challenges and opportunities in expanding ABM to other fields: the example of psychology

By Dino Carpentras

Centre for Social Issues Research, Department of Psychology, University of Limerick

The loop of isolation

One of the problems discussed during the last public meeting of the European Social Simulation Association (ESSA) at the Social Simulation Conference 2021 was the problem of reaching different communities outside the ABM one. This is a serious problem as we are risking getting trapped in a vicious cycle of isolation.

The cycle can be explained as follows. (a) Many fields are not familiar with ABM methods and standards. This results in the fact that (b) both reviewers and editors will struggle in understanding and evaluating the quality of an ABM paper. In general, this translates in a higher rejection rate and way longer time before publication. As results (c) fewer researchers in ABM will be willing to send their work to other communities, and, in general, fewer ABM works will be published in journals of other communities. Fewer articles using ABM makes it such that (d) fewer people would be aware of ABM, understand their methods and standards and even consider it an established research method.

Another point to consider is that, as time passes, each field evolves and develops new standards and procedures. Unfortunately, if two fields are not enough aware of each other, the new procedures will appear even more alien to members of the other community reinforcing the previously discussed cycle. A schematic of this is offered in figure 1.

fig1_v2

Figure 1: Vicious cycle of isolation

The challenge

Of course, a “brute force” solution would be to keep sending articles to journals in different fields until they get published. However, this would be extremely expensive in terms of time, and probably most researchers will not be happy of following this path.

A more elaborated solution could be framed as “progressively getting to know each other.” This would consist in modellers getting more familiar with the target community and vice versa. In this way, people from ABM would be able to better understand the jargon, the assumptions and even what is interesting enough to be the main result of a paper in a specific discipline. This would make it easier for members of our community to communicate research results using the language and methods familiar to the other field.

At the same time, researchers in the other field could slowly integrate ABM into their work, showing the potential of ABM and making it appear less alien to their peers. All of this would revert the previously discussed vicious cycle, by producing a virtuous one which would bring the two fields closer and closer.

Unfortunately, such goal cannot be obtained overnight, as it probably will require several events, collaborations, publications and probably several years (or even decades!). However, as result, our field would be familiar to and recognized by multiple other fields, enormously increasing the scientific impact of our research as well as the number of people working in ABM.

In this short communication, I would like to, firstly, highlight the importance and the challenges of reaching out other fields and, secondly, show a practical example with the field of psychology. I have chosen this field for no particular reason, besides the fact that I am currently working in the department of psychology. This gave me the opportunity of interacting with several researchers in this field.

In the next sections, I will summarize the main points of several informal discussions with these researchers. Specifically, I will try to highlight what they reported to be promising or interesting in ABM and also what felt alien or problematic to them.

Let me also stress that this does not want to be a complete overview, nor it should be thought as a summary of “what every psychologist think about ABM.” Instead, this is simply a summary of the discussions I had so far. What I hope, is that this will be at least a little useful to our community for building better connections with other fields.

The elephant in the room

Before moving to the list of comments on ABM I have collected, I want to address one point which appeared almost every time I discussed ABM with psychologists. Actually, it appeared almost every time I discuss ABM with people outside our field. This is the problem of experiments and validation.

I know there was recently a massive discussion on the SimSoc mailing list on opinion dynamics and validation, and this discussion will probably continue. Therefore, I am not going to discuss if all models should be tested, if a validated model should be considered superior, etc. Indeed, I do not want to discuss at all if validation should be considered important within our community. Instead, I want to discuss how important this is while interacting with other communities.

Indeed, many other fields give empirical data and validation a key role, having even developed different methods to test the quality of a hypothesis or a model when comparing it to empirical data (e.g. calculation of p-value, Krishnaiah 1980). Also, I repeatedly experienced disappointment or even mockery when I explained to non-ABM people that the model I was explaining them about was not empirically validated (e.g. the Deffuant model of opinion dynamics). In one single case, I even had a person laughing at me for this.

Unfortunately, many people which are not familiar with ABM end up considering it almost like a “nice exercise,” and even “not a real science.” This could be extremely dangerous for our field. Indeed, if multiple researchers will start thinking of ABM as a lesser science, communication with other fields – as well as obtaining funding for research – would get exponentially harder for our community.

Also, please, let me stress again to not “confuse the message with the messenger.” Here, I am not claiming that an unvalidated model should be considered inferior, or anything like that. What I am saying is that many people outside our field think in a similar fashion and this may eventually turn into a way bigger problem for us.

I will further discuss this point in the conclusion section, however, I will not claim that we should get rid of “pure models,” or that every model should be validated. What I will claim is that we should promote more empirical works as they will allow us to interact more easily with other fields.

Further points

In this section, I have collected (in no particular order) different comments and suggestions I have received from psychologist on the topic ABM. All of them had at least some experience of working side to side with a researcher developing ABMs.

Also in this case, please, remember that this are not my claims, but feedbacks I received. Furthermore, they should not be analysed as “what ABM is,” but more as “how ABM may look like to people in another field.”

  1. Some psychologists showed interest in the possibility of having loops in ABMs, which allow for relationships which go beyond simple cause and effect. Indeed, several models in psychology are structured in the form of “parameter X influences parameter Y” (and Y cannot influence X, forming a loop). While this approach is very common in psychology, many researchers are not satisfied with it, making ABMs are a very good opportunity for the development of more realistic models.
  2. Some psychologists said that at first impact, ABM looks very interesting. However, the extensive use of equations can confuse or even scare people who are not very used to them.
  3. Some praised Schelling’s model (Schelling 1971). Especially the approach of developing a hypothesis and then using an ABM to falsify it.
  4. Some criticized that often is not clear what an ABM should be used for or what such a model “is telling us.”
  5. Similarly, the use of models with a big number of parameters was criticized as “[these models] can eventually produce any result.”
  6. Another confusion that appeared multiple times was that often it is not clear if the model should be analysed and interpreted at the individual level (e.g. agents which start from state A often end up in state B) or at the more global level (e.g. distribution A results in distribution B).
  7. Another major complaint was that psychological measures are nominal or ordinal, while many models suppose interval-like variables.
  8. Another criticism was based on the fact that often agents behave all in the same way without including personal differences.
  9. In psychology there is a lot of attention on the sample size and if this is big enough to produce significant results. Some stressed that in many ABM works it is often not clear if the sample size (i.e. the number of agents) is sufficient for supporting the analysis.

Conclusion

I would like to stress again that these comments are not supposed to represent the thoughts of every psychologist, nor that I am suggesting that all the ABM literature should adapt to them or that they are always correct. For example, to my personal opinion, point 5 and 8 are pushing towards opposite directions; one aiming at simpler models and the other pushing towards complexity. Similarly, I do not think we should decrease the number of equations in our works to meet point 2. However, I think we should consider these feedbacks when planning interactions with the psychology community.

As mentioned before, a crucial role when interacting with other communities is played by experiments and validations. Even points 6 and especially points 7 and 9 suggest how member of this community often try to look for 1-to-1 relationships between agents of simulations and people in the real world.

fig2

Figure 2: (left) Empirical ABM acting as a bridge between theoretical ABM and other research fields. (Right) as the relationship between ABM and the other field matures, people become familiar with ABM standards and a direct link to theoretical ABM can be established.

As suggested by someone during the already mentioned discussion in the SimSoc mailing list, this could be solved by introducing a new figure (or, equivalently, a new research field) dedicated to empirical work in ABM. Following this solution, theoretical modellers could keep developing models without having to worry about validation. This would be similar to the work carried out by theoretical researchers in physics. At the same time, we would have also a stream of research dedicated to “experimental ABM.” People working on this topic will further explore the connection between models and the empirical world through experiments and validation processes. Of course, the two should not be mutually exclusive, as a researcher (or a piece of research) may still fall in both categories. However, having this distinction may help in giving more space to empirical work.

I believe that the role of experimental ABM could be crucial for developing good interactions between ABM and other communities. Indeed, this type of research could be accepted much more easily by other communities, producing better interactions with ABM. Especially, mentioning experiments and validation, could strongly decrease the initial mistrust that many people show when discussing ABM. Furthermore, as ABM develops stronger connections with another field, and our methods and standards become more familiar, we would probably also observe more people from the other community which would start looking into more theoretical ABM approaches and what-if scenarios (see fig 2).

References

Krishnaiah, P. R. (Ed.). (1980). A Hand Book of Statistics (Vol. 1). Motilal Banarsidass Publishe.

Schelling, T. C. (1971). Dynamic models of segregation. Journal of Mathematical Sociology, 1(2), 143-186.

Edmonds, B. and Moss, S. (2005) From KISS to KIDS – an ‘anti-simplistic’ modelling approach. In P. Davidsson et al. (Eds.): Multi Agent Based Simulation 2004. Springer, Lecture Notes in Artificial Intelligence, 3415:130–144.


Carpentras, D. (2020) Challenges and opportunities in expanding ABM to other fields: the example of psychology. Review of Artificial Societies and Social Simulation, 20th December 2021. https://rofasss.org/2021/12/20/challenges/


 

Benefits of Open Research in Social Simulation: An Early-Career Researcher’s Perspective

By Hyesop Shin

Research Associate at the School of Geographical and Earth Sciences, University of Glasgow, UK

In March 2017, in my first year of PhD, I attended a talk at the Microsoft Research Lab in Cambridge UK. It was about the importance of reproducibility and replicability in science. Inspired by the talk, I redesigned my research beyond my word processer and hard disk to open repositories and social media. Through my experience, there have been some challenges to learn other people’s work and replicate them to my project, but I found it more beneficial to share my problem and solutions for other people who may have encountered the same problem.

Having spoken to many early career researchers (ECRs) regarding the need for open science, specifically whether sharing codes is essential, the consensus was that it was not an essential component for their degree. A few answered that they were too embarrassed to share their codes online because their codes were not well coded enough. I somewhat empathised with their opinions, but at the same time, would insist that open research can gain more benefits than shame.

I wrote this short piece to openly discuss the benefits of conducting open research and suggest some points that ECRs should keep in mind. During the writing, there are some screenshots taken from my PhD work (Shin, 2021). I conclude my writing by accommodating personal experiences or other thoughts that might give more insights to the audience.

Benefits of Aiming an Open Project

I argue here that being transparent and honest about your model development strengthens the credibility of the research. In doing so, my thesis shared the original data, the scripts with annotations that are downloadable and executable, and wiki pages to summarise the outcomes and interpretations (see Figure 1 for examples). This evidence enables scholars and technicians to visit the repository if they are interested in the source codes or outcomes. Also, people can comment if any errors or bugs are identified, or the model is not executing on their machine or may suggest alternative ways to tackle the same problem. Even during the development, many developers share their work via online repositories (e.g. Github, Gitlab), and social media to ask for advice. Agent-based models are mostly uploaded on CoMSeS.net (previously named OpenABM). All of this can improve the quality of research.

Picture 1

Figure 1 A screenshot of a Github page showing how open platforms can help other people to understand the outcomes step by step

More practically, one can learn new ideas by helping each other. If there was a technical issue that can’t be solved, the problem should not be kept hidden, but rather be opened and solved together with experts online and offline. Figure 2 is a pragmatic example of posing questions to a wide range of developers on Stackoverflow – an online community of programmers to share and build codes. Providing my NetLogo codes, I asked how to send an agent group from one location to the other. The anonymous person, whose ID was JenB, kindly responded to me with a new set of codes, which helped me structure the codes more effectively.

Picture 2

Figure 2 Raising a question about sending agents from one location to another in NetLogo

Another example was about the errors I had encountered whilst I was running NetLogo with an R package “nlrx” (Salecker et al., 2019). Here, R was used as a compiler to submit iterative NetLogo jobs on the HPC (High Performance Computing) cluster to improve the execution speed. However, much to my surprise, I received error messages due to early terminations of failed HPC jobs. Not knowing what to do, I posed a question to the developer of the package (see Figure 3) and luckily got a response that the R ecosystem stores all the assigned objects in the RAM, but even with gigabytes of RAM, it struggles to write 96,822 patches over 8764 ticks on a spreadsheet.

Stackoverflow has kindly informed that NetLogo has a memory ceiling of 1GB[i] and keeps each run in the memory before it shuts down. Thus, if the model is huge and requires several iterations, then it is more likely that the execution speed will decrease after a few iterations. Before this information was seen, it was not understood why the model took 1 hour 20 minutes to finish the first run but struggled to maintain that speed on the twentieth run. Hence, sharing technical obstacles that occur in the middle of research can save a lot of time even for those who are contemplating similar research.

Picture 3

Figure 3 Comments posted on an online repository regarding the memory issue that NetLogo and R encountered

The Future for Open Research

For future quantitative studies in social simulation, this paper suggests students and researchers in their early careers should acclimatise themselves to using open-source platforms to conduct sustainable research. As clarity, conciseness, and coherence are featured as the important C’s for writing skills, good programming should take into consideration the following points.

First is clarity and conciseness (C&C). Here, clarity means that the scripts should be neatly documented. The computer does not know whether the codes are dirty or neat, it only cares whether it is syntactically correct, but it matters when other people attempt to understand the task. If the outcome produces the same results, it is always better to write clearer and simpler codes for other people and future upgrades. Thus, researchers should refer to other people’s work and learn how to code effectively. Another way to maintain clarity in coding is to keep descriptive and distinctive names for new variables. This statement might seem contradictory to the conciseness issue, but this is important as one of the common mistakes users make is to assign variables with abstract names such as LP1, LP2…LP10, which seems clear and concise for the model builder, but is even harder for the others when reviewing the code. The famous quote from Einstein, “Everything should be made as simple as possible, but not simpler.” is the appropriate phrase that model builders should always keep in mind. Hence, instead of coding LP9, names such as LandPriceIncreaseRate2009 (camel cases) or landprice_incrate_2009 (snake cases) can be more effective for the reviewers to understand the model.

Second is reproducibility and replicability (R&R). To be reproducible and replicable, initially, no errors should occur when others execute the script, and possible errors or bugs should be reported. It will also be more useful to document the libraries and the dependencies required. This is quite important as different OSs (operating systems) have different behaviours to install packages. For instance, the sf package in R has slightly different ways to install the package between OSs where Windows and MacOSX can be installed from the binary package while Linux needs to separately install GDAL (to read and write vector and raster data), Proj (which deals with projection), and GEOS (which provides geospatial functions) prior to the package installation. Finally, it would be very helpful if unit testing is included in the model. While R and Python provide splendid examples in their vignettes, NetLogo remains to offer the library models but goes no further than that. Offering unit testing examples can give a better understanding when the whole model is too complicated for others to comprehend. It can also give the impression that the modeller has full control of the model because without the unit test the verification process becomes error-prone. The good news is that NetLogo has most recently released the Beginner’s Interactive Dictionary with friendly explanations with videos and code examples[ii].

Third is to maintain version control. In terms of sustainability, researchers should be aware of software maintenance. Much programming software relies on libraries and packages that are built on a particular version. If the software is upgraded and no longer accepts the previous versions, then the package developers need to keep updating to run it on a new version. For example, NetLogo 6.0 experienced a significant change compared to versions 5.X. The biggest change was the replacement of tasks[iii] by anonymous procedures (Wilensky, 1999). This means that tasks are no longer primitives but are converted to arrow syntax. For example, if there is a list of [a b c], the previous task is asked to add the first, second, and third element as foreach [a b c] [ ?a+?b+?c ], while the new version does the same job as foreach [a b c][ add_all → a + b + c]. If the models haven’t converted to a new version it can be viewable as a read-only model but can’t be executed. Other geospatial packages in R such as rgdal and sf, have also struggled whenever a major update was made on their own packages or on the R version itself due to a lot of dependencies. Even ArcGIS, a UI (User Interface) software, had issues when they upgraded it from version 9.3 to 10. The projects that were coded under the VBA script in 9.3 were broken because it was not recognised as a correct function in the new version based on Python. This is also another example that backward compatibility and deprecation mechanisms are important.

Lastly, for more advanced users, it is also recommended to use a collaborative platform that executes every result from the codes with the exact version. One of the platforms is Codeocean. The Nature research team has recently chosen the platform to peer-review the codes (Perkel, 2019). The Nature editors and peer-reviewers strongly believed that coding has become a norm across many disciplines, and hence have asserted that the model process including the quality of data, conciseness, reproducibility, and documentation of the model should be placed as a requirement. Although the training procedure can be difficult at first, it will lead researchers to conduct themselves with more responsibility.

Looking for Opinions

With the advent of the era of big data and data science where people collaborate online and the ‘sharing is caring’ atmosphere has become a norm (Arribas-Bel et al., 2021; Lovelace, 2021), I insist that open research should no longer be an option. However, one may argue that although open research is by far an excellent model that can benefit many of today’s projects, there are certain types of risks that might concern ECRs such as intellectual property issues, code quality and technical security. Thus, if you have had different opinions regarding this issue, or simply favour adding your experiences during your PhD in social simulation, please add your thoughts via a thread.

Notes

[i] http://ccl.northwestern.edu/netlogo/docs/faq.html#how-big-can-my-model-be-how-many-turtles-patches-procedures-buttons-and-so-on-can-my-model-contain

[ii] https://ccl.northwestern.edu/netlogo/bind/

[iii] Tasks can be equations, x + y, or a set of lists [1 2 3 4 5]

References

Arribas-Bel, D., Alvanides, S., Batty, M., Crooks, A., See, L., & Wolf, L. (2021). Urban data/code: A new EP-B section. Environment and Planning B: Urban Analytics and City Science, 23998083211059670. https://doi.org/10.1177/23998083211059670

Lovelace, R. (2021). Open source tools for geographic analysis in transport planning. Journal of Geographical Systems, 23(4), 547–578. https://doi.org/10.1007/s10109-020-00342-2

Perkel, J. M. (2019). Make code accessible with these cloud services. Nature, 575(7781), 247. https://doi.org/10.1038/d41586-019-03366-x

Salecker, J., Sciaini, M., Meyer, K. M., & Wiegand, K. (2019). The nlrx r package: A next-generation framework for reproducible NetLogo model analyses. Methods in Ecology and Evolution, 10(11), 1854–1863. https://doi.org/10.1111/2041-210X.13286

Shin, H. (2021). Assessing Health Vulnerability to Air Pollution in Seoul Using an Agent-Based Simulation. University of Cambridge. https://doi.org/https://doi.org/10.17863/CAM.65615

Wilensky, U. (1999). Netlogo. Northwestern University: Evanston, IL, USA. https://ccl.northwestern.edu/netlogo/


Shin, H. (2021) Benefits of Open Research in Social Simulation: An Early-Career Researcher’s Perspective. Review of Artificial Societies and Social Simulation, 24th Nov 2021. https://rofasss.org/2021/11/23/benefits-open-research/


 

Reply to Frank Dignum

By Edmund Chattoe-Brown

This is a reply to Frank Dignum’s reply (about Edmund Chattoe-Brown’s review of Frank’s book)

As my academic career continues, I have become more and more interested in the way that people justify their modelling choices, for example, almost every Agent-Based Modeller makes approving noises about validation (in the sense of comparing real and simulated data) but only a handful actually try to do it (Chattoe-Brown 2020). Thus I think two specific statements that Frank makes in his response should be considered carefully:

  1. … we do not claim that we have the best or only way of developing an Agent-Based Model (ABM) for crises.” Firstly, negative claims (“This is not a banana”) are not generally helpful in argument. Secondly, readers want to know (or should want to know) what is being claimed and, importantly, how they would decide if it is true “objectively”. Given how many models sprang up under COVID it is clear that what is described here cannot be the only way to do it but the question is how do we know you did it “better?” This was also my point about institutionalisation. For me, the big lesson from COVID was how much the automatic response of the ABM community seems to be to go in all directions and build yet more models in a tearing hurry rather than synthesise them, challenge them or test them empirically. I foresee a problem both with this response and our possible unwillingness to be self-aware about it. Governments will not want a million “interesting” models to choose from but one where they have externally checkable reasons to trust it and that involves us changing our mindset (to be more like climate modellers for example, Bithell & Edmonds 2020). For example, colleagues and I developed a comparison methodology that allowed for the practical difficulties of direct replication (Chattoe-Brown et al. 2021).
  2. The second quotation which amplifies this point is: “But we do think it is an extensive foundation from which others can start, either picking up some bits and pieces, deviating from it in specific ways or extending it in specific ways.” Again, here one has to ask the right question for progress in modelling. On what scientific grounds should people do this? On what grounds should someone reuse this model rather than start their own? Why isn’t the Dignum et al. model built on another “market leader” to set a good example? (My point about programming languages was purely practical not scientific. Frank is right that the model is no less valid because the programming language was changed but a version that is now unsupported seems less useful as a basis for the kind of further development advocated here.)

I am not totally sure I have understood Frank’s point about data so I don’t want to press it but my concern was that, generally, the book did not seem to “tap into” relevant empirical research (and this is a wider problem that models mostly talk about other models). It is true that parameter values can be adjusted arbitrarily in sensitivity analysis but that does not get us any closer to empirically justified parameter values (which would then allow us to attempt validation by the “generative methodology”). Surely it is better to build a model that says something about the data that exists (however imperfect or approximate) than to rely on future data collection or educated guesses. I don’t really have the space to enumerate the times the book said “we did this for simplicity”, “we assumed that” etc. but the cumulative effect is quite noticeable. Again, we need to be aware of the models which use real data in whatever aspects and “take forward” those inputs so they become modelling standards. This has to be a collective and not an individualistic enterprise.

References

Bithell, M. and Edmonds, B. (2020) The Systematic Comparison of Agent-Based Policy Models – It’s time we got our act together!. Review of Artificial Societies and Social Simulation, 11th May 2021. https://rofasss.org/2021/05/11/SystComp/

Chattoe-Brown, E. (2020) A Bibliography of ABM Research Explicitly Comparing Real and Simulated Data for Validation. Review of Artificial Societies and Social Simulation, 12th June 2020. https://rofasss.org/2020/06/12/abm-validation-bib/

Chattoe-Brown, E. (2021) A review of “Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis”. Journal of Artificial Society and Social Simulation. 24(4). https://www.jasss.org/24/4/reviews/1.html

Chattoe-Brown, E., Gilbert, N., Robertson, D. A., & Watts, C. J. (2021). Reproduction as a Means of Evaluating Policy Models: A Case Study of a COVID-19 Simulation. medRxiv 2021.01.29.21250743; DOI: https://doi.org/10.1101/2021.01.29.21250743

Dignum, F. (2020) Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”. Review of Artificial Societies and Social Simulation, 4th Nov 2021. https://rofasss.org/2021/11/04/dignum-review-response/

Dignum, F. (Ed.) (2021) Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis. Springer. DOI:10.1007/978-3-030-76397-8


Chattoe-Brown, E. (2021) Reply to Frank Dignum. Review of Artificial Societies and Social Simulation, 10th November 2021. https://rofasss.org/2021/11/10/reply-to-dignum/


 

Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”

By Frank Dignum

This is a reply to a review in JASSS (Chattoe-Brown 2021) of (Dignum 2021).

Before responding to some of the specific concerns of Edmund I would like to thank him for the thorough review. I am especially happy with his conclusion that the book is solid enough to make it a valuable contribution to scientific progress in modelling crises. That was the main aim of the book and it seems that is achieved. I want to reiterate what we already remarked in the book; we do not claim that we have the best or only way of developing an Agent-Based Model (ABM) for crises. Nor do we claim that our simulations were without limitations. But we do think it is an extensive foundation from which others can start, either picking up some bits and pieces, deviating from it in specific ways or extending it in specific ways.

The concerns that are expressed by Edmund are certainly valid. I agree with some of them, but will nuance some others. First of all the concern about the fact that we seem to abandon the NetLogo implementation and move to Repast. This fact does not make the ABM itself any less valid! In itself it is also an important finding. It is not possible to scale such a complex model in NetLogo beyond around two thousand agents. This is not just a limitation of our particular implementation, but a more general limitation of the platform. It leads to the important challenge to get more computer scientists involved to develop platforms for social simulations that both support the modelers adequately and provide efficient and scalable implementations.

That the sheer size of the model and the results make it difficult to trace back the importance and validity of every factor on the results is completely true. We have tried our best to highlight the most important aspects every time. But, this leaves questions as to whether we make the right selection of highlighted aspects. As an illustration to this, we have been busy for two months to justify our results of the simulations of the effectiveness of the track and tracing apps. We basically concluded that we need much better integrated analysis tools in the simulation platform. NetLogo is geared towards creating one simulation scenario, running the simulation and analyzing the results based on a few parameters. This is no longer sufficient when we have a model with which we can create many scenarios and have many parameters that influence a result. We used R now to interpret the flood of data that was produced with every scenario. But, R is not really the most user friendly tool and also not specifically meant for analyzing the data from social simulations.

Let me jump to the third concern of Edmund and link it to the analysis of the results as well. While we tried to justify the results of our simulation on the effectiveness of the track and tracing app we compared our simulation with an epidemiological based model. This is described in chapter 12 of the book. Here we encountered the difference in assumed number of contacts per day a person has with other persons. One can take the results, as quoted by Edmund as well, of 8 or 13 from empirical work and use them in the model. However, the dispute is not about the number of contacts a person has per day, but what counts as a contact! For the COVID-19 simulations standing next to a person in the queue in a supermarket for five minutes can count as a contact, while such a contact is not a meaningful contact in the cited literature. Thus, we see that what we take as empirically validated numbers might not at all be the right ones for our purpose. We have tried to justify all the values of parameters and outcomes in the context for which the simulations were created. We have also done quite some sensitivity analyses, which we did not all report on just to keep the volume of the book to a reasonable size. Although we think we did a proper job in justifying all results, that does not mean that one can have different opinions on the value that some parameters should have. It would be very good to check the influence on the results of changes in these parameters. This would also progress scientific insights in the usefulness of complex models like the one we made!

I really think that an ABM crisis response should be institutional. That does not mean that one institution determines the best ABM, but rather that the ABM that is put forward by that institution is the result of a continuous debate among scientists working on ABM’s for that type of crisis. For us, one of the more important outcomes of the ASSOCC project is that we really need much better tools to support the types of simulations that are needed for a crisis situation. However, it is very difficult to develop these tools as a single group. A lot of the effort needed is not publishable and thus not valued in an academic environment. I really think that the efforts that have been put in platforms such as NetLogo and Repast are laudable. They have been made possible by some generous grants and institutional support. We argue that this continuous support is also needed in order to be well equipped for a next crisis. But we do not argue that an institution would by definition have the last word in which is the best ABM. In an ideal case it would accumulate all academic efforts as is done in the climate models, but even more restricted models would still be better than just having a thousand individuals all claiming to have a useable ABM while governments have to react quickly to a crisis.

The final concern of Edmund is about the empirical scale of our simulations. This is completely true! Given the scale and details of what we can incorporate we can only simulate some phenomena and certainly not everything around the COVID-19 crisis. We tried to be clear about this limitation. We had discussions about the Unity interface concerning this as well. It is in principle not very difficult to show people walking in the street, taking a car or a bus, etc. However, we decided to show a more abstract representation just to make clear that our model is not a complete model of a small town functioning in all aspects. We have very carefully chosen which scenarios we can realistically simulate and give some insights in reality from. Maybe we should also have discussed more explicitly all the scenarios that we did not run with the reasons why they would be difficult or unrealistic in our ABM. One never likes to discuss all the limitations of one’s labor, but it definitely can be very insightful. I have made up for this a little bit by submitting an to a special issue on predictions with ABM in which I explain in more detail, which should be the considerations to use a particular ABM to try to predict some state of affairs. Anyone interested to learn more about this can contact me.

To conclude this response to the review, I again express my gratitude for the good and thorough work done. The concerns that were raised are all very valuable to concern. What I tried to do in this response is to highlight that these concerns should be taken as a call to arms to put effort in social simulation platforms that give better support for creating simulations for a crisis.

References

Dignum, F. (Ed.) (2021) Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis. Springer. DOI:10.1007/978-3-030-76397-8

Chattoe-Brown, E. (2021) A review of “Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis”. Journal of Artificial Society and Social Simulation. 24(4). https://www.jasss.org/24/4/reviews/1.html


Dignum, F. (2020) Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”. Review of Artificial Societies and Social Simulation, 4th Nov 2021. https://rofasss.org/2021/11/04/dignum-review-response/


 

The Systematic Comparison of Agent-Based Policy Models – It’s time we got our act together!

By Mike Bithell and Bruce Edmonds

Model Intercomparison

The recent Covid crisis has led to a surge of new model development and a renewed interest in the use of models as policy tools. While this is in some senses welcome, the sudden appearance of many new models presents a problem in terms of their assessment, the appropriateness of their application and reconciling any differences in outcome. Even if they appear similar, their underlying assumptions may differ, their initial data might not be the same, policy options may be applied in different ways, stochastic effects explored to a varying extent, and model outputs presented in any number of different forms. As a result, it can be unclear what aspects of variations in output between models are results of mechanistic, parameter or data differences. Any comparison between models is made tricky by differences in experimental design and selection of output measures.

If we wish to do better, we suggest that a more formal approach to making comparisons between models would be helpful. However, it appears that this is not commonly undertaken most fields in a systematic and persistent way, except for the field of climate change, and closely related fields such as pollution transport or economic impact modelling (although efforts are underway to extend such systematic comparison to ecosystem models –  Wei et al., 2014, Tittensor et al., 2018⁠). Examining the way in which this is done for climate models may therefore prove instructive.

Model Intercomparison Projects (MIP) in the Climate Community

Formal intercomparison of atmospheric models goes back at least to 1989 (Gates et al., 1999)⁠ with the first atmospheric model inter-comparison project (AMIP), initiated by the World Climate Research Programme. By 1999 this had contributions from all significant atmospheric modelling groups, providing standardised time-series of over 30 model variables for one particular historical decade of simulation, with a standard experimental setup. Comparisons of model mean values with available data helped to reveal overall model strengths and weaknesses: no single model was best at simulation of all aspects of the atmosphere, with accuracy varying greatly between simulations. The model outputs also formed a reference base for further inter-comparison experiments including targets for model improvement and reduction of systematic errors, as well as a starting point for improved experimental design, software and data management standards and protocols for communication and model intercomparison. This led to AMIPII and, subsequently, to a series of Climate model inter-comparison projects (CMIP) beginning with CMIP I in 1996. The latest iteration (CMIP 6) is a collection of 23 separate model intercomparison experiments covering atmosphere, ocean, land surface, geo-engineering, and the paleoclimate. This collection is aimed at the upcoming 2021 IPCC process (AR6). Participating projects go through an endorsement process for inclusion, (a process agreed with modelling groups), based on 10 criteria designed to ensure some degree of coherence between the various models – a further 18 MIPS are also listed as currently active (https://www.wcrp-climate.org/wgcm-cmip/wgcm-cmip6). Groups contribute to a central set of common experiments covering the period 1850 to the near-present. An overview of the whole process can be found in (Eyring et al., 2016).

The current structure includes a set of three overarching questions covering the dynamics of the earth system, model systematic biases and understanding possible future change under uncertainty. Individual MIPS may build on this to address one or more of a set of 7 “grand science challenges” associated with the climate. Modelling groups agree to provide outputs in a standard form, obtained from a specified set of experiments under the same design, and to provide standardised documentation to go with their models. Originally (up to CMIP 5), outputs were then added to a central public repository for further analysis, however the output grew so large under CMIP6 that now the data is held dispersed over repositories maintained by separate groups.

Other Examples

Two further more recent examples of collective model  development may also be helpful to consider.

Firstly, an informal network collating models across more than 50 research groups has already been generated as a result of the COVID crisis –  the Covid Forecast Hub (https://covid19forecasthub.org). This is run by a small number of research groups collaborating with the US Centre for Disease Control and is strongly focussed on the epidemiology. Participants are encouraged to submit weekly forecasts, and these are integrated into a data repository and can be vizualized on the website – viewers can look at forward projections, along with associated confidence intervals and model evaluation scores, including those for an ensemble of all models. The focus on forecasts in this case arises out of the strong policy drivers for the current crisis, but the main point is that it is possible to immediately view measures of model performance and to compare the different model types: one clear message that rapidly becomes apparent is that many of the forward projections have 95% (and at some times, even 50%) confidence intervals for incident deaths that more than span the full range of the past historic data. The benefit of comparing many different models in this case is apparent, as many of the historic single-model projections diverge strongly from the data (and the models most in error are not consistently the same ones over time), although the ensemble mean tends to be better.

As a second example, one could consider the Psychological Science Accelerator (PSA: Moshontz et al 2018, https://psysciacc.org/). This is a collaborative network set up with the aim of addressing the “replication crisis” in psychology: many previously published results in psychology have proved problematic to replicate as a result of small or non-representative sampling or use of experimental designs that do not generalize well or have not been used consistently either within or across studies. The PSA seeks to ensure accumulation of reliable and generalizable evidence in psychological science, based on principles of inclusion, decentralization, openness, transparency and rigour. The existence of this network has, for example, enabled the reinvestigation of previous  experiments but with much larger and less nationally biased samples (e.g. Jones et al 2021).

The Benefits of the Intercomparison Exercises and Collaborative Model Building

More specifically, long-term intercomparison projects help to do the following.

  • Build on past effort. Rather than modellers re-inventing the wheel (or building a new framework) with each new model project, libraries of well-tested and documented models, with data archives, including code and experimental design, would allow researchers to more efficiently work on new problems, building on previous coding effort
  • Aid replication. Focussed long term intercomparison projects centred on model results with consistent standardised data formats would allow new versions of code to be quickly tested against historical archives to check whether expected results could be recovered and where differences might arise, particularly if different modelling languages were being used
  • Help to formalize. While informal code archives can help to illustrate the methods or theoretical foundations of a model, intercomparison projects help to understand which kinds of formal model might be good for particular applications, and which can be expected to produce helpful results for given desired output measures
  • Build credibility. A continuously updated set of model implementations and assessment of their areas of competence and lack thereof (as compared with available datasets) would help to demonstrate the usefulness (or otherwise) of ABM as a way to represent social systems
  • Influence Policy (where appropriate). Formal international policy organisations such as the IPCC or the more recently formed IPBES are effective partly through an underpinning of well tested and consistently updated models. As yet it is difficult to see whether such a body would be appropriate or effective for social systems, as we lack the background of demonstrable accumulated and well tested model results.

Lessons for ABM?

What might we be able to learn from the above, if we attempted to use a similar process to compare ABM policy models?

In the first place, the projects started small and grew over time: it would not be necessary, for example, to cover all possible ABM applications at the outset. On the other hand, the latest CMIP iterations include a wide range of different types of model covering many different aspects of the earth system, so that the breadth of possible model types need not be seen as a barrier.

Secondly, the climate inter-comparison project has been persistent for some 30 years – over this time many models have come and gone, but the history of inter-comparisons allows for an overview of how well these models have performed over time – data from the original AMIP I models is still available on request, supporting assessments concerning  long-term model improvement.

Thirdly, although climate models are complex – implementing a variety of different mechanisms in different ways – they can still be compared by use of standardised outputs, and at least some (although not necessarily all) have been capable of direct comparison with empirical data.

Finally, an agreed experimental design and public archive for documentation and output that is stable over time is needed; this needs to be done via a collective agreement among the modelling groups involved so as to ensure a long-term buy-in from the community as a whole, so that there is a consistent basis for long-term model development, building on past experience.

The need for aligning or reproducing ABMs has long been recognised within the community (Axtell et al. 1996; Edmonds & Hales 2003), but on a one-one basis for verifying the specification of models against their implementation, although (Hales et al. 2003) discusses a range of possibilities. However, this is far from a situation where many different models of basically the same phenomena are systematically compared – this would be a larger scale collaboration lasting over a longer time span.

The community has already established a standardised form of documentation in the ODD protocol. Sharing of model code is also becoming routine, and can be easily achieved through COMSES, Github or similar. The sharing of data in a long-term archive may require more investigation. As a starting project COVID-19 provides an ideal opportunity for setting up such a model inter-comparison project – multiple groups already have running examples, and a shared set of outputs and experiments should be straightforward to agree on. This would potentially form a basis for forward looking experiments designed to assist with possible future pandemic problems, and a basis on which to build further features into the existing disease-focussed modelling, such as the effects of economic, social and psychological issues.

Additional Challenges for ABMs of Social Phenomena

Nobody supposes that modelling social phenomena is going to have the same set of challenges that climate change models face. Some of the differences include:

  • The availability of good data. Social science is bedevilled by a paucity of the right kind of data. Although an increasing amount of relevant data is being produced, there are commercial, ethical and data protection barriers to accessing it and the data rarely concerns the same set of actors or events.
  • The understanding of micro-level behaviour. Whilst the micro-level understanding of our atmosphere is very well established, those of the behaviour of the most important actors (humans) is not. However, it may be that better data might partially substitute for a generic behavioural model of decision-making.
  • Agreement upon the goals of modelling. Although there will always be considerable variation in terms of what is wanted from a model of any particular social phenomena, a common core of agreed objectives will help focus any comparison and give confidence via ensembles of projections. Although the MIPs and Covid Forecast Hub are focussed on prediction, it may be that empirical explanation may be more important in other areas.
  • The available resources. ABM projects tend to be add-ons to larger endeavours and based around short-term grant funding. The funding for big ABM projects is yet to be established, not having the equivalent of weather forecasting to piggy-back on.
  • Persistence of modelling teams/projects. ABM tends to be quite short-term with each project developing a new model for a new project. This has made it hard to keep good modelling teams together.
  • Deep uncertainty. Whilst the set of possible factors and processes involved in a climate change model are well established, which social mechanisms need to be involved in any model of any particular social phenomena is unknown. For this reason, there is deep disagreement about the assumptions to be made in such models, as well as sharp divergence in outcome due to changes brought about by a particular mechanism but not included in a model. Whilst uncertainty in known mechanisms can be quantified, assessing the impact of those due to such deep uncertainty is much harder.
  • The sensitivity of the political context. Even in the case of Climate Change, where the assumptions made are relatively well understood and done on objective bases, the modelling exercise and its outcomes can be politically contested. In other areas, where the representation of people’s behaviour might be key to model outcomes, this will need even more care (Adoha & Edmonds 2017).

However, some of these problems were solved in the case of Climate Change as a result of the CMIP exercises and the reports they ultimately resulted in. Over time the development of the models also allowed for a broadening and updating of modelling goals, starting from a relatively narrow initial set of experiments. Ensuring the persistence of individual modelling teams is easier in the context of an internationally recognised comparison project, because resources may be easier to obtain, and there is a consistent central focus. The modelling projects became longer-term as individual researchers could establish a career doing just climate change modelling and importance of the work increasingly recognised. An ABM modelling comparison project might help solve some of these problems as the importance of its work is established.

Towards an Initial Proposal

The topic chosen for this project should be something where there: (a) is enough public interest to justify the effort, (b) there are a number of models with a similar purpose in mind being developed.  At the current stage, this suggests dynamic models of COVID spread, but there are other possibilities, including: transport models (where people go and who they meet) or criminological models (where and when crimes happen).

Whichever ensemble of models is focussed upon, these models should be compared on a core of standard, with the same:

  • Start and end dates (but not necessarily the same temporal granularity)
  • Covering the same set of regions or cases
  • Using the same population data (though possibly enhanced with extra data and maybe scaled population sizes)
  • With the same initial conditions in terms of the population
  • Outputting a core of agreed measures (but maybe others as well)
  • Checked against their agreement against a core set of cases, with agreed data sets
  • Reported on in a standard format (though with a discussion section for further/other observations)
  • well documented and with code that is open access
  • Run a minimum of times with different random seeds

Any modeller/team that had a suitable model and was willing to adhere to the rules would be welcome to participate (commercial, government or academic) and these teams would collectively decide the rules, development and write any reports on the comparisons. Other interested stakeholder groups could be involved including professional/academic associations, NGOs and government departments but in a consultative role providing wider critique – it is important that the terms and reports from the exercise be independent or any particular interest or authority.

Conclusion

We call upon those who think ABMs have the potential to usefully inform policy decisions to work together, in order that the transparency and rigour of our modelling matches our ambition. Whilst model comparison exercises of the kind described are important for any simulation work, particular care needs to be taken when the outcomes can affect people’s lives.

References

Aodha, L. & Edmonds, B. (2017) Some pitfalls to beware when applying models to issues of policy relevance. In Edmonds, B. & Meyer, R. (eds.) Simulating Social Complexity – a handbook, 2nd edition. Springer, 801-822. (A version is at http://cfpm.org/discussionpapers/236)

Axtell, R., Axelrod, R., Epstein, J. M., & Cohen, M. D. (1996). Aligning simulation models: A case study and results. Computational & Mathematical Organization Theory, 1(2), 123-141. https://link.springer.com/article/10.1007%2FBF01299065

Edmonds, B., & Hales, D. (2003). Replication, replication and replication: Some hard lessons from model alignment. Journal of Artificial Societies and Social Simulation, 6(4), 11. http://jasss.soc.surrey.ac.uk/6/4/11.html

Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., & Taylor, K. E. (2016). Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geoscientific Model Development, 9(5), 1937–1958. https://doi.org/10.5194/gmd-9-1937-2016

Gates, W. L., Boyle, J. S., Covey, C., Dease, C. G., Doutriaux, C. M., Drach, R. S., Fiorino, M., Gleckler, P. J., Hnilo, J. J., Marlais, S. M., Phillips, T. J., Potter, G. L., Santer, B. D., Sperber, K. R., Taylor, K. E., & Williams, D. N. (1999). An Overview of the Results of the Atmospheric Model Intercomparison Project (AMIP I). In Bulletin of the American Meteorological Society (Vol. 80, Issue 1, pp. 29–55). American Meteorological Society. https://doi.org/10.1175/1520-0477(1999)080<0029:AOOTRO>2.0.CO;2

Hales, D., Rouchier, J., & Edmonds, B. (2003). Model-to-model analysis. Journal of Artificial Societies and Social Simulation, 6(4), 5. http://jasss.soc.surrey.ac.uk/6/4/5.html

Jones, B.C., DeBruine, L.M., Flake, J.K. et al. To which world regions does the valence–dominance model of social perception apply?. Nat Hum Behav 5, 159–169 (2021). https://doi.org/10.1038/s41562-020-01007-2

Moshontz, H. + 85 others (2018) The Psychological Science Accelerator: Advancing Psychology Through a Distributed Collaborative Network ,  1(4) 501-515. https://doi.org/10.1177/2515245918797607

Tittensor, D. P., Eddy, T. D., Lotze, H. K., Galbraith, E. D., Cheung, W., Barange, M., Blanchard, J. L., Bopp, L., Bryndum-Buchholz, A., Büchner, M., Bulman, C., Carozza, D. A., Christensen, V., Coll, M., Dunne, J. P., Fernandes, J. A., Fulton, E. A., Hobday, A. J., Huber, V., … Walker, N. D. (2018). A protocol for the intercomparison of marine fishery and ecosystem models: Fish-MIP v1.0. Geoscientific Model Development, 11(4), 1421–1442. https://doi.org/10.5194/gmd-11-1421-2018

Wei, Y., Liu, S., Huntzinger, D. N., Michalak, A. M., Viovy, N., Post, W. M., Schwalm, C. R., Schaefer, K., Jacobson, A. R., Lu, C., Tian, H., Ricciuto, D. M., Cook, R. B., Mao, J., & Shi, X. (2014). The north american carbon program multi-scale synthesis and terrestrial model intercomparison project – Part 2: Environmental driver data. Geoscientific Model Development, 7(6), 2875–2893. https://doi.org/10.5194/gmd-7-2875-2014


Bithell, M. and Edmonds, B. (2020) The Systematic Comparison of Agent-Based Policy Models - It’s time we got our act together!. Review of Artificial Societies and Social Simulation, 11th May 2021. https://rofasss.org/2021/05/11/SystComp/