Tag Archives: edmundchattoebrown

Delusional Generality – how models can give a false impression of their applicability even when they lack any empirical foundation

By Bruce Edmonds1, Dino Carpentras2, Nick Roxburgh3, Edmund Chattoe-Brown4 and Gary Polhill3

  1. Centre for Policy Modelling, Manchester Metropolitan University
  2. Computational Social Science, ETH Zurich
  3. James Hutton Institute, Aberdeen
  4. University of Leicester

“Hamlet: Do you see yonder cloud that’s almost in shape of a camel?
Polonius: By the mass, and ‘tis like a camel, indeed.
Hamlet: Methinks it is like a weasel.
Polonius: It is backed like a weasel.
Hamlet: Or like a whale?
Polonius: Very like a whale.

Models and Generality

The essence of a model is that it represents – if it is not a model of something it is not a model at all (Zeigler 1976, Wartofsky 1979). A random bit of code or set of equations is not a model. The point of a model is that one can use the model to infer or understand some aspects about what it represents. However, models can represent a variety of kinds of things in a variety of ways (Edmonds & al. 2019) – it can represent ideas, correspond to data, or aspects of other models and it can represent each of these in either a vague or precise manner. To completely understand a model – its construction, properties and working – one needs to understand how it does this mapping. This piece focuses attention on this mapping, rather than the internal construction of models.

What a model reliably represents may be a single observed situation, but it might satisfactorily represent more than one such situation. The range of situations that the model satisfactorily represents is called the “scope” of the model (what is “satisfactory” depending on the purpose for which the model is being used). The more extensive the scope, the more “general” we say the model is. A model that only represents one case has no generality at all and may be more in the nature of a description.

There is a hunger for general accounts of social phenomena (let us call these ‘theories’). However, this hunger is often frustrated by the sheer complexity and ‘messiness’ involved in such phenomena. If every situation we observe is essentially different, then no such theory is possible. However, we hope that this is not the case for the social world and, indeed, informal observation suggests that there is, at least some, commonality between situations – in other words, that some kind of reliable generalisation about social phenomena might be achievable, however modest (Merton 1968). This piece looks at two kinds of applicability – analogical applicability and empirical applicability – and critiques those that conflate them. Although the expertise of the authors is in the agent-based modelling of social phenomena, and so we restrict our discussion to this, we strongly suspect that our arguments are true for many kinds of modelling across a range of domains.

In the next sections we contrast two uses for models: as analogies (ways of thinking about observed systems) and those that intend to represent empirical data in a more precise way. There are, of course, other uses of model such as that of exploring theory which have nothing to do with anything observed.

Models used as analogies

Analogical applicability comes from the flexibility of the human mind in interpreting accounts in terms of the different situations. When we encounter a new situation, the account is mapped onto it – the account being used as an analogy for understanding this situation. Such accounts are typically in the form of a narrative, but a model can also be used as an analogy (which is the case we are concerned with here). The flexibility with which this mapping can be constructed means that such an account can be related to a wide range of phenomena. Such analogical mapping can lead to an impression that the account has a wide range of applicability. Analogies are a powerful tool for thinking since it may give us some insights into otherwise novel situations. There are arguments that analogical thinking is a fundamental aspect of human thought (Hofstadter 1995) and language (Lakoff 2008). We can construct and use analogical mappings so effortlessly that they seem natural to us. The key thing about analogical thinking is that the mapping from the analogy to the situation to which it is applied is re-invented each time – there is no fixed relationship between the analogy and what it might be applied to. We are so good at doing this that we may not be aware of how different the constructed mapping is each time. However, its flexibility comes at a cost, namely that because there is no well-defined relationship with what it applies to, the mapping tends to be more intuitive than precise. An analogy can give insights but analogical reasoning suggests rather than establishes anything reliably and you cannot empirically test it (since analogical mappings can be adjusted to avoid falsification). Such “ways of thinking” might be helpful, but equally might be misleading [note ‎1].

Just because the content of an analogy might be expressed formally does not change any of this (Edmonds 2018), in fact formally expressed analogies might give the impression of being applicable, but often are only related to anything observed via ideas – the model relates to some ideas, and the ideas relate to reality (Edmonds 2000). Using models as analogies is a valid use of models but this is not an empirically reliable one (Edmonds et al. 2019). Arnold (2013) makes a powerful argument that many of the more abstract simulation models are of this variety and simply not relatable to empirically observed cases and data at all – although these give the illusion of wide applicability, that applicability is not empirical. In physics the ways of thinking about atomic or subatomic entities have changed over time whilst the mathematically-expressed, empirically-relevant models have not (Hartman 1997). Although Thompson (2022) concentrates on mathematically formulated models, she also distinguishes between well-validated empirical models and those that just encapsulate the expertise/opinion of the modeller. She gives some detailed examples of where the latter kind had disproportionate influence, beyond that of other expertise, just because it was in the form of a model (e.g. the economic impact of climate change).

An example of an analogical model is described in Axelrod (1984) – a formalised tournament where algorithmically-expressed strategies are pitted against each other, playing the iterated prisoner’s dilemma game. It is shown how the ‘tit for tat’ strategy can survive against many other mixes of strategies (static or evolving).  In the book, the purpose of the model is to suggest a new way of thinking about the evolution of cooperation. The book claims the idea ‘explains’ many observed phenomena, but this in an analogical manner – no precise relationship with any observed measurements is described. There is no validation of the model here or in the more academic paper that described these results (Axelrod & Hamilton 1981).

Of course, researchers do not usually call their models “analogies” or “analogical” explicitly but tend to use other phrasings that imply a greater importance. An exception is Epstein (2008) where it is explicitly listed as one of the 15 modelling purposes, other than prediction, that he discusses. Here he says such models are “…more than beautiful testaments to the unifying power of models: they are headlights in dark unexplored territory.” (ibid.) thus suggesting their use in thinking about phenomena where we do not already have reliable empirical models. Anything that helps us think about such phenomena could be useful, but that does not mean they are at all reliable. As Herbert Simon said: “Metaphor and analogy can be helpful, or they can be misleading. ” (Simon 1968, p. 467).

Another purpose listed in Epstein (2008) is to “Illuminate core dynamics”. After raising the old chestnut that “All models are wrong”, he goes on to justify them on the grounds that “…they capture qualitative behaviors of overarching interest”. This is fine if the models are, in fact, known to be useful as more than vague analogies [Note 2] – that they do, in some sense, approximate observed phenomena – but this is not the case with novel models that have not been empirically tested. This phrase is more insidious, because it implies that the dynamics that have been illuminated by the model are “core” – some kind of approximation of what is important about the phenomena, allowing for future elaborations to refine the representation. This implies a process where an initially rough idea is iteratively improved. However, this is premature because we do not know if what has been abstracted away in the abstract model was essential to the dynamics of the target phenomena or not without empirical testing – this is just assumed or asserted based on the intuitions of the modeller.

This idea of the “core dynamics” leads to some paradoxical situations – where a set of competing models are all deemed to be core. Indeed, the literature has shown how the same phenomenon can be modelled in many contrasting ways. For instance, political polarisation has been modelled through models with mechanisms for repulsion, bounded confidence, reinforcement, or even just random fluctuations, to name a few (Flache et al., 2017; Banisch & Olbrich 2019; Carpentras et al. 2022). However, it is likely that only a few of them contribute substantially to the political polarisation we observe in the real world, and so that all the others are not a real “core dynamic” but until we have more empirical work we do not know which are core and which not.

A related problem with analogical models is that, even when relying on parsimony principles [Note 3], it is not possible to decide which model is better. This aspect, combined with the constant production of new models, can makes the relevant literature increasingly difficult to navigate as models proliferate without any empirical selection, especially for researchers new to ABM. Furthermore, most analogical models define their object of study in an imprecise manner so that it is hard to evaluate whether they are even intended to capture element of any particular observed situation. For example, opinion dynamics models rarely define the type of interaction they represent (e.g. in person vs online) or even what an opinion is. This has led to cases where even knowledge of facts has been studied as “opinions” (e.g. Chacoma & Zanette, 2015).

In summary, analogical models can be a useful tool to start thinking about complex phenomena. However, the danger with them is that they give an impression of progress but result in more confusion than clarity, possibly slowing down scientific progress. Once one has some possible insights, one needs to confront these with empirical data to determine which are worth further investigation.

Models that relate directly to empirical data

An empirical model, in contrast, has a well-defined way of mapping to the phenomena it represents. For example, the variables of the gas laws (volume, temperature and pressure) are measured using standard methods developed over a long period of time, one does not invent a new way of doing this each time the laws are applied. In this case, the ways of measuring these properties have developed alongside the mathematical models of the laws so that these work reliably under broad (and well known) conditions and cannot be adjusted at the whim of a modeller. Empirical generality comes from when a model applies reliably to many different situations – in the case of the gas laws, to a wide range of materials in gaseous form to a high degree of accuracy.

Empirical models can be used for different purposes, including: prediction, explanation and description (Edmonds et al. 2019). Each of these uses how the model is mapped to empirical data in different ways, to reflect these purposes. With a descriptive model the mapping is one-way from empirical data to the model to justify the different parts. In a predictive model, the initial model setup is determined from known data and the model is then run to get its results. These results are then mapped back to what we might expect as a prediction, which can be later compared to empirically measured values to check the model’s validity. An explanatory model supports a complex explanation of some known outcomes in terms of a set of processes, structures and parameter values. When it is shown that the outcomes of such a model sufficiently match those from the observed data – the model represents a complex chain of causation that would result in that data in terms of the processes, structures and parameter values it comprised. It thus supports an explanation in terms of the model and its input of what was observed. In each of these three cases the mapping from empirical data to the model happens in a different order and maybe in a different direction, however they all depend upon the mapping being well defined.

Cartwright (1983), studying how physics works, distinguished between explanatory and phenomenological laws – the former explains but does not necessary relate exactly to empirical data (such as when we fit a line to data using regression), whilst the latter fits the data but does not necessarily explain (like the gas laws). Thus the jobs of theoretical explanation and empirical prediction are done by different models or theories (often calling the explanatory version “theory” and the empirical versions “models”). However, in physics the relationship between the two is, itself, examined so that the “bridging laws” between them are well understood, especially in formal terms. In this case, we attribute reliable empirical meaning to the explanatory theories to the extent that the connection to the data is precise, even though it is done via the intermediary of an “phenomenological” model because both mappings (explanatory↔phenomenological and phenomenological↔empirical data) are precise and well established. The point is that the total mapping from model or theory to empirical data is not subject to interpretation or post-hoc adjustment to improve its fit.

ABMs are often quite complicated and require many parameters or other initialising input to be specified before they can be run. If some of these are not empirically determinable (even in principle) then these might be guessed at using a process of “calibration”, that is searching the space of possible initialisations for some values for which some measured outcomes of the results match other empirical data. If the model has been separately shown to be empirically reliable then one could do such a calibration to suggest what these input values might have been. Such a process might establish that the model captures a possible explanation of the fitted outcomes (in terms of the model plus those backward-inferred input values), but this is not a very strong relationship, since many models are very flexible and so could fit a wide range of possible outcomes. The reliability of such a suggested explanation, supported by the model, is only relative to (a) the empirical reliability of any theory or other assumptions the model is built upon (b) how flexibly the model outcomes can be adjusted to fit the target data and (c) how precisely the choice of outcome measures and fit are. Thus, calibration does not provide strong evidence of the empirical adequacy of an ABM and any explanation supported by such a procedure is only relative to the ‘wiggle room’ afforded by free parameters and unknown input data as well as any assumptions used in the making of the model. However, empirical calibration is better than none and may empirically fix the context in which theoretical exploration occurs – showing that the model is, at least, potentially applicable to the case being considered [Note 4].

An example of a model that is strongly grounded in empirical data is the “538” model of the US electoral college for presidential elections (Silver 2012). This is not an ABM but more like a micro-simulation. It aggregates the uncertainty from polling data to make probabilistic predictions about what this means for the outcomes. The structure of the model comes directly from the rules of the electoral college, the inputs are directly derived from the polling data and it makes predictions about the results that can be independently checked. It does a very specific, but useful job, in translating the uncertainty of the polling data into the uncertainty about the outcome.

Why this matters

If people did not confuse the analogical and empirical cases, there would not be a problem. However, researchers seem to suffer from a variety of “Kuhnian Spectacles” (Kuhn 1962) – namely that because they view their target systems through an analogical model, they tend to think that this is how that system actually is – i.e. that the model has not just analogical but also empirical applicability. This is understandable, we use many layers of analogy to navigate our world and in many every-day cases it is practical to conflate our models with the reality we deal with (when they are very reliable). However, people who claim to be scientists are under an obligation to be more cautious and precise than this, since others might wish to rely upon our theories and models (this is, after all, why they support us in our privileged position). However, such caution is not always followed. There are cases where modellers declare their enterprise a success even after a long period without any empirical backing, making a variety of excuses instead of coming clean about this lack (Arnold 2015).

Another fundamental aspect is that agent-based models can be very interdisciplinary and, because of that, they can be used also by researchers in different fields. However, many fields do not consider models as simple analogies, especially when they provide precise mathematical relationship among variables. This can easily result in confusions where the analogical applicability of ABMs is interpreted as empirical in another field.

Of course, we may be hopeful that, sometime in the future, our vague or abstract analogical model maybe developed into something with proven empirical abilities, but we should not suggest such empirical abilities until these have been established. Furthermore, we should be particularly careful to ensure that non-modellers understand that this possibility is only a hope and not imply anything otherwise (e.g. imply that it is likely to have empirical validity). However, we suspect that in many cases this confusion goes beyond optimistic anticipation and that some modellers conflate analogical with empirical applicability, assuming that their model is basically right just because it seems that way to them. This is what we call “delusional generality” – that a researcher is under the impression that their model has a wide applicability (or potentially wide applicability) due to the attractiveness of the analogy it presents. In other words, unaware of the unconscious process of re-inventing the mapping to each target system, they imagine (without further justification) that it has some reliable empirical (or potentially empirical) generality at its core [Note 5].

Such confusion can have severe real-world consequences if a model with only analogical validity is assumed to also have some empirical reliability. Thompson (2022) discusses how abstract economic models of the cost of future climate change did affect the debate about the need for prevention and mitigation, even though they had no empirical validity. However, agent-based modellers have also made the same mistake, with a slew of completely unvalidated models about COVID affecting public debate about policy (Squazzoni et al 2021).

Conclusion

All of the above discussion raises the question of how we might achieve reliable models with even a moderate level of empirical generality in the social sciences. This is a tricky question of scientific strategy, which we are not going to answer here [Note 6]. However, we question whether the approach of making “heroic” jumps from phenomena to abstract non-empirical models on the sole basis of its plausibility to its authors will be a productive route when the target is complex phenomena, such as socio-cognitive systems (Dignum, Edmonds and Carpentras 2022). Certainly, that route has not yet been empirically demonstrated.

Whatever the best strategy is, there is a lot of theoretical modelling in the field of social simulation that assumes or implies that it is the precursor for empirical applicability and not a lot of critique about the extent of empirical success achieved. The assumption seems to be that abstract theory is the way to make progress understanding social phenomena but, as we argue here, this is largely wishful thinking – the hope that such models will turn out to have empirical generality being a delusion.  Furthermore, this approach has substantive deleterious effects in terms of encouraging an explosion of analogical models without any process of selection (Edmonds 2010). It seems that the ‘famine’ of theory about social phenomena with any significant level of generality is so severe, that many seem to give credence to models they might otherwise reject – constructing their understanding using models built on sand.

Notes

1. There is some debate about the extent to which analogical reasoning works, what kind of insights it results in and under what circumstances (Hofstede 1995). However, all we need for our purposes is that: (a) it does not reliably produce knowledge, (b) the human mind is exceptionally good at ‘fitting’ analogies to new situations (adjusting the mapping to make it ‘work’ somehow) and (c) due to this ability analogies can be far more convincing that the analogical reasoning warrants.

2. In pattern-oriented modelling (Grimm & al 2005) models are related to empirical evidence in a qualitative (pattern-based) manner, for example to some properties of a distribution of numeric outcomes. In this kind of modelling, a precise numerical correspondence is replaced by a set of qualitative correspondences in many different dimensions. In this the empirical relevance of a model is established on the basis that it is too hard to simultaneously fit a model to evidence in this way, thus ruling that out as a source of its correspondence with that evidence.

3. So-called “parsimony principles” are a very unreliable manner of evaluating competing theories on grounds other than convenience or that of using limited data to justify the values of parameters (Edmonds 2007).

4. In many models a vague argument for its plausibility is often all that is described to show that it is applicable to the cases being discussed. At least calibration demonstrates its empirical applicability, rather than simply assuming it.

5. We are applying the principle of charity here, assuming that such conflations are innocent and not deliberate. However, there is increasing pressure from funding agencies to demonstrate ‘real life relevance’ so some of these apparent confusions might be more like ‘spin’ – trying to give an impression of empirical relevance even when this is merely an aspiration, in order to suggest that their model has more significant than they have reliably established.

6. This has been discussed elsewhere, e.g. (Moss & Edmonds 2005).

Acknowledgements

Thanks to all those we have discussed these issues with, including Scott Moss (who was talking about these kinds of issue more than 30 years ago), Eckhart Arnold (who made many useful comments and whose careful examination of the lack of empirical success of some families of model demonstrates our mostly abstract arguments), Sven Banisch and other members of the ESSA special interest group on “Strongly Empirical Modelling”.

References

Arnold, E. (2013). Simulation models of the evolution of cooperation as proofs of logical possibilities. How useful are they? Ethics & Politics, XV(2), pp. 101-138. https://philpapers.org/archive/ARNSMO.pdf

Arnold, E. (2015) How Models Fail – A Critical Look at the History of Computer Simulations of the Evolution of Cooperation. In Misselhorn, C. (Ed.): Collective Agency and Cooperation in Natural and Artificial Systems. Explanation, Implementation and Simulation, Philosophical Studies Series, Springer, pp. 261-279. https://eckhartarnold.de/papers/2015_How_Models_Fail

Axelrod, R. (1984) The Evolution of Cooperation, Basic Books.

Axelrod, R.  & Hamilton, W.D. (1981) The evolution of cooperation. Science, 211, 1390-1396. https://www.science.org/doi/abs/10.1126/science.7466396

Banisch, S., & Olbrich, E. (2019). Opinion polarization by learning from social feedback. The Journal of Mathematical Sociology, 43(2), 76-103. https://doi.org/10.1080/0022250X.2018.1517761

Carpentras, D., Maher, P. J., O’Reilly, C., & Quayle, M. (2022). Deriving An Opinion Dynamics Model From Experimental Data. Journal of Artificial Societies & Social Simulation, 25(4).http://doi.org/10.18564/jasss.4947

Cartwright, N. (1983) How the Laws of Physics Lie. Oxford University Press.

Chacoma, A. & Zanette, D. H. (2015). Opinion formation by social influence: From experiments to modelling. PloS ONE, 10(10), e0140406.https://doi.org/10.1371/journal.pone.0140406

Dignum, F., Edmonds, B. and Carpentras, D. (2022) Socio-Cognitive Systems – A Position Statement. Review of Artificial Societies and Social Simulation, 2nd Apr 2022. https://rofasss.org/2022/04/02/scs

Edmonds, B. (2000). The Use of Models – making MABS actually work. In. S. Moss and P. Davidsson. Multi Agent Based Simulation. Berlin, Springer-Verlag. 1979: 15-32. http://doi.org/10.1007/3-540-44561-7_2

Edmonds, B. (2007) Simplicity is Not Truth-Indicative. In Gershenson, C.et al. (eds.) Philosophy and Complexity. World Scientific, pp. 65-80.

Edmonds, B. (2010) Bootstrapping Knowledge About Social Phenomena Using Simulation Models. Journal of Artificial Societies and Social Simulation, 13(1), 8. http://doi.org/10.18564/jasss.1523

Edmonds, B. (2018) The “formalist fallacy”. Review of Artificial Societies and Social Simulation, 11th June 2018. https://rofasss.org/2018/07/20/be/

Edmonds, B., le Page, C., Bithell, M., Chattoe-Brown, E., Grimm, V., Meyer, R., Montañola-Sales, C., Ormerod, P., Root H. & Squazzoni. F. (2019) Different Modelling Purposes. Journal of Artificial Societies and Social Simulation, 22(3):6. http://doi.org/10.18564/jasss.3993

Epstein, J. M. (2008). Why Model?. Journal of Artificial Societies and Social Simulation, 11(4),12. https://www.jasss.org/11/4/12.html

Flache, A., Mäs, M., Feliciani, T., Chattoe-Brown, E., Deffuant, G., Huet, S. & Lorenz, J. (2017). Models of social influence: Towards the next frontiers. Journal of Artificial Societies and Social Simulation, 20(4), 2. http://doi.org/10.18564/jasss.4298

Grimm, V., Revilla, E., Berger, U., Jeltsch, F., Mooij, W.M., Railsback, S.F., et al. (2005). Pattern-oriented modeling of agent-based complex systems: lessons from ecology. Science, 310 (5750), 987–991. https://www.jstor.org/stable/3842807

Hartman, S. (1997) Modelling and the Aims of Science. 20th International Wittgenstein Symposium, Kirchberg am Weshsel.

Hofstadter, D. (1995) Fluid Concepts and Creative Analogies. Basic Books.

Kuhn, T. S. (1962). The structure of scientific revolutions. University of Chicago Press.

Lakoff, G. (2008). Women, fire, and dangerous things: What categories reveal about the mind. University of Chicago Press.

Merton, R.K. (1968). On the Sociological Theories of the Middle Range. In Classical Sociological Theory, Calhoun, C., Gerteis, J., Moody, J., Pfaff, S. and Virk, I. (Eds), Blackwell, pp. 449–459.

Meyer, R. & Edmonds, B. (2023). The Importance of Dynamic Networks Within a Model of Politics. In: Squazzoni, F. (eds) Advances in Social Simulation. ESSA 2022. Springer Proceedings in Complexity. Springer. (Earlier, open access, version at: https://cfpm.org/discussionpapers/292)

Moss, S. and Edmonds, B. (2005). Towards Good Social Science. Journal of Artificial Societies and Social Simulation, 8(4), 13. https://www.jasss.org/8/4/13.html

Squazzoni, F. et al. (2020) ‘Computational Models That Matter During a Global Pandemic Outbreak: A Call to Action’ Journal of Artificial Societies and Social Simulation 23(2):10. http://doi.org/10.18564/jasss.4298

Silver, N, (2012) The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t. Penguin.

Simon, H. A. (1962). The architecture of complexity. Proceedings of the American philosophical society, 106(6), 467-482.https://www.jstor.org/stable/985254

Thompson, E. (2022). Escape from Model Land: How mathematical models can lead us astray and what we can do about it. Basic Books.

Wartofsky, M. W. (1979). The model muddle: Proposals for an immodest realism. In Models (pp. 1-11). Springer, Dordrecht.

Zeigler, B. P. (1976). Theory of Modeling and Simulation. Wiley Interscience, New York.


Edmonds, B., Carpentras, D., Roxburgh, N., Chattoe-Brown, E. and Polhill, G. (2024) Delusional Generality – how models can give a false impression of their applicability even when they lack any empirical foundation. Review of Artificial Societies and Social Simulation, 7 May 2024. https://rofasss.org/2024/05/06/delusional-generality


© The authors under the Creative Commons’ Attribution-NoDerivs (CC BY-ND) Licence (v4.0)

Discussions on Qualitative & Quantitative Data in the Context of Agent-Based Social Simulation

By Peer-Olaf Siebers, in collaboration with Kwabena Amponsah, James Hey, Edmund Chattoe-Brown and Melania Borit

Motivation

1.1: Some time ago, I had several discussions with my PhD students Kwabena Amponsah and James Hey (we are all computer scientists, with a research interest in multi-agent systems) on the topic of qualitative vs. quantitative data in the context of Agent-Based Social Simulation (ABSS). Our original goal was to better understand the role of qualitative vs. quantitative data in the life cycle of an ABSS study. But as you will see later, we conquered more ground during our discussions.

1.2: The trigger for these discussions came from numerous earlier discussions within the RAT task force (Sebastian Achter, Melania Borit, Edmund Chattoe-Brown, and Peer-Olaf Siebers) on the topic, while we were developing the Rigour and Transparency – Reporting Standard (RAT-RS). The RAT-RS is a tool to improve the documentation of data use in Agent-Based Modelling (Achter et al 2022). During our RAT-RS discussions we made the observation that when using the terms “qualitative data” and “quantitative data” in different phases of the ABM simulation study life cycle these could be interpreted in different ways, and we felt difficult to clearly state the definition/role of these different types of data in the different contexts that the individual phases within the life cycle represent. This was aggravated by the competing understandings of the terminology within different domains (from social and natural sciences) that form the field of social simulation.

1.3: As the ABSS community is a multi-disciplinary one, often doing interdisciplinary research, we thought that we should share the outcome of our discussions with the community. To demonstrate the different views that exist within the topic area, we ask some of our friends from the social simulation community to comment on our philosophical discussions. And we were lucky enough to get our RAT-RS colleagues Edmund Chattoe-Brown and Melania Borit on board who provided critical feedback and their own view of things. In the following we provide a summary of the overall discussion Each of the following paragraph contains summaries of the initial discussion outcomes, representing the computer scientists’ views, followed by some thoughts provided by our two friends from the social simulation community (Borit’s in {} brackets and italic and Chattoe-Brown’s in [] brackets and bold), both commenting on the initial discussion outcomes of the computer scientists. To see the diverse backgrounds of all the contributors and perhaps to better understand their way of thinking and their arguments, I have added some short biographies of all contributors at the end of this Research Note. To support further (public) discussions I have numbered the individual paragraphs to make it easier to refer back to them. 

Terminology

2.1: As a starting point for our discussions I searched the internet for some terminology related to the topic of “data”. Following is a list of initial definitions of relevant terms [1]. First, the terms qualitative data and quantitative data, as defined by the Australian Bureau of Statistics: “Qualitative data are measures of ‘types’ and may be represented by a name, symbol, or a number code. They are data about categorical variables (e.g. what type). Quantitative data are measures of values or counts and are expressed as numbers. They are data about numeric variables (e.g. how many; how much; or how often).” (Australian Bureau of Statistics 2022) [Maybe don’t let a statistics unit define qualitative research? This has a topic that is very alien to us but argues “properly” about the role of different methods (Helitzer-Allen and Kendall 1992). “Proper” qualitative researchers would fiercely dispute this. It is “quantitative imperialism”.].

2.2: What might also help for this discussion is to better understand the terms qualitative data analysis and quantitative data analysis. Qualitative data analysis refers to “the processes and procedures that are used to analyse the data and provide some level of explanation, understanding, or interpretation” (Skinner et al 2021). [This is a much less contentious claim for qualitative data – and makes the discussion of the Australian Bureau of Statistics look like a distraction but a really “low grade” source in peer review terms. A very good one is Strauss (1987).] These methods include content analysis, narrative analysis, discourse analysis, framework analysis, and grounded theory and the goal is to identify common patterns. {These data analysis methods connect to different types of qualitative research: phenomenology, ethnography, narrative inquiry, case study research, or grounded theory. The goal of such research is not always to identify patterns – see (Miles and Huberman 1994): e.g., making metaphors, seeing plausibility, making contrasts/comparisons.} [In my opinion some of these alleged methods are just empire building or hot air. Do you actually need them for your argument?] These types of analysis must therefore use qualitative inputs, broadening the definition to include raw text, discourse and conceptual frameworks.

2.3 When it comes to quantitative data analysis “you are expected to turn raw numbers into meaningful data through the application of rational and critical thinking. Quantitative data analysis may include the calculation of frequencies of variables and differences between variables.” (Business Research Methodology 2022) {One does the same in qualitative data analysis – turns raw words (or pictures etc.) into meaningful data through the application of interpretation based on rational and critical thinking. In quantitative data analysis you usually apply mathematical/statistical models to analyse the data.}. While the output of quantitative data analysis can be used directly as input to a simulation model, the output of qualitative data analysis still needs to be translated into behavioural rules to be useful (either manually or through machine learning algorithms). {What is meant by “translated” in this specific context? Do we need this kind of translation only for qualitative data or also for quantitative data? Is there a difference between translation methods of qualitative and quantitative data?} [That seems pretty contentious too. It is what is often done, true, but I don’t think it is a logical requirement. I guess you could train a neural net using “cases” or design some other simple “cognitive architecture” from the data. Would this (Becker 1953), for example, best be modelled as “rules” or as some kind of adaptive process? But of course you have to be careful that “rule” is not defined so broadly that everything is one or it is true by definition. I wonder what the “rules” are in this: Chattoe-Brown (2009).]

2.4: Finally, let’s have a quick look at the difference between “data” and “evidence”. For this, we found the following distinction by Wilkinson (2022) helpful: “… whilst data can exist on its own, even though it is essentially meaningless without context, evidence, on the other hand, has to be evidence of or for something. Evidence only exists when there is an opinion, a viewpoint or an argument.”

Hypothesis

3.1: The RAT-RS divides the simulation life cycle into five phases, in terms of data use: model aim and context, conceptualisation, operationalisation, experimentation, and evaluation (Siebers et al 2019). We started our discussion by considering the following hypothesis: The outcome of qualitative data analysis is only useful for the purpose of conceptualisation and as a basis for producing quantitative data. It does not have any other roles within the ABM simulation study life cycle. {Maybe this hypothesis in itself has to be discussed. Is it so that you use only numbers in the operationalisation phase? One can write NetLogo code directly from qualitative data, without numbers.} [Is this inevitable given the way ABM works? Agents have agency and therefore can decide to do things (and we can only access this by talking to them, probably “open ended”). A statistical pattern – time series or correlation – has no agency and therefore cannot be accessed “qualitatively” – though we also sometimes mean by “qualitative” eyeballing two time series rather than using some formal measure of tracking. I guess that use would >>not<< be relevant here.]

Discussion

4.1: One could argue that qualitative data analysis provides causes for behaviour (and indications about their importance (ranking); perhaps also the likelihood of occurrence) as well as key themes that are important to be considered in a model. All sounds very useful for the conceptual modelling phase. The difficulty might be to model the impact (how do we know we model it correctly and at the right level), if that is not easily translatable into a quantitative value but requires some more (behavioural) mechanistic structures to represent the impact of behaviours. [And, of course, there is a debate in psychology (with some evidence on both sides) about the extent to which people are able to give subjective accounts we can trust (see Hastorf and Cantril (1954).] This might also provide issues when it comes to calibration – how does one calibrate qualitative data? {Triangulation.} One random idea we had was that perhaps fuzzy logic could help with this. More brainstorming and internet research is required to confirm that this idea is feasible and useful. [A more challenging example might be ethnographic observation of a “neighbourhood” in understanding crime. This is not about accessing the cognitive content of agents but may well still contribute to a well specified model. It is interesting how many famous models – Schelling, Zaller-Deffuant – actually have no “real” environment.]

4.2: One could also argue that what starts (or what we refer to initially) as qualitative data always ends up as quantitative data, as whatever comes out of the computer are numbers. {This is not necessarily true. Check  the work on qualitative outputs using Grounded Theory by Neumann (2015).} Of course this is a question related to the conceptual viewpoint. [Not convinced. It sounds like all sociology is actually physics because all people are really atoms. Formally, everything in computers is numbers because it has to be but that isn’t the same as saying that data structures or whatever don’t constitute a usable and coherent level of description: We “meet” and “his” opinion changes “mine” and vice versa. Somewhere, that is all binary but you can read the higher level code that you can understand as “social influence” (whatever you may think of the assumptions). Be clear whether this (like the “rules” claim) is a matter of definition – in which case it may not be useful (even if people are atoms we have no idea of how to solve the “atomic physics” behind the Prisoner’s Dilemma) or an empirical one (in which case some models may just prove it false). This (Beltratti et al 1996) contains no “rules” and no “numbers” (except in the trivial sense that all programming does).]

4.3: Also, an algorithm is expressed in code and can only be processed numerically, so it can only deliver quantitative data as output. These can perhaps be translated into qualitative concepts later. A way of doing this via the use of grounded theory is proposed in Neumann and Lotzmann (2016). {This refers to the same idea as my previous comment.} [Maybe it is “safest” to discuss this with rules because everyone knows those are used in ABM. Would it make sense to describe the outcome of a non trivial set of rules – accessed for example like this: Gladwin (1989) – as either “quantitative” or “numbers?”]

4.4: But is it true that data delivered as output is always quantitative? Let’s consider, for example, a consumer marketing scenario, where we define stereotypes (shopping enthusiast; solution demander; service seeker; disinterested shopper; internet shopper) that can change over time during a simulation run (Siebers et al 2010). These stereotypes are defined by likelihoods (likelihood to buy, wait, ask for help, and ask for refund). So, during a simulation run an agent could change its stereotype (e.g. from shopping enthusiast to disinterested shopper), influenced by the opinion of others and their own previous experience. So, at the beginning of the simulation run the agent can have a different stereotype compared to the end. Of course we could enumerate the five different stereotypes, and claim that the outcome is numeric, but the meaning of the outcome would be something qualitative – the stereotype related to that number. To me this would be a qualitative outcome, while the number of people that change from one stereotype to another would be a quantitative outcome. They would come in a tandem. {So, maybe the problem is that we don’t yet have the right ways of expressing or visualising qualitative output?} [This is an interesting and grounded example but could it be easily knocked down because everything is “hard coded” and therefore quantifiable? You may go from one shopper type to another – and what happens depends on other assumptions about social influence and so on – but you can’t “invent” your own type. Compare something like El Farol (Arthur 1994) where agents arguably really can “invent” unique strategies (though I grant these are limited to being expressed in a specified “grammar”).]

4.5: In order to define someone’s stereotype we would use numerical values (likelihood = proportion). However, stereotypes refer to nominal data (which refers to data that is used for naming or labelling variables, without any quantitative value). The stereotype itself would be nominal, while the way one would derive the stereotype would be numerical. Figure 1 illustrates a case in which the agent moves from the disinterested stereotype to the enthusiast stereotype. [Is there a potential confusion here between how you tell an agent is a type – parameters in the code just say so – and how you say a real person is a type? Everything you say about the code still sounds “quantitative” because all the “ingredients” are.]

Figure 1: Enthusiastic and Disinterested agent stereotypes

4.6: Let’s consider a second example, related to the same scenario: The dynamics over time to get from an enthusiastic shopper (perhaps via phases) to a disinterested shopper. This is represented as a graph where the x-axis represents time and the y-axis stereotypes (categorical data). If you want to take a quantitative perspective on the outcome you would look at a specific point in time (state of the system) but to take a qualitative perspective of the outcome, you would look at the pattern that the curve represents over the entire simulation runtime. [Although does this shade into the “eyeballing” sense of qualitative rather than the “built from subjective accounts” sense? Another way to think of this issue is to imagine “experts” as a source of data. We might build an ABM based on an expert perception of say, how a crime gang operates. That would be qualitative but not just individual rules: For example, if someone challenges the boss to a fight and loses they die or leave. This means the boss often has no competent potential successors.]

4.7: So, the inputs (parameters, attributes) to get the outcome are numeric, but the outcome itself in the latter case is not. The outcome only makes sense once it’s put into the qualitative context. And then we could say that the simulation produces some qualitative outputs. So, does the fact that data needs to be seen in a context make it evidence, i.e. do we only have quantitative and qualitative evidence on the output side? [Still worried that you may not be happy equating qualitative interview data with qualitative eyeballing of graphs. Mixes up data collection and analysis? And unlike qualitative interviews you don’t have to eyeball time series. But the argument of qualitative research is you can’t find out some things any other way because, to run a survey say, or an experiment, you already have to have a pretty good grasp of the phenomenon.]

4.8: If one runs a marketing campaign that will increase the number of enthusiastic shoppers. This can be seen as qualitative data as it is descriptive of how the system works rather than providing specific values describing the performance of a system. You could also equally express this algebraic terms which would make it quantitative data. So, it might be useful to categorise quantitative data to make the outcome easier to understand. [I don’t think this argument is definitely wrong – though I think it may be ambiguous about what “qualitative” means – but I think it really needs stripping down and tightening. I’m not completely convinced as a new reader that I’m getting at the nub of the argument. Maybe just one example in detail and not two in passing?]

Outcome

5.1: How we understand things and how the computer processes things are two different things. So, in fact qualitative data is useful for the conceptualisation and for describing experimentation and evaluation output, and needs to be translated into numerical data or algebraic constructs for the operationalisation. Therefore, we can reject our initial hypothesis, as we found more places where qualitative data can be useful. [Yes, and that might form the basis for a “general” definition of qualitative that was not tied to one part of the research process but you would have to be clear that’s what you were aiming at and not just accidentally blurring two different “senses” of qualitative.]

5.2: In the end of the discussion we picked up the idea of using Fuzzy Logic. Could perhaps fuzzy logic be used to describe qualitative output, as it describes a degree of membership to different categories? An interesting paper to look at in this context would be Sugeno and Yasukawa (1993). Also, a random idea that was mentioned is if there is potential in using “fuzzy logic in reverse”, i.e. taking something that is fuzzy, making it crisp for the simulation, and making it fuzzy again for presenting the result. However, we decided to save this topic for another discussion. [Devil will be in the detail. Depends on exactly what assumptions the method makes. Devil’s advocate: What if qualitative research is only needed for specification – not calibration or validation – but it doesn’t follow from this that that use is “really” quantitative? How intellectually unappealing is that situation and why?]

Conclusion

6.1: The purpose of this Research Note is really to stimulate you to think about, talk about, and share your ideas and opinions on the topic! What we present here is a philosophical impromptu discussion of our individual understanding of the topic, rather than a scientific debate that is backed up by literature. We still thought it is worthwhile to share this with you, as you might stumble across similar questions. Also, we don’t think we have found the perfect answers to the questions yet. So we would like to invite you to join the discussion and leave some comments in the chat, stating your point of view on this topic. [Is the danger of discussing these data types “philosophically”? I don’t know if it is realistic to use examples directly from social simulation but for sure examples can be used from social science generally. So here is a “quantitative” argument from quantitative data: “The view that cultural capital is transmitted from parents to their children is strongly supported in the case of pupils’ cultural activities. This component of pupils’ cultural capital varies by social class, but this variation is entirely mediated by parental cultural capital.” (Sullivan 2001). As well as the obvious “numbers” (social class by a generally agreed scheme) there is also a constructed “measure” of cultural capital based on questions like “how many books do you read in a week?” Here is an example of qualitative data from which you might reason: “I might not get into Westbury cos it’s siblings and how far away you live and I haven’t got any siblings there and I live a little way out so I might have to go on a waiting list … I might go to Sutton Boys’ instead cos all my mates are going there.” (excerpt from Reay 2002). As long as this was not just a unique response (but was supported by several other interviews) one would add to one’s “theory” of school choice: 1) Awareness of the impact of the selection system (there is no point in applying here whatever I may want) and 2) The role of networks in choice: This might be the best school for me educationally but I won’t go because I will be lonely.]

Biographies of the authors

Peer-Olaf Siebers is an Assistant Professor at the School of Computer Science, University of Nottingham, UK. His main research interest is the application of Computer Simulation and Artificial Intelligence to study human-centric and coupled human-natural systems. He is a strong advocate of Object-Oriented Agent-Based Social Simulation and is advancing the methodological foundations. It is a novel and highly interdisciplinary research field, involving disciplines like Social Science, Economics, Psychology, Geography, Operations Research, and Computer Science.

Kwabena Amponsah is a Research Software Engineer working for the Digital Research Service, University of Nottingham, UK. He completed his PhD in Computer Science at Nottingham in 2019 by developing a framework for evaluating the impact of communication on performance in large-scale distributed urban simulations.

James Hey is a PhD student at the School of Computer Science, University of Nottingham, UK. In his PhD he investigates the topic of surrogate optimisation for resource intensive agent based simulation of domestic energy retrofit uptake with environmentally conscious agents. James holds a Bachelor degree in Economics as well as a Master degree in Computer Science.

Edmund Chattoe-Brown is a lecturer in Sociology, School of Media, Communication and Sociology, University of Leicester, UK. His career has been interdisciplinary (including Politics, Philosophy, Economics, Artificial Intelligence, Medicine, Law and Anthropology), focusing on the value of research methods (particularly Agent-Based Modelling) in generating warranted social knowledge. His aim has been to make models both more usable generally and particularly more empirical (because the most rigorous social scientists tend to be empirical). The results of his interests have been published in 17 different peer reviewed journals across the sciences to date. He was funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) by the ESRC via ORA Round 5.

Melania Borit is an interdisciplinary researcher and the leader of the CRAFT Lab – Knowledge Integration and Blue Futures at UiT The Arctic University of Norway. She has a passion for knowledge integration and a wide range of interconnected research interests: social simulation, agent-based modelling; research methodology; Artificial Intelligence ethics; pedagogy and didactics in higher education, games and game-based learning; culture and fisheries management, seafood traceability; critical futures studies.

References

Achter S, Borit M, Chattoe-Brown E, and Siebers PO (2022) RAT-RS: a reporting standard for improving the documentation of data use in agent-based modelling. International Journal of Social Research Methodology, DOI: 10.1080/13645579.2022.2049511

Australian Bureau of Statistics (2022) Statistical Language > Qualitative and Quantitative data. https://www.abs.gov.au/websitedbs/D3310114.nsf/Home/Statistical+Language (last accessed 05/05/2022)

Arthur WB (1994) Inductive reasoning and bounded rationality. The American Economic Review, 84(2), pp.406-411. https://www.jstor.org/stable/pdf/2117868.pdf

Becker HS (1953). Becoming a marihuana user. American Journal of Sociology, 59(3), pp.235-242. https://www.degruyter.com/document/doi/10.7208/9780226339849/pdf

Beltratti A, Margarita S, and Terna P (1996) Neural Networks for Economic and Financial Modelling. International Thomson Computer Press.

Business Research Methodology (2022) Quantitative Data Analysis. https://research-methodology.net/research-methods/data-analysis/quantitative-data-analysis/ (last accessed 05/05/2022)

Chattoe-Brown E (2009) The social transmission of choice: a simulation with applications to hegemonic discourse. Mind & Society, 8(2), pp.193-207. DOI: 10.1007/s11299-009-0060-7

Gladwin CH (1989) Ethnographic Decision Tree Modeling. SAGE Publications.

Hastorf AH and Cantril H (1954) They saw a game; a case study. The Journal of Abnormal and Social Psychology, 49(1), pp.129–134.

Helitzer-Allen DL and Kendall C (1992) Explaining differences between qualitative and quantitative data: a study of chemoprophylaxis during pregnancy. Health Education Quarterly, 19(1), pp.41-54. DOI: 10.1177%2F109019819201900104

Miles MB and Huberman AM (1994) . Qualitative Data Analysis: An Expanded Sourcebook. Sage

Neumann M (2015) Grounded simulation. Journal of Artificial Societies and Social Simulation, 18(1)9. DOI: 10.18564/jasss.2560

Neumann M and Lotzmann U (2016) Simulation and interpretation: a research note on utilizing qualitative research in agent based simulation. International Journal of Swarm Intelligence and Evolutionary Computing 5/1.

Reay D (2002) Shaun’s Story: Troubling discourses of white working-class masculinities. Gender and Education, 14(3), pp.221-234. DOI: 10.1080/0954025022000010695

Siebers PO, Achter S, Palaretti Bernardo C, Borit M, and Chattoe-Brown E (2019) First steps towards RAT: a protocol for documenting data use in the agent-based modeling process (Extended Abstract). Social Simulation Conference 2019 (SSC 2019), 23-27 Sep, Mainz, Germany.

Siebers PO, Aickelin U, Celia H and Clegg C (2010) Simulating customer experience and word-of-mouth in retail: a case study. Simulation: Transactions of the Society for Modeling and Simulation International, 86(1) pp. 5-30. DOI: 10.1177%2F0037549708101575

Skinner J, Edwards A and Smith AC (2021) Qualitative Research in Sport Management – 2e, p171. Routledge.

Strauss AL (1987). Qualitative Analysis for Social Scientists. Cambridge University Press.

Sugeno M and Yasukawa T (1993) A fuzzy-logic-based approach to qualitative modeling. IEEE Transactions on Fuzzy Systems, 1(1), pp.7-31.

Sullivan A (2001) Cultural capital and educational attainment. Sociology 35(4), pp.893-912. DOI: 10.1017/S0038038501008938

Wilkinson D (2022) What’s the difference between data and evidence? Evidence-based practice. https://oxford-review.com/data-v-evidence/ (last accessed 05/05/2022)


Notes

[1] An updated set of the terminology, defined by the RAT task force in 2022, is available as part of the RAT-RS in Achter et al (2022) Appendix A1.


Peer-Olaf Siebers, Kwabena Amponsah, James Hey, Edmund Chattoe-Brown and Melania Borit (2022) Discussions on Qualitative & Quantitative Data in the Context of Agent-Based Social Simulation. Review of Artificial Societies and Social Simulation, 16th May 2022. https://rofasss.org/2022/05/16/Q&Q-data-in-ABM


© The authors under the Creative Commons’ Attribution-NoDerivs (CC BY-ND) Licence (v4.0)

If you want to be cited, calibrate your agent-based model: A Reply to Chattoe-Brown

By Marijn A. Keijzer

This is a reply to a previous comment, (Chattoe-Brown 2022).

The social simulation literature has called on its proponents to enhance the quality and realism of their contributions through systematic validation and calibration (Flache et al., 2017). Model validation typically refers to assessments of how well the predictions of their agent-based models (ABMs) map onto empirically observed patterns or relationships. Calibration, on the other hand, is the process of enhancing the realism of the model by parametrizing it based on empirical data (Boero & Squazzoni, 2005). We would expect that presenting a validated or calibrated model serves as a signal of model quality, and would thus be a desirable characteristic of a paper describing an ABM.

In a recent contribution to RofASSS, Edmund Chattoe-Brown provocatively argued that model validation does not bear fruit for researchers interested in boosting their citations. In a sample of articles from JASSS published on opinion dynamics he observed that “the sample clearly divides into non-validated research with more citations and validated research with fewer” (Chattoe-Brown, 2022). Well-aware of the bias and limitations of the sample at hand, Chattoe-Brown calls on refutation of his hypothesis. An analysis of the corpus of articles in Web of Science, presented here, could serve that goal.

To test whether there exists an effect of model calibration and/or validation on the citation counts of papers, I compare citation counts of a larger number of original research articles on agent-based models published in the literature. I extracted 11,807 entries from Web of Science by searching for items that contained the phrases “agent-based model”, “agent-based simulation” or “agent-based computational model” in its abstract.[1] I then labeled all items that mention “validate” in its abstract as validated ABMs and those that mention “calibrate” as calibrated ABMs. This measure if rather crude, of course, as descriptions containing phrases like “we calibrated our model” or “others should calibrate our model” are both labeled as calibrated models. However, if mentioning that future research should calibrate or validate the model is not related to citations counts (which I would argue it indeed is not), then this inaccuracy does not introduce bias.

The shares of entries that mention calibration or validation are somewhat small. Overall, just 5.62% of entries mention validation, 3.21% report a calibrated model and 0.65% fall in both categories. The large sample size, however, will still enable the execution of proper statistical analysis and hypothesis testing.

How are mentions of calibration and validation in the abstract related to citation counts at face value? Bivariate analyses show only minor differences, as revealed in Figure 1. In fact, the distribution of citations for validated and non-validated ABMs (panel A) is remarkably similar. Wilcoxon tests with continuity correction—the nonparametric version of the simple t test—corroborate their similarity (W = 3,749,512, p = 0.555). The differences in citations between calibrated and non-calibrated models appear, albeit still small, more pronounced. Calibrated ABMs are cited slightly more often (panel B), as also supported by a bivariate test (W = 1,910,772, p < 0.001).

Picture 1

Figure 1. Distributions of number of citations of all the entries in the dataset for validated (panel A) and calibrated (panel B) ABMs and their averages with standard errors over years (panels C and D)

Age of the paper might be a more important determinant of citation counts, as panels C and D of Figure 1 suggest. Clearly, the age of a paper should be important here, because older papers have had much more opportunity to get cited. In particular, papers younger than 10 years seem to not have matured enough for its citation rates to catch up to older articles. When comparing the citation counts of purely theoretical models with calibrated and validated versions, this covariate should not be missed, because the latter two are typically much younger. In other words, the positive relationship between model calibration/validation and citation counts could be hidden in the bivariate analysis, as model calibration and validation are recent trends in ABM research.

I run a Poisson regression on the number of citations as explained by whether they are validated and calibrated (simultaneously) and whether they are both. The age of the paper is taken into account, as well as the number of references that the paper uses itself (controlling for reciprocity and literature embeddedness, one might say). Finally, the fields in which the papers have been published, as registered by Web of Science, have been added to account for potential differences between fields that explains both citation counts and conventions about model calibration and validation.

Table 1 presents the results from the four models with just the main effects of validation and calibration (model 1), the interaction of validation and calibration (model 2) and the full model with control variables (model 3).

Table 1. Poisson regression on the number of citations

# Citations
(1) (2) (3)
Validated -0.217*** -0.298*** -0.094***
(0.012) (0.014) (0.014)
Calibrated 0.171*** 0.064*** 0.076***
(0.014) (0.016) (0.016)
Validated x Calibrated 0.575*** 0.244***
(0.034) (0.034)
Age 0.154***
(0.0005)
Cited references 0.013***
(0.0001)
Field included No No Yes
Constant 2.553*** 2.556*** 0.337**
(0.003) (0.003) (0.164)
Observations 11,807 11,807 11,807
AIC 451,560 451,291 301,639
Note: *p<0.1; **p<0.05; ***p<0.01

The results from the analyses clearly suggest a negative effect of model validation and a positive effect of model calibration on the likelihood of being cited. The hypothesis that was so “badly in need of refutation” (Chattoe-Brown, 2022) will remain unrefuted for now. The effect does turn positive, however, when the abstract makes mention of calibration as well. In both the controlled (model 3) and uncontrolled (model 2) analyses, combining the effects of validation and calibration yields a positive coefficient overall.[2]

The controls in model 3 substantially affect the estimates from the three main factors of interest, while remaining in expected directions themselves. The age of a paper indeed helps its citation count, and so does the number of papers the item cites itself. The fields, furthermore, take away from the main effects somewhat, too, but not to a problematic degree. In an additional analysis, I have looked at the relationship between the fields and whether they are more likely to publish calibrated or validated models and found no substantial relationships. Citation counts will differ between fields, however. The papers in our sample are more often cited in, for example, hematology, emergency medicine and thermodynamics. The ABMs in the sample coming from toxicology, dermatology and religion are on the unlucky side of the equation, receiving less citations on average. Finally, I have also looked at papers published in JASSS specifically, due to the interest of Chattoe-Brown and the nature of this outlet. Surprisingly, the same analyses run on the subsample of these papers (N=376) showed a negative relationship between citation counts and model calibration/validation. Does the JASSS readership reveal its taste for artificial societies?

In sum, I find support for the hypothesis of Chattoe-Brown (2022) on the negative relationship between model validation and citations counts for papers presenting ABMs. If you want to be cited, you should not validate your ABM. Calibrated ABMs, on the other hand, are more likely to receive citations. What is more, ABMs that were both calibrated and validated are most the most successful papers in the sample. All conclusions were drawn considering (i.e. controlling for) the effects of age of the paper, the number of papers the paper cited itself, and (citation conventions in) the field in which it was published.

While the patterns explored in this and Chattoe-Brown’s recent contribution are interesting, or even puzzling, they should not distract from the goal of moving towards realistic agent-based simulations of social systems. In my opinion, models that combine rigorous theory with strong empirical foundations are instrumental to the creation of meaningful and purposeful agent-based models. Perhaps the results presented here should just be taken as another sign that citation counts are a weak signal of academic merit at best.

Data, code and supplementary analyses

All data and code used for this analysis, as well as the results from the supplementary analyses described in the text, are available here: https://osf.io/x9r7j/

Notes

[1] Note that the hyphen between “agent” and “based” does not affect the retrieved corpus. Both contributions that mention “agent based” and “agent-based” were retrieved.

[2] A small caveat to the analysis of the interaction effect is that the marginal improvement of model 2 upon model 1 is rather small (AIC difference of 269). This is likely (partially) due to the small number of papers that mention both calibration and validation (N=77).

Acknowledgements

Marijn Keijzer acknowledges IAST funding from the French National Research Agency (ANR) under the Investments for the Future (Investissements d’Avenir) program, grant ANR-17-EURE-0010.

References

Boero, R., & Squazzoni, F. (2005). Does empirical embeddedness matter? Methodological issues on agent-based models for analytical social science. Journal of Artificial Societies and Social Simulation, 8(4), 1–31. https://www.jasss.org/8/4/6.html

Chattoe-Brown, E. (2022) If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation. Review of Artificial Societies and Social Simulation, 1st Feb 2022. https://rofasss.org/2022/02/01/citing-od-models

Flache, A., Mäs, M., Feliciani, T., Chattoe-Brown, E., Deffuant, G., Huet, S., & Lorenz, J. (2017). Models of social influence: towards the next frontiers. Journal of Artificial Societies and Social Simulation, 20(4). https://doi.org/10.18564/jasss.3521


Keijzer, M. (2022) If you want to be cited, calibrate your agent-based model: Reply to Chattoe-Brown. Review of Artificial Societies and Social Simulation, 9th Mar 2022. https://rofasss.org/2022/03/09/Keijzer-reply-to-Chattoe-Brown


© The authors under the Creative Commons’ Attribution-NoDerivs (CC BY-ND) Licence (v4.0)

If You Want To Be Cited, Don’t Validate Your Agent-Based Model: A Tentative Hypothesis Badly In Need of Refutation

By Edmund Chattoe-Brown

As part of a previous research project, I collected a sample of the Opinion Dynamics (hereafter OD) models published in JASSS that were most highly cited in JASSS. The idea here was to understand what styles of OD research were most influential in the journal. In the top 50 on 19.10.21 there were eight such articles. Five were self-contained modelling exercises (Hegselmann and Krause 2002, 58 citations, Deffuant et al. 2002, 35 citations, Salzarulo 2006, 13 citations, Deffuant 2006, 13 citations and Urbig et al. 2008, 9 citations), two were overviews of OD modelling (Flache et al. 2017, 13 citations and Sobkowicz 2009, 10 citations) and one included an OD example in an article mainly discussing the merits of cellular automata modelling (Hegselmann and Flache 1998, 12 citations). In order to get in to the top 50 on that date you had to achieve at least 7 citations. In parallel, I have been trying to identify Agent-Based Models that are validated (undergo direct comparison of real and equivalent simulated data). Based on an earlier bibliography (Chattoe-Brown 2020) which I extended to the end of 2021 for JASSS and articles which were described as validated in the highly cited articles listed above, I managed to construct a small and unsystematic sample of validated OD models. (Part of the problem with a systematic sample is that validated models are not readily searchable as a distinct category and there are too many OD models overall to make reading them all feasible. Also, I suspect, validated models just remain rare in line with the larger scale findings of Dutton and Starbuck (1971, p. 130, table 1) and discouragingly, much more recently, Angus and Hassani-Mahmooei (2015, section 4.5, figure 9). Obviously, since part of the sample was selected by total number of citations, one cannot make a comparison on that basis, so instead I have used the best possible alternative (given the limitations of the sample) and compared articles on citations per year. The problem here is that attempting validated modelling is relatively new while older articles inevitably accumulate citations however slowly. But what I was trying to discover was whether new validated models could be cited at a much higher annual rate without reaching the top 50 (or whether, conversely, older articles could have a high enough total citations to get into the top 50 without having a particularly impressive annual citation rate.) One would hope that, ultimately, validated models would tend to receive more citations than those that were not validated (but see the rather disconcerting related findings of Serra-Garcia and Gneezy 2021). Table 1 shows the results sorted by citations per year.

Article Status Number of JASSS Citations[1] Number of Years[2] Citations Per Year
Bernardes et al. 2002 Validated 1 20 0.05
Bernardes et al. 2001 Validated 2 21 0.096
Fortunato and Castellano 2007 Validated 2 15 0.13
Caruso and Castorina 2005 Validated 4 17 0.24
Chattoe-Brown 2014 Validated 2 8 0.25
Brousmiche et al. 2016 Validated 2 6 0.33
Hegselmann and Flache 1998 Non-Validated 12 24 0.5
Urbig et al. 2008 Non-Validated 9 14 0.64
Sobkowicz 2009 Non-Validated 10 13 0.77
Deffuant 2006 Non-Validated 13 16 0.81
Salzarulo 2006 Non-Validated 13 16 0.81
Duggins 2017 Validated 5 5 1
Deffuant et al. 2002 Non-Validated 35 20 1.75
Flache et al. 2017 Non-Validated 13 5 2.6
Hegselmann and Krause 2002 Non-Validated 58 20 2.9

Table 1. Annual Citation Rates for OD Articles Highly Cited in JASSS (Systematic Sample) and Validated OD Articles in or Cited in JASSS (Unsystematic Sample)

With the notable (and potentially encouraging) exception of Duggins (2017), the most recent validated OD model I have been able to discover in JASSS, the sample clearly divides into non-validated research with more citations and validated research with fewer. The position of Duggins (2017) might suggest greater recent interest in validated OD models. Unfortunately, however, qualitative analysis of the citations suggests that these are not cited as validated models per se (and thus as a potential improvement over non-validated models) but merely as part of general classes of OD model (like those involving social networks or repulsion – moving away from highly discrepant opinions). This tendency to cite validated models without acknowledging that they are validated (and what the implications of that might be) is widespread in the articles I looked at.

Obviously, there is plenty wrong with this analysis. Even looking at citations per annum we are arguably still partially sampling on the dependent variable (articles selected for being widely cited prove to be widely cited!) and the sample of validated OD models is unsystematic (though in fairness the challenges of producing a systematic sample are significant.[3]) But the aim here is to make a distinctive use of RoFASSS as a rapid mode of permanent publication and to think differently about science. If I tried to publish this in a peer reviewed journal, the amount of labour required to satisfy reviewers about the research design would probably be prohibitive (even if it were possible). As a result, the case to answer about this apparent (and perhaps undesirable) pattern in data might never see the light of day.

But by publishing quickly in RoFASSS without the filter of peer review I actively want my hypothesis to be rejected or replaced by research based on a better design (and such research may be motivated precisely by my presenting this interesting pattern with all its imperfections). When it comes to scientific progress, the chance to be clearly wrong now could be more useful than the opportunity to be vaguely right at some unknown point in the future.

Acknowledgements

This analysis was funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5 (PI: Professor Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University: https://gtr.ukri.org/projects?ref=ES%2FS015159%2F1).

Notes

[1] Note that the validated OD models had their citations counted manually while the high total citation articles had them counted automatically. This may introduce some comparison error but there is no reason to think that either count will be terribly inaccurate.

[2] Including the year of publication and the current year (2021).

[3] Note, however, that there are some checks and balances on sample quality. Highly successful validated OD models would have shown up independently in the top 50. There is thus an upper bound to the impact of the articles I might have missed in manually constructing my “version 1” bibliography. The unsystematic review of 47 articles by Sobkowicz (2009) also checks independently on the absence of validated OD models in JASSS to that date and confirms the rarity of such articles generally. Only four of the articles that he surveys are significantly empirical.

References

Angus, Simon D. and Hassani-Mahmooei, Behrooz (2015) ‘“Anarchy” Reigns: A Quantitative Analysis of Agent-Based Modelling Publication Practices in JASSS, 2001-2012’, Journal of Artificial Societies and Social Simulation, 18(4), October, article 16, <http://jasss.soc.surrey.ac.uk/18/4/16.html>. doi:10.18564/jasss.2952

Bernardes, A. T., Costa, U. M. S., Araujo, A. D. and Stauffer, D. (2001) ‘Damage Spreading, Coarsening Dynamics and Distribution of Political Votes in Sznajd Model on Square Lattice’, International Journal of Modern Physics C: Computational Physics and Physical Computation, 12(2), February, pp. 159-168. doi:10.1140/e10051-002-0013-y

Bernardes, A. T., Stauffer, D. and Kertész, J. (2002) ‘Election Results and the Sznajd Model on Barabasi Network’, The European Physical Journal B: Condensed Matter and Complex Systems, 25(1), January, pp. 123-127. doi:10.1142/S0129183101001584

Brousmiche, Kei-Leo, Kant, Jean-Daniel, Sabouret, Nicolas and Prenot-Guinard, François (2016) ‘From Beliefs to Attitudes: Polias, A Model of Attitude Dynamics Based on Cognitive Modelling and Field Data’, Journal of Artificial Societies and Social Simulation, 19(4), October, article 2, <https://www.jasss.org/19/4/2.html>. doi:10.18564/jasss.3161

Caruso, Filippo and Castorina, Paolo (2005) ‘Opinion Dynamics and Decision of Vote in Bipolar Political Systems’, arXiv > Physics > Physics and Society, 26 March, version 2. doi:10.1142/S0129183105008059

Chattoe-Brown, Edmund (2014) ‘Using Agent Based Modelling to Integrate Data on Attitude Change’, Sociological Research Online, 19(1), February, article 16, <https://www.socresonline.org.uk/19/1/16.html>. doi:0.5153/sro.3315

Chattoe-Brown Edmund (2020) ‘A Bibliography of ABM Research Explicitly Comparing Real and Simulated Data for Validation: Version 1’, CPM Report CPM-20-216, 12 June, <http://cfpm.org/discussionpapers/256>

Deffuant, Guillaume (2006) ‘Comparing Extremism Propagation Patterns in Continuous Opinion Models’, Journal of Artificial Societies and Social Simulation, 9(3), June, article 8, <https://www.jasss.org/9/3/8.html>.

Deffuant, Guillaume, Amblard, Frédéric, Weisbuch, Gérard and Faure, Thierry (2002) ‘How Can Extremism Prevail? A Study Based on the Relative Agreement Interaction Model’, Journal of Artificial Societies and Social Simulation, 5(4), October, article 1, <https://www.jasss.org/5/4/1.html>.

Duggins, Peter (2017) ‘A Psychologically-Motivated Model of Opinion Change with Applications to American Politics’, Journal of Artificial Societies and Social Simulation, 20(1), January, article 13, <http://jasss.soc.surrey.ac.uk/20/1/13.html>. doi:10.18564/jasss.3316

Dutton, John M. and Starbuck, William H. (1971) ‘Computer Simulation Models of Human Behavior: A History of an Intellectual Technology’, IEEE Transactions on Systems, Man, and Cybernetics, SMC-1(2), April, pp. 128-171. doi:10.1109/TSMC.1971.4308269

Flache, Andreas, Mäs, Michael, Feliciani, Thomas, Chattoe-Brown, Edmund, Deffuant, Guillaume, Huet, Sylvie and Lorenz, Jan (2017) ‘Models of Social Influence: Towards the Next Frontiers’, Journal of Artificial Societies and Social Simulation, 20(4), October, article 2, <http://jasss.soc.surrey.ac.uk/20/4/2.html>. doi:10.18564/jasss.3521

Fortunato, Santo and Castellano, Claudio (2007) ‘Scaling and Universality in Proportional Elections’, Physical Review Letters, 99(13), 28 September, article 138701. doi:10.1103/PhysRevLett.99.138701

Hegselmann, Rainer and Flache, Andreas (1998) ‘Understanding Complex Social Dynamics: A Plea For Cellular Automata Based Modelling’, Journal of Artificial Societies and Social Simulation, 1(3), June, article 1, <https://www.jasss.org/1/3/1.html>.

Hegselmann, Rainer and Krause, Ulrich (2002) ‘Opinion Dynamics and Bounded Confidence Models, Analysis, and Simulation’, Journal of Artificial Societies and Social Simulation, 5(3), June, article 2, <http://jasss.soc.surrey.ac.uk/5/3/2.html>.

Salzarulo, Laurent (2006) ‘A Continuous Opinion Dynamics Model Based on the Principle of Meta-Contrast’, Journal of Artificial Societies and Social Simulation, 9(1), January, article 13, <http://jasss.soc.surrey.ac.uk/9/1/13.html>.

Serra-Garcia, Marta and Gneezy, Uri (2021) ‘Nonreplicable Publications are Cited More Than Replicable Ones’, Science Advances, 7, 21 May, article eabd1705. doi:10.1126/sciadv.abd1705

Sobkowicz, Pawel (2009) ‘Modelling Opinion Formation with Physics Tools: Call for Closer Link with Reality’, Journal of Artificial Societies and Social Simulation, 12(1), January, article 11, <http://jasss.soc.surrey.ac.uk/12/1/11.html>.

Urbig, Diemo, Lorenz, Jan and Herzberg, Heiko (2008) ‘Opinion Dynamics: The Effect of the Number of Peers Met at Once’, Journal of Artificial Societies and Social Simulation, 11(2), March, article 4, <http://jasss.soc.surrey.ac.uk/11/2/4.html>.


© The authors under the Creative Commons’ Attribution-NoDerivs (CC BY-ND) Licence (v4.0)

Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM

By Edmund Chattoe-Brown


Today we have naming of parts. Yesterday,
We had daily cleaning. And tomorrow morning,
We shall have what to do after firing. But to-day,
Today we have naming of parts. Japonica
Glistens like coral in all of the neighbouring gardens,
And today we have naming of parts.
(Naming of Parts, Henry Reed, 1942)

It is not difficult to establish by casual reading that there are almost as many ways of using crucial terms like calibration and validation in ABM as there are actual instances of their use. This creates several damaging problems for scientific progress in the field. Firstly, when two different researchers both say they “validated” their ABMs they may mean different specific scientific activities. This makes it hard for readers to evaluate research generally, particularly if researchers assume that it is obvious what their terms mean (rather than explaining explicitly what they did in their analysis). Secondly, based on this, each researcher may feel that the other has not really validated their ABM but has instead done something to which a different name should more properly be given. This compounds the possible confusion in debate. Thirdly, there is a danger that researchers may rhetorically favour (perhaps unconsciously) uses that, for example, make their research sound more robustly empirical than it actually is. For example, validation is sometimes used to mean consistency with stylised facts (rather than, say, correspondence with a specific time series according to some formal measure). But we often have no way of telling what the status of the presented stylised facts is. Are they an effective summary of what is known in a field? Are they the facts on which most researchers agree or for which the available data presents the clearest picture? (Less reputably, can readers be confident that they were not selected for presentation because of their correspondence?) Fourthly, because these terms are used differently by different researchers it is possible that valuable scientific activities that “should” have agreed labels will “slip down the terminological cracks” (either for the individual or for the ABM community generally). Apart from clear labels avoiding confusion for others, they may help to avoid confusion for you too!

But apart from these problems (and there may be others but these are not the main thrust of my argument here) there is also a potential impasse. There simply doesn’t seem to be any value in arguing about what the “correct” meaning of validation (for example) should be. Because these are merely labels there is no objective way to resolve this issue. Further, even if we undertook to agree the terminology collectively, each individual would tend to argue for their own interpretation without solid grounds (because there are none to be had) and any collective decision would probably therefore be unenforceable. If we decide to invent arbitrary new terminology from scratch we not only run the risk of adding to the existing confusion of terms (rather than reducing it) but it is also quite likely that everyone will find the new terms unhelpful.

Unfortunately, however, we probably cannot do without labels for these scientific activities involved in quality controlling ABMs. If we had to describe everything we did without any technical shorthand, presenting research might well become impossibly unwieldy.

My proposed solution is therefore to invent terms from scratch (so we don’t end up arguing about our different customary usages to no purpose) but to do so on the basis of actual scientific practices reported in published research. For example, we might call the comparison of corresponding real and simulated data (which at least has the endorsement of the much used Gilbert and Troitzsch 2005 – see pp. 15-19 – to be referred to as validation) CORAS – Comparison Of Real And Simulated. Similarly, assigning values to parameters given the assumptions of model “structures” might be called PANV – Parameters Assigned Numerical Values.

It is very important to be clear what the intention is here. Naming cannot solve scientific problems or disagreements. (Indeed, failure to grasp this may well be why our terminology is currently so muddled as people try to get their different positions through “on the nod”.) For example, if we do not believe that correspondence with stylised facts and comparison measures on time series have equivalent scientific status then we will have to agree distinct labels for them and have the debate about their respective value separately. Perhaps the former could be called COSF – Comparison Of Stylised Facts. But it seems plainly easier to describe specific scientific activities accurately and then find labels for them than to have to wade through the existing marsh of ambiguous terminology and try to extract the associated science. An example of a practice which does not seem to have even one generally agreed label (and therefore seems to be neglected in ABM as a practice) is JAMS – Justifying A Model Structure. (Why are your agents adaptive rather than habitual or rational? Why do they mix randomly rather than in social networks?)

Obviously, there still needs to be community agreement for such a convention to be useful (and this may need to be backed institutionally for example by reviewing requirements). But the logic of the approach avoids several existing problems. Firstly, while the labels are useful shorthand, they are not arbitrary. Each can be traced back to a clearly definable scientific practice. Secondly, this approach steers a course between the Scylla of fruitless arguments from current muddled usage and the Charybdis of a novel set of terminology that is equally unhelpful to everybody. (Even if people cannot agree on labels, they knew how they built and evaluated their ABMs so they can choose – or create – new labels accordingly.) Thirdly, the proposed logic is extendable. As we clarify our thinking, we can use it to label (or improve the labels of) any current set of scientific practices. We will do not have to worry that we will run out of plausible words in everyday usage.

Below I suggest some more scientific practices and possible terms for them. (You will see that I have also tried to make the terms as pronounceable and distinct as possible.)

Practice Term
Checking the results of an ABM by building another.[1] CAMWA (Checking A Model With Another).
Checking ABM code behaves as intended (for example by debugging procedures, destructive testing using extreme values and so on). TAMAD (Testing A Model Against Description).
Justifying the structure of the environment in which agents act. JEM (Justifying the Environment of a Model): This is again a process that may pass unnoticed in ABM typically. For example, by assuming that agents only consider ethnic composition, the Schelling Model (Schelling 1969, 1971) does not “allow” locations to be desirable because, for example, they are near good schools. This contradicts what was known empirically well before (see, for example, Rossi 1955) and it isn’t clear whether simply saying that your interest is in an “abstract” model can justify this level of empirical neglect.
Finding out what effect parameter values have on ABM behaviour. EVOPE (Exploring Value Of Parameter Effects).
Exploring the sensitivity of an ABM to structural assumptions not justified empirically (see Chattoe-Brown 2021). ESOSA (Exploring the Sensitivity Of Structural Assumptions).

Clearly this list is incomplete but I think it would be more effective if characterising the scientific practices in existing ABM and naming them distinctively was a collective enterprise.

Acknowledgements

This research is funded by the project “Towards Realistic Computational Models Of Social Influence Dynamics” (ES/S015159/1) funded by ESRC via ORA Round 5 (PI: Professor Bruce Edmonds, Centre for Policy Modelling, Manchester Metropolitan University: https://gtr.ukri.org/projects?ref=ES%2FS015159%2F1).

Notes

[1] It is likely that we will have to invent terms for subcategories of practices which differ in their aims or warranted conclusions. For example, rerunning the code of the original author (CAMWOC – Checking A Model With Original Code), building a new ABM from a formal description like ODD (CAMUS – Checking A Model Using Specification) and building a new ABM from the published description (CAMAP – Checking A Model As Published, see Chattoe-Brown et al. 2021).

References

Chattoe-Brown, Edmund (2021) ‘Why Questions Like “Do Networks Matter?” Matter to Methodology: How Agent-Based Modelling Makes It Possible to Answer Them’, International Journal of Social Research Methodology, 24(4), pp. 429-442. doi:10.1080/13645579.2020.1801602

Chattoe-Brown, Edmund, Gilbert, Nigel, Robertson, Duncan A. and Watts Christopher (2021) ‘Reproduction as a Means of Evaluating Policy Models: A Case Study of a COVID-19 Simulation’, medRXiv, 23 February. doi:10.1101/2021.01.29.21250743

Gilbert, Nigel and Troitzsch, Klaus G. (2005) Simulation for the Social Scientist, second edition (Maidenhead: Open University Press).

Rossi, Peter H. (1955) Why Families Move: A Study in the Social Psychology of Urban Residential Mobility (Glencoe, IL, Free Press).

Schelling, Thomas C. (1969) ‘Models of Segregation’, American Economic Review, 59(2), May, pp. 488-493. (available at https://www.jstor.org/stable/1823701)


Chattoe-Brown, E. (2022) Today We Have Naming Of Parts: A Possible Way Out Of Some Terminological Problems With ABM. Review of Artificial Societies and Social Simulation, 11th January 2022. https://rofasss.org/2022/01/11/naming-of-parts/


© The authors under the Creative Commons’ Attribution-NoDerivs (CC BY-ND) Licence (v4.0)

Reply to Frank Dignum

By Edmund Chattoe-Brown

This is a reply to Frank Dignum’s reply (about Edmund Chattoe-Brown’s review of Frank’s book)

As my academic career continues, I have become more and more interested in the way that people justify their modelling choices, for example, almost every Agent-Based Modeller makes approving noises about validation (in the sense of comparing real and simulated data) but only a handful actually try to do it (Chattoe-Brown 2020). Thus I think two specific statements that Frank makes in his response should be considered carefully:

  1. … we do not claim that we have the best or only way of developing an Agent-Based Model (ABM) for crises.” Firstly, negative claims (“This is not a banana”) are not generally helpful in argument. Secondly, readers want to know (or should want to know) what is being claimed and, importantly, how they would decide if it is true “objectively”. Given how many models sprang up under COVID it is clear that what is described here cannot be the only way to do it but the question is how do we know you did it “better?” This was also my point about institutionalisation. For me, the big lesson from COVID was how much the automatic response of the ABM community seems to be to go in all directions and build yet more models in a tearing hurry rather than synthesise them, challenge them or test them empirically. I foresee a problem both with this response and our possible unwillingness to be self-aware about it. Governments will not want a million “interesting” models to choose from but one where they have externally checkable reasons to trust it and that involves us changing our mindset (to be more like climate modellers for example, Bithell & Edmonds 2020). For example, colleagues and I developed a comparison methodology that allowed for the practical difficulties of direct replication (Chattoe-Brown et al. 2021).
  2. The second quotation which amplifies this point is: “But we do think it is an extensive foundation from which others can start, either picking up some bits and pieces, deviating from it in specific ways or extending it in specific ways.” Again, here one has to ask the right question for progress in modelling. On what scientific grounds should people do this? On what grounds should someone reuse this model rather than start their own? Why isn’t the Dignum et al. model built on another “market leader” to set a good example? (My point about programming languages was purely practical not scientific. Frank is right that the model is no less valid because the programming language was changed but a version that is now unsupported seems less useful as a basis for the kind of further development advocated here.)

I am not totally sure I have understood Frank’s point about data so I don’t want to press it but my concern was that, generally, the book did not seem to “tap into” relevant empirical research (and this is a wider problem that models mostly talk about other models). It is true that parameter values can be adjusted arbitrarily in sensitivity analysis but that does not get us any closer to empirically justified parameter values (which would then allow us to attempt validation by the “generative methodology”). Surely it is better to build a model that says something about the data that exists (however imperfect or approximate) than to rely on future data collection or educated guesses. I don’t really have the space to enumerate the times the book said “we did this for simplicity”, “we assumed that” etc. but the cumulative effect is quite noticeable. Again, we need to be aware of the models which use real data in whatever aspects and “take forward” those inputs so they become modelling standards. This has to be a collective and not an individualistic enterprise.

References

Bithell, M. and Edmonds, B. (2020) The Systematic Comparison of Agent-Based Policy Models – It’s time we got our act together!. Review of Artificial Societies and Social Simulation, 11th May 2021. https://rofasss.org/2021/05/11/SystComp/

Chattoe-Brown, E. (2020) A Bibliography of ABM Research Explicitly Comparing Real and Simulated Data for Validation. Review of Artificial Societies and Social Simulation, 12th June 2020. https://rofasss.org/2020/06/12/abm-validation-bib/

Chattoe-Brown, E. (2021) A review of “Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis”. Journal of Artificial Society and Social Simulation. 24(4). https://www.jasss.org/24/4/reviews/1.html

Chattoe-Brown, E., Gilbert, N., Robertson, D. A., & Watts, C. J. (2021). Reproduction as a Means of Evaluating Policy Models: A Case Study of a COVID-19 Simulation. medRxiv 2021.01.29.21250743; DOI: https://doi.org/10.1101/2021.01.29.21250743

Dignum, F. (2020) Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”. Review of Artificial Societies and Social Simulation, 4th Nov 2021. https://rofasss.org/2021/11/04/dignum-review-response/

Dignum, F. (Ed.) (2021) Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis. Springer. DOI:10.1007/978-3-030-76397-8


Chattoe-Brown, E. (2021) Reply to Frank Dignum. Review of Artificial Societies and Social Simulation, 10th November 2021. https://rofasss.org/2021/11/10/reply-to-dignum/


© The authors under the Creative Commons’ Attribution-NoDerivs (CC BY-ND) Licence (v4.0)

Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”

By Frank Dignum

This is a reply to a review in JASSS (Chattoe-Brown 2021) of (Dignum 2021).

Before responding to some of the specific concerns of Edmund I would like to thank him for the thorough review. I am especially happy with his conclusion that the book is solid enough to make it a valuable contribution to scientific progress in modelling crises. That was the main aim of the book and it seems that is achieved. I want to reiterate what we already remarked in the book; we do not claim that we have the best or only way of developing an Agent-Based Model (ABM) for crises. Nor do we claim that our simulations were without limitations. But we do think it is an extensive foundation from which others can start, either picking up some bits and pieces, deviating from it in specific ways or extending it in specific ways.

The concerns that are expressed by Edmund are certainly valid. I agree with some of them, but will nuance some others. First of all the concern about the fact that we seem to abandon the NetLogo implementation and move to Repast. This fact does not make the ABM itself any less valid! In itself it is also an important finding. It is not possible to scale such a complex model in NetLogo beyond around two thousand agents. This is not just a limitation of our particular implementation, but a more general limitation of the platform. It leads to the important challenge to get more computer scientists involved to develop platforms for social simulations that both support the modelers adequately and provide efficient and scalable implementations.

That the sheer size of the model and the results make it difficult to trace back the importance and validity of every factor on the results is completely true. We have tried our best to highlight the most important aspects every time. But, this leaves questions as to whether we make the right selection of highlighted aspects. As an illustration to this, we have been busy for two months to justify our results of the simulations of the effectiveness of the track and tracing apps. We basically concluded that we need much better integrated analysis tools in the simulation platform. NetLogo is geared towards creating one simulation scenario, running the simulation and analyzing the results based on a few parameters. This is no longer sufficient when we have a model with which we can create many scenarios and have many parameters that influence a result. We used R now to interpret the flood of data that was produced with every scenario. But, R is not really the most user friendly tool and also not specifically meant for analyzing the data from social simulations.

Let me jump to the third concern of Edmund and link it to the analysis of the results as well. While we tried to justify the results of our simulation on the effectiveness of the track and tracing app we compared our simulation with an epidemiological based model. This is described in chapter 12 of the book. Here we encountered the difference in assumed number of contacts per day a person has with other persons. One can take the results, as quoted by Edmund as well, of 8 or 13 from empirical work and use them in the model. However, the dispute is not about the number of contacts a person has per day, but what counts as a contact! For the COVID-19 simulations standing next to a person in the queue in a supermarket for five minutes can count as a contact, while such a contact is not a meaningful contact in the cited literature. Thus, we see that what we take as empirically validated numbers might not at all be the right ones for our purpose. We have tried to justify all the values of parameters and outcomes in the context for which the simulations were created. We have also done quite some sensitivity analyses, which we did not all report on just to keep the volume of the book to a reasonable size. Although we think we did a proper job in justifying all results, that does not mean that one can have different opinions on the value that some parameters should have. It would be very good to check the influence on the results of changes in these parameters. This would also progress scientific insights in the usefulness of complex models like the one we made!

I really think that an ABM crisis response should be institutional. That does not mean that one institution determines the best ABM, but rather that the ABM that is put forward by that institution is the result of a continuous debate among scientists working on ABM’s for that type of crisis. For us, one of the more important outcomes of the ASSOCC project is that we really need much better tools to support the types of simulations that are needed for a crisis situation. However, it is very difficult to develop these tools as a single group. A lot of the effort needed is not publishable and thus not valued in an academic environment. I really think that the efforts that have been put in platforms such as NetLogo and Repast are laudable. They have been made possible by some generous grants and institutional support. We argue that this continuous support is also needed in order to be well equipped for a next crisis. But we do not argue that an institution would by definition have the last word in which is the best ABM. In an ideal case it would accumulate all academic efforts as is done in the climate models, but even more restricted models would still be better than just having a thousand individuals all claiming to have a useable ABM while governments have to react quickly to a crisis.

The final concern of Edmund is about the empirical scale of our simulations. This is completely true! Given the scale and details of what we can incorporate we can only simulate some phenomena and certainly not everything around the COVID-19 crisis. We tried to be clear about this limitation. We had discussions about the Unity interface concerning this as well. It is in principle not very difficult to show people walking in the street, taking a car or a bus, etc. However, we decided to show a more abstract representation just to make clear that our model is not a complete model of a small town functioning in all aspects. We have very carefully chosen which scenarios we can realistically simulate and give some insights in reality from. Maybe we should also have discussed more explicitly all the scenarios that we did not run with the reasons why they would be difficult or unrealistic in our ABM. One never likes to discuss all the limitations of one’s labor, but it definitely can be very insightful. I have made up for this a little bit by submitting an to a special issue on predictions with ABM in which I explain in more detail, which should be the considerations to use a particular ABM to try to predict some state of affairs. Anyone interested to learn more about this can contact me.

To conclude this response to the review, I again express my gratitude for the good and thorough work done. The concerns that were raised are all very valuable to concern. What I tried to do in this response is to highlight that these concerns should be taken as a call to arms to put effort in social simulation platforms that give better support for creating simulations for a crisis.

References

Dignum, F. (Ed.) (2021) Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis. Springer. DOI:10.1007/978-3-030-76397-8

Chattoe-Brown, E. (2021) A review of “Social Simulation for a Crisis: Results and Lessons from Simulating the COVID-19 Crisis”. Journal of Artificial Society and Social Simulation. 24(4). https://www.jasss.org/24/4/reviews/1.html


Dignum, F. (2020) Response to the review of Edmund Chattoe-Brown of the book “Social Simulations for a Crisis”. Review of Artificial Societies and Social Simulation, 4th Nov 2021. https://rofasss.org/2021/11/04/dignum-review-response/


© The authors under the Creative Commons’ Attribution-NoDerivs (CC BY-ND) Licence (v4.0)

Where Now For Experiments In Agent-Based Modelling? Report of a Round Table at SSC2021, held on 22 September 2021


By Dino Carpentras1, Edmund Chattoe-Brown2*, Bruce Edmonds3, Cesar García-Diaz4, Christian Kammler5, Anna Pagani6 and Nanda Wijermans7

*Corresponding author, 1Centre for Social Issues Research, University of Limerick, 2School of Media, Communication and Sociology, University of Leicester, 3Centre for Policy Modelling, Manchester Metropolitan University, 4Department of Business Administration, Pontificia Universidad Javeriana, 5Department of Computing Science, Umeå University, 6Laboratory on Human-Environment Relations in Urban Systems (HERUS), École Polytechnique Fédérale de Lausanne (EPFL), 7Stockholm Resilience Centre, Stockholm University.

Introduction

This round table was convened to advance and improve the use of experimental methods in Agent-Based Modelling, in the hope that both existing and potential users of the method would be able to identify steps towards this aim[i]. The session began with a presentation by Bruce Edmonds (http://cfpm.org/slides/experiments%20and%20ABM.pptx) whose main argument was that the traditional idea of experimentation (controlling extensively for the environment and manipulating variables) was too simplistic to add much to the understanding of the sort of complex systems modelled by ABMs and that we should therefore aim to enhance experiments (for example using richer experimental settings, richer measures of those settings and richer data – like discussions between participants as well as their behaviour). What follows is a summary of the main ideas discussed organised into themed sections.

What Experiments Are

Defining the field of experiments proved to be challenging on two counts. The first was that there are a number of labels for potentially relevant approaches (experiments themselves – for example, Boero et al. 2010, gaming – for example, Tykhonov et al. 2008, serious games – for example Taillandier et al. 2019, companion/participatory modelling – for example, Ramanath and Gilbert 2004 and web based gaming – for example, Basole et al. 2013) whose actual content overlap is unclear. Is it the case that a gaming approach is generally more in line with the argument proposed by Edmonds? How can we systematically distinguish the experimental content of a serious game approach from a gaming approach? This seems to be a problem in immature fields where the labels are invented first (often on the basis of a few rather divergent instances) and the methodology has to grow into them. It would be ludicrous if we couldn’t be sure whether a piece of research was survey based or interview based (and this would radically devalue the associated labels if it were so.)

The second challenge is also more general in Agent-Based Modelling which is the same labels being used differently by different researchers. It is not productive to argue about which uses are correct but it is important that the concepts behind the different uses are clear so a common scheme of labelling might ultimately be agreed. So, for example, experiment can be used (and different round table participants had different perspectives on the uses they expected) to mean laboratory experiments (simplified settings with human subjects – again see, for example, Boero et al. 2010), experiments with ABMs (formal experimentation with a model that doesn’t necessarily have any empirical content – for example, Doran 1998) and natural experiments (choice of cases in the real world to, for example, test a theory – see Dinesen 2013).

One approach that may help with this diversity is to start developing possible dimensions of experimentation. One might be degree of control (all the way from very stripped down behavioural laboratory experiments to natural situations where the only control is to select the cases). Another might be data diversity: From pure analysis of ABMs (which need not involve data at all), through laboratory experiments that record only behaviour to ethnographic collection and analysis of diverse data in rich experiments (like companion modelling exercises.) But it is important for progress that the field develops robust concepts that allow meaningful distinctions and does not get distracted into pointless arguments about labelling. Furthermore, we must consider the possible scientific implications of experimentation carried out at different points in the dimension space: For example, what are the relative strengths and limitations of experiments that are more or less controlled or more or less data diverse? Is there a “sweet spot” where the benefit of experiments is greatest to Agent-Based Modelling? If so, what is it and why?

The Philosophy of Experiment

The second challenge is the different beliefs (often associated with different disciplines) about the philosophical underpinnings of experiment such as what we might mean by a cause. In an economic experiment, for example, the objective may be to confirm a universal theory of decision making through displayed behaviour only. (It is decisions described by this theory which are presumed to cause the pattern of observed behaviour.) This will probably not allow the researcher to discover that their basic theory is wrong (people are adaptive not rational after all) or not universal (agents have diverse strategies), or that some respondents simply didn’t understand the experiment (deviations caused by these phenomena may be labelled noise relative to the theory being tested but in fact they are not.)

By contrast qualitative sociologists believe that subjective accounts (including accounts of participation in the experiment itself) can be made reliable and that they may offer direct accounts of certain kinds of cause: If I say I did something for a certain reason then it is at least possible that I actually did (and that the reason I did it is therefore its cause). It is no more likely that agreement will be reached on these matters in the context of experiments than it has been elsewhere. But Agent-Based Modelling should keep its reputation for open mindedness by seeing what happens when qualitative data is also collected and not just rejecting that approach out of hand as something that is “not done”. There is no need for Agent-Based Modelling blindly to follow the methodology of any one existing discipline in which experiments are conducted (and these disciplines often disagree vigorously on issues like payment and deception with no evidence on either side which should also make us cautious about their self-evident correctness.)

Finally, there is a further complication in understanding experiments using analogies with the physical sciences. In understanding the evolution of a river system, for example, one can control/intervene, one can base theories on testable micro mechanisms (like percolation) and one can observe. But there is no equivalent to asking the river what it intends (whether we can do this effectively in social science or not).[ii] It is not totally clear how different kinds of data collection like these might relate to each other in the social sciences, for example, data from subjective accounts, behavioural experiments (which may show different things from what respondents claim) and, for example, brain scans (which side step the social altogether.) This relationship between different kinds of data currently seems incompletely explored and conceptualised. (There is a tendency just to look at easy cases like surveys versus interviews.)

The Challenge of Experiments as Practical Research

This is an important area where the actual and potential users of experiments participating in the round table diverged. Potential users wanted clear guidance on the resources, skills and practices involved in doing experimental work (and see similar issues in the behavioural strategy literature, for example, Reypens and Levine 2018). At the most basic level, when does a researcher need to do an experiment (rather than a survey, interviews or observation), what are the resource requirements in terms of time, facilities and money (laboratory experiments are unusual in often needing specific funding to pay respondents rather than substituting the researcher working for free) what design decisions need to be made (paying subjects, online or offline, can subjects be deceived?), how should the data be analysed (how should an ABM be validated against experimental data?) and so on.[iii] (There are also pros and cons to specific bits of potentially supporting technology like Amazon Mechanical Turk, Qualtrics and Prolific, which have not yet been documented and systematically compared for the novice with a background in Agent-Based Modelling.) There is much discussion about these matters in the traditional literatures of social sciences that do experiments (see, for example, Kagel and Roth 1995, Levine and Parkinson 1994 and Zelditch 2014) but this has not been summarised and tuned specifically for the needs of Agent-Based Modellers (or published where they are likely to see it).

However, it should not be forgotten that not all research efforts need this integration within the same project, so thinking about the problems that really need it is critical. Nonetheless, triangulation is indeed necessary within research programmes. For instance, in subfields such as strategic management and organisational design, it is uncommon to see an ABM integrated with an experiment as part of the same project (though there are exceptions, such as Vuculescu 2017). Instead, ABMs are typically used to explore “what if” scenarios, build process theories and illuminate potential empirical studies. In this approach, knowledge is accumulated instead through the triangulation of different methodologies in different projects (see Burton and Obel 2018). Additionally, modelling and experimental efforts are usually led by different specialists – for example, there is a Theoretical Organisational Models Society whose focus is the development of standards for theoretical organisation science.

In a relatively new and small area, all we often have is some examples of good practice (or more contentiously bad practice) of which not everyone is even aware. A preliminary step is thus to see to what extent people know of good practice and are able to agree that it is good (and perhaps why it is good).

Finally, there was a slightly separate discussion about the perspectives of experimental participants themselves. It may be that a general problem with unreal activity is that you know it is unreal (which may lead to problems with ecological validity – Bornstein 1999.) On the other hand, building on the enrichment argument put forward by Edmonds (above), there is at least anecdotal observational evidence that richer and more realistic settings may cause people to get “caught up” and perhaps participate more as they would in reality. Nonetheless, there are practical steps we can take to learn more about these phenomena by augmenting experimental designs. For example we might conduct interviews (or even group discussions) before and after experiments. This could make the initial biases of participants explicit and allow them to self-evaluate retrospectively the extent to which they got engaged (or perhaps even over-engaged) during the game. The first such questionnaire could be available before attending the experiment, whilst another could be administered right after the game (and perhaps even a third a week later). In addition to practical design solutions, there are also relevant existing literatures that experimental researchers should probably draw on in this area, for example that on systemic design and the associated concept of worldviews. But it is fair to say that we do not yet fully understand the issues here but that they clearly matter to the value of experimental data for Agent-Based Modelling.[iv]

Design of Experiments

Something that came across strongly in the round table discussion as argued by existing users of experimental methods was the desirability of either designing experiments directly based on a specific ABM structure (rather than trying to use a stripped down – purely behavioural – experiment) or mixing real and simulated participants in richer experimental settings. In line with the enrichment argument put forward by Edmonds, nobody seemed to be using stripped down experiments to specify, calibrate or validate ABM elements piecemeal. In the examples provided by round table participants, experiments corresponding closely to the ABM (and mixing real and simulated participants) seemed particularly valuable in tackling subjects that existing theory had not yet really nailed down or where it was clear that very little of the data needed for a particular ABM was available. But there was no sense that there is a clearly defined set of research designs with associated purposes on which the potential user can draw. (The possible role of experiments in supporting policy was also mentioned but no conclusions were drawn.)

Extracting Rich Data from Experiments

Traditional experiments are time consuming to do, so they are frequently optimised to obtain the maximum power and discrimination between factors of interest. In such situations they will often limit their data collection to what is strictly necessary for testing their hypotheses. Furthermore, it seems to be a hangover from behaviourist psychology that one does not use self-reporting on the grounds that it might be biased or simply involve false reconstruction (rationalisation). From the point of view of building or assessing ABMs this approach involves a wasted opportunity. Due to the flexible nature of ABMs there is a need for as many empirical constraints upon modelling as possible. These constraints can come from theory, evidence or abstract principles (such as simplicity) but should not hinder the design of an ABM but rather act as a check on its outcomes. Game-like situations can provide rich data about what is happening, simultaneously capturing decisions on action, the position and state of players, global game outcomes/scores and what players say to each other (see, for example, Janssen et al. 2010, Lindahl et al. 2021). Often, in social science one might have a survey with one set of participants, interviews with others and longitudinal data from yet others – even if these, in fact, involve the same people, the data will usually not indicate this through consistent IDs. When collecting data from a game (and especially from online games) there is a possibility for collecting linked data with consistent IDs – including interviews – that allows for a whole new level of ABM development and checking.

Standards and Institutional Bootstrapping

This is also a wider problem in newer methods like Agent-Based Modelling. How can we foster agreement about what we are doing (which has to build on clear concepts) and institutionalise those agreements into standards for a field (particularly when there is academic competition and pressure to publish).[v] If certain journals will not publish experiments (or experiments done in certain ways) what can we do about that? JASSS was started because it was so hard to publish ABMs. It has certainly made that easier but is there a cost through less publication in other journals? See, for example, Squazzoni and Casnici (2013). Would it have been better for the rigour and wider acceptance of Agent-Based Modelling if we had met the standards of other fields rather than setting our own? This strategy, harder in the short term, may also have promoted communication and collaboration better in the long term. If reviewing is arbitrary (reviewers do not seem to have a common view of what makes an experiment legitimate) then can that situation be improved (and in particular how do we best go about that with limited resources?) To some extent, normal individualised academic work may achieve progress here (researchers make proposals, dispute and refine them and their resulting quality ensures at least some individualised adoption by other researchers) but there is often an observable gap in performance: Even though most modellers will endorse the value of data for modelling in principle most models are still non-empirical in practice (Angus and Hassani-Mahmooei 2015, Figure 9). The jury is still out on the best way to improve reviewer consistency, use the power of peer review to impose better standards (and thus resolve a collective action problem under academic competition[vi]) and so on but recognising and trying to address these issues is clearly important to the health of experimental methods in Agent-Based Modelling. Since running experiments in association with ABMs is already challenging, adding the problem of arbitrary reviewer standards makes the publication process even harder. This discourages scientists from following this path and therefore retards this kind of research generally. Again, here, useful resources (like the Psychological Science Accelerator, which facilitates greater experimental rigour by various means) were suggested in discussion as raw material for our own improvements to experiments in Agent-Based Modelling.

Another issue with newer methods such as Agent-Based Modelling is the path to legitimation before the wider scientific community. The need to integrate ABMs with experiments does not necessarily imply that the legitimation of the former is achieved by the latter. Experimental economists, for instance, may still argue that (in the investigation of behaviour and its implications for policy issues), experiments and data analysis alone suffice. They may rightly ask: What is the additional usefulness of an ABM? If an ABM always needs to be justified by an experiment and then validated by a statistical model of its output, then the method might not be essential at all. Orthodox economists skip the Agent-Based Modelling part: They build behavioural experiments, gather (rich) data, run econometric models and make predictions, without the need (at least as they see it) to build any computational representation. Of course, the usefulness of models lies in the premise that they may tell us something that experiments alone cannot (see Knudsen et al. 2019). But progress needs to be made in understanding (and perhaps reconciling) these divergent positions. The social simulation community therefore needs to be clearer about exactly what ABMs can contribute beyond the limitations of an experiment, especially when addressing audiences of non-modellers (Ballard et al. 2021). Not only is a model valuable when rigorously validated against data, but also whenever it makes sense of the data in ways that traditional methods cannot.

Where Now?

Researchers usually have more enthusiasm than they have time. In order to make things happen in an academic context it is not enough to have good ideas, people need to sign up and run with them. There are many things that stand a reasonable chance of improving the profile and practice of experiments in Agent-Based Modelling (regular sessions at SSC, systematic reviews, practical guidelines and evaluated case studies, discussion groups, books or journal special issues, training and funding applications that build networks and teams) but to a great extent, what happens will be decided by those who make it happen. The organisers of this round table (Nanda Wijermans and Edmund Chattoe-Brown) are very keen to support and coordinate further activity and this summary of discussions is the first step to promote that. We hope to hear from you.

References

Angus, Simon D. and Hassani-Mahmooei, Behrooz (2015) ‘“Anarchy” Reigns: A Quantitative Analysis of Agent-Based Modelling Publication Practices in JASSS, 2001-2012’, Journal of Artificial Societies and Social Simulation, 18(4), October, article 16, <http://jasss.soc.surrey.ac.uk/18/4/16.html>. doi:10.18564/jasss.2952

Ballard, Timothy, Palada, Hector, Griffin, Mark and Neal, Andrew (2021) ‘An Integrated Approach to Testing Dynamic, Multilevel Theory: Using Computational Models to Connect Theory, Model, and Data’, Organizational Research Methods, 24(2), April, pp. 251-284. doi: 10.1177/1094428119881209

Basole, Rahul C., Bodner, Douglas A. and Rouse, William B. (2013) ‘Healthcare Management Through Organizational Simulation’, Decision Support Systems, 55(2), May, pp. 552-563. doi:10.1016/j.dss.2012.10.012

Boero, Riccardo, Bravo, Giangiacomo, Castellani, Marco and Squazzoni, Flaminio (2010) ‘Why Bother with What Others Tell You? An Experimental Data-Driven Agent-Based Model’, Journal of Artificial Societies and Social Simulation, 13(3), June, article 6, <https://www.jasss.org/13/3/6.html>. doi:10.18564/jasss.1620

Bornstein, Brian H. (1999) ‘The Ecological Validity of Jury Simulations: Is the Jury Still Out?’ Law and Human Behavior, 23(1), February, pp. 75-91. doi:10.1023/A:1022326807441

Burton, Richard M. and Obel, Børge (2018) ‘The Science of Organizational Design: Fit Between Structure and Coordination’, Journal of Organization Design, 7(1), December, article 5. doi:10.1186/s41469-018-0029-2

Derbyshire, James (2020) ‘Answers to Questions on Uncertainty in Geography: Old Lessons and New Scenario Tools’, Environment and Planning A: Economy and Space, 52(4), June, pp. 710-727. doi:10.1177/0308518X19877885

Dinesen, Peter Thisted (2013) ‘Where You Come From or Where You Live? Examining the Cultural and Institutional Explanation of Generalized Trust Using Migration as a Natural Experiment’, European Sociological Review, 29(1), February, pp. 114-128. doi:10.1093/esr/jcr044

Doran, Jim (1998) ‘Simulating Collective Misbelief’, Journal of Artificial Societies and Social Simulation, 1(1), January, article 1, <https://www.jasss.org/1/1/3.html>.

Janssen, Marco A., Holahan, Robert, Lee, Allen and Ostrom, Elinor (2010) ‘Lab Experiments for the Study of Social-Ecological Systems’, Science, 328(5978), 30 April, pp. 613-617. doi:10.1126/science.1183532

Kagel, John H. and Roth, Alvin E. (eds.) (1995) The Handbook of Experimental Economics (Princeton, NJ: Princeton University Press).

Knudsen, Thorbjørn, Levinthal, Daniel A. and Puranam, Phanish (2019) ‘Editorial: A Model is a Model’, Strategy Science, 4(1), March, pp. 1-3. doi:10.1287/stsc.2019.0077

Levine, Gustav and Parkinson, Stanley (1994) Experimental Methods in Psychology (Hillsdale, NJ: Lawrence Erlbaum Associates).

Lindahl, Therese, Janssen, Marco A. and Schill, Caroline (2021) ‘Controlled Behavioural Experiments’, in Biggs, Reinette, de Vos, Alta, Preiser, Rika, Clements, Hayley, Maciejewski, Kristine and Schlüter, Maja (eds.) The Routledge Handbook of Research Methods for Social-Ecological Systems (London: Routledge), pp. 295-306. doi:10.4324/9781003021339-25

Ramanath, Ana Maria and Gilbert, Nigel (2004) ‘The Design of Participatory Agent-Based Social Simulations’, Journal of Artificial Societies and Social Simulation, 7(4), October, article 1, <https://www.jasss.org/7/4/1.html>.

Reypens, Charlotte and Levine, Sheen S. (2018) ‘Behavior in Behavioral Strategy: Capturing, Measuring, Analyzing’, in Behavioral Strategy in Perspective, Advances in Strategic Management Volume 39 (Bingley: Emerald Publishing), pp. 221-246. doi:10.1108/S0742-332220180000039016

Squazzoni, Flaminio and Casnici, Niccolò (2013) ‘Is Social Simulation a Social Science Outstation? A Bibliometric Analysis of the Impact of JASSS’, Journal of Artificial Societies and Social Simulation, 16(1), January, article 10, <http://jasss.soc.surrey.ac.uk/16/1/10.html>. doi:10.18564/jasss.2192

Taillandier, Patrick, Grignard, Arnaud, Marilleau, Nicolas, Philippon, Damien, Huynh, Quang-Nghi, Gaudou, Benoit and Drogoul, Alexis (2019) ‘Participatory Modeling and Simulation with the GAMA Platform’, Journal of Artificial Societies and Social Simulation, 22(2), March, article 3, <https://www.jasss.org/22/2/3.html>. doi:10.18564/jasss.3964

Tykhonov, Dmytro, Jonker, Catholijn, Meijer, Sebastiaan and Verwaart, Tim (2008) ‘Agent-Based Simulation of the Trust and Tracing Game for Supply Chains and Networks’, Journal of Artificial Societies and Social Simulation, 11(3), June, article 1, <https://www.jasss.org/11/3/1.html>.

Vuculescu, Oana (2017) ‘Searching Far Away from the Lamp-Post: An Agent-Based Model’, Strategic Organization, 15(2), May, pp. 242-263. doi:10.1177/1476127016669869

Zelditch, Morris Junior (2007) ‘Laboratory Experiments in Sociology’, in Webster, Murray Junior and Sell, Jane (eds.) Laboratory Experiments in the Social Sciences (New York, NY: Elsevier), pp. 183-197.


Notes

[i] This event was organised (and the resulting article was written) as part of “Towards Realistic Computational Models of Social Influence Dynamics” a project funded through ESRC (ES/S015159/1) by ORA Round 5 and involving Bruce Edmonds (PI) and Edmund Chattoe-Brown (CoI). More about SSC2021 (Social Simulation Conference 2021) can be found at https://ssc2021.uek.krakow.pl

[ii] This issue is actually very challenging for social science more generally. When considering interventions in social systems, knowing and acting might be so deeply intertwined (Derbyshire 2020) that interventions may modify the same behaviours that an experiment is aiming to understand.

[iii] In addition, experiments often require institutional ethics approval (but so do interviews, gaming activities and others sort of empirical research of course), something with which non-empirical Agent-Based Modellers may have little experience.

[iv] Chattoe-Brown had interesting personal experience of this. He took part in a simple team gaming exercise about running a computer firm. The team quickly worked out that the game assumed an infinite return to advertising (so you could have a computer magazine consisting entirely of adverts) independent of the actual quality of the product. They thus simultaneously performed very well in the game from the perspective of an external observer but remained deeply sceptical that this was a good lesson to impart about running an actual firm. But since the coordinators never asked the team members for their subjective view, they may have assumed that the simulation was also a success in its didactic mission.

[v] We should also not assume it is best to set our own standards from scratch. It may be valuable to attempt integration with existing approaches, like qualitative validity (https://conjointly.com/kb/qualitative-validity/) particularly when these are already attempting to be multidisciplinary and/or to bridge the gap between, for example, qualitative and quantitative data.

[vi] Although journals also face such a collective action problem at a different level. If they are too exacting relative to their status and existing practice, researchers will simply publish elsewhere.


Dino Carpentras, Edmund Chattoe-Brown, Bruce Edmonds, Cesar García-Diaz, Christian Kammler, Anna Pagani and Nanda Wijermans (2020) Where Now For Experiments In Agent-Based Modelling? Report of a Round Table as Part of SSC2021. Review of Artificial Societies and Social Simulation, 2nd Novermber 2021. https://rofasss.org/2021/11/02/round-table-ssc2021-experiments/


Does It Take Two (And A Creaky Search Engine) To Make An Outstation? Hunting Highly Cited Opinion Dynamics Articles in the Journal of Artificial Societies and Social Simulation (JASSS)

By Edmund Chattoe-Brown

In an important article, Squazzoni and Casnici (2013) raise the issue of how social simulation (as manifested in the Journal of Artificial Societies and Social Simulation – hereafter JASSS – the journal that has probably published the most of this kind of research for longest) cites and is cited in the wider scientific community. They discuss this in terms of social simulation being a potential “outstation” of social science (but better integrated into physical science and computing). This short note considers the same argument in reverse. As an important site of social simulation research, is it the case that JASSS is effectively representing research done more widely across the sciences?

The method used to investigate this was extremely simple (and could thus easily be extended and replicated). On 28.08.21, using the search term “opinion dynamics” in “all fields”, all sources from Web of Science (www.webofknowledge.com, hereafter WOS) that were flagged as “highly cited” were selected as a sample. For each article (only articles turned out to be highly cited), the title was searched in JASSS and the number of hits recorded. Common sense was applied in this search process to maximise the chances of success. So if a title had two sub clauses, these were searched jointly as quotations (to avoid the “hits” being very sensitive to the reproduction of punctuation linking clauses.) In addition, the title of the journal in which the article appeared was searched to give a wider sense of how well the relevant journal is known is JASSS.

However, now we come to the issue of the creaky search engine (as well as other limitations of quick and dirty searches). Obviously searching for the exact title will not find variants of that title with spelling mistakes or attempts to standardise spelling (i. e. changing behavior to behaviour). Further, it turns out that the Google search engine (which JASSS uses) does not promise the consistency that often seems to be assumed for it (http://jdebp.uk/FGA/google-result-counts-are-a-meaningless-metric.html). For example, when I searched for “SIAM Review” I mostly got 77 hits, rather often 37 hits and very rarely 0 or 1 hits. (PDFs are available for three of these outcomes from the author but the fourth could not be reproduced to be recorded in the time available.) This result occurred when another search took place seconds after the first so it is not, for example, a result of substantive changes to the content of JASSS. To deal with this problem I tried to confirm the presence of a particular article by searching jointly for all its co-authors. Mostly this approach gave a similar result (but where it does not it is noted in the table below). In addition, wherever there were a relatively large number of hits for a specific search, some of these were usually not the ones intended. (For example no hit on the term “global challenges” actually turned out to be for the journal Global Challenges.) In addition, JASSS often gives an oddly inconsistent number of hits for a specific article: It may appear as PDF and HTML as well as in multiple indices or may occur just once. (This discouraged attempts to go from hits to the specific number of unique articles citing these WOS sources. As it turns out, this additional detail would have added little to the headline result.)

The term “opinion dynamics” was chosen somewhat arbitrarily (for reasons connected with other research) and it is not claimed that this term is even close to a definitive way of capturing any models connected with opinion/attitude change. Nonetheless, it is clear that the number of hits and the type of articles reported on WOS (which is curated and quality controlled) are sufficient (and sufficiently relevant) for this to be a serviceable search term to identify a solid field of research in JASSS (and elsewhere). I shall return to this issue.

The results, shown in the table below are striking on several counts. (All these sources are fully cited in the references at the end of this article.) Most noticeably, JASSS is barely citing a significant number of articles that are very widely cited elsewhere. Because these are highly cited in WOS this cannot be because they are too new or too inaccessible. The second point is the huge discrepancy in citation for the one article on the WOS list that appears in JASSS itself (Flache et al. 2017). Thirdly, although some of these articles appear in journals that JASSS otherwise does not cite (like Global Challenges and Dynamic Games and Applications) others appear in journals that are known to JASSS and generally cited (like SIAM Review).

Reference WOS Citations Article Title Hits in JASSS Journal Title Hits in JASSS
Acemoglu and Ozdaglar (2011) 301 0 (1 based on joint authors) 2
Motsch and Tadmor (2014) 214 0 77
Van Der Linden et al. (2017) 191 0 6 (but none for the journal)
Acemoğlu et al. (2013) 186 1 2 (but 1 article)
Proskurnikov et al. (2016) 165 0 9
Dong et al. (2017) 147 0 48 (but rather few for the journal)
Jia et al. (2015) 118 0 77
Dong et al. (2018) 117 0 (1 based on joint authors) 48 (but rather few for the journal)
Flache et al. (2017) 86 58 (17 based on joint authors) N/A
Urena et al. (2019) 72 0 6
Bu et al. (2020) 56 0 5
Zhang et al. (2020) 55 0 33 (but only some of these are for the journal)
Xiong et al. (2020) 28 0 1
Carrillo et al. (2020) 13 0 0

One possible interpretation of this result is simply that none of the most highly cited articles in WOS featuring the term “opinion dynamics” happen to be more than incidentally relevant to the scientific interests of JASSS. On consideration, however, this seems a rather improbable coincidence. Firstly, these articles were chosen exactly because they are highly cited so we would have to explain how they could be perceived as so useful generally but specifically not in JASSS. Secondly, the same term (“opinion dynamics”) consistently generates 254 hits in JASSS, suggesting that the problem isn’t a lack of overlap in terminology or research interests.

This situation, however, creates a problem for more conclusive explanation. The state of affairs here is not that these articles are being cited and then rejected on scientific grounds given the interests of JASSS (thus providing arguments I could examine). It is that they are barely being cited at all. Unfortunately, it is almost impossible to establish why something is not happening. Perhaps JASSS authors are not aware of these articles to begin with. Perhaps they are aware but do not see the wider scientific value of critiquing them or attempting to engage with their irrelevance in print.

But, given that the problem is non citation, my concern can be made more persuasive (perhaps as persuasive as it can be given problems of convincingly explaining an absence) by investigating the articles themselves. (My thanks are due to Bruce Edmonds for encouraging me to strengthen the argument in this way.) There are definitely some recurring patterns in this sample. Firstly, a significant proportion of the articles are highly mathematical and, therefore (as Agent-Based Modelling often criticises) rely on extreme simplifying assumptions and toy examples. Even here, however, it is not self-evident that such articles should not be cited in JASSS merely because they are mathematical. JASSS has itself published relatively mathematical articles and, if an article contains a mathematical model that could be “agentised” (thus relaxing its extreme assumptions) which is no less empirical than similar models in JASSS (or has particularly interesting behaviours) then it is hard to see why this should not be discussed by at least a few JASSS authors. A clear example of this is provided by Acemoğlu et al. (2013) which argues that existing opinion dynamics models fail to produce the ongoing fluctuations of opinion observed in real data (see, for example, Figures 1-3 in Chattoe-Brown 2014 which also raises concerns about the face validity of popular social simulations of opinion dynamics). In fact, the assumptions of this model could easily be questioned (and real data involves turning points and not just fluctuations) but the point is that JASSS articles are not citing it and rejecting it based on argument but simply not citing it. A model capable of generating ongoing opinion fluctuations (however imperfect) is simply too important to the current state of opinion dynamics research in social simulation not to be considered at all. Another (though less conclusive) example is Motsch and Tadmor (2014) which presents a model suggesting (counter intuitively) that interaction based on heterophily can better achieve consensus than interaction based on homophily. Of course one can reject such an assumption on empirical grounds but JASSS is not currently doing that (and in fact the term heterophily is unknown in the journal except for the title of a cited article.)

Secondly, there are also a number of articles which, while not providing important results seem no less plausible or novel than typical OD articles that are published in JASSS. For example, Jia et al. (2015) add self-appraisal and social power to a standard OD model. Between debates, agents amend the efficacy they believe that they and others have in terms of swaying the outcome and take that into account going forward. Proskurnikov et al. (2016) present the results of a model in which agents can have negative ties with each other (as well as the more usual positive ones) and thus consider the coevolution of positive/negative sentiments and influence (describing what they call hostile camps i. e. groups with positive ties to each other and negative ties to other groups). This is distinct from the common repulsive effect in OD models where agents do not like the opinions of others (rather than disliking the others themselves.)

Finally, both Dong et al. (2017) and Zhang et al. (2020) reach for the idea (through modelling) that experts and leaders in OD models may not just be randomly scattered through the population as types but may exist because of formal organisations or accidents of social structure: This particular agent is either deliberately appointed to have more influence or happens to have it because of their network position.

On a completely different tack, two articles (Dong et al. 2018 and Acemoglu and Ozdaglar 2011) are literature reviews or syntheses on relevant topics and it is hard to see how such broad ranging articles could have so little value to OD research in JASSS.

It will be admitted that some of the articles in the sample are hard to evaluate with certainty. Mathematical approaches often seem to be more interested in generating mathematics than in justifying its likely value. This is particularly problematic when combined with a suggestion that the product of the research may be instrumental algorithms (designed to get things done) rather than descriptive ones (designed to understand social behaviour). An example of this is several articles which talk about achieving consensus without really explaining whether this is a technical goal (for example in a neural network) or a social phenomenon and, if the latter, whether this places constraints on what it legitimate: You can reach consensus by debate but not by shooting dissenters!

But as well as specific ideas in specific models, this sample of articles also suggest a different emphasis from those currently found within JASSS OD research. For example, there is much more interest in deliberately achieving consensus (and the corresponding hazards of manipulation or misinformation impeding that.) Reading these articles collectively gives a sense that JASSS OD models are very much liberal democratic: Agents honestly express their views (or at most are somewhat reticent to protect themselves.) They decently expect the will of the people to prevail. They do not lie strategically to sway the influential, spread rumours to discredit the opinions of opponents or flood the debate with bots. Again, this darker vision is no more right a priori than the liberal democratic one but JASSS should at least be engaging with articles modelling (or providing data on – see Van Der Linden et al. 2017) such phenomena in an OD context. (Although misinformation is mentioned in some OD articles in JASSS it does not seem to be modelled. There also seems to be another surprising glitch in the search engine which considers the term “fake news” to be a hit for misinformation!) This also puts a new slant on an ongoing challenge in OD research, identifying a plausible relationship between fact and opinion. Is misinformation a different field of research (on the grounds that opinions can never be factually wrong) or is it possible for the misinformed to develop mis-opinions? (Those that they would change if what they knew changed.) Is it really the case that Brexiteers, for example, are completely indifferent to the economic consequences which will reveal themselves or did they simply have mistaken beliefs about how high those costs might turn out to be which will cause them to regret their decision at some later stage?

Thus to sum up, while some of the articles in the sample can be dismissed as either irrelevant to JASSS or having a potential relevance that is hard to establish, the majority cannot reasonably be regarded in this way (and a few are clearly important to the existing state of OD research.) While we cannot explain why these articles are not in fact cited, we can thus call into question one possible (Panglossian) explanation for the observed pattern (that they are not cited because they have nothing to contribute).

Apart from the striking nature of the result and its obvious implication (if social simulators want to be cited more widely they need to make sure they are also citing the work of others appropriately) this study has two wider (related) implications for practice.

Firstly, systematic literature reviewing (see, for example, Hansen et al. 2019 – not published in JASSS) needs to be better enforced in social simulation: “Systematic literature review” gets just 7 hits in JASSS. It is not enough to cite just what you happen to have read or models that resemble your own, you need to be citing what the community might otherwise not be aware of or what challenges your own model assumptions. (Although, in my judgement, key assumptions of Acemoğlu et al. 2013 are implausible I don’t think that I could justify non subjectively that they are any more implausible than those of those of the Zaller-Deffuant model – Malarz et al. 2011 – given the huge awareness discrepancy which the two models manifest in social simulation.)

Secondly, we need to rethink the nature of literature reviewing as part of progressive research. I have used “opinion dynamics” here not because it is the perfect term to identify all models of opinion and attitude change but because it throws up enough hits to show that this term is widely used in social simulation. Because I have clearly stated my search term, others can critique it and extend my analysis using other relevant terms like “opinion change” or “consensus formation”. A literature review that is just a bunch of arbitrary stuff cannot be critiqued or improved systematically (rather than nit-picked for specific omissions – as reviewers often do – and even then the critique can’t tell what should have been included if there are no clearly stated search criteria.) It should not be possible for JASSS (and the social simulation community it represents) simply to disregard articles as potentially important in their implications for OD as Acemoğlu et al. (2013). Even if this article turned out to be completely wrong-headed, we need to have enough awareness of it to be able to say why before setting it aside. (Interestingly, the one citation it does receive in JASSS can be summarised as “there are some other model broadly like this” with no detailed discussion at all – and thus no clear statement of how the model presented in the citing article adds to previous models – but uninformative citation is a separate problem.)

Acknowledgements

This article as part of “Towards Realistic Computational Models of Social Influence Dynamics” a project funded through ESRC (ES/S015159/1) by ORA Round 5.

References

Acemoğlu, Daron and Ozdaglar, Asuman (2011) ‘Opinion Dynamics and Learning in Social Networks’, Dynamic Games and Applications, 1(1), March, pp. 3-49. doi:10.1007/s13235-010-0004-1

Acemoğlu, Daron, Como, Giacomo, Fagnani, Fabio and Ozdaglar, Asuman (2013) ‘Opinion Fluctuations and Disagreement in Social Networks’, Mathematics of Operations Research, 38(1), February, pp. 1-27. doi:10.1287/moor.1120.0570

Bu, Zhan, Li, Hui-Jia, Zhang, Chengcui, Cao, Jie, Li, Aihua and Shi, Yong (2020) ‘Graph K-Means Based on Leader Identification, Dynamic Game, and Opinion Dynamics’, IEEE Transactions on Knowledge and Data Engineering, 32(7), July, pp. 1348-1361. doi:10.1109/TKDE.2019.2903712

Carrillo, J. A., Gvalani, R. S., Pavliotis, G. A. and Schlichting, A. (2020) ‘Long-Time Behaviour and Phase Transitions for the Mckean–Vlasov Equation on the Torus’, Archive for Rational Mechanics and Analysis, 235(1), January, pp. 635-690. doi:10.1007/s00205-019-01430-4

Chattoe-Brown, Edmund (2014) ‘Using Agent Based Modelling to Integrate Data on Attitude Change’, Sociological Research Online, 19(1), February, article 16, <http://www.socresonline.org.uk/19/1/16.html&gt;. doi:10.5153/sro.3315

Dong, Yucheng, Ding, Zhaogang, Martínez, Luis and Herrera, Francisco (2017) ‘Managing Consensus Based on Leadership in Opinion Dynamics’, Information Sciences, 397-398, August, pp. 187-205. doi:10.1016/j.ins.2017.02.052

Dong, Yucheng, Zhan, Min, Kou, Gang, Ding, Zhaogang and Liang, Haiming (2018) ‘A Survey on the Fusion Process in Opinion Dynamics’, Information Fusion, 43, September, pp. 57-65. doi:10.1016/j.inffus.2017.11.009

Flache, Andreas, Mäs, Michael, Feliciani, Thomas, Chattoe-Brown, Edmund, Deffuant, Guillaume, Huet, Sylvie and Lorenz, Jan (2017) ‘Models of Social Influence: Towards the Next Frontiers’, Journal of Artificial Societies and Social Simulation, 20(4), October, article 2, <http://jasss.soc.surrey.ac.uk/20/4/2.html&gt;. doi:10.18564/jasss.3521

Hansen, Paula, Liu, Xin and Morrison, Gregory M. (2019) ‘Agent-Based Modelling and Socio-Technical Energy Transitions: A Systematic Literature Review’, Energy Research and Social Science, 49, March, pp. 41-52. doi:10.1016/j.erss.2018.10.021

Jia, Peng, MirTabatabaei, Anahita, Friedkin, Noah E. and Bullo, Francesco (2015) ‘Opinion Dynamics and the Evolution of Social Power in Influence Networks’, SIAM Review, 57(3), pp. 367-397. doi:10.1137/130913250

Malarz, Krzysztof, Gronek, Piotr and Kulakowski, Krzysztof (2011) ‘Zaller-Deffuant Model of Mass Opinion’, Journal of Artificial Societies and Social Simulation, 14(1), 2, <https://www.jasss.org/14/1/2.html&gt;. doi:10.18564/jasss.1719

Motsch, Sebastien and Tadmor, Eitan (2014) ‘Heterophilious Dynamics Enhances Consensus’, SIAM Review, 56(4), pp. 577-621. doi:10.1137/120901866

Proskurnikov, Anton V., Matveev, Alexey S. and Cao, Ming (2016) ‘Opinion Dynamics in Social Networks With Hostile Camps: Consensus vs. Polarization’, IEEE Transactions on Automatic Control, 61(6), June, pp. 1524-1536. doi:10.1109/TAC.2015.2471655

Squazzoni, Flaminio and Casnici, Niccolò (2013) ‘Is Social Simulation a Social Science Outstation? A Bibliometric Analysis of the Impact of JASSS’, Journal of Artificial Societies and Social Simulation, 16(1), 10, <http://jasss.soc.surrey.ac.uk/16/1/10.html&gt;. doi:10.18564/jasss.2192

Ureña, Raquel, Chiclana, Francisco, Melançon, Guy and Herrera-Viedma, Enrique (2019) ‘A Social Network Based Approach for Consensus Achievement in Multiperson Decision Making’, Information Fusion, 47, May, pp. 72-87. doi:10.1016/j.inffus.2018.07.006

Van Der Linden, Sander, Leiserowitz, Anthony, Rosenthal, Seth and Maibach, Edward (2017) ‘Inoculating the Public against Misinformation about Climate Change’, Global Challenges, 1(2), 27 February, article 1600008. doi:10.1002/gch2.201600008

Xiong, Fei, Wang, Ximeng, Pan, Shirui, Yang, Hong, Wang, Haishuai and Zhang, Chengqi (2020) ‘Social Recommendation With Evolutionary Opinion Dynamics’, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 50(10), October, pp. 3804-3816. doi:10.1109/TSMC.2018.2854000

Zhang, Zhen, Gao, Yuan and Li, Zhuolin (2020) ‘Consensus Reaching for Social Network Group Decision Making by Considering Leadership and Bounded Confidence’, Knowledge-Based Systems, 204, 27 September, article 106240. doi:10.1016/j.knosys.2020.106240


Chattoe-Brown, E. (2021) Does It Take Two (And A Creaky Search Engine) To Make An Outstation? Hunting Highly Cited Opinion Dynamics Articles in the Journal of Artificial Societies and Social Simulation (JASSS). Review of Artificial Societies and Social Simulation, 19th August 2021. https://rofasss.org/2021/08/19/outstation/