In the last decades, population censuses have become complex engines of human–machine interaction. The digitisation of public statistics has sparked new methods, objects of concern and arenas of measurement to maintain relevance before society (Ruppert and Scheel 2021). Still, in many countries, population censuses continue to rely on questionaries filled out through in-person encounters between enumerators and citizens. Here, too, disquiets about the legitimacy, cost and timeliness of public numbers have led to the incorporation of computer-assisted tools into data collection and processing protocols (Bruno et al. 2020; Jhamba et al. 2020; Mrkić 2020). Crucially, these interconnected and real-time information-generating systems are crafting new spaces for professional and political practice through the production and everyday management of numbers.
Some of these new procedures aim explicitly at controlling data quality and avoiding fraudulent data collection behaviour. Fraudulent practices can entail a range of grey-zone activities by which data operators attempt to forge data entries without – or in some cases beyond – the responses given by their informants in hopes of expediting collection and obtaining greater – or faster – financial compensation. With the arrival of computerised technologies to collect, process and circulate information, fraudulent behaviour has taken on new forms. As census-takers now leave behind huge digital footprints on their daily movements, data operators have gained additional tools to monitor progress through novel practices of framing, locating and addressing ‘possible enumerator fraud attempts’ (UN 2019: 96).
Brazil first pioneered computer-assisted personal interviewing (CAPI) technologies for data collection in 2007 during its decennial agricultural census.1 In its 2010 population census round, the Brazilian Institute for Geography and Statistics (IBGE) – one of the oldest and most respected national statistics organisations in the world and in charge of producing and processing Brazilian official statistics – deployed over 220,000 smartphones (called ‘DMC's, which in Portuguese stands for ‘mobile collection devices’) equipped with global positioning system (GPS) receivers for data collection and supervision. That census also trailblazed SIGC, an integrated system for management and control used at data collection stations across the country to monitor the wireless transmission of field data and generate analytics through ‘indicators’ capable of improving management, automating operation flows and ‘evaluating the enumerators’ work quality’ (Bianchini 2011) in real time. Fine-tuned and perfected over the last thirteen years, these interconnected systems were now ready to run in Brazil's 2022 population census.
In this article, we interrogate the ethical work enacted by indicators as political technologies of everyday data production and governance. We second extant understandings of censuses as collections of material semiotic practices with and around numbers (Verran 2012), through which ideas of futurity are made amenable to intervention. Yet we do not locate data politics mainly in expert operations of tabulating, encoding, calibrating, sieving and classifying (Anderson 2015; Darrow 2002; Emigh et al. 2016; Lanata-Briones et al. 2022; Mezey 2003; Ruppert 2007; Ruppert and Scheel 2021; Scott 1998; Thorvaldsen 2017). Instead, we follow traces of data politics in everyday affective negotiations around the work of fraud-detecting algorithms. We ask: what kinds of truths surface as these machine infrastructures meet the human infrastructure of data collection, and how do indicators come to index competing modes of veridiction?
To answer these questions, we focus on a pivotal moment in Brazil's recent statistical history. Since 2018, a combination of far-right politics and neoliberal restructuring aggravated by the pandemic have robbed IBGE of the financial resources necessary to implement the 2020 census, leading to cuts in the questionnaire, successive postponements and new forms of expert number activism (Kopper 2023). The public discussion of the implications of a ‘statistical blackout’ (Barbosa and Szwako 2019) created a unique empirical and methodological window to conduct participant observation and in-depth interviews with IBGE experts at the national and regional levels between 2021 and 2022.
Through this ongoing empirical work, we make two theoretical interventions. First, we contend that a pivotal dimension of data politics within the census machinery lies in the imaginative and moral work performed by data controllers and field operators – whereby they seek to stabilise, qualify and amass knowledge. To explore how, in practice, indicators are imbued with meaning and the power to anticipate the future, we worked closely with IBGE professionals at one of their largest regional subunits in Southern Brazil, following the training of data operators immediately before the beginning of data collection. Taking the bureaucratic form of meetings, these activities ‘are key sites through which social, political, temporal, spatial, and material circumstances are constituted and transformed’ (Brown et al. 2017). At first glance, population censuses seem to befit Bruno Latour's (1987) notion of a ‘centre of calculation’, in so far as they expand their networks through iterative movements of validation and dissemination from the centre towards the peripheries of the statistical system (Jöns 2011). We challenge this assumption by looking into the situated workings of indicators, the fissures and noise (Larkin 2008) that emerge with their displacement and deployment alongside the implementation chain, and the differential affordances and capacities for action that fraud-detecting indicators come to embody in such bureaucratic settings (Hull 2012).
Second, as stakeholders use resources to stabilise definitions within the bureaucratic form of training events on the fringes of centres of calculation, indicators appear less as univocal future-modelling instruments that standardise at a distance (Porter 1995). Instead, they index present-day struggles around terms and distributions that become, themselves, the basis for a politics of futurity whereby the future as an affective-technological temporality is apprehended, measured and acted upon in the present (Valentine and Hassoun 2019). Drawing on Michel Foucault's political history of truth-telling, we label these competing visions of reality encoding and the ‘different games of truth and falsehood’ (2014: 20) they enact as modes of veridiction. Within the hierarchical labour structure of training activities, we identify two competing forms of trust and truth-telling, which we label probabilistic and performative. Paraphrasing Ian Hacking's (1990) landmark notion of the ‘taming of chance’, our ethnography shows that while census trainers seek to tame temporarily hired fieldworkers through the translation of official enumeration practices and the probabilistic work of indicators, temporarily hired fieldworkers tame indicators when they see them as hemming the inscription of reality. Much like audits – which create a form of ‘accountability at a distance’ that recasts (and is recast by) the behaviour of individuals and organisations and their understandings of accountability, responsibility and ethics (Strathern 2000) – fraud-detecting indicators also enact new meanings of data quality control that recast (and are recast by) the contingent ethical work of operators on the ground.
The article consists of four parts. The first part discusses the technopolitical and bureaucratic infrastructure of information dissemination whereby concepts and processes travel from the centre to the peripheries of the census operation. The second and third parts flesh out emic understandings of the technical and political workings of fraud-detecting algorithms through the ethnographic description of training activities. We then show how these competing visions of data enable a politics of futurity that is built on imaginaries of data production, organisation and purification. The fourth part concludes our study by discussing how an ethnography of fraud-detecting algorithms illuminates broader technical and political questions around the ethical work sustaining population censuses in the current data moment, where (digitised) public numbers vie for space alongside private, personal and post-factual modes of veridiction.
Passing on Concepts
We first met Marisa in May 2022, a few months before the start of data collection, when she was hustling to prepare the first session of training activities at the regional headquarters of IBGE in Southern Brazil. Marisa holds a bachelor's degree in social science and has worked as a permanent staff member for eighteen years. As a technologist, she processes cartographic changes in the sectorial grids of census tracts and handles local updates to the National Database of Addresses for Statistical Purposes (CNEFE). Yet, as is common in the months preceding census data collection, most staff members are redirected to work on specific preparatory tasks. Celso, the local census coordinator, had designated Marisa, João and Adair to pass on (repassar) information they had first received from higher-ranked staff that had attended the initial training at IBGE's national headquarters in Rio de Janeiro the month before.
The fourteen-day-long intensive workshop was designed to educate sub-area coordinators – or simply ‘coordinators’2 (as they were addressed) – to act as proxies in further delegating information to field supervisors and enumerators in subsequent training activities. They were hired in 2019 when Brazil's population census was still slated to occur the year after. However, as IBGE struggled with successive deferments, these temporary functionaries were integrated into staff activities. They were deployed in other surveys – including the institute's well-regarded household sample surveys (PNAD) and, during the pandemic, its newly designed COVID-PNAD – which gave them critical operational skills.
Known locally as repasses, such training activities are a critical nexus in the census implementation chain, not least because they entail complex translations across the various operating links, from statisticians to on-the-ground field and data-processing staff (UNECA 1971). In our conversations with IBGE's former presidents and experts, we were recurringly presented with statements about the need for a noise-free communication of ‘concepts’ from the directive instances where variables are debated, questionnaires designed, and informational networks implemented, in Rio de Janeiro, to the nitty-gritty terrains of data collection. Such an intricate, hierarchical and centralised knowledge transmission chain was intended to keep local operators ‘faithful’ to the concepts designed by the sede (‘headquarters’), we were told by Marisa and João while accompanying them and other staff to lunch at the beginning of the second week of activities.
Passing on ‘concepts’ while preserving their meaning across the vertical chain of command was indeed the central goal of repasses. Coordinators played a critical role in this process, as they needed to capture the ‘conceptual essence’ of the census and then travel to other parts of the state to train supervisors, who would then train small groups of enumerators – in a downwards spiral that would continue throughout data collection. For the first time in a Brazilian population census, IBGE was forced to recruit temporary operatives to work as sub-area coordinators, the highest managerial level below the regional census coordinators, due to successive budgetary cuts. We witnessed senior staff object to this decision many times during fieldwork. One person questioned: ‘Are they actually getting the concepts right? You know, they don't have the institutional commitment we have, and after their three-year contracts end, we are left holding the bag’.
Such fears of conceptual ‘distortion’ along the transmission chain were partly justified due to the introduction of new digital tools and controls called ‘indicators’, which were designed to improve data quality and make statistical systems more internally and externally responsive. While indicators stemmed from efforts at decentralising decision-making and creating two-way exchanges with regional offices, Marisa feared that they also heightened the responsibilities of field operators and placed additional burdens on an already overloaded staff,3 all the while making training an extensive, complex and taxing operation. ‘Some concepts are difficult to assimilate over such a short period for us trainers, let alone for temporary staff’, she told us in an interview.
At the heart of field operations, and monitored by sub-area coordinators and collection station managers, are the supervisors. Recruited as temporary workers with only a minimum high-school level education, they manage about fifteen enumerators each and have some of the highest workloads within the census. During the pre-collection phase, they train enumerators and perform updates in the address database through visits to their associated census tracts. During data collection, they oversee the georeferenced trajectory of enumerators and perform periodic field revisits to authenticate enumerators’ data. All these tasks are executed through personal computers, DMCs and tablets wirelessly integrated into SIGC. They thus require a thorough understanding of the technical and human infrastructures behind data collection. Still, limited training and experience and paltry wages raise concerns about a labour shortage and data quality, ultimately putting the very viability of the census at risk.
In Marisa's words:
[The supervisor] is the most demanded person... and the key link [peça-chave]. We convey in our training that she is everything. She will pass on misguided information to the enumerator if she doesn't properly grasp the concepts and workflow during training. She needs to know everything. She is the first to arrive at the tract to settle in and will take the enumerator by hand throughout the process. From our systems, we monitor whether the supervisor assimilated the concept [well] by checking the real-time coordinates captured by her mobile device. Watching the little doll [sic] move on the screen, you can already tell if she recorded data from the field or home, followed the right trajectory, and so on.
Here, the technology afforded by SIGC enables everyday controls to counterbalance the structural shortcomings of low-qualified, underpaid supervisors. Through its built-in indicators, SIGC helps cast statistical data as actionable, valuable and ready for further treatment by IBGE's IT Department – the ‘brains of our operation’, according to Marisa. Ultimately, this contributes to obscuring the various forms of labour precarity of on-the-ground data operators. ‘When everything is based on a hierarchy of processes’, indicators become ‘important beacons [balizas] to guide the gaze of supervisors with little effective training. Since IBGE doesn't have the staffing to track this manually, it helps us see the problem and where it lies’. Crucially, Marisa's words reveal that indicators suppose the existence of an interlinked system of checks and balances that spreads from enumerators all the way to coordinators, thanks to their capacity to translate information uniformly. ‘If you stop to think that the entire census structure hinges on some 230,000 temporary operatives, that's maddening! We need something that gives us more consistency [on the data collected] and doesn't let [errors] pass’. Yet, as we shall now learn, new noise and fissures are prone to crack open at each junction of this human–machine continuum. The following sections describe how concept translation in repasses gives way to multiple forms of thinking about and affectively acting with data.
Figuring out Indicators
Entering the second week of training, João, Adair and Marisa seemed drained. They had to master hundreds of PowerPoint slides loaded heavily with pre-scripted text and instruction videos. While Marisa tried to infuse her words with enthusiasm, João and Adair stayed closer to the text, reading entire passages non-stop and with little voice inflection. ‘You don't want to go off tangent much because you risk losing the concept’, one of them later explained. The sheer amount of content assigned to each eight-hour-long training day was truly astonishing.
‘Our aim today is to identify the managerial indicators at our disposal to associate them with the reality found in the field, to guarantee the quality of our data’, Adair read from the scripted slide. ‘We will learn to read indicators that pinpoint errors in data collection and even frauds. We will also learn how to respond to the messages they generate’, João further explained.
Following another slide, Adair then asked the audience what they thought an indicator was. After some silence, we heard overlapping voices suggesting that it could be ‘a tendency’, a ‘portrait’ and ‘a form of measurement’. Adair proceeded to read IBGE's formal definition without agreeing or disagreeing with the opinions presented: ‘[An] indicator is a methodological resource that informs something about an aspect of... reality’. Being presented between quotation marks on the slide and attributed to IBGE's renowned demographer Paulo Jannuzzi (2004), this somewhat vague definition was further contextualised by João: ‘It helps us, here at the headquarters, to assess the data and control what happens during collection. IBGE previously sets these parameters so that the information collected in the field matches an expected pattern’.
Adair gave further insight into the workings of indicators. To collect data, enumerators are furnished with DMCs: smartphones running a proprietary operating system developed by IBGE (Bianchini 2011) and continually being perfected by the institution's Information and Communication Technology (ICT) experts for software stability and practical use. Once an enumerator concludes a questionnaire, the device transmits the information to IBGE's central server through an encrypted mobile internet connection. As this process takes place in the background of data collection, SIGC starts generating real-time statistics that measure the degree to which the collected data matches or deviates from the expected data. This algorithm predicts normal intervals based on pre-existing aggregated data for the territorial unit where the data was collected. Being the brains of the census implementation, SIGC is manipulated by all field operators save enumerators, albeit with different degrees of clearance and access. When deviations occur, SIGC generates warning signs that must be periodically checked and attended to by supervisors to identify and correct potential issues, which can range from errors in concept application to low productivity to outright fraudulent behaviour.
‘There is a parameter, which the enumerator won't know – and this is crucial! – against which her work is gauged’, Adair explained. ‘The classic example is the variable number of residents. If the enumerator is catching one or two persons per household, and her average is at 1.4, this will likely be flagged by SIGC because in that tract a higher number of residents is expected. In fact, the average typically ranges between 2.9 and 3.1 [persons per household]’. Conversely, also, if an enumerator records households with six, then four individuals, the indicator, number of residents higher than expected, will get flagged. ‘This is not a problem; it's only natural at the beginning’. To ease concerns, João expounded that indicators are generated continuously from the first ‘transmission’, which is how the process of wiring information from DMCs to central servers is known. ‘Now, if it happens before closing a census tract, and the number is significantly higher than expected, it would be interesting to check how the supervision requests were handled’.
A supervisor's job is mainly about actively monitoring the data reports generated by SIGC on their tablets. This includes handling ‘supervision requests’: system-generated protocols to revisit a sample of households to confirm key questionnaire parameters, such as the number of residents, age and gender, to match that data with the data initially collected by the enumerator. In case of divergences, supervisors must request enumerators to return to those households to redo the interview. Sometimes, such a verification process can take several rounds of revisits and re-interviews until all the data is matched and the enumerator is cleared to close the census tract.
The appearance of an indicator requires a fast turnaround from the supervisor to prevent time- and resource-wasting. Still, trainers stressed that an indicator does not necessarily imply an error of procedure or conceptual interpretation. ‘We don't want enumerators to think that we are questioning the quality of their work. We need to be very careful. Because if an enumerator believes that they are doing everything right, they may feel aggressively confronted by the supervisor’. ‘Did they do this to you?’ – a random question from the audience cut through Adair's narrative. ‘Well, it wasn't aggressive, but I was absolutely sure that my data was correct. And in the end, it was’. Still, Adair explained that this should not be considered the norm, as it occurred during pre-testing when seasoned IBGE staff went to the field to assess, for a limited number of households, whether the framing of questions was yielding the expected responses. ‘It is part of the supervisor's job to make the enumerator redo an interview in case of inconsistent data. But in practice, it is difficult to convince an enumerator that they will need to return to the field. This means a revisit, and they don't get paid for it’. Differently from supervisors and coordinators, who receive fixed monthly salaries, enumerators are remunerated exclusively based on productivity. ‘We must ensure that the enumerators who are doing well continue to work for us. They are a scarce resource [eles são recurso escasso]’.
Once supervisors locate and address breakdowns, they must return to SIGC and describe how they handled data discrepancies. All the registered information then enters a revision cycle by staff, such as Marisa, appointed specifically for data quality control during collection. ‘The system won't accept basic justifications such as, “the collected data has been confirmed, check!”’, Adair laughingly explained. In carrying on with his example about the indicator that flagged a lower-than-expected number of residents, he suggested that supervisors describe the entire procedural chain in the following manner:
Supervisor spoke with enumerator.
Enumerator informed [me of] not having seen any abnormality.
Supervisor performed field verification of unit 65, checked list of residents, and found numbers matching those of the census-taker.
‘We need to see that something was done about the indicator so that the divergence in expected data can be properly accounted for’, Adair stated. ‘There is no need to be afraid,’ João, again, tried to appease some apprehensive glances from the audience. ‘We've all been through this: even Celso [the local census coordinator] did; it's bound to happen when you transmit data all the time’. João and Aldair then conveyed that indicators are prone to come and go as SIGC is updated with more information and that most indicators would eventually ‘resolve themselves’ and disappear. In the next segment of the PowerPoint presentation, Adair went over some of the most common indicators, their possible meanings and the best ways to handle them. He referred the audience to specific pages in the supervisor's instruction manual, a hefty book containing detailed descriptions of roles, error codes and procedure orientations for field operations:
Total number of people surveyed per collection day higher or lower than expected.
Average number of residents per permanently occupied private household higher or lower than expected.
Proportion of persons reporting month and year of birth lower than expected.
‘There is no mystery with this one’, Adair cut the monotone. ‘It means that enumerators are only reporting presumed age. Sometimes the head of household doesn't remember everyone's age. We have seen cases where they even forgot to report on some household members. We recommend asking for individual ID cards, whenever possible to confirm age, or return[ing] the next day to collect the missing information’.
Proportion of included addresses higher than expected.
Adair proceeded to explain the meaning of this indicator. ‘Sometimes the pre-list of addresses is completely disorganised. This makes fieldwork difficult. It's much easier to include new addresses as you go and later exclude those for which you did not get an interview’. This approach, however, poses a problem for IBGE, as all existing addresses contain ‘staples’ that allow them to be randomly selected for participation in IBGE's other surveys. Replacing an existing address with a new entry instead of simply fixing the cadastral issues removes the staple, triggering a spiral of statistical distortions in the randomised sampling of IBGE's other surveys that use the address database updated once every ten years during census data collection.
Moreover, João changed his tone, ‘this indicator may also be a sign of fraud’. Every time a new address was included, he explained, the system randomly attributed a short-form or long-form questionnaire to it. ‘If the enumerator is bored, tired, or lazy, they might intentionally exclude an existing address randomly selected for a long-form interview and include the same address again and again until it is assigned a short-form interview’. The consequences would be, again, far-reaching. Not only would that eliminate a critical piece of information – a household's sampling staple – but it would also cause the appearance of another indicator, number of long-form interviews lower than expected. Crucially, then, as would soon become clear, coordinators and supervisors needed to master soft skills to sieve through the technical language of indicators and ‘read’ in them vestiges of potentially fraudulent behaviour.
Questioning the Indicator
Adair split the audience into four groups and asked each to unpack a different indicator, reflecting on possible reasons for their activation and procedures to fix them. Short presentations and collective discussions followed group brainstorming. As the spokesperson for the first group presented their thoughts on the indicator proportion of vacant households higher than expected, someone in the audience raised a question that seemed to be in the minds of many coordinators: ‘We understand that the software will display some warning message on the screen, but we were wondering what this proportion is and what it is based on. Starting from what proportion of vacant properties will it be triggered? There must be an algorithm behind the programme!’
João and Adair seemed surprised by the audience's sudden peak in interest: ‘I understand’, Adair proceeded to answer. ‘But it's best not to disclose what this parameter is. Otherwise, it will generate a kind of anxiety’. He reasoned that if the normal range for vacant units per hundred households were known beforehand to supervisors and enumerators, then they would want to remain within that acceptable margin in order not to trigger the indicator and maximise individual performance – at the cost of data quality. ‘I get that’, the spokesperson responded, ‘but thinking as a supervisor, I would want to understand what the number is to better assess the reality in front of me’. The person in the audience conceded that the indicator could be flagging conceptual discrepancies or even fraudulent behaviour. However, they also described two scenarios in which data accuracy was not hindered by human error but by SIGC's incommensurability with reality itself. First, a new real estate development increases housing stock to a point where it outpaces demand, leaving behind a higher-than-expected proportion of vacant households. Second, a neighbourhood undergoing urban structural transformations triggers property depreciation and emigration, leaving behind a higher-than-expected proportion of vacant households. Given these different explanations for the indicator, the group reasoned that the supervisor should gather as much information on the area as possible by going to the field or talking to seasoned IBGE staff. If human error is ascertained, the group suggested that the supervisor have a one-to-one talk with the enumerator and propose to go to the field together to validate the collected data.
‘Thinking that it is important not to offend [melindrar] the enumerator, we decided to leave the option of requesting a revisit to the household indicated in the manual as last. In our opinion, it is a harsher approach, [useful only] in case the situation is not solved [by any other means]’. ‘But it is important to stick to the manual’, Adair intervened. ‘It says that if the supervisor detects that the household was not vacant, but occupied, for example, then they need to request that the enumerator go back and redo all the households they marked as vacant’. ‘The reasoning being: if you catch one [error], there is a chance that other [households] might be in the same situation’, João added.
A similar concern surfaced when the following group debated the indicator average number of residents per occupied private household higher or lower than expected. ‘What is the concept behind this indicator?’, the spokesperson for the group proceeded to ask the audience rhetorically. ‘It is a sign [indício] of improper inclusion of residents in the household’, she answered. Here, too, the possible explanations the group came up with to justify the indicator's appearance go beyond human error. ‘We need to make sure the enumerator understands what constitutes a dweller, but we cannot forget that sometimes there can be structural changes to the neighbourhoods that affect the average number of residents, such as the arrival or departure of collective dwellings like nursing homes’. ‘I want to comment on this’, another person from the same group interjected. ‘Everything depends on when the data in SIGC was collected. If we are talking about data from 2010 [the year of the last census], a lot can have changed, including the entire profile of residents from a particular neighbourhood. These things can happen’. ‘That's exactly our point!’, the spokesperson from the previous group interjected. ‘How are SIGC's averages calibrated? Is it specifically for that census tract, based on the previous census [of 2010], or is it average for the city, state or even the country?’
People in the audience continued to speculate about the algorithmic infrastructure behind the indicator for several minutes. João and Adair explained again that the parameters informing the appearance of an indicator could vary according to the area and variable. However, in all cases, the data feeding the indicator will be the most up-to-date and fine-grained available. ‘Inside SIGC, the supervisor will see the indicator and how much the collected data deviates from it’, João stressed. ‘Ok, but we are questioning the indicator’, the spokesperson for group two interjected. ‘Where does it come from? As was said, we can have an out-of-date indicator that prevents us from seeing what is new in reality’.
Another audience member added: ‘As a supervisor, what you will see in SIGC's report is a picture of the current reality. You understand? SIGC indicates that something must be wrong. But maybe it isn't’. As questions continued to arise, Adair proposed taking them to Cesar to obtain more precise answers. ‘An indicator should prompt you to observe and verify; it doesn't necessarily mean the data is wrong’, João tried again. ‘It is only meant to give you a parameter to gauge the reality in front of you and compare it against the existing data’.
Coordinators, however, continued to hypothesise scenarios in which the reality encountered in the field completely outpaced even the most comprehensive algorithm. At some point, João changed his tone and attempted another explanation for the existence and legitimacy of an indicator: ‘You know, indicators were first created to avoid situations in which the enumerator closes census tracts without giving us the guarantee that the data collected has the quality necessary to substantiate the reliability of the census’. Here, ‘closing a tract’ indicates that the enumerator finalised data collection for all addresses and that they underwent the formal procedures to declare a household unoccupied. Importantly, only after closing a tract on their DMC are enumerators reimbursed for the interviews performed in that area and allowed to work on the following tract.
The first spokesperson then wrapped up his understanding of the debate:
Earlier, it caught my attention when Adair said it's important that we don't know what's behind an indicator. Again, I think the supervisor should know what these parameters are. First, I don't believe these are actual indicators. To me, an indicator means something else. These are simple error messages filtered through some sieve. I think it is a failure that the manual doesn't tell us more about the parameters behind the indicators. We talk so much about planning, but in the end we really don't know what those indicators are telling us.
Before calling for a break, Adair reaffirmed that having ‘the exact information, that number you need to reach’, would be counterproductive. ‘If you know you are working within a certain [error] margin, you will always try to accommodate [yourself]. It becomes about landing inside the margin and not about the data’.
Taming Enumerators, Taming the Indicator
The contentious interpretations that arose around the workings of indicators illuminate important aspects of the datafication of Brazil's current population census. In their travails as IBGE training staff, João and Adair described the work of supervisors as entailing a continuous process of monitoring and testing reality that would ultimately lead to more accurately calibrated indicators. ‘If you find that a recently built new development is triggering the indicator, then write that on your SIGC justification’, João advised the audience. As supervisors tame enumerators, they generate feedback loops between the complexity of reality and the accrued data crystallised in the indicator. Here, supervisors appear as key nodes of veridiction between IBGE's concepts and their translation into everyday practice. Learning about the existence of a new building in a previously unoccupied area means, then, producing new human infrastructure capable of redrawing the link between the empirical and the informational so that, ultimately, the indicator – and its complex, institutional and epistemological apparatus – remains as much as possible a mirror of reality.
By contrast, coordinators were more concerned with the kinds of enumerators – and adverse enumeration practices – that indicators were yielding in the long run. ‘Finding out whether there is a new construction is crucial before you rub the enumerator the wrong way when their data might not even be wrong’, one coordinator in the audience stated. Here, gathering empirical information served to tame the indicator rather than the enumerator. This comes through, for example, in the vernacular critique that reconstrues indicators as ‘error messages’ and thus strips IBGE's terminology from its flair of scientificity. Underlying such an insurgent move is an effort to bring indicators out of the protected domains of conceptual precision and into the muddy and unpredictable terrain of data collection – so that it can serve the on-the-ground human infrastructure that produces the census. Here, the supervisor is seen as generating knowledge against the indicator: honing productive working sociabilities, building alternative mechanisms of trust and generating numbers more attuned to perceived reality.
Let us now look in more detail into these two competing modes of establishing the veridiction of an indicator. While, in both cases, the indicator exists as a mediator between IBGE's conceptual apparatus and the contingency of reality, the concrete work of ‘quality control’ it performs varies greatly.
In the first mode, which we term probabilistic, truth is established through iterative cycles of data validation and controlled comparison with older, accrued versions of reality, which work as the ultimate gauge to assess present-day transformations encountered in the field. Here, one need not know all new aspects of reality to gain a statistically relevant depiction of its complexity. By its very nature as a data-amassing technology, the indicator is thought to already contain the relevant range of empirical variation. Like the perennial character of statistical institutions behind numbers, indicators are designed to reflect incremental rather than disruptive changes. This is not to say that IBGE staff work with atemporal, definite or reified conceptions of ‘reality’ and ‘indicators’. Following Alain Desrosières’ (1998) point about the double-edged nature of statistics, we argue that, for these practitioners, an indicator presents itself as both real and constructed. It is real in so far as it is designed to reflect reality through the production of ‘scientific facts’ that contain claims to objectivity and universality that allow an indicator to work as a ‘technology of distance’ (Porter 1995). At the same time, an indicator is also constructed in so far as such claims must be continuously enforced through coordinated institutional, political and social conventions and practices. Here, indicators are baked through years of amassed institutional, conceptual and bureaucratic labour and thus carry gravitational weight in shaping the all-too-fleeting realities that enumerators encounter on the ground.
In this sense, indicators possess many of the mediating properties of scientific models. Mary Morgan and Margaret Morrison contend that for models to work as ‘critical instruments of modern science’ (1999: 10), they require a degree of autonomy from both theory and data that enables them to function as multipurpose connecting devices between things. Similarly, in the pedagogical work of IBGE staff, indicators appear as quantification tools in so far as they yield enduring representations of different aspects of the world, systematise them into a metrological language that is (partially) independent of the phenomena it represents and provide an operational grid for practitioners to ‘read’ and gauge reality. In this process, they are cast as mediating devices between the theory consecrated in IBGE's concepts and the world presented in data collection. Thus, in the day-to-day practice of in-house data operators, indicators are envisaged to function much like Gaussian probability theory: not as a set of universal laws derived from the unmediated empirical realm but as specific instruments carefully designed to accurately reflect and consistently predict outcomes under controlled conditions (Hacking 1990).
Here, the probabilistic causation of accrued numbers is supposed to provide a framework of action and representation for newcomers to data collection. ‘Good’ indicators are meant to guide field operators through their perusal of reality as much as they are meant to set the conditions for assessing ‘good’ or ‘proper’ data collection behaviour. The figure of ‘fraud’ conjures the image of a breakdown in the regime of veridiction imposed by a good indicator. It translates a technical expectation – the imperative of harvesting reliable data – into a moral exhortation – the imperative of making the gathered data reliable and thereby usable in future chains of data validation through controlled behavioural practices. Ultimately, then, a good indicator presupposes a befitting behaviour by the data collector for it to retain its probabilistic and conventional nature and its ability to anticipate trends realistically into the future.
Against the envisaged stability of the probabilistic mode, another mode of veridiction, which we call performative, emerges from the experiences of the temporary employment of coordinators, supervisors and enumerators. Whereas for IBGE staff, good indicators were expected to ‘disappear by themselves’ into the background of data collection, for operators on the ground, their very algorithmic nature needed to be rendered visible and interrogated. Here, truth – what was referenced by trainers and trainees as ‘data quality control’ – is established not via real-time feedback loops between the empirical and the conceptual but by taking heed of the contextual and human elements as they present themselves in the everyday contingency of data collection. Indicators, we are reminded by the coordinators’ critique, are nothing but a curated and normalised collection of sedimented data points in time and space that were purged of their human components so as to normalise future human behaviour.
From this perspective, to keep their heuristic validity, indicators must be drawn back into the messiness of the everyday from which they originate: the critical points where the human attaches to the machine. This is because the ‘real world out there’, with its complexity and dynamic transformations, is seen as fundamentally irreducible to the gridlock abstractions of indicators and their probabilistic reasoning. While indicators can only process a limited amount of information in their modelling to conserve their capacity to predict future regularities, empirical worlds are incommensurable and potentially infinite in the amount of information they produce. They can sometimes even disprove the predictive capabilities of indicators and render them outpaced – hence why coordinators spoke of the need to tame indicators.
The notions of partiality and contingency that indicators expunge are the basis for new, situated and human-driven claims of veridiction. On the one hand, these claims embody a native critique of data politics (Bigo et al. 2019; Ruppert et al. 2017) – first, by questioning the representational and informational nature of indicators; second, by positing the need to attend to data collection as a social and political practice. Yet they also seem to be exposing the ‘method assemblages’ (Knox and Nafus 2018; Law 2004) sustaining this quantification tool, that is, the material and technological arrangements that enable the indicator to perform its work of purifying, categorising, standardising and ultimately estimating the range of ‘plausible’ realities.
Here, taming the indicator means considering how the human infrastructure of census implementation and the encounters it generates between data collectors and respondents shape alternative sites of veridiction and data quality control. Rather than a search for ‘fraudulent behaviour’, breakdown instances spark questions about ‘bad indicators’ and the need for the human infrastructure to be preserved and overwrite the machine compound. For as much as an indicator can contain different perspectives of reality, it cannot account for the different realities that data practices continually perform (Mol 2002). Thus, the ethical and quasi-invisible work of field operators foregrounds the bottom-up epistemologies and insurgent methodologies that emerge out of the situatedness of data collection.
Conclusion
At their disposal, IBGE staff have a range of disciplinary measures to counteract fraud – from face-to-face conversations, to payment withholding, to the ultimate dismissal (desligamento) of enumerators. Much like fines against refusing respondents are intended to fashion conscious and data-responsive citizens, fraud indictments are meant to shape data-responsive enumerators constructed through relations of data production, accumulation, processing and circulation (Chandler and Fuchs 2019; Isin and Ruppert 2020).
While this article does not explore how the new digitised tools for data control are being taken up in the everyday data collection practice of operators, our ethnography of training events has identified the affective modulations whereby temporarily hired census coordinators and IBGE training staff negotiate reality and thus become data subjects. In their interactions within the bureaucratic form of the meeting, we mapped tensions between probabilistic and performative modes of producing and relating to information. While the former foregrounds reality as resulting from technical and algorithmic ordering, the latter emphasises the ethical work flourishing from data encounters as bottom-up sites for reality validation.
Census training events such as the ones we covered in this article are privileged empirical and methodological windows into the making of these competing modes of veridiction. At training events, new relations of trust are performed at both the human and techno-epistemological levels. On the one hand, a training event is an act of repasse, of passing on, translating and repurposing knowledge: it allows us to see how concepts and methods ‘become alive’ as they trickle down from IBGE's centres of calculation into the fringes of these information systems. On the other, a training event is a negotiated act of rethinking the mechanisms of trust and veridiction that underly data collection and knowledge production. Critically, the labour asymmetries at the core of the census enterprise are pivotal in linking these scales and moulding the various visions of what constitutes proper data and data quality controls. As trainees build mutual trust and domesticate top-down information, they recast extant notions of what makes a good indicator and, ultimately, what the imaginable future of data, collection and policy might resemble in a country like Brazil.
Thus, the training activities of IBGE's latest population census illuminate how quantification tools take form as attempts to both shape reality (what scholars have identified as ‘data politics’ and ‘method assemblages’) and imagine anew the boundaries of such reality (through qualified and situated ‘data practices’). Combining insights from quantification and data studies sheds new light on the intersections between informational systems and human practices, allowing us to see emerging spaces for number politics that take us beyond the political and social renderings of data (Douglas-Jones et al. 2021; Guyer et al. 2010) into the spaces where data and reality are co-constructed via subjective and techno-moral logics.
The case of the fraud-detecting indicator and the regimes of veridiction around it showcases that ideals of futurity are actively designed through techno-institutional cultures and apparatuses but also through everyday and seemingly irrelevant mundane acts of making top-down conceptual frameworks function in practice. To gain a deeper understanding of how quantification tools are imbued with and shape futurity, we must pay attention to intermediary discourse formations – such as training activities – and actors, including the roles of coordinators, supervisors and enumerators. Their structurally precarious position and imaginative actions matter to grasp how tools and numbers are infused with life, veridiction and politico-economic relevance.
Acknowledgements
This article was written as part of the InfoCitizen project at the Institute of Development Policy, University of Antwerp, with funding from ERC Starting Grant no. 101076030. Preliminary research was carried out as part of the Future Data project at the Université Libre de Bruxelles with funding from a Marie Skłodowska Curie grant (agreement no. 801505). We profited from the insights that came as a result of the discussions that took place during the International Symposium on Quantified Futures at the Laboratoire d'Anthropologie des Mondes Contemporains, Université Libre de Bruxelles, in September 2022. We thank our interlocutors at IBGE for their exchanges and collaboration throughout our fieldwork.
Notes
The agricultural census is a nation-wide operation designed to capture data on all farms and farmers – including aspects of land, livestock, labour force, animal housing and manure management, and rural development measures.
A sub-area comprises several census tracts. Large metropolitan cities like São Paulo, Rio de Janeiro and Porto Alegre are divided into several sub-areas.
Like other national statistics organisations in the world affected by New Public Management, IBGE has seen significant reductions in its permanent staff over recent decades, generating expertise imbalances and institutional tensions.
References
Anderson, M. J. 2015. The American Census: A Social History. New Haven, CT: Yale University Press.
Barbosa, R. and J. Szwako. 2019. ‘Por Que Há uma Grave Ameaça de Apagão Estatístico no Brasil’ [Why there is a serious threat of statistical blackout in Brazil]. Nexo Jornal, 21 April. https://www.nexojornal.com.br/ensaio/2019/Por-que-há-uma-grave-ameaça-de-apagão-estat percentC3 percentADstico-no-Brasil.
Bianchini, Z. 2011. ‘The 2010 Brazilian Population Census: Innovations and Impacts in Data Collection’. Paper presented at the 58th World Congress of the International Statistical Institute, Dublin, 21–26 August.
Bigo, D., E. Isin and E. Ruppert (eds). 2019. Data Politics: Worlds, Subjects, Rights. London: Routledge.
Brown, H., A. Reed and T. Yarrow. 2017. ‘Introduction: Towards an Ethnography of Meeting’. Journal of the Royal Anthropological Institute 23 (S1): 10–26. https://doi.org/10.1111/1467-9655.12591.
Bruno, M., and F. Grassia,... and A. Girma. 2020. ‘Census Metadata Driven Data Collection Monitoring: The Ethiopian Experience’. Statistical Journal of the IAOS 36 (1): 67–76. https://doi.org/10.3233/SJI-190582.
Chandler, D. and C. Fuchs. 2019. ‘Introduction’. In D. Chandler and C. Fuchs, (eds), Digital Objects, Digital Subjects: Interdisciplinary Perspectives on Capitalism, Labour and Politics in the Age of Big Data. London: University of Westminster Press, 1–20.
Darrow, D. 2002. ‘Census as a Technology of Empire’. Ab Imperio 4: 145–176. https://doi.org/10.1353/imp.2002.0132.
Desrosières, A. 1998. The Politics of Large Numbers: A History of Statistical Reasoning. Cambridge, MA: Harvard University Press.
Douglas-Jones, R., A. Walford and N. Seaver. 2021. ‘Introduction: Towards an Anthropology of Data’. Journal of the Royal Anthropological Institute 27 (S1): 9–25. https://doi.org/10.1111/1467-9655.13477.
Emigh, R., D. Riley and P. Ahmed. 2016. Changes in Censuses from Imperialist to Welfare States: How Societies and States Count. London: Palgrave Macmillan.
Foucault, M. 2014. Wrong-Doing, Truth-Telling: The Function of Avowal in Justice. Chicago: University of Chicago Press.
Guyer, J., N. Khan... and H. Verran. 2010. ‘Introduction: Number as Inventive Frontier’. Anthropological Theory 10 (1–2): 36–61. https://doi.org/10.1177/1463499610365388.
Hacking, I. 1990. The Taming of Chance. Cambridge: Cambridge University Press.
Hull, M. S. 2012. ‘Documents and Bureaucracy’. Annual Review of Anthropology 41: 251–267. https://doi.org/10.1146/annurev.anthro.012809.104953.
Isin, E. and E. Ruppert. 2020. Being Digital Citizens. Washington, DC: Rowman and Littlefield.
Jannuzzi, P. 2004. Indicadores Sociais no Brasil: Conceitos, Fontes de Dados e Aplicações [Social indicators in Brazil: Concepts, data sources, and applications]. Campinas: Editora Alínea.
Jhamba, T., S. Juran and M. Jones. 2020. ‘UNFPA Strategy for the 2020 Round of Population and Housing Censuses (2015–2024)’. Statistical Journal of the IAOS 36: 43–50. https://doi.org/10.3233/SJI-190600.
Jöns, H. 2011. ‘Centre of Calculation’. In J. Agnew and D. Livingstone (eds), The SAGE Handbook of Geographical Knowledge. London: SAGE, 158–170.
Knox, H. and D. Nafus. 2018. Ethnography for a Data-Saturated World. Manchester: Manchester University Press.
Kopper, M. 2023. ‘“Our Census Is the Informational Infrastructure of the Country”: Mobilizing for Data through Technopolitical Controversies in Brazil's 2020 Population Census’. Revue Brésil 23. https://doi.org/10.4000/bresils.14425.
Lanata-Briones, C., A. Estefane and C. Daniel. 2022. Socio-Political Histories of Latin American Statistics. London: Springer Nature.
Larkin, B. 2008. Signal and Noise: Media, Infrastructure, and Urban Culture in Nigeria. Durham, NC: Duke University Press.
Latour, B. 1987. Science in Action: How to Follow Scientists and Engineers through Society. Cambridge, MA: Harvard University Press.
Law, J. 2004. After Method: Mess in Social Science Research. London: Routledge.
Mezey, N. 2003. ‘Erasure and Recognition: The Census Race and the National Imagination’. Georgetown Law Faculty Publications 196: 1701–1768. https://scholarship.law.georgetown.edu/facpub/196.
Mol, A. 2002. The Body Multiple: Ontology in Medical Practice. Durham, NC: Duke University Press.
Morgan, M. and M. Morrison (eds). 1999. Models as Mediators: Perspectives on Natural and Social Science. Cambridge: Cambridge University Press.
Mrkić, S. 2020. ‘The 2020 Round of Population and Housing Censuses: An Overview’. Statistical Journal of the IAOS 36: 35–42. https://doi.org/10.3233/SJI-190574.
Porter, T. M. 1995. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, NJ: Princeton University Press.
Ruppert, E. 2007. ‘Producing Population’. CRESC Working Paper Series 37: 2–32. https://core.ac.uk/display/16467044.
Ruppert, E., E. Isin and D. Bigo. 2017. ‘Data Politics’. Big Data and Society 4 (2): 1–7. https://doi.org/10.1177/2053951717717749.
Ruppert, E. and S. Scheel. 2021. Data Practices. Cambridge, MA: MIT Press.
Scott, J. 1998. Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. New Haven, CT: Yale University Press.
Strathern, M. (ed.). 2000. Audit Cultures: Anthropological Studies in Accountability, Ethics, and the Academy. London: Routledge.
Thorvaldsen, G. 2017. Censuses and Census Takers: A Global History. New York: Routledge.
Valentine, D. and A. Hassoun. 2019. ‘Uncommon Futures’. Annual Review of Anthropology 48: 243–260. https://doi.org/10.1146/annurev-anthro-102218-011435.
Verran, H. 2012. ‘Number’. In C. Lury and N. Wakeford (eds), Inventive Methods: The Happening of the Social. London: Routledge, 110–124.
UN (United Nations). 2019. Guidelines on the Use of Electronic Data Collection Technologies in Population and Housing Censuses. New York: United Nations Statistics Division. https://unstats.un.org/unsd/demographic/standmeth/handbooks/guideline-edct-census-v1.pdf.
UNECA. 1971. ‘Staff Training Manual for Population and Housing Census’. Addis Ababa: United Nations Economic Commission for Africa. June. https://hdl.handle.net/10855/18916.