Small communities are at risk from Big Data errors

Good policymaking relies on good data and ensuring quality, meaningful data faces constant challenges in a fast-evolving information ecosystem.

The quantity of data has increased, real-time data have become ubiquitous and data sources have become more varied, notably as administrative data are harnessed. Furthermore, artificial intelligence (AI) – despite its biases and limitations – is easing access to, organization of, and interpretation of, these data.

In this context, it has been suggested that policymaking can respond to changing social and economic situations more rapidly and accurately.

This may be the case. However, it does not adequately address a perennial limit of all data, big or small, that continues to impact policy: however much data are available and however well they are compiled, the information they convey declines in quality as one focuses on smaller communities.

This does not hinder the development of policy, which rests upon the identification of social issues and trends, but it does hinder its targeting and delivery, which rest upon accurate and reliable measurement.

Mistargeting of programs for smaller communities risks breeding discontent and mistrust if perceived as arbitrary.

Solutions exist, such as combining data with local knowledge at the community (e.g. local council) and regional (e.g. regional universities or research centres) levels. This hybrid assessment model can lead to better-designed policies for smaller communities.

While this approach presents more complexities for administrators than simply applying key performance indicators (KPIs), projections and data-driven profiles, it ultimately helps by reinforcing the perception that all communities matter.

Why are small community data unreliable?

Even with well-collected and fine-grained data, each particular measurement is prone to potential errors due to data entry issues, coding, misunderstanding or mismanagement. In many instances, these are the unavoidable pitfalls of all measurement.

For larger communities or for data aggregated across many small communities, the values and trends derived from imperfect measurement have small relative errors. Random errors tend to cancel out and unbiased errors do not tilt data one way or another. Good policies can be devised on their basis.

So, what is the problem?

Even in a world of big and timely data where policies are targeted based on thresholds such as income or population cutoffs, or performance indicators such as growth rates, some communities that should be included will get left out. Conversely, some communities that should not be included will be. Whether the cutoff is met can be driven by measurement error rather than by policy-relevant factors.

One example is the Canada Mortgage and Housing Corp.’s housing accelerator fund. launched in the summer of 2023 to “encourage initiatives that increase housing supply and promote the development of affordable, inclusive and diverse communities that are low-carbon and climate-resilient.”

The program has two streams: one for large urban areas and one for rural, Northern and Indigenous communities of fewer than 10,000 people. To apply, the local government “must calculate their own [housing] projections based on reasonable assumptions and data sources, including Statistics Canada and/or its own administrative data (…) projections should be based on a three-year period ending September 1, 2026.”

The inner workings of government

Keep track of who’s doing what to get federal policy made. In The Functionary.

Read it now.

The Functionary

Our newsletter about the public service. Nominated for a Digital Publishing Award.

To be authoritative, these usually rely upon Statistics Canada population projections combined with the expected components of population change. For small, municipal governments, data availability and quality make this condition almost unattainable. There are four key reasons:

Measurement error of the baseline population

In small communities, even the baseline population is uncertain. For example, the 2011 census indicated that the population of Rose Blanche-Harbour Le Cou, N.L., , had dropped to 118 from 547 in 2006 This was a manifest error that was eventually recognized as such by Statistic Canada. But it could not be corrected in the census. It is likely that more modest errors, of say 50 or 100, are common but go unnoticed. Yet such errors can have a significant impact on targeting program spending and growth.

Absence of local projections

Statistics Canada produces demographic projections at the level of urban agglomerations (CAs and CMAs) and census divisions (counties). A fast-growing municipality may be located in a declining census division due to an influx from rural areas. This scenario can trigger significant housing supply issues which cannot be documented because here are no authoritative projections for the municipality.

Lack of capacity

Gathering, evaluating and transforming data into projections requires time and skill. Small municipal governments often lack the resources to fulfil data requirements and highlight or even detect relevant data issues.

Intersectionality

An increasing number of policies seek to respond to issues of equity and diversity. They often require information on sub-populations and on how they will be affected. Compounding measurement and capacity issues for small communities, data collection may not provide a fulsome portrait to protect individual privacy.

The limits of data: do all communities matter?

In an era of data triumphalism marked by clarion calls for evidence-based policy and quantifiable indicators to justify much-needed programs, too little attention is paid to basic data quality.

The effects of data-driven error are not equal. The consequences are predominantly felt in smaller, more vulnerable and marginalized communities.

The central issue is not limited to equity and fairness in the development and application of programs. People in communities most at risk of getting the short end of the stick when it comes to resource distribution via data-driven programs stand to lose trust in government because they come to believe they simply don’t matter.

Do you have something to say about the article you just read? Be part of the Policy Options discussion, and send in your own submission, or a letter to the editor.

Richard Shearmur

Richard Shearmur is a professor at McGill University’s school of urban planning. His research focuses on regional development, innovation by firms in peripheral regions and innovation in municipal organizations.

View all by this author

Jamie Ward

Jamie Ward manages the regional analytics lab at Memorial University's Harris Centre.

View all by this author

You are welcome to republish this Policy Options article online or in print periodicals, under a Creative Commons/No Derivatives licence.