May 09, 2025

Coo-incidence: What London Pigeons Can Teach Us About Data Engineering

What do London Pigeons have to do with data engineering anyway? Sounds like the start of a bad joke.

Pigeons can teach us a lot about data engineering. They constantly process data to navigate a complex, ever-changing environment – sometimes without their toes! To an untrained eye, they might look like vermin, spreading filth to otherwise clean spaces. But in my opinion, they offer a fascinating analogy into the world of data.

Signal Amidst Scraps: The Trafalgar Square Scenario

The iconic Trafalgar Square, no longer dominated by its former pigeon flocks, is now probably better known for its sculptures, like this whimsical ice cream cone, complete with a fly and a drone. Feeding pigeons at Trafalgar Square has been banned since 2003, meaning our feathered friends have had to find new places to eat!

The remaining pigeons must cope with a very different landscape, once bountiful of seed. We can think of this as a signal-to-noise ratio problem: how can we find value from noisy data?

Data Filtering
One of the easiest ways to speed up SQL queries is to reduce the volume of data you need to process. Within a data lake, we can store a virtually unlimited amount of data. The reality is that we don't need to process everything stored, only what is relevant to consume. Pigeons generally have an instinct for what is and what isn't food. As a data engineer, you will understand the scope of the data you need to use and apply filtering to eliminate some of the obvious noise.
Cleaning and pre-processing:
Working with data scientists is one of the most rewarding parts of data engineering for me; I've learned a lot about pre-processing and data cleansing from them. Essentially, what this boils down to is discriminating and checking whether records are fit for consumption. If not, then business rules can be applied to them. Records should be discarded if completely unsalvageable or anomalous. It's the same with pigeons, they might peck at something to test the texture or smell to see if it's edible.
Data Verification:
”Dirty data” costs hundreds of millions of revenue to the UK businesses. Data verification helps to ensure that the data is correct. This can be achieved through double-entry, proofreading, sampling and reconciliation against known data sources. Through employing these checks, the ingestion of bad data can be avoided, akin to pigeons avoiding inedible items, preventing costly mistakes.

A Pigeon's Memory for a Good Crumb: Metadata Management And Cataloguing

London pigeons always seem to remember where they found food before. That's why we see them in the same places: stations and parks, which have lots of foot traffic.

The more I think about this, the more I believe pigeons have some form of information schema for finding crumbs. If we put enough pigeons in the room, we could build a pretty good data catalogue with all the metadata they've collected!

What is Metadata?

Metadata appears in a lot of places. For example, even this post has metadata!

Metadata is simply data about the data; imagine it as a pigeon’s internal note about a particularly promising crumb! It’s not just about the food, it‘s the details that make it memorable: location (a park bench next to an oak tree), type (A sugary blueberry muffin), size (Is it worth the effort). This data “about the crumb“ is what guides the pigeon back.

What is Metadata Management?

Metadata management is about organising the data about the data; metadata without management is as useful as a flock of birds without a migration pattern... Through metadata management, we can enhance our understanding of data assets, driving better utilisation and consumption. You can see it as an overarching discipline, strategy, and set of processes for handling metadata.

What is a Data Catalogue?

A data catalogue is a comprehensive inventory of all data assets in an organisation. This enables metadata management, data discovery, collaboration, governance and compliance. In my opinion, data catalogues should also inspire as much excitement as the Argos catalogue at Christmas time did when I was a kid.

Transfer Speeds: Pigeons vs Pipelines

Picture shows a MicroSD card: I know it should be a 4GB FlashDrive instead!

In 2009, a South African telecoms company showed that for a certain amount of data over a specific distance, a carrier pigeon was quicker than their ADSL connection's upload speed. Jeff Geerling has created his own test inspired by this, with one pigeon that can carry 3TB of data!

However, even Geerling's souped-up, data-laden pigeon highlights a fundamental truth: the limitations of physical media for large-scale, continuous data movement in today's digital landscape. Imagine trying to populate a data lake with petabytes of information using even an army of 3 TB-carrying birds – the logistics, latency, and sheer volume would be insurmountable.

Thankfully, instead of relying on avian couriers, modern data engineering leverages high-bandwidth networks, sophisticated ETL/ELT pipelines, and cloud-based services to move and transform massive datasets efficiently and reliably. These technologies enable the seamless flow of data from diverse sources into data lakes, often in near real-time.

Data Touchdown: The Enduring Lessons Of London Pigeons

Who would have thought that the ubiquitous London pigeon could offer guidance to data engineers? Yet, through the lens of their daily struggles to find sustenance and navigate their surroundings, we uncover fundamental concepts relevant to our field. From discerning signal from noise to the importance of verification and the insights hidden within seemingly random distributions, the lessons learned from these urban survivors serve as a compelling reminder that valuable principles can be found in the most unexpected of places.