Capture-recapture model

A capture-recapture model is a technique to estimate an unknown population by capturing, tagging, and re-capturing samples from the population.

In the article How many Mechanical Turk workers are there?, Panos Ipeirotis explains a simple version of a capture-recapture model as follows:

The simplest possible technique is the following:

  • Capture/marking phase: Capture n1 animals, mark them, and release them back.

  • Recapture phase: A few days later, capture n2 animals. Assuming there are N animals overall, n1/N of them are marked. So, for each of the n2 captured animals, the probability that the animal is marked is n1/N (from the capture/marking phase).

  • Calculation: On expectation, we expect to see n2n1N marked animals in the recapture phase. (Notice that we do not know N.) So, if we actually see m marked animals during the recapture phase, we set m=n2n1N and we get the estimate that:

N=n1n2m

He adds that this basic version of a capture-recapture model makes the following assumptions, and the estimate N can be inaccurate when these assumptions are violated:

  • Assumption of no arrivals / departures (“closed population”): The vanilla capture-recapture scheme assumes that there are no arrivals or departures of workers between the capture and recapture phase.

  • Assumption of no selection bias (“equal catchability”): The vanilla capture-recapture scheme assumes that every worker in the population is equally likely to be captured.