+
Skip to content

get_imm_resid - data issues  #17

@jpycroft

Description

@jpycroft

@jdebacker @rickecon

This issue has been copied over from jpycroft#2 to have it on the main repository.

As a number of data issues arose when producing the get_imm_resid function within demographics.py, I'm starting this issue to keep a record of them and to allow others to comment on the decisions taken.

The production of net immigration rates, imm_rates, went through the following steps:

  1. Try to use Eurostat immigration rates directly ... but Eurostat no longer publishes by age for the UK (presumably a post-Brexit change). For future reference, other EU countries are there.

  2. Return to the OG-USA style of backing out the immigration rates from the population total, as done in get_imm_resid in demographics.py in OG-USA-Calibration. This works well for most ages, but does not work well for:
    a. age 0, new borns (see point 3).
    b. the oldest ages, especially age 90+ (see point 4).

  1. Adjust new born values:
    The OG-USA methodology uses fert_rates and applies them to 2015, 2016, 2017 populations to obtain 2016, 2017, 2018. The problem with this is that the fert_rates for 2018 are much lower than the 2015 rates. Therefore, the calculated new borns in 2016 are more than 40,000 below the actual new borns. The standard get_imm_resid allocates this shortfall to net immigration of babies, leading to a net immigration rate of 7%, while the actual rate is closer to 0.7%.
    Instead of approximating, I downloaded the actual numbers of new borns in 2015, 2016 and 2017 from Eurostat. These then become the "newborn" array, from which the imm_rates[0] is calculated.

  1. The over 90s:
    The over 90s data is not consistent. The standard methodology suggests that imm_rates for some years over 90 rise to over 5%, hitting 19% for age 99. This is vanishingly unlikely to be accurate. The overall population and mortality numbers are not consistent (even when one downloads the full data for all years). Any errors in the data are amplified by the small denominators, e.g. there are less than 10,000 people aged 99.
    To fix this, I have replaced the over 90s values with the average value for ages 80 to 89. This allows for some continued migration of the over 90s, but by using the data for aged 80 to 89, the errors are smoothed out and the denominators used for the calculation are much larger.

  1. Moving average smoothing:
    The above adjustments lead to a much improved imm_rates. However, there are still a number of spikes in the data, which are unlikely to contain real long-term information. Therefore a simple three-year moving average is applied.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载