Generating a Fully Synthetic Human Services Dataset

Current Information

In 2022, staff at the Urban Institute partnered with the Allegheny County DHS and the Western Pennsylvania Regional Data Center (WPRDC) to pilot synthetic data generation at the local level, to help understand the unique challenges that might face state and local governments in generating synthetic data. Each record in the synthetic dataset represents a simulated individual, or record, who received at least one service from the Allegheny County DHS in 2021. The synthetic data were designed such that records aggregated by service represent the original data. Read more here about synthetic data.

Why create a synthetic dataset?

The Department of Human Services (DHS) in Allegheny County, Pennsylvania, serves one in five residents of the county every year through child welfare services, behavioral health services, aging services, developmental support services, homeless and housing supports, and family strengthening and youth supports. In the process, data are collected about these services and the population using them. These data are integrated at the individual level to allow for better care coordination, operational improvements, and program evaluation. Because of the dataset’s sensitive nature, it cannot be widely shared at an individual level, so synthetic data are used in the real dataset’s place—allowing the data to be publicly shared and helping stakeholders, including researchers, service providers, and members of the public, understand these populations better.