Characterizing the Food Retail Environment: Impact of Count, Type, and Geospatial Error in 2 Secondary Data Sources

Commercial listings of food retail outlets are increasingly used by community members and food policy councils and in multilevel intervention research to identify areas with limited access to healthier food. This study quantified the amount of count, type, and geospatial error in 2 commercial data sources.

InfoUSA and Dun and Bradstreet were compared with a validated field census and validity statistics were calculated.

Considering only completeness, Dun and Bradstreet data undercounted 24% of existing supermarkets and grocery stores, and InfoUSA, 29%. In addition, considering accuracy of outlet type assignment increased the undercount error to 42% and 39%, respectively. Marked overcount existed as well, and only 43% of existing supermarkets were correctly identified with respect to presence, outlet type, and location.

Conclusions and Implications
Relying exclusively on secondary data to characterize the food environment will result in substantial error. Whereas extensive data cleaning can offset some error, verification of outlets with a field census is still the method of choice.