error: unknown/unsupported geography heirarchy, when querying Census tracts - r

I'm following rcensusapi's guide on retrieving data for retrieving Census tract-level data. Here's my code.
getCensus(
name = "acs/acs5/cprofile",
vintage = 2018,
key = Sys.getenv("CENSUS_API"),
vars = c("NAME","CP03_2014_2018_062E"),
region = "tract:*",
regionin = "state:12+county:033"
)
But when I run this code I get this error.
Error in apiCheck(req) :
The Census Bureau returned the following error message:
error: unknown/unsupported geography heirarchy

This similar query seemed to work to give a county-wide number.
getCensus(
name = "acs/acs5/cprofile",
vintage = 2018,
key = Sys.getenv("CENSUS_API"),
vars = c("NAME","CP03_2014_2018_062E"),
region = "county:033",
# region = "county", # to get all counties in FL
# region = "congressional district", # to get all congressional districts in FL
regionin = "state:12"
)
# state county NAME CP03_2014_2018_062E
#1 12 033 Escambia County, Florida 49286
It's also possible to get the values for all Florida counties or congressional districts with the alternate filters above.
But unfortunately I don't think it's possible to get tract-level detail for this particular query.
https://api.census.gov/data/2018/acs/acs5/cprofile.html
https://api.census.gov/data/2018/acs/acs5/cprofile/examples.html
Judging by the help at those links, it doesn't look like this survey is available at the tract level. Here are the listed geographical levels for the region parameter from here. (Also, as #ThetaFC noted in the comments, it's possible to query this list directly using listCensusMetadata(name = "acs/acs5/cprofile", vintage = 2018, type = "geography").)

Related

How to get US county name from Address, city and state in R?

I have a dataset of around 10000 rows. I have the Address, City, State and Zipcode values. I do not have lat/long coordinates. I would like to retrieve the county name without taking a large amount of time. I have tried library(tinygeocoder) but it takes around 14 seconds for 100 values, and is giving a 'time-out' error when I put in the entire dataset. Plus, it's outputting a fip code, which I have to join to get the actual county name. Reproducible example:
library(tidygeocoder)
library(dplyr)
df <- tidygeocoder::louisville[,1:4]
county_fips <- data.frame (fips = c("111", "112"),
county = c("Jefferson", "Montgomery"))
geocoded <- df %>% geocode(street = street, city = city, state = state,
method = 'census', full_results = TRUE,
api_options = list(census_return_type = 'geographies'))
df$fips <- geocoded$county_fips
df_new <- merge(x=df, y=county_fips, by="fips", all.x = T)
You can use a public dataset that links city and/or zipcode to county. I found these websites with such data:
https://www.unitedstateszipcodes.org/zip-code-database
https://simplemaps.com/data/us-cities
You can then do a left join on the linking column (presumably city or zipcode but will depend on the dataset):
df = merge(x=df, y=public_dataset, by="City", all.x=T)
If performance is an issue, you can select just the county and linking columns from the public data set before you do the merge.
public_dataset = public_dataset %>% select(County, City)
The slow performance is due to tinygeocoder's use of the the Census Bureau's API in order to match data. Asking the API to match thousands of addresses is the slow down, and I'm not aware of a different way to do this.
However, we can at least pare down the number of addresses that you are putting into the API. Maybe if we get that number low enough the code will run.
The ZIP Code Tabulation Areas (ZCTA) shows the relationships between ZIP Codes and county names (as well as FIPS). A "|" delimited file with a description of the data can be found on the Bureau's website.
Counting the number of times a ZIP code shows up tells us if a ZIP code spans multiple counties. If the frequency == 1, then you can freely translate the ZIP code to the county.
ZCTA <- read.delim("tab20_zcta520_county20_natl.txt", sep="|")
n_occur <- data.frame(table(ZCTA$GEOID_ZCTA5_20))
head(n_occur, 10)
Var1
Freq
1
601
2
2
602
2
3
603
2
4
606
3
5
610
4
6
611
1
7
612
3
8
616
1
9
617
2
10
622
1
In these results, addresses with ZIP codes 00611 and 00622 can be mapped to the corresponding counties without sending the addresses through the API. If your addresses are very urban, then you may be lucky in that the ZIP codes are small area-wise and may not span typically multiple counties.

Separating geographical data strings in R

I'm working with QECW data from BLS and would like to make the geographical data included more useful. I want to split the column "area_title" into different columns - one with the area's name, one with the level of the area, and one with the state.
I got a good start using separate:
qecw <- qecw %>% separate(area_title, c("county", "geography level", "state"))
The problem is that there's a variety of ways the geographical data are arranged into strings that makes them not uniform enough to cleanly separate. The area_title column includes names in formats that separate pretty cleanly, like:
area_title
Alabama -- Statewide
Autauga County, Alabama
which splits pretty well into
county geography level state
Alabama Statewide NA
Autauga County Alabama
but this breaks down for cases like:
area_title
Aleutians West Census Area, Alaska
Chattanooga-Cleveland-Dalton TN-GA-AL CSA
U.S. Combined statistical Areas, combined
as well as any states, counties or other place names that have more than one word.
I can go case-by-case to fix these, but I would appreciate a more efficient solution.
The exact data I'm using is "2019.q1-q3 10 10 Total, all industries," available at the link under "Current year quarterly data grouped by industry".
Thanks!
So far I came up with this:
I can get a place name by selecting a substring of area_title with everything to the left of the first comma:
qecw <- qecw %>% mutate(location = sub(",.*","", qecw$area_title))
Then I have a series of nested if_else statements to create a location type:
mutate(`Location Type` =
if_else(str_detect(area_title, "Statewide"), "State",
if_else(str_detect(area_title, "County"), "County",
if_else(str_detect(area_title, "CSA"), "CSA",
if_else(str_detect(area_title, "MSA"), "MSA",
if_else(str_detect(area_title, "MicroSA"), "MicroSA",
if_else(str_detect(area_title, "Undefined"), "Undefined",
"other")))))))
This isn't a complete answer; I think I'm still missing some location types, and I haven't come up with a good way to extract state names yet.

getCensus Hawaii City Populations

I'm looking to gather populations for Hawaiian cities and am puzzled how to collect it using the censusapi getCensus() function.
census_api_key(key='YOURKEYHERE')
newpopvars <- listCensusMetadata(name = "2017/pep/population", type = "variables")
usapops <- getCensus(name = "pep/population",
vintage = 2017,
vars = c(newpopvars$name),
region = "place:*")
usapops <- usapops[which(usapops$DATE_==10),]
state <- grepl("Hawaii", usapops$GEONAME)
cities <- data.frame()
for (i in seq(1,length(state))) {
if (state[i] == TRUE) {
cities <- rbind(cities,usapops[i,])
}
}
This returns only two cities but certainly there are more than that in Hawaii. What am I doing wrong?
There is only one place (Census summary level 160) in Hawaii which is large enough to be included in the 1-year American Community Survey release: "Urban Honolulu" (GeoID 1571550). The 1-year release only includes places with 65,000+ population. I assume similar controls apply to the Population Estimates program -- I couldn't find it stated directly, but the section header on the page for Population Estimates downloads for cities and towns says "Places of 50,000 or More" -- the second most populated CDP in Hawaii is East Honolulu, which had only 47,868 in the 2013-2017 ACS release.
If you use the ACS 5-year data release, you'll find 151 places at summary level 160.
It looks as though you should change pep/population to acs/acs5 in your getCensus call. I don't know the specific variables for the API, but if you just want total population for places, use the ACS B01003 table, which has a single column with that value.

Obtain State Name from Google Trends Interest by City

Suppose you inquire the following:
gtrends("google", geo="US")$interest_by_city
This returns how many searches for the term "google" occurred across cities in the US. However, it does not provide any information regarding which state each city belongs to.
I have tried merging this data set with several others including city and state names. Given that the same city name can be present in many states, it is unclear to me how to identify which city was the one Google Trends provided data for.
I provide below a more detailed MWE.
library(gtrendsR)
library(USAboundariesData)
data1 <- gtrends("google", geo= "US")$interest_by_city
data1$city <- data1$location
data2 <- us_cities(map_date = NULL)
data3 <- merge(data1, data2, by="city")
And this yields the following problem:
city state
Alexandria Louisiana
Alexandria Indiana
Alexandria Kentucky
Alexandria Virginia
Alexandria Minnesota
making it difficult to know which "Alexandria" Google Trends provided the data for.
Any hints in how to identify the state of each city would be much appreciated.
One way around this is to collect the cities per state and then just rbind the respective data frames. You could first make a vector of state codes like so
states <- paste0("US-",state.abb)
I then just used purrr for its map and reduce functionality to create a single frame
data <- purrr::reduce(purrr::map(states, function(x){
cities = gtrends("google", geo = x)$interest_by_city
}),
rbind)

Census API did not provide data for selected endyear

Looking to pull in 2014 ACS data released recently through the acs package. Used the following basic query:
# Set the geo marker for all TN counties
geo <- geo.make(state = "TN", county = "*")
# Fetch Total Population for all TN counties
acs.fetch(endyear = 2014, span = 5, geography = geo, table.number = "B01003")
Output (shortened) is what I would expect to see for the 2010-2014 Total Population table:
ACS DATA:
2010 -- 2014 ;
Estimates w/90% confidence intervals;
for different intervals, see confint()
B01003_001
Anderson County, Tennessee 75346 +/- 0
Bedford County, Tennessee 45660 +/- 0
Benton County, Tennessee 16345 +/- 0
But I also get this Warning, which is odd since the values for my acs.fetch match if I do a look-up in the ACS FactFinder website:
Warning messages:
1: In acs.fetch(endyear = 2014, span = 5, geography = geo, table.number = "B01003") :
As of the date of this version of the acs package
Census API did not provides data for selected endyear
2: In acs.fetch(endyear = endyear, span = span, geography = geography[[1]], :
As of the date of this version of the acs package
Census API did not provides data for selected endyear
Am I misunderstanding something here? How can I be seeing the correct values, but the Warning Messages are telling me the Census API is not providing data for my parameters? Thank you.
From the developer, Ezra Glenn (eglenn#mit.edu):
The above is essentially correct: the data is getting fetched just fine. The warning is outdated, from a time before the 2014 data was available. (Technically it's not an error -- just a warning message to possibly explain what went wrong if the data did not get fetched. In this case, it can be safely ignored. I'll remove this in the next version.)

Resources