Census API did not provide data for selected endyear - r

Looking to pull in 2014 ACS data released recently through the acs package. Used the following basic query:
# Set the geo marker for all TN counties
geo <- geo.make(state = "TN", county = "*")
# Fetch Total Population for all TN counties
acs.fetch(endyear = 2014, span = 5, geography = geo, table.number = "B01003")
Output (shortened) is what I would expect to see for the 2010-2014 Total Population table:
ACS DATA:
2010 -- 2014 ;
Estimates w/90% confidence intervals;
for different intervals, see confint()
B01003_001
Anderson County, Tennessee 75346 +/- 0
Bedford County, Tennessee 45660 +/- 0
Benton County, Tennessee 16345 +/- 0
But I also get this Warning, which is odd since the values for my acs.fetch match if I do a look-up in the ACS FactFinder website:
Warning messages:
1: In acs.fetch(endyear = 2014, span = 5, geography = geo, table.number = "B01003") :
As of the date of this version of the acs package
Census API did not provides data for selected endyear
2: In acs.fetch(endyear = endyear, span = span, geography = geography[[1]], :
As of the date of this version of the acs package
Census API did not provides data for selected endyear
Am I misunderstanding something here? How can I be seeing the correct values, but the Warning Messages are telling me the Census API is not providing data for my parameters? Thank you.

From the developer, Ezra Glenn (eglenn#mit.edu):
The above is essentially correct: the data is getting fetched just fine. The warning is outdated, from a time before the 2014 data was available. (Technically it's not an error -- just a warning message to possibly explain what went wrong if the data did not get fetched. In this case, it can be safely ignored. I'll remove this in the next version.)

Related

How to get US county name from Address, city and state in R?

I have a dataset of around 10000 rows. I have the Address, City, State and Zipcode values. I do not have lat/long coordinates. I would like to retrieve the county name without taking a large amount of time. I have tried library(tinygeocoder) but it takes around 14 seconds for 100 values, and is giving a 'time-out' error when I put in the entire dataset. Plus, it's outputting a fip code, which I have to join to get the actual county name. Reproducible example:
library(tidygeocoder)
library(dplyr)
df <- tidygeocoder::louisville[,1:4]
county_fips <- data.frame (fips = c("111", "112"),
county = c("Jefferson", "Montgomery"))
geocoded <- df %>% geocode(street = street, city = city, state = state,
method = 'census', full_results = TRUE,
api_options = list(census_return_type = 'geographies'))
df$fips <- geocoded$county_fips
df_new <- merge(x=df, y=county_fips, by="fips", all.x = T)
You can use a public dataset that links city and/or zipcode to county. I found these websites with such data:
https://www.unitedstateszipcodes.org/zip-code-database
https://simplemaps.com/data/us-cities
You can then do a left join on the linking column (presumably city or zipcode but will depend on the dataset):
df = merge(x=df, y=public_dataset, by="City", all.x=T)
If performance is an issue, you can select just the county and linking columns from the public data set before you do the merge.
public_dataset = public_dataset %>% select(County, City)
The slow performance is due to tinygeocoder's use of the the Census Bureau's API in order to match data. Asking the API to match thousands of addresses is the slow down, and I'm not aware of a different way to do this.
However, we can at least pare down the number of addresses that you are putting into the API. Maybe if we get that number low enough the code will run.
The ZIP Code Tabulation Areas (ZCTA) shows the relationships between ZIP Codes and county names (as well as FIPS). A "|" delimited file with a description of the data can be found on the Bureau's website.
Counting the number of times a ZIP code shows up tells us if a ZIP code spans multiple counties. If the frequency == 1, then you can freely translate the ZIP code to the county.
ZCTA <- read.delim("tab20_zcta520_county20_natl.txt", sep="|")
n_occur <- data.frame(table(ZCTA$GEOID_ZCTA5_20))
head(n_occur, 10)
Var1
Freq
1
601
2
2
602
2
3
603
2
4
606
3
5
610
4
6
611
1
7
612
3
8
616
1
9
617
2
10
622
1
In these results, addresses with ZIP codes 00611 and 00622 can be mapped to the corresponding counties without sending the addresses through the API. If your addresses are very urban, then you may be lucky in that the ZIP codes are small area-wise and may not span typically multiple counties.

getSymbols returning inaccurate data

For some reason getSymbols is returning inaccurate data for the below symbol. Example for this data in the high column the price went from 35 to 3515 between 2021 feb and 2021 march 16th.
d <- as.data.frame(
getSymbols(
paste("USHAMART" , ".NS", sep = ""),
from ="2000-01-01",
periodicity = "weekly" ,
return.class = 'zoo',
env = NULL
)
)
Not sure if this is occurring for other symbols but I doubt it may.
Crapy data on yahoo finance. What a surprise. You get "free" data and this is an example of errors that can occur. Looks like you need to divide the numbers that are in the thousands by 100 to get the "correct" data.
See the data disclaimers on Yahoo:
All data provided on Yahoo Finance is provided for informational
purposes only, and is not intended for trading or investing purposes.

error: unknown/unsupported geography heirarchy, when querying Census tracts

I'm following rcensusapi's guide on retrieving data for retrieving Census tract-level data. Here's my code.
getCensus(
name = "acs/acs5/cprofile",
vintage = 2018,
key = Sys.getenv("CENSUS_API"),
vars = c("NAME","CP03_2014_2018_062E"),
region = "tract:*",
regionin = "state:12+county:033"
)
But when I run this code I get this error.
Error in apiCheck(req) :
The Census Bureau returned the following error message:
error: unknown/unsupported geography heirarchy
This similar query seemed to work to give a county-wide number.
getCensus(
name = "acs/acs5/cprofile",
vintage = 2018,
key = Sys.getenv("CENSUS_API"),
vars = c("NAME","CP03_2014_2018_062E"),
region = "county:033",
# region = "county", # to get all counties in FL
# region = "congressional district", # to get all congressional districts in FL
regionin = "state:12"
)
# state county NAME CP03_2014_2018_062E
#1 12 033 Escambia County, Florida 49286
It's also possible to get the values for all Florida counties or congressional districts with the alternate filters above.
But unfortunately I don't think it's possible to get tract-level detail for this particular query.
https://api.census.gov/data/2018/acs/acs5/cprofile.html
https://api.census.gov/data/2018/acs/acs5/cprofile/examples.html
Judging by the help at those links, it doesn't look like this survey is available at the tract level. Here are the listed geographical levels for the region parameter from here. (Also, as #ThetaFC noted in the comments, it's possible to query this list directly using listCensusMetadata(name = "acs/acs5/cprofile", vintage = 2018, type = "geography").)

getCensus Hawaii City Populations

I'm looking to gather populations for Hawaiian cities and am puzzled how to collect it using the censusapi getCensus() function.
census_api_key(key='YOURKEYHERE')
newpopvars <- listCensusMetadata(name = "2017/pep/population", type = "variables")
usapops <- getCensus(name = "pep/population",
vintage = 2017,
vars = c(newpopvars$name),
region = "place:*")
usapops <- usapops[which(usapops$DATE_==10),]
state <- grepl("Hawaii", usapops$GEONAME)
cities <- data.frame()
for (i in seq(1,length(state))) {
if (state[i] == TRUE) {
cities <- rbind(cities,usapops[i,])
}
}
This returns only two cities but certainly there are more than that in Hawaii. What am I doing wrong?
There is only one place (Census summary level 160) in Hawaii which is large enough to be included in the 1-year American Community Survey release: "Urban Honolulu" (GeoID 1571550). The 1-year release only includes places with 65,000+ population. I assume similar controls apply to the Population Estimates program -- I couldn't find it stated directly, but the section header on the page for Population Estimates downloads for cities and towns says "Places of 50,000 or More" -- the second most populated CDP in Hawaii is East Honolulu, which had only 47,868 in the 2013-2017 ACS release.
If you use the ACS 5-year data release, you'll find 151 places at summary level 160.
It looks as though you should change pep/population to acs/acs5 in your getCensus call. I don't know the specific variables for the API, but if you just want total population for places, use the ACS B01003 table, which has a single column with that value.

I'm getting these programming errors in R - attempt to apply non-function, adding class "factor" to an invalid object

I'm a newbie to R programming..I have a csv file contains items by country, life expectancy and region. And I've to do the following:
List out no. of countries regionwise & draw bar chart
Draw boxplot for each region
Cluster countries based on life expectancy using k-means algorithm
Name the countries that have the min & max life expectancy.
input.csv
Country,LifeExpectancy,Region
India,60,Asia
Srilanka,62,Asia
Myanmar,61,Asia
USA,65,America
Canada,65,America
UK,68,Europe
Belgium,67,Europe
Germany,69,Europe
Switzerland,70,Europe
France,68,Europe
What I did?
1.
mydata <- read.table("input.csv", header=TRUE, sep=",")
barplot(data$ncol(Region))
and I get the error Error in barplot(mydata$ncol(Region)) : attempt to apply non-function
boxplot(LifeExpectancy~Region,mydata=data) ##This is correct
3 Have no idea how to do this!
4.min(mydata$LifeExpectancy);max(mydata$LifeExpectancy) ##This is correct
As I pointed out in my comments, this question is really multiple questions, and does not reflect the title. In future, please try to keep questions manageable and discrete. I'm not going to attempt to answer your third point (about K-means clustering) here. Search SO and I'm sure you will find some relevant questions/answers.
Regarding your other questions, have a careful look at the following. If you don't understand what a particular function is doing, refer to ?function_name (e.g. ?tapply), and for further enlightenment, run nested code from the inside out (e.g. for foo(bar(baz(x))), you could examine baz(x), then bar(baz(x)), and finally foo(bar(baz(x))). This is an easy way to help you get a handle on what's going on, and is also useful when debugging code that produces errors.
d <- read.csv(text='Country,LifeExpectancy,Region
India,60,Asia
Srilanka,62,Asia
Myanmar,61,Asia
USA,65,America
Canada,65,America
UK,68,Europe
Belgium,67,Europe
Germany,69,Europe
Switzerland,70,Europe
France,68,Europe', header=TRUE)
barplot(with(d, tapply(Country, Region, length)), cex.names=0.8,
ylab='No. of countries', xlab='Region', las=1)
boxplot(LifeExpectancy ~ Region, data=d, las=1,
xlab='Region', ylab='Life expectancy')
d$Country[which.min(d$LifeExpectancy)]
# [1] India
# Levels: Belgium Canada France Germany India Myanmar Srilanka Switzerland UK USA
d$Country[which.max(d$LifeExpectancy)]
# [1] Switzerland
# Levels: Belgium Canada France Germany India Myanmar Srilanka Switzerland UK USA

Resources