For my Bachelor Thesis I need to pull Google Trends Data for several Brands in different countries.
As I am totally new to R a friend of mine helped me create the code for a loop, which does it automatically.
After a while the error
data must be a data frame, or other object coercible by fortify(), not a list
appears and the loop stops. When checking with the google trends page itself i found out that there is not enough data to support the request.
My question now would be, if it is possible to continue the loop, regardless of the error and just "skip" the request responsible for the error.
I Already looked around in other threads but the try() appears not to work here or I did it wrong.
Also I changed the low_search_volume = FALSEwhich is the default to TRUE, but that didn't change anything.
for (row in 1:nrow(my_data)) {
country_code <- as.character(my_data[row, "Country_Code"])
query <- as.character(my_data[row, "Brand"])
trend <- gtrends(
c(query),
geo = country_code,
category = 68,
low_search_volume = TRUE,
time = "all"
)
plot(trend)
export <- trend[["interest_over_time"]]
filepath <- paste(
"C:\\Users\\konst\\Desktop\\Uni\\Bachelorarbeit\\R\\Ganzer Datensatz\\",
query, "_", country_code,
".csv",
sep = ""
)
write.csv(export, filepath)
}
To reproduce the error use following list:
Brand Country Code
Gucci MA
Gucci US
allsaints MA
allsaints US
The allsaints MA request should produce the error. Therefore, the allsaints US will not processed.
Thank you all in advance for your assistance.
Best wishes from Hamburg, Germany
I'm having trouble accessing the Energy Information Administration's API through R (https://www.eia.gov/opendata/).
On my office computer, if I try the link in a browser it works, and the data shows up (the full url: https://api.eia.gov/series/?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json).
I am also successfully connected to Bloomberg's API through R, so R is able to access the network.
Since the API is working and not blocked by my company's firewall, and R is in fact able to connect to the Internet, I have no clue what's going wrong.
The script works fine on my home computer, but at my office computer it is unsuccessful. So I gather it is a network issue, but if somebody could point me in any direction as to what the problem might be I would be grateful (my IT department couldn't help).
library(XML)
api.key = "e122a1411ca0ac941eb192ede51feebe"
series.id = "PET.MCREXUS1.M"
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=", api.key, "&out=xml", sep="")
doc = xmlParse(file=my.url, isURL=TRUE) # yields error
Error msg:
No such file or directoryfailed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
Error: 1: No such file or directory2: failed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
I tried some other methods like read_xml() from the xml2 package, but this gives a "could not resolve host" error.
To get XML, you need to change your url to XML:
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=",
api.key, "&out=xml", sep="")
res <- httr::GET(my.url)
xml2::read_xml(res)
Or :
res <- httr::GET(my.url)
XML::xmlParse(res)
Otherwise with the post as is(ie &out=json):
res <- httr::GET(my.url)
jsonlite::fromJSON(httr::content(res,"text"))
or this:
xml2::read_xml(httr::content(res,"text"))
Please note that this answer simply provides a way to get the data, whether it is in the desired form is opinion based and up to whoever is processing the data.
If it does not have to be XML output, you can also use the new eia package. (Disclaimer: I'm the author.)
Using your example:
remotes::install_github("leonawicz/eia")
library(eia)
x <- eia_series("PET.MCREXUS1.M")
This assumes your key is set globally (e.g., in .Renviron or previously in your R session with eia_set_key). But you can also pass it directly to the function call above by adding key = "yourkeyhere".
The result returned is a tidyverse-style data frame, one row per series ID and including a data list column that contains the data frame for each time series (can be unnested with tidyr::unnest if desired).
Alternatively, if you set the argument tidy = FALSE, it will return the list result of jsonlite::fromJSON without the "tidy" processing.
Finally, if you set tidy = NA, no processing is done at all and you get the original JSON string output for those who intend to pass the raw output to other canned code or software. The package does not provide XML output, however.
There are more comprehensive examples and vignettes at the eia package website I created.
Problem
I am given a long list of specific variable codes for the DP05 table - in the census bureau format. For instance:
target_dp05_vars = c(perc_white = "HC03_VC53",
perc_black = "HC03_VC55",
perc_native = "HC03_VC56")
Since tidycensus uses its own variable naming convention, I can't use the above easily. How do I easily crosswalk to the tidycensus definition?
Temporary solution
In the meantime, I've downloaded the bureau file manually and eliminated rows with HC02 and HC04 prefixes to match with tidycensus to create an internal crosswalk (because it's at least positionally correct) but it's tedious.
I'd love to just feed those HCs as a named vector into get_acs() and perhaps just specify the table as DP05.
tidycensus doesn't use its own variable naming convention - it uses variable IDs as specified by the Census API. For example, see https://api.census.gov/data/2017/acs/acs5/profile/variables.html, which is accessible in R with:
library(tidycensus)
dp17 <- load_variables(2017, "acs5/profile", cache = TRUE)
The IDs you've provided appear to be FactFinder codes.
If you want the full DP05 table in one tidycensus call, you can do the following (e.g. for counties in New York) with tidycensus 0.9:
dp05 <- get_acs(geography = "county",
table = "DP05",
state = "NY")
Mapping of variable IDs to their meanings are in turn available with load_variables().
Note: I am getting intermittent server errors with these calls from the API, which may be due to the government shutdown. If it doesn't work at first, try again.
I am trying to batch geocode a group of addresses through the US Census Geocoder: http://geocoding.geo.census.gov/geocoder/
I have found this question:
Posting to and Receiving data from API using httr in R
and Hadley's suggestion works perfectly to send my data frame to the API and get the geocoded addresses back. The problem I am running in to is how to get the returned data back in to a data frame. I would've commented on his response there, but unfortunately since this is a new account I am not able to comment yet.
So my code is as follows:
req <- POST("http://geocoding.geo.census.gov/geocoder/geographies/addressbatch",
body = list(
addressFile = upload_file("mydata.csv"),
benchmark = "Public_AR_Census2010",
vintage = "Census2010_Census2010"
),
encode = "multipart",
verbose())
stop_for_status(req)
content(req)
When I run content(req), I get data that looks like this:
"946\",\"123 MY STREET, ANYTOWN, TX,
99999\",\"Match\",\"Non_Exact\",\"123 MY STREET, ANYTOWN, TX,
99999\",\"-75.43486,80.423775\",\"95495654\",\"L\",\"99\",\"999\",\"021999\",\"3
005\"\n\"333\",\"456 MY STREET, ANYTOWN, TX,
99999\",\"Match\",\"Exact\",\"456 MY STREET, ANYTOWN, TX,
99999\",\"-75.38545,80.383747\",\"6546542\",\"R\",\"99\",\"999\",\"021999\",\"3002\"\n\
I've tried using the jsonlite approach mentioned here: Successfully coercing paginated JSON object to R dataframe
as well as googling httr/content to data frame, and haven't had any luck. The closest I have come to getting what I want is using
cat(content(req, "text"), "\n") which gets results that look like a CSV I could use as a data frame:
"476","123 MY STREET, ANYTOWN, TX, 99999","Match","Exact",
"123 MY STREET, ANYTOWN, TX,
99999","-75.438644,80.426025","654651321","L","99","999","0219999","3013"
But I was also unable to find any help on getting the results of a cat() into a data frame as I believe the function only prints the results.
When I use a browser and upload a csv I get a csv back that has the following columns:
RowID, Address, Match, MatchType, MatchedAddress, Lat, Long, StreetSide, State, County, Tract, Block
I would prefer to do this all through R, so my end result needs to be a data frame with those columns. The data is there in the content(req), I just haven't figured out how to get it in a data frame.
Thanks for the help!
Use textConnection to make it one liner
df <- read.csv(textConnection(content(req, 'text')))
Perhaps now over 6 months later, this question has been resolved. But in case others have the same issue:
The problem is that you are missing a column header in your list of variables and you have two column headers for coordinates. And you can't use the ones provided by the Census Bureau, because they do not provide a complete header row for all variables. First send the output to a CSV file:
cat(content(req, "text"), file="reqoutput.csv")
Then read it back in as a dataframe, providing your own header row:
reqdata<-read.csv(file="reqoutput.csv", skip=1,
col.names = c('RowID', 'Address', 'Match', 'MatchType',
'MatchedAddress', 'LongLat', 'thing',
'Streetside', 'State', 'County', 'Tract',
'Block'))
In your example output, note that the Census bureau provides coordinates as one field in double-quotes, and it's Longitude followed by Latitude.
After coordinates, there is a nine digit string of numbers, I don't know what that is. I called it 'thing'.
I'd like to append a set of ways which are related and give a district's boundary.
I tried the following but got stuck up:
require(osmar)
require(XML)
# a set of open street map ways (lines) related as given by a relation..
# (if connected these ways represent the boundary of a political
# district in Tyrol/Austria)
myxml <- xmlParse("http://api.openstreetmap.org/api/0.6/relation/85647")
# extracting way ids at the according xml-nodes:
els <- getNodeSet(myxml, "//member[#ref]")
ways <- as.numeric(sapply(els, function(el) xmlGetAttr(el, "ref")))
# now I try to get one of those ways as an osmar-obj and plot it,
# which throws an error:
plot_ways(get_osm(way(ways[1])))
apparently there's a boundingbox missing but I don't know how to assign it to this sort of object.. If I get this problem resolved I'd like to make one polygon out of the lines/ways.
the author of the package was so kind to give info that was lacking the current documentation:
the argument get_osm(.., all = T) was simply missing... with all = T all related elements are retrieved.
to get the my desired district-boundary the following code applies:
District_Boundary <- get_osm(relation(85647), all = T)