How can I extract the zip code from the ggmap output? - r

structure(list(trip_count = 1:10, pickup_longitude = c(-73.964096,
-73.989037, -73.934998, -73.93409, -73.998222, -74.004478, -73.994881,
-73.955917, -73.993607, -73.948265), pickup_latitude = c(40.764141,
40.760208, 40.746693, 40.715908, 40.750809, 40.741501, 40.74033,
40.776054, 40.758625, 40.778515)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
So I am trying to recode the longitude and latitude data into ZIP codes of NYC.
With revgeocodingfrom ggmap I already get some results:
res <- revgeocode(c(test$pickup_longitude[1],test$pickup_latitude[1]), output = "all")
However,
I can apply this procedure just to one location data at a time. Is there a way I can get the results for every location at once like
res <- revgeocode(c(test$pickup_longitude,test$pickup_latitude), output = "all")
How can I extract the ZIP Code from the ggmap results so that I can create a new column in the data frame with the correspondent zip code? Somehow the zip code is stored in a very strange way (see picture). How can I access this information in the console?
The solutions does not have to work in ggmap, maybe there is a different approach to get the ZIP codes from the longitude and latitude data?
Thanks!

I don't have an API key, but try something like the following:
my_func <- function(longlat, output){
list(list(list(c(vector("list", 7), list(12345L)))))
}
test %>%
rowwise() %>%
mutate(zip = list(my_func(c(pickup_longitude, pickup_latitude), output = "all"))) %>%
mutate(zip = zip[[1]][[1]],
zip = zip[[8]])
Replace my_func with revgeocode . You'll have to figure out exactly how to pick out the zip code from the output. But you can try something like the above.

Related

Prediction on time series analysis using ARIMA in R

I am new to programming and am attempting to create a prediction model for multiple articles.
Unfortunately, using Excel or similar software is not possible for this task. Therefore, I have installed Rstudio to solve this problem. My goal is to make a 18-month prediction for each article in my dataset using an ARIMA model.
However, I am currently facing an issue with the format of my data frame. Specifically, I am unsure of how my CSV should be structured to be read by my code.
I have attached an image of my current dataset in CSV format : https://i.stack.imgur.com/AQJx1.png
Here is my dput(sales_data) :
structure(list(X.Article.1.Article.2.Article.3 = c("janv-19;42;49;55", "f\xe9vr-19;56;58;38", "mars-19;55;59;76")), class = "data.frame", row.names = c(NA, -3L))
And also provided the code I have constructed so far with the help of blogs and websites :
library(forecast)
library(reshape2)
sales_data <- read.csv("sales_data.csv", header = TRUE)
sales_data_long <- reshape2::melt(sales_data, id.vars = "Code Article")
for(i in 1:nrow(sales_data_long)) {
sales_data_article <- subset(sales_data_long, sales_data_long$`Code Article` == sales_data_long[i,"Code Article"])
sales_ts <- ts(sales_data_article$value, start = c(2010,6), frequency = 12)
arima_fit <- auto
arima_forecast <- forecast(arima_fit, h = 18)
print(arima_forecast)
print("Article: ", Code article[i])
}
With this code, RStudio gives me the following error : "Error: id variables not found in data: Code Article"
Currently, I am not interested in generating any plots or outputs. My main focus is on identifying the appropriate format for my data.
Do I need to modify my CSV file and separate each column using "," or ";"? Or, can I keep my data in its current format and make adjustments in the code instead?
Added the dput output as per jrcalabrese request.
Swapped to the replacement for reshape2 (tidyr).
Used pivot_longer.
Now doesn't give error, which was happening in reshape2::melt.
It doesn't matter so much what the csv structure is. Your structure was fine.
Hope this helps! :-)
library(tidyr)
sales_data <- structure(list(var1 = c("Article 1", "Article 2", "Article 3"),
`janv-19` = c(42, 56, 55),
`fev-19` = c(49, 58, 59),
`mars-19` = c(55, 38, 76)),
row.names = c(NA, 3L), class = "data.frame")
sales_data_long <- sales_data |> pivot_longer(!var1,
names_to = "month",
values_to = "count")

R Highcharter map from customized shapefile

I am having trouble importing and joining a geojson map to some data using the highcharter library. I am trying to use a slim downed version of a sf dataset that I got using the tidycensus package which I then uploaded to https://mapshaper.org/ to reduce the size of the file by thinning out the polygons. After thinning I exported as geojson and import into R.
Here is an example. First I download the data using tidycensus, create two data sets one for geometry and one for the attribute of interest, here its median family income. Then I export the geometry data to so that I can feed into mapshapper for reduction.
#start with an example for one state
##pull geometry data for one state
md_data <- get_acs(geography = "tract",
state = "MD",
variables = "B19113_001",
geometry = T,
key = Sys.getenv("CENSUS_API_KEY"))
#data set of just GEOID and median family income for use in mapping
md_mfi <- as.data.frame(md_data) %>%
mutate(median_family_income = case_when(is.na(estimate) ~ 0,
TRUE ~ estimate)) %>%
select(GEOID,median_family_income)
#slim down to just the geoid and the geometry data
md_tracts <- md_data %>%
select(GEOID,geometry)
st_write(md_tracts, "U:/M1JPW00/GeoSpatial/census_tracts/acs_carto_2016/md_carto_tracts.shp")
After reformatting in mapshaper I import back into R
md_map_json <- jsonlite::fromJSON(txt = "FILEPATH/md_carto_tracts.json",simplifyVector = FALSE)
md_map_json <- geojsonio::as.json(md_map_json)
And then try and build a map based on an example from the highcharter docs here
> class(md_map_json)
[1] "json" "geo_json"
> head(md_mfi)
GEOID median_family_income
1 24001000100 54375
2 24001000200 57174
3 24001000300 48362
4 24001000400 52038
5 24001000500 46174
6 24001000600 49784
highchart(type = "map") %>%
hc_add_series(mapData = md_map_json,
data = list_parse(md_mfi),
joinBy = "GEOID",
value = "median_family_income",
name = "Median Family Income")
The map actually renders and the census tracts are colored solid blue but the series data doesn't seem to successfully join even with or without using list_parse.
I had the same problem, asked here:
Make a choropleth from a non-highmap-collection map. Nobody responded (I know!), so I finally got to a solution that I think should work for you too:
#Work with the map you get until this step:
md_map_json <- jsonlite::fromJSON(txt = "FILEPATH/md_carto_tracts.json",simplifyVector = FALSE)
#This part is unnecessary:
#md_map_json <- geojsonio::as.json(md_map_json)
#Then, write your map like this:
highchart() %>%
hc_add_series_map(md_map_json, md_mfi, value = "median_family_income", joinBy = "GEOID")

Iterating through values in R

I'm new-ish to R and am having some trouble iterating through values.
For context: I have data on 60 people over time, and each person has his/her own dataset in a folder (I received the data with id #s 00:59). For each person, there are 2 values I need - time of response and picture response given (a number 1 - 16). I need to convert this data from wide to long format for each person, and then eventually append all of the datasets together.
My problem is that I'm having trouble writing a loop that will do this for each person (i.e. each dataset). Here's the code I have so far:
pam[x] <- fromJSON(file = "PAM_u[x].json")
pam[x]df <- as.data.frame(pam[x])
#Creating long dataframe for times
pam[x]_long_times <- gather(
select(pam[x]df, starts_with("resp")),
key = "time",
value = "resp_times"
)
#Creating long dataframe for pic_nums (affect response)
pam[x]_long_pics <- gather(
select(pam[x]df, starts_with("pic")),
key = "picture",
value = "pic_num"
)
#Combining the two long dataframes so that I have one df per person
pam[x]_long_fin <- bind_cols(pam[x]_long_times, pam[x]_long_pics) %>%
select(resp_times, pic_num) %>%
add_column(id = [x], .before = 1)
If you replace [x] in the above code with a person's id# (e.g. 00), the code will run and will give me the dataframe I want for that person. Any advice on how to do this so I can get all 60 people done?
Thanks!
EDIT
So, using library(jsonlite) rather than library(rjson) set up the files in the format I needed without having to do all of the manipulation. Thanks all for the responses, but the solution was apparently much easier than I'd thought.
I don't know the structure of your json files. If you are not in the same folder, like the json files, try that:
library(jsonlite)
# setup - read files
json_folder <- "U:/test/" #adjust you folder here
files <- list.files(path = paste0(json_folder), pattern = "\\.json$")
# import data
pam <- NULL
pam_df <- NULL
for (i in seq_along(files)) {
pam[[i]] <- fromJSON(file = files[i])
pam_df[[i]] <- as.data.frame(pam[[i]])
}
Here you generally read all json files in the folder and build a vector of a length of 60.
Than you sequence along that vector and read all files.
I assume at the end you can do bind_rowsor add you code in the for loop. But remember to set the data frames to NULL before the loop starts, e.g. pam_long_pics <- NULL
Hope that helped? Let me know.
Something along these lines could work:
#library("tidyverse")
#library("jsonlite")
file_list <- list.files(pattern = "*.json", full.names = TRUE)
Data_raw <- tibble(File_name = file_list) %>%
mutate(File_contents = map(File_name, fromJSON)) %>% # This should result in a nested tibble
mutate(File_contents = map(File_contents, as_tibble))
Data_raw %>%
mutate(Long_times = map(File_contents, ~ gather(key = "time", value = "resp_times", starts_with("resp"))),
Long_pics = map(File_contents, ~ gather(key = "picture", value = "pic_num", starts_with("pic")))) %>%
unnest(Long_times, Long_pics) %>%
select(File_name, resp_times, pic_num)
EDIT: you may or may not need not to include as_tibble() after reading in the JSON files, depending on how your data looks like.

Tmap Error - replacement has [x] rows, data has [y]

Short version: when executing the following command qtm(countries, "freq") I get the following error message:
Error in $<-.data.frame(*tmp*, "SHAPE_AREAS", value =
c(652270.070308042, : replacement has 177 rows, data has 210
Disclaimer: I have already checked other answers like this one or this one as well as this explanation that states that usually this error comes from misspelling objects, but could not find an answer to my problem.
Reproducible code:
library(rgdal)
library(dplyr)
library(tmap)
# Load JSON file with countries.
countries = readOGR(dsn = "https://gist.githubusercontent.com/ccamara/fc26d8bb7e777488b446fbaad1e6ea63/raw/a6f69b6c3b4a75b02858e966b9d36c85982cbd32/countries.geojson")
# Load dataframe.
df = read.csv("https://gist.githubusercontent.com/ccamara/fc26d8bb7e777488b446fbaad1e6ea63/raw/754ea37e4aba1b7ed88eaebd2c75fd4afcc54c51/sample-dataframe.csv")
countries#data = left_join(countries#data, df, by = c("iso_a2" = "country_code"))
qtm(countries, "freq")
Your error is in the data - the code works fine.
What you are doing right now is:
1) attempting a 1:1 match
2) realize that your .csv data contains several ids to match
3) a left-join then multiplies the left hand side with all matches on the right hand-side
To avoid this issue you have to aggregate your data one more time like:
library(dplyr)
df_unique = df %>%
group_by(country_code, country_name) %>%
summarize(total = sum(total), freq = sum(freq))
#after that you should be fine - as long as just adding up the data is okay.
countries#data = left_join(countries#data, df, by = c("iso_a2" =
"country_code"))
qtm(countries, "freq")

Need help writing data from a table in R for unique values using a loop

Trying to figure why when I run this code all the information from the columns is being written to the first file only. What I want is only the data from the columns unique to a MO number to be written out. I believe the problem is in the third line, but am not sure how to divide the data by each unique number.
Thanks for the help,
for (i in 1:nrow(MOs_InterestDF1)) {
MO = MOs_InterestDF1[i,1]
df = MOs_Interest[MOs_Interest$MO_NUMBER == MO, c("ITEM_NUMBER", "OPER_NO", "OPER_DESC", "STDRUNHRS", "ACTRUNHRS","Difference", "Sum")]
submit.df <- data.frame(df)
filename = paste("Variance", "Report",MO, ".csv", sep="")
write.csv(submit.df, file = filename, row.names = FALSE)}
If you are trying to write out a separate csv for each unique MO number, then something like this may work to accomplish that.
unique.mos <- unique(MOs_Interest$MO_NUMBER)
for (mo in unique.mos){
submit.df <- MOs_Interest[MOs_Interest$MO_NUMBER == mo, c("ITEM_NUMBER", "OPER_NO", "OPER_DESC", "STDRUNHRS", "ACTRUNHRS","Difference", "Sum")]
filename <- paste("Variance", "Report", mo, ".csv", sep="")
write.csv(submit.df, file = filename, row.names = FALSE)
}
It's hard to answer fully without example data (what are the columns of MOs_InterestDF1?) but I think your issue is in the df line. Are you trying to subset the dataframe to only the data matching the MO? If so, try which as in df = MOs_Interest[which(MOs_Interest$MO_NUMBER == MO),].
I wasn't sure if you actually had two separate dfs (MOs_Interest and MOs_InterestDF1); if not, make sure the df line points to the correct data frame.
I tried to create some simplified sample data:
MOs_InterestDF1 <- data.frame("MO_NUMBER" = c(1,2,3), "Item_No" = c(142,423,214), "Desc" = c("Plate","Book","Table"))
for (i in 1:nrow(MOs_InterestDF1)) {
MO = MOs_InterestDF1[i,1]
mydf = data.frame(MOs_InterestDF1[which(MOs_InterestDF1$MO_NUMBER == MO),])
filename = paste("This is number ",MO,".csv", sep="")
write.csv(mydf, file = filename, row.names=FALSE)
}
This output three different csv files, each with exactly one row of data. For example, "This is number 1.csv" had the following data:
MOs Item_No Desc
1 142 Plate

Resources