Write data frame to multiple csv's in R

Write data frame to multiple csv's in R - r

I have the below data frame which contains infomation about different states.
long=c(-106.61291,-106.61291,-106.61291,-81.97224,-81.97224,-81.97224,-84.4277,-84.4277,-84.4277)
lat=c(35.04333,35.04333,35.04333,33.37378,33.37378,33.37378,33.64073,33.64073,33.64073)
city=c("Albuquerque","Albuquerque","Albuquerque","Augusta","Augusta","Augusta","Atlanta","Atlanta","Atlanta")
date=c("2017-08-22","2017-08-23","2017-09-24","2017-09-28","2017-10-24","2017-09-22","2017-11-12","2017-010-14","2017-09-03")
value=c(12,10.8,18.3,12.4,43,21,12,32.1,14)
df<-data.frame(long,lat,city,date,value)
Problem:I want to write each city information in individual csv's. And each csv should look
like below.
Final output:
Albuquerque.csv
long lat city date value
1 -106.6129 35.04333 Albuquerque 2017-08-22 12.0
2 -106.6129 35.04333 Albuquerque 2017-08-23 10.8
3 -106.6129 35.04333 Albuquerque 2017-09-24 18.3
Augusta.csv
long lat city date value
1 -81.97224 33.37378 Augusta 2017-09-28 12.4
2 -81.97224 33.37378 Augusta 2017-10-24 43.0
3 -81.97224 33.37378 Augusta 2017-09-22 21.0
Atlanta.csv
long lat city date value
1 -84.4277 33.64073 Atlanta 2017-11-12 12.0
2 -84.4277 33.64073 Atlanta 2017-010-14 32.1
3 -84.4277 33.64073 Atlanta 2017-09-03 14.0
Thanks in advance!

# Split dataframe by city
split_df <- split(df, list(df$city))
# Write out separate CSV for each city
for (city in names(split_df)) {
write.csv(split_df[[city]], paste0(city, ".csv"))
}

long=c(-106.61291,-106.61291,-106.61291,-81.97224,-81.97224,-81.97224,-84.4277,-84.4277,-84.4277)
lat=c(35.04333,35.04333,35.04333,33.37378,33.37378,33.37378,33.64073,33.64073,33.64073)
city=c("Albuquerque","Albuquerque","Albuquerque","Augusta","Augusta","Augusta","Atlanta","Atlanta","Atlanta")
date=c("2017-08-22","2017-08-23","2017-09-24","2017-09-28","2017-10-24","2017-09-22","2017-11-12","2017-010-14","2017-09-03")
value=c(12,10.8,18.3,12.4,43,21,12,32.1,14)
df<-data.frame(long,lat,city,date,value)
dflist <- split(df , f = df$city)
sapply(names(dflist),
function (x) write.csv(dflist[[x]], file=paste(x, "csv", sep=".") ) )

There's a few different ways of doing this but a very quick approach to do this for all your cities at once is to take advantage of the apply family of functions in base R - specifically lapply.
long=c(-106.61291,-106.61291,-106.61291,-81.97224,-81.97224,-81.97224,-84.4277,-84.4277,-84.4277)
lat=c(35.04333,35.04333,35.04333,33.37378,33.37378,33.37378,33.64073,33.64073,33.64073)
city=c("Albuquerque","Albuquerque","Albuquerque","Augusta","Augusta","Augusta","Atlanta","Atlanta","Atlanta")
date=c("2017-08-22","2017-08-23","2017-09-24","2017-09-28","2017-10-24","2017-09-22","2017-11-12","2017-010-14","2017-09-03")
value=c(12,10.8,18.3,12.4,43,21,12,32.1,14)
df<-data.frame(long,lat,city,date,value)
# create a convenience function to split your data and export to csv
split_into_csv <- function(x) {
tmp <- df[df$city == x,]
write.csv(tmp, file = paste0(x,".csv"))}
# Apply split_into_csv over elements of list with lapply
lapply(levels(df$city), split_into_csv)
# Check output in director
dir()
[1] "Albuquerque.csv" "Atlanta.csv" "Augusta.csv"

You can do this with base R or with the dplyr package.
dplyr way:
dplyr::filter(df, city == 'Albuquerque') %>% write.csv(file = 'Albuquerque.csv', row.names = FALSE)
dplyr::filter(df, city == 'Augusta') %>% write.csv(file = 'Augusta.csv', row.names = FALSE)
dplyr::filter(df, city == 'Atlanta') %>% write.csv(file = 'Atlanta.csv', row.names = FALSE)
base R:
write.csv(df[df$city == 'Albuquerque', ], file = 'Albuquerque.csv', row.names = FALSE)
write.csv(df[df$city == 'Augusta', ], file = 'Augusta.csv', row.names = FALSE)
write.csv(df[df$city == 'Atlanta', ], file = 'Atlanta.csv', row.names = FALSE)
You can use a for loop if you start getting more cities.
for (city in c('Albuquerque', 'Augusta', 'Atlanta')) {
write.csv(df[df$city == city, ], file = paste0(city, '.csv'))
}

Related

R - Merging two dataframe by text

I have two datasets which I want to merge :
df1 <- data.frame( title =
c("residence mozart",
"les hesperides auteuil mirabeau",
"chaillot",
"jouvenet",
"retraite dosne"))
df2 <- data.frame(title = c("terrasses mozart", "chaillot",
"villa jules janin", "retraites dosne"))
And I would like to have something like this :
1 residence mozart NA (or terrasses mozart)
2 les hesperides auteuil mirabeau NA
3 chaillot chaillot
4 jouvenet NA
5 retraite dosne retraites dosne
Here is what I did :
x = data.frame(title_df2 = matrix(ncol = 1, nrow = nrow(df1)))
for (i in nbr){
x[i, ] <- grep(df1$title[i], df2$title, value = T)
}
It does not work at all ! Even though grep(df1$title[5], df2$title, value = T) works and return "chaillot"!

If I understand correctly
df1 <- data.frame( title =
c("residence mozart",
"les hesperides auteuil mirabeau",
"chaillot",
"jouvenet",
"retraite dosne"))
df2 <- data.frame(title = c("terrasses mozart", "chaillot",
"villa jules janin", "retraites dosne"))
library(dplyr)
library(fuzzyjoin)
stringdist_left_join(x = df1, y = df2, method = "jw", distance_col = "d") %>%
filter(d < 0.25) %>%
right_join(df1, by = c("title.x" = "title"))
#> Joining by: "title"
#> title.x title.y d
#> 1 residence mozart terrasses mozart 0.23863636
#> 2 chaillot chaillot 0.00000000
#> 3 retraite dosne retraites dosne 0.09206349
#> 4 les hesperides auteuil mirabeau <NA> NA
#> 5 jouvenet <NA> NA
Created on 2021-04-19 by the reprex package (v2.0.0)

The issue is that grep returns a vector of length 0 when there is no match.
grep('a', 'hello', value = TRUE)
#character(0)
If we want to make use of the same for loop, make an adjustment in the code to return NA whereever there is no match
nbr <- seq_len(nrow(df1))
for (i in nbr){
x[i, ] <- c(grep(df1$title[i], df2$title, value = TRUE), NA_character_)[1]
}
-output
x
# title_df2
#1 <NA>
#2 <NA>
#3 chaillot
#4 <NA>
#5 <NA>

You could do:
a <-Vectorize(agrep, "pattern")(df1$title, df2$title, value=TRUE)
is.na(a)<- lengths(a) == 0
cbind(df1,df2_title=unlist(a, use.names = FALSE))
title df2_title
1 residence mozart <NA>
2 les hesperides auteuil mirabeau <NA>
3 chaillot chaillot
4 jouvenet <NA>
5 retraite dosne retraites dosne

To achieve your goal, you need a matching on each word of your strings within df1 title.
As used in your example, Grep will return an output only if there is a match on the full string.
In order to do that, you'll need to grep on possible words on df1 that are also contained in df2. This can be achieved by implementing an or condition on the full word contained in each string.
nbr <- 1:nrow(x)
for (i in nbr){
pattern <- paste("\\b",unlist(strsplit(as.character(df1$title[i]), " ")), "\\b", collapse = "|", sep = "") # here you create a regex expression whereby you can check if one of the words contained in 1 is also in df2. the \\b \\b escape makes sure that there is a full match on the single word.
fitInDataFrame <- grep(pattern, as.character(df2$title), value = T) # here you grep on the constructed regex expression
x[i, ] <- ifelse(length(fitInDataFrame) == 0, NA, fitInDataFrame)
}
Here the output:
> x
title_df2
1 terrasses mozart
2 <NA>
3 chaillot
4 <NA>
5 retraites dosne

You can do a left_join(df1, df2, by = c('title' = 'title'), keep = TRUE), specifying keep = TRUE so it doesn't drop df2's join column.
Or, for this particular case, you could do this:
df1$newcol <- ifelse(df1$title %in% df2$title, df1$title, NA)
This adds a new column to df1 which is filled out by going through each title in df1, checking if that title is in df2, if so writing that title in the second column and if not writing NA in that row of the second column. You could choose to put something else there instead, like:
df1$newcol <- ifelse(df1$title %in% df2$title, 'Title in DF2', 'Not in DF2')

Detect string pattern in dataframe and conditionally fill another in R

I have a dataframe containing text and numeric references, and a vector of words that may appear in the text. What I want is to check for every instance in which a word from words_df appears in text_df$text, and record the word from word_df and the numeric reference from text_df$ref in a new dataframe (edge_df).
text_df <- data.frame(text = c("John went to the shops", "Sarita hates apples", "Wendy doesn't care about this"),
ref = c("13.5", "1.9.9", "20.1"))
words_df <- data.frame(word = c("shops", "John", "apples", "Wendy", "this"))
edge_df <- data.frame(ref = NA, word = NA)
The output should look like this:
> edge_df
ref word
1 13.5 shops
2 13.5 John
3 1.9.9 apples
4 20.1 Wendy
5 20.1 this
It isn't very elegant but I thought a for-loop would work, where each word is checked against the text using stringr::str_detect, and if the result is TRUE it would record the word and ref:
for (i in 1:nrow(text_df)) {
for (j in 1:nrow(words_df)) {
if (str_detect(text_df$text[i], words_df$word[j]) == TRUE) {
edge_df$ref <- text_df$ref[i]
edge_df$word <- words_df$word[j]
}
}
}
This did not work, and nor have several variations on this loop. If possible I would rather not use a loop at all as the dataframes I'm working with have around 1000 rows each and it takes far too long to loop through them. Any fixes to the loop much appreciated, and bonus points/props if you can do it without a loop at all.
Thank you!

Here is an option with str_extract and unnest. We extract the words from the 'text' column into a list and use unnest the expand the rows
library(dplyr)
library(stringr)
library(tidyr)
text_df %>%
transmute(ref, word = str_extract_all(text,
str_c(words_df$word, collapse="|"))) %>%
unnest(c(word))
# A tibble: 5 x 2
# ref word
# <chr> <chr>
#1 13.5 John
#2 13.5 shops
#3 1.9.9 apples
#4 20.1 Wendy
#5 20.1 this

Try this tidyverse approach. The key for your issue: you can format your data to long by separating each word in the sentences and then use left_join(). Here the code (I have used the data you provided):
library(tidyverse)
#Data
text_df <- data.frame(text = c("John went to the shops", "Sarita hates apples", "Wendy doesn't care about this"),
ref = c("13.5", "1.9.9", "20.1"),stringsAsFactors = F)
words_df <- data.frame(word = c("shops", "John", "apples", "Wendy", "this"),stringsAsFactors = F)
#Join
words_df %>% left_join(text_df %>% separate_rows(text,sep = ' ') %>%
rename(word=text))
Output:
word ref
1 shops 13.5
2 John 13.5
3 apples 1.9.9
4 Wendy 20.1
5 this 20.1

Here is a base R option
u <- lapply(text_df$text,function(x) words_df$word[sapply(words_df$word,function(y) grepl(y,x))])
edge_df <- data.frame(ref = rep(text_df$ref,lengths(u)),word = unlist(u))
which gives
ref word
1 13.5 shops
2 13.5 John
3 1.9.9 apples
4 20.1 Wendy
5 20.1 this

library(data.table)
words_df <- data.frame(word = c("shops", "John", "apples", "Wendy", "this"))
text_df <- data.frame(text = c("John went to the shops",
"Sarita hates apples", "Wendy doesn't care about this"),
ref = c("13.5", "1.9.9", "20.1"))
setDT(words_df)
setDT(text_df)
First we get our words vector ready.
wordvec <- paste0(words_df[,word],collapse="|")
Now all there is to do is to check, for each row all the words in wordvec
## > text_df[,.(word=unlist(regmatches(text,gregexpr(wordvec,text)))),ref]
## ref word
## 1: 13.5 John
## 2: 13.5 shops
## 3: 1.9.9 apples
## 4: 20.1 Wendy
## 5: 20.1 this
The functions regmatches,grepexpr will return a list containing all the words that match the pattern wordvec.
> regmatches("John went to the shops",gregexpr(wordvec,"John went to the shops"))
##[[1]]
##[1] "John" "shops"
Warning, to format the output quickly I'm over-relying the ref variable and consider them to be ids. If it is not the case then it is best to create an id column and use it with in addition to ref. For instance
text_df[,id:=1:.N][,.(word=unlist(regmatches(text,
gregexpr(wordvec,text)))),.(id,ref)]

How to write a function in R to download files and gather the data?

There are the urls I have saved and I have saved the variable names into a vector.
gapminder
if(!file.exists("./data")) {dir.create("./data")}
fileUrls <- c("https://docs.google.com/spreadsheet/pub?key=0AkBd6lyS3EmpdHo5S0J6ekhVOF9QaVhod05QSGV4T3c&output=xlsx",
"https://docs.google.com/spreadsheet/pub?key=phAwcNAVuyj2tPLxKvvnNPA&output=xlsx",
"https://docs.google.com/spreadsheet/pub?key=phAwcNAVuyj0XOoBL_n5tAQ&output=xlsx")
var_names <- c("GDP","life_expectancy", "population")
I want to fill in the function get_clean to download and read in the excel file from the url provided and then put the data in a column with the variable name specified in var_name.
get_clean <- function(url_in, var_name){
}
I can do it in separate code, but I don't know how to write them in a function.
Such as
life_expect_url <- fileUrls[[2]]
download.file(life_expect_url, destfile = "./data/tmp.xlsx", mode = "wb")
life_expect <-read_excel("./data/tmp.xlsx")
# change the name of the first variable to country
names(life_expect)[[1]] <- "country"
life_expect <- life_expect %>%
gather(key = "year",
value = !!var_names[[2]],
-country,
na.rm = TRUE,
convert = TRUE)
head(life_expect, n = 5)
pop_url <- fileUrls[[3]]
download.file(pop_url, destfile = "./data/tmp.xlsx", mode = "wb")
pop <-read_excel("./data/tmp.xlsx")
# change the name of the first variable to country
names(pop)[[1]] <- "country"
pop <- pop %>%
gather(key = "year",
value = !!var_names[[3]],
-country,
na.rm = TRUE,
convert = TRUE)
head(pop, n= 5)
I tried this
get_clean <- function(url_in, var_name){
download.file(url_in, destfile = "./data/tmp.xlsx", mode = "wb")
a <- read_excel("./data/tmp.xlsx")
names(a)[[1]] <- "country"
a <- a %>%
gather(key = "year",
value = !!var_name,
-country,
na.rm = TRUE,
convert = TRUE)
}
out1 <- get_clean(fileUrls[1],var_names[1])
head(out1)
Is that right?
Should I use for loop?
The result should be like this:
## # A tibble: 6 x 3
## country year GDP
## <chr> <dbl> <dbl>
## 1 Algeria 1960 1280.3848
## 2 Argentina 1960 5251.8768
## 3 Australia 1960 9407.6851
## 4 Austria 1960 7434.1837
## 5 Bahamas 1960 11926.4610

In this way files will be dowloaded as temp files and the final result will be a list, which includes the three datasets.
fileUrls <- c("https://docs.google.com/spreadsheet/pub?key=0AkBd6lyS3EmpdHo5S0J6ekhVOF9QaVhod05QSGV4T3c&output=xlsx",
"https://docs.google.com/spreadsheet/pub?key=phAwcNAVuyj2tPLxKvvnNPA&output=xlsx",
"https://docs.google.com/spreadsheet/pub?key=phAwcNAVuyj0XOoBL_n5tAQ&output=xlsx")
var_names <- c("GDP","life_expectancy", "population")
get_clean <- function(fileUrl, var_name){
tmpfile <- rep(tempfile(fileext = ".xlsx"), length(fileUrl))
lapply(1:length(fileUrl), function(x) {
link <- fileUrl[x]
file <- download.file(link, tmpfile[x])
file <- readxl::read_excel(tmpfile[x])
names(file)[1] <- "country"
file <- file %>%
tidyr::gather(year, !!rlang::sym(var_names[x]), -country,
na.rm = TRUE, convert = TRUE)
file
})
}
l <- get_clean(fileUrls, var_names)
l[[1]]
[[1]]
# A tibble: 7,988 x 3
country year GDP
<chr> <dbl> <dbl>
1 Algeria 1960 1280.
2 Argentina 1960 5252.
3 Australia 1960 9408.
4 Austria 1960 7434.
5 Bahamas 1960 11926.
6 Bangladesh 1960 255.
7 Barbados 1960 3397.
8 Belgium 1960 7455.
9 Belize 1960 950.
10 Benin 1960 257.
# … with 7,978 more rows
If you want to keep files stored in a specific folder after download, you just need to change the part that builds the path:
get_clean <- function(fileUrl, var_name){
filepath <- paste0("./data", var_name, filext = ".xlsx")
lapply(1:length(fileUrl), function(x) {
link <- fileUrl[x]
file <- download.file(link, filepath[x])
file <- readxl::read_excel(filepath[x])
names(file)[1] <- "country"
file <- file %>%
tidyr::gather(year, !!rlang::sym(var_names[x]), -country,
na.rm = TRUE, convert = TRUE)
file
})
}
l <- get_clean(fileUrls, var_names)
l[[1]]
Yes, you have to make a loop. But i suggest to not use a for loop. Instead, use a lapply. It's clean, faster (if correctly builded) and do not populate your environment and RAM with elements created inside loop.

how to extract cells from r dataframe and add as a new row

I have a following dataframe in r
Names X_1 X_2 X_3 X_4
Name Sagar II Booster
Location India No Discharge Open
Depth 19.5 start End
DOC 3.2 FPL 64
Qunatity 234 SPL 50
Now I want to extract certain cells and their corresponding values in next cell.
My desired dataframe would be
Names Values
Name Sagar II
Location India
Discharge Open
Depth 19.5
DOC 3.2
FPL 64
SPL 50
How can I do it in r?

A solution from base R.
# Create example data frame
dt <- read.table(text = "Names X_1 X_2 X_3 X_4
Name Sagar II Booster
Location India No Discharge Open
Depth 19.5 start End
DOC 3.2 FPL 64
Qunatity 234 SPL 50",
stringsAsFactors = FALSE, header = TRUE, fill = TRUE)
# A list of target keys
target_key <- c("Name", "Location", "Discharge", "Depth", "DOC", "FPL", "SPL")
# A function to extract value based on key and create a new data frame
extract_fun <- function(key, df = dt){
Row <- which(apply(dt, 1, function(x) key %in% x))
Col <- which(apply(dt, 2, function(x) key %in% x))
df2 <- data.frame(Names = key, Values = df[Row, Col + 1],
stringsAsFactors = FALSE)
df2$Values <- as.character(df2$Values)
return(df2)
}
# Apply the extract_fun
ext_list <- lapply(target_key, extract_fun)
# Combine all data frame
dt_final <- do.call(rbind, ext_list)
dt_final
Names Values
1 Name Sagar
2 Location India
3 Discharge Open
4 Depth 19.5
5 DOC 3.2
6 FPL 64
7 SPL 50

Might not be the most efficient, but works for your example:
library(dplyr)
key_value = function(extraction){
temp = matrix(NA, nrow = length(extraction), ncol = 2)
temp[,1] = extraction
for(ii in 1:nrow(temp)){
index = df %>%
as.matrix %>%
{which(. == extraction[ii], arr.ind = TRUE)}
temp[ii, 2] = index %>% {df[.[1], .[2]+1]}
}
return(data.frame(Names = temp[,1], Values = temp[,2]))
}
Result:
> vec = c("Name", "Location", "Discharge", "Depth", "DOC", "FPL", "SPL")
> key_value(vec)
Names Values
1 Name SagarII
2 Location India
3 Discharge Open
4 Depth 19.5
5 DOC 3.2
6 FPL 64
7 SPL 50
Data:
df = read.table(text = "Names X_1 X_2 X_3 X_4
Name SagarII Booster NA NA
Location India No Discharge Open
Depth 19.5 start End NA
DOC 3.2 FPL 64 NA
Qunatity 234 SPL 50 NA", header = TRUE, stringsAsFactors = FALSE)

Create a vector function to clean address data for Houston Crime Data

There are good tutorials for mapping Houston Crime data, but no easy examples of how to clean the raw data provided from HPD.
https://github.com/hadley/ggplot2/wiki/Crime-in-Downtown-Houston,-Texas-:-Combining-ggplot2-and-Google-Maps
d <- structure(list(BlockRange = c("5400-5499", "3700-3799", "2200-2299",
"1000-1099", "1200-1299", "UNK", "1900-1999", "500-599", "1200-1299"
), StreetName = c("BELL", "BELL", "BELL", "BELL", "BELL", "BELL",
"BELL", "BELL", "BELL"), Date = c("4/28/2015", "4/11/2015", "4/26/2015",
"4/9/2015", "4/9/2015", "4/21/2015", "4/26/2015", "4/26/2015",
"4/17/2015")), row.names = c(60L, 75L, 88L, 4972L, 4990L, 5096L,
5098L, 5099L, 5155L), class = "data.frame", .Names = c("BlockRange",
"StreetName", "Date"))
This will return the Lon and Lat:
x <- gGeoCode("1950 Bell St, Houston, TX")
#[1] 29.74800 -95.35926
However, it needs a function that will geocode an entire database and add columns for Lon and Lat
Example of a selection of the finished data.
structure(list(address = c("9650 marlive ln", "4750 telephone rd",
"5050 wickview ln", "1050 ashland st", "8350 canyon", "9350 rowan ln",
"2550 southmore blvd", "6350 rupley cir", "5050 georgi ln", "10750 briar forest dr"
), lon = c(-95.4373883, -95.2988769, -95.455864, -95.4033373,
-95.3779081, -95.5483009, -95.3733977, -95.3156032, -95.4665841,
-95.565934), lat = c(29.6779015, 29.6917121, 29.5992174, 29.7902425,
29.6706341, 29.7022336, 29.7198936, 29.6902746, 29.8297359, 29.747596
)), row.names = 82729:82738, class = "data.frame", .Names = c("address",
"lon", "lat"))
Here are the functions for geocoding:
library(RCurl)
library(RJSONIO)
library(dplyr)
library(gdata)
construct.geocode.url <- function(address, return.call = "json", sensor = "false") {
root <- "http://maps.google.com/maps/api/geocode/"
u <- paste(root, return.call, "?address=", address, "&sensor=", sensor, sep = "")
return(URLencode(u))
}
gGeoCode <- function(address,verbose=FALSE) {
if(verbose) cat(address,"\n")
u <- construct.geocode.url(address)
doc <- getURL(u)
x <- fromJSON(doc,simplify = FALSE)
if(x$status=="OK") {
lat <- x$results[[1]]$geometry$location$lat
lng <- x$results[[1]]$geometry$location$lng
return(c(lat, lng))
} else {
return(c(NA,NA))
}
}
How can we write a function using dplyr or another method that adds another 3 more columns with the output of [address, long, lat]?
i.e..
data.frame <- mutate(d, address = ConvertBlockRange(BlockRange) + StreetName, "Houston, TX"), Lon = geocode(address)[0] , lat = geocode(address)[1])
This is the blocking point of the question:
#function to convert - "2200-2299" to integer 2250.. i.e find the middle of the block.
library(stringr)
ConvertBlockRange <- function(blockRange){
m <- unlist(str_split(d$BlockRange, "-"))
m2 <- mean(c(as.numeric(m[1]),as.numeric(m[2]))) + .5
m2
}

You can calculate the mean block range by splitting the range and averaging:
e.g.
x <- '5400-5499'
mean(as.numeric(strsplit(x, '-')[[1]])) # 5449.5
To scale it up, we can use separate from the tidyr package. This does some cool things like automagically putting the min/max of blockrange into a new column, converting the types from string to numeric (convert=T, type.convert=as.numeric). I filter out the "UNK" addresses first - you will have to handle them separately.
library(dplyr)
library(tidyr)
d %>%
filter(BlockRange != "UNK") %>%
# this is a df with blockmin & blockmax
separate(BlockRange, c("blockmin", "blockmax"), sep = "-",
convert=T, type.convert=as.numeric, remove=FALSE) %>%
# calc average (round down) and address
mutate(block=floor((blockmin + blockmax)/2),
address=paste(block, StreetName))
# BlockRange blockmin blockmax StreetName Date block address
# 1 5400-5499 5400 5499 BELL 4/28/2015 5449 5449 BELL
# 2 3700-3799 3700 3799 BELL 4/11/2015 3749 3749 BELL
# 3 2200-2299 2200 2299 BELL 4/26/2015 2249 2249 BELL
# 4 1000-1099 1000 1099 BELL 4/9/2015 1049 1049 BELL
# 5 1200-1299 1200 1299 BELL 4/9/2015 1249 1249 BELL
# 6 1900-1999 1900 1999 BELL 4/26/2015 1949 1949 BELL
# 7 500-599 500 599 BELL 4/26/2015 549 549 BELL
# 8 1200-1299 1200 1299 BELL 4/17/2015 1249 1249 BELL
Then you could %>% group_by(address) to get unique addresses and geocode (though I'd think about how to restrict maximum number of requests etc here).
With regards to adding your output lat and lon columns all at once, I don't think dplyr does this yet (see this feature request).
If you really want to use the dplyr syntax here, your best bet is to change gGeoCode so that it is vectorised, e.g.
gGeoCode2 <- function (addresses) {
x <- data.frame(t(sapply(addresses[[1]], gGeoCode)), row.names=NULL)
names(x) <- c('lat', 'lng')
x
}
d2 %>%
select(address) %>%
gGeoCode2 %>%
bind_cols(d2, .)
but really really I think you should skip the dplyr sugar for this particular step and do a manual loop and cbind the result, which gives you greater control over request limiting.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Write data frame to multiple csv's in R - r

# Split dataframe by city split_df <- split(df, list(df$city)) # Write out separate CSV for each city for (city in names(split_df)) { write.csv(split_df[[city]], paste0(city, ".csv")) }

Related

R - Merging two dataframe by text

Detect string pattern in dataframe and conditionally fill another in R

How to write a function in R to download files and gather the data?

how to extract cells from r dataframe and add as a new row

Create a vector function to clean address data for Houston Crime Data

Categories

Resources