How to loop through a list of cities and get temparature for given date with 'weatherData' in R

I have the following df
city <- data.frame(City = c("London", "Liverpool", "Manchester","London", "Liverpool", "Manchester"),
Date = c("2016-08-05","2016-08-09","2016-08-10", "2016-09-05","2016-09-09","2016-09-10"))
I want to loop through it and get weather data by city$City for the data in city$Date
city <- data.frame(City = c("London", "Liverpool", "Manchester","London", "Liverpool", "Manchester"),
Date = c("2016-08-05","2016-08-09","2016-08-10", "2016-09-05","2016-09-09","2016-09-10"),
Mean_TemperatureC = c("15","14","13","14","11","14"))
Currently I am using weatherData to get weather data with the following funtion:
df <- getWeatherForDate("BOS", "2016-08-01")
Can someone help?

Here is a possibility:
temp <- as.matrix(city)
codes <- sapply(1:nrow(city), function(x) getStationCode(city[x,1], "GB")[[1]])
station <- sub("^[[:print:]]+\\s([A-Z]{4})\\s[[:print:]]+", "\\1", codes)
temp[, 1] <- station
temperature <- sapply(1:nrow(temp), function(x) {getWeatherForDate(temp[x, 1], temp[x, 2])$Mean_TemperatureC})
city2 <- setNames(cbind(city, temperature), c(colnames(city), "Mean_TemperatureC"))
# City Date Mean_TemperatureC
# 1 London 2016-08-05 14
# 2 Liverpool 2016-08-09 14
# 3 Manchester 2016-08-10 13
# 4 London 2016-09-05 20
# 5 Liverpool 2016-09-09 18
# 6 Manchester 2016-09-10 13
The first step is to get the codes of the different cities with the sub and the getStationCode functions. We then get the vector with the mean of the temperatures, and finally, we create the data.frame city2, with the correct column names.
It is necessary to look for the stations code, as some cities (like Liverpool) could be on different countries (Canada and UK in this case). I checked the results on Weather Underground website for Liverpool, the results are correct.


How to remove values in a column based on other column values equaling the column values above it?

I am currently coding in R and merged two dataframes together so I could include all the information together but I don't want the one column "Cost" to be duplicated multiple times (it was due to the unique values of the last 3 columns). I want it to include the cost 100 only in the first column and then for every other instance where the columns "State", "Market", "Date", and "Cost" are the same as above. I attached what the dataframe looks like and what I want it to be changed to. Thank you!
What it currently looks like
What it should look like
Please use index like in this example:
name_of_your_dataset[nrow_init:nrow_fin, ncol] <- NA
In your case, assuming the name of your dataset as 'data'
data[2:4,4]<- NA
Just leave a positive feedback and if I was useful, just vote this answer up.
Here is a solution using duplicated with your dataframe (df)
State Market Date Cost Word format Type
1 AZ Phoenix 10-20-2020 100 HELLO AM Sports related
2 AZ Phoenix 10-21-2020 NA GOODBYE PM Non Sports related
3 AZ Phoenix 10-22-2020 NA YES FM Country
4 AZ Phoenix 10-23-2020 NA NONE CM Rock
Set duplicates to NA
df$Cost[duplicated(df$Cost)] <- NA
State Market Date Cost Word format Type
1 AZ Phoenix 10-20-2020 100 HELLO AM Sports related
2 AZ Phoenix 10-21-2020 NA GOODBYE PM Non Sports related
3 AZ Phoenix 10-22-2020 NA YES FM Country
4 AZ Phoenix 10-23-2020 NA NONE CM Rock
The column Date is different so I think you want to do replace duplicated Cost for every value of State and Market combination.
df <- df %>%
group_by(State, Market) %>%
mutate(Cost = replace(Cost, duplicated(Cost), NA)) %>%
# State Market Date Cost Word format Type
# <chr> <chr> <chr> <dbl> <chr> <chr> <chr>
#1 AZ Phoenix 10-20-2020 100 HELLO AM Sports related
#2 AZ Phoenix 10-21-2020 NA GOODBYE PM Non Sports related
#3 AZ Phoenix 10-22-2020 NA YES FM Country
#4 AZ Phoenix 10-23-2020 NA NONE CM Rock
It is easier to help if you provide data in a reproducible format
df <- structure(list(State = c("AZ", "AZ", "AZ", "AZ"), Market = c("Phoenix",
"Phoenix", "Phoenix", "Phoenix"), Date = c("10-20-2020", "10-21-2020",
"10-22-2020", "10-23-2020"), Cost = c(100, 100, 100, 100), Word = c("HELLO",
"GOODBYE", "YES", "NONE"), format = c("AM", "PM", "FM", "CM"),
Type = c("Sports related", "Non Sports related", "Country",
"Rock")), row.names = c(NA, -4L), class = "data.frame")

Keep specific rows of a data frame based on word sequence in R

I have a dataframe (df) like this. What I want to do is to go through the values for each ID and if there are two strings starting with the same word, I want to compare them to keep distinct values.
df <- data.frame(id = c(1,1,2,3,3,4,4,4,4,5),
value = c('australia', 'australia sydney', 'brazil',
'australia', 'usa', 'australia sydney', 'australia sydney randwick', 'australia', 'australia sydney circular quay', 'australia sydney'))
I want to get the first words to compare them and if they are different keep both but if they are the same go to the second words to compare them and so on...
so like for ID 1 I want to keep the row with the value 'australia sydney' and for Id 4 I want to keep both 'australia sydney circular quay', 'australia sydney randwick'.
For this example I need to get rows 2:5, 7, 9,10
Based on your edit, you can check within groups if any entry matches the start of any other entry and remove entries that do:
df %>%
group_by(id) %>%
filter(!map_lgl(seq_along(value), ~ any(if (length(value) == 1) FALSE else str_detect(value[-.x], paste0("^", value[.x])))))
# A tibble: 7 x 2
# Groups: id, value [7]
id value
<dbl> <chr>
1 1 australia sydney
2 2 brazil
3 3 australia
4 3 usa
5 4 australia sydney randwick
6 4 australia sydney circular quay
7 5 australia sydney

If/Else statement in R

I have two dataframes in R:
city price bedroom
San Jose 2000 1
Barstow 1000 1
NA 1500 1
Code to recreate:
data = data.frame(city = c('San Jose', 'Barstow'), price = c(2000,1000, 1500), bedroom = c(1,1,1))
Name Density
San Jose 5358
Barstow 547
Code to recreate:
population_density = data.frame(Name=c('San Jose', 'Barstow'), Density=c(5358, 547));
I want to create an additional column named city_type in the data dataset based on condition, so if the city population density is above 1000, it's an urban, lower than 1000 is a suburb, and NA is NA.
city price bedroom city_type
San Jose 2000 1 Urban
Barstow 1000 1 Suburb
NA 1500 1 NA
I am using a for loop for conditional flow:
for (row in 1:length(data)) {
if ([row,'city'])) {
data[row, 'city_type'] = NA
} else if (population[population$Name == data[row,'city'],]$Density>=1000) {
data[row, 'city_type'] = 'Urban'
} else {
data[row, 'city_type'] = 'Suburb'
The for loop runs with no error in my original dataset with over 20000 observations; however, it yields a lot of wrong results (it yields NA for the most part).
What has gone wrong here and how can I do better to achieve my desired result?
I have become quite a fan of dplyr pipelines for this type of join/filter/mutate workflow. So here is my suggestion:
# I had to add that extra "NA" there, did you not? Hm...
data <- data.frame(city = c('San Jose', 'Barstow', NA), price = c(2000,1000, 500), bedroom = c(1,1,1))
population <- data.frame(Name=c('San Jose', 'Barstow'), Density=c(5358, 547));
data %>%
# join the two dataframes by matching up the city name columns
left_join(population, by = c("city" = "Name")) %>%
# add your new column based on the desired condition
city_type = ifelse(Density >= 1000, "Urban", "Suburb")
city price bedroom Density city_type
1 San Jose 2000 1 5358 Urban
2 Barstow 1000 1 547 Suburb
3 <NA> 500 1 NA <NA>
Using ifelse create the city_type in population_density, then we using match
city price bedroom city_type
1 San Jose 2000 1 Urban
2 Barstow 1000 1 Suburb
3 <NA> 1500 1 <NA>

R make new data frame from current one

I'm trying to calculate the best goal differentials in the group stage of the 2014 world cup.
football <- read.csv(
header = TRUE,
strip.white = TRUE
football <- head(football,n=48L)
football[which(max(abs(football$home_score - football$away_score)) == abs(football$home_score - football$away_score)),]
Results in
home home_continent home_score away away_continent away_score result
4 Cameroon Africa 0 Croatia Europe 4 l
7 Spain Europe 1 Netherlands Europe 5 l
37 Germany
So those are the games with the highest goal differntial, but now I need to make a new data frame that has a team name, and abs(football$home_score-football$away_score)
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- ifelse(football$home_score > football$away_score, as.character(football$home),
ifelse(football$result == "d", NA, as.character(football$away)))
You could save some typing in this way. You first get score differences and winners. When the result indicates w, home is the winner. So you do not have to look into scores at all. Once you add the score difference and winner, you can subset your data by subsetting data with max().
mydf <- read.csv(file="",
header = TRUE, strip.white = TRUE)
mydf <- head(mydf,n = 48L)
mutate(mydf, scorediff = abs(home_score - away_score),
winner = ifelse(result == "w", as.character(home),
ifelse(result == "l", as.character(away), "draw"))) %>%
filter(scorediff == max(scorediff))
# home home_continent home_score away away_continent away_score result scorediff winner
#1 Cameroon Africa 0 Croatia Europe 4 l 4 Croatia
#2 Spain Europe 1 Netherlands Europe 5 l 4 Netherlands
#3 Germany Europe 4 Portugal Europe 0 w 4 Germany
Here is another option without using ifelse for creating the "winner" column. This is based on row/column indexes. The numeric column index is created by matching the result column with its unique elements (match(football$result,..), and the row index is just 1:nrow(football). Subset the "football" dataset with columns 'home', 'away' and cbind it with an additional column 'draw' with NAs so that the 'd' elements in "result" change to NA.
football$score_diff <- abs(football$home_score - football$away_score)
football$winner <- cbind(football[c('home', 'away')],draw=NA)[
cbind(1:nrow(football), match(football$result, c('w', 'l', 'd')))]
football[with(football, score_diff==max(score_diff)),]
# home home_continent home_score away away_continent away_score result
#60 Brazil South America 1 Germany Europe 7 l
# score_diff winner
#60 6 Germany
If the dataset is very big, you could speed up the match by using chmatch from library(data.table)
chmatch(as.character(football$result), c('w', 'l', 'd'))
NOTE: I used the full dataset in the link

Reading names with special characters using R

I've an excel (xlsx) table and in the column "PLAYERS" European players have an asterisk in their names and South Americans don't. Something like this
Is there any way I can use R (or excel itself) to split this dataset into one with Europeans (with asterisk) and another one with South Americans? Of course, the data set contains other columns like "SALARY", "SCORED GOALS", "OFFSITE", "AGE" etc. etc. etc.
You could check if there's an "*" in the players name and in a new column write "European" or "South American" and, if you want, you could then split the data frame into a list with two data.frames, one with Europeans and the other with South Americans:
df <- data.frame(PLAYERS = c("Neymar", "*Ronaldo*", "Messi"), SALARY = 5:7)
#1 Neymar 5
#2 *Ronaldo* 6
#3 Messi 7
# check if there's a * in the PLAYERS column
df$Location <- ifelse(grepl("\\*", df$PLAYERS), "European", "South American")
#1 Neymar 5 South American
#2 *Ronaldo* 6 European
#3 Messi 7 South American
#split the data based on location:
dflist <- split(df, df$Location)
#2 *Ronaldo* 6 European
#$`South American`
#1 Neymar 5 South American
#3 Messi 7 South American
Now you can access each list element (which is a data.frame) by typing
dflist[["European"]] # or "South American" instead
#2 *Ronaldo* 6 European
You can split this specific column and name the resulting list with split and setNames
> dat <- structure(list(PLAYERS = structure(c(6L, 1L, 5L, 7L, 2L, 4L, 3L),
.Label = c("*Bale*", "*Benzema*", "DiMaria", "*Iniesta*",
"Messi", "Neymar", "*Ronaldo*"), class = "factor")),
.Names = "PLAYERS", class = "data.frame", row.names = c(NA,-7L))
> setNames(split(dat, grepl("[*]", dat$PLAYERS)), nm = c("Euro", "SoAm"))
# 1 Neymar
# 3 Messi
# 7 DiMaria
# $SoAm
# 2 *Bale*
# 4 *Ronaldo*
# 5 *Benzema*
# 6 *Iniesta*
Create a PivotTable from your source data with PLAYERS for ROWS. Filter with Label Filters, Contains... ~* and click on Grand Total. Return to PT, select Does Not Contain... and click on Grand Total again.
