Use lapply on condition in R - r

it's easier to explain what I want to do if you look at the code first but essentially I think I want to use lapply on a condition but I wasn't able to do it.
library("tidyverse")
names <- rep(c("City A", "City B"), each = 11)
year <- rep(c(2010:2020), times = 2)
col_1 <- c(1, 17, 34, 788, 3, 4, 78, 98, 650, 45, 20,
23, 45, 56, 877, 54, 12, 109, 167, 12, 19, 908)
col_2 <- c(3, 4, 23, 433, 2, 45, 34, 123, 98, 76, 342,
760, 123, 145, 892, 23, 5, 90, 40, 12, 67, 98)
df <- as.data.frame(cbind(names, year, col_1, col_2))
df <- df %>%
mutate(col_1 = as.numeric(col_1),
col_2 = as.numeric(col_2))
I want every numeric column in the year 2018 and later to be rounded with round_any to a value which is a multiple of three (plyr::round_any, 3)
What I tried is this:
df_2018 <- df %>%
filter(year >= 2018)
df <- df %>%
filter(!(year >= 2018))
df_2018[, c(3:4)] <- lapply(df_2018[, c(3:4)], plyr::round_any, 3)
df <- rbind(df, df_2018)
In reality, there's about 50 numeric columns and tons of rows. What I tried works in theory but I would like to achieve it with less code and cleaner.
I am new to using lapply and I failed trying to combine it with an ifelse because I don't want it to change my year column.
Thank you for everyone who takes the time out of their day to look at this :)

Using dplyr::across and if_else you could do:
library(dplyr)
df |>
mutate(across(-c(names, year), ~ if_else(year >= 2018, plyr::round_any(.x, 3), .x)))
#> names year col_1 col_2
#> 1 City A 2010 1 3
#> 2 City A 2011 17 4
#> 3 City A 2012 34 23
#> 4 City A 2013 788 433
#> 5 City A 2014 3 2
#> 6 City A 2015 4 45
#> 7 City A 2016 78 34
#> 8 City A 2017 98 123
#> 9 City A 2018 651 99
#> 10 City A 2019 45 75
#> 11 City A 2020 21 342
#> 12 City B 2010 23 760
#> 13 City B 2011 45 123
#> 14 City B 2012 56 145
#> 15 City B 2013 877 892
#> 16 City B 2014 54 23
#> 17 City B 2015 12 5
#> 18 City B 2016 109 90
#> 19 City B 2017 167 40
#> 20 City B 2018 12 12
#> 21 City B 2019 18 66
#> 22 City B 2020 909 99

Using data.table:
cols <- grep("^col_[0-9]+$", names(df), value = TRUE)
setDT(df)[year >= 2018, (cols) := round(.SD / 3) * 3, .SDcols = cols]

Related

creating matrix from three column data frame in R [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 11 months ago.
I have a data frame with three columns where each row is unique:
df1
# state val_1 season
# 1 NY 3 winter
# 2 NY 10 spring
# 3 NY 24 summer
# 4 BOS 14 winter
# 5 BOS 26 spring
# 6 BOS 19 summer
# 7 WASH 99 winter
# 8 WASH 66 spring
# 9 WASH 42 summer
I want to create a matrix with the state names for rows and the seasons for columns with val_1 as the values. I have previously used:
library(reshape2)
df <- acast(df1, state ~ season, value.var='val_1')
And it has created the desired matrix with each state name appearing once but for some reason when I have been using acast or dcast recently it automatically defaults to the length function and gives 1's for the values. Can anyone recommend a solution?
data
state <- c('NY', 'NY', 'NY', 'BOS', 'BOS', 'BOS', 'WASH', 'WASH', 'WASH')
val_1 <- c(3, 10, 24, 14, 26, 19, 99, 66, 42)
season <- c('winter', 'spring', 'summer', 'winter', 'spring', 'summer',
'winter', 'spring', 'summer')
df1 <- data.frame(state, val_1, season)
You may define the fun.aggregate=.
library(reshape2)
acast(df1, state~season, value.var = 'val_1', fun.aggregate=sum)
# spring summer winter
# BOS 26 19 14
# NY 10 24 3
# WASH 66 42 99
This also works
library(reshape2)
state = c('NY', 'NY', 'NY', 'BOS', 'BOS', 'BOS', 'WASH', 'WASH', 'WASH')
val_1 = c(3, 10, 24, 14, 26, 19, 99, 66, 42)
season = c('winter', 'spring', 'summer', 'winter', 'spring', 'summer', 'winter', 'spring', 'summer')
df1 = data.frame(state,
val_1,
season)
dcast(df1, state~season, value.var = 'val_1')
#> state spring summer winter
#> 1 BOS 26 19 14
#> 2 NY 10 24 3
#> 3 WASH 66 42 99
Created on 2022-04-08 by the reprex package (v2.0.1)

Creating a new column in R conditioned on the values in a different column and different row

I'm trying to figure out how to create a new column in an R dataframe whose values are based on the values in another column, but in a different row. My data is as follows:
player <- c('Tim Duncan', 'Lebron James', 'Kobe Bryant', 'Paul Pierce',
'Tim Duncan', 'Lebron James', 'Kobe Bryant', 'Paul Pierce',
'Tim Duncan', 'Lebron James', 'Kobe Bryant', 'Paul Pierce')
t <- c(3, 3, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1)
min_per_game <- c(30, 36, 34, 33, 31, 36, 34, 32, 29, 35, 32, 36)
pts_per_36_min <- c(19, 28, 27, 24, 22, 27, 25, 28, 23, 28, 29, 29)
df <- data.frame(player, t, min_per_game, pts_per_36_min)
What I want to do is create a new column called "pts_per_game" that will look at each row in the dataframe, examine the value in the 't' column, then go find the row that has an equivalent value in the 'player' column but a value in the 't' column that is smaller by 1, and then fill the new "pts_per_game" column using data from the row that R has identified (specifically min_per_game/36 * pts_per_36 min).
So for example, in the first row of this dataframe, the value in the 'player' column is "Tim Duncan" and the value in the "t" column is 3. I want R to see that, go find the row where 'player == "Tim Duncan" and t == 2, take the data from that row and do ((min_per_game/36)* pts_per_36 min), and then put the resulting value in the first dataframe row (where player is Tim Duncan and t is 3) in a new column called "pts_per_game". And I want it do loop through the whole dataframe and do that for every row, with an understanding that this means that rows with the lowest possible value of t (1, in this case), will not be able to have a "pts_per_game" value computed for them, and thus should receive NA. Can anyone help me figure out how to do this?
You may try using dplyr::lead
library(dplyr)
df %>%
arrange(player, desc(t)) %>%
group_by(player) %>%
mutate(pts_per_game = lead(min_per_game)/36 * lead(pts_per_36_min))
player t min_per_game pts_per_36_min pts_per_game
<chr> <dbl> <dbl> <dbl> <dbl>
1 Kobe Bryant 3 34 27 23.6
2 Kobe Bryant 2 34 25 25.8
3 Kobe Bryant 1 32 29 NA
4 Lebron James 3 36 28 27
5 Lebron James 2 36 27 27.2
6 Lebron James 1 35 28 NA
7 Paul Pierce 3 33 24 24.9
8 Paul Pierce 2 32 28 29
9 Paul Pierce 1 36 29 NA
10 Tim Duncan 3 30 19 18.9
11 Tim Duncan 2 31 22 18.5
12 Tim Duncan 1 29 23 NA
This also works
data.frame(player, t, min_per_game, pts_per_36_min) %>%
arrange(player, desc(t)) %>%
dplyr::group_by(player) %>%
dplyr::mutate(pts_per_game = dplyr::lead(min_per_game)/36 * dplyr::lead(pts_per_36_min))

Is there any short way in R to find min Max based on below data?

I have two datasets. The first one is like this:
code | name
115 | A
120 | B
125 | A
130 | C
140 | A
The second one is like this:
code | Year
115 | 2015
140 | 2020
120 | 2017
130 | 2019
125 | 2011
Based on the column "code", I want to find the range of Year for each name like this:
code | Year | Range
115 | 2015 | 9
140 | 2020 | 9
120 | 2017 | 0
130 | 2019 | 0
125 | 2011 | 9
In fact, 9 is 2020-2011
My goal is to write a function with a minimum number of loops to work fast on a large number of data.
I think you want to first merge the dataframes:
data<-merge(df1, df2, by = 'code')
Then you want the range of the year column, grouped by code:
library(dplyr)
data %>% group_by(name) %>% mutate(Range=diff(range(Year)))
This can all be done in a single call:
library(dplyr)
merge(df1, df2, by = 'code')%>%
group_by(name)%>%
mutate(Range=diff(range(Year)))
code Year name Range
<dbl> <dbl> <chr> <dbl>
1 115 2015 A 9
2 140 2020 A 9
3 120 2017 B 0
4 130 2019 C 0
5 125 2011 A 9
left_join the dataframes by code
group_by name
use max and min
df <- tribble(
~code, ~name,
115, "A",
120, "B",
125, "A",
130, "C",
140, "A")
df1 <- tribble(
~code, ~Year,
115, 2015,
140, 2020,
120, 2017,
130, 2019,
125, 2011)
df2 <- df1 %>%
left_join(df, by="code") %>%
group_by(name) %>%
mutate(Range = max(Year) - min(Year)) %>%
select(-name)
df2
Output:
code Year name Range
<dbl> <dbl> <chr> <dbl>
1 115 2015 A 9
2 140 2020 A 9
3 120 2017 B 0
4 130 2019 C 0
5 125 2011 A 9

Fill NA values in one data table with observed values from a second data table in R

I can't believe I'm having this much trouble finding a solution to this problem: I have two data tables with identical rows and columns that look like this:
Country <- c("FRA", "FRA", "DEU", "DEU", "CHE", "CHE")
Year <- c(2010, 2020, 2010, 2020, 2010, 2020)
acctm <- c(20, 30, 10, NA, 20, NA)
acctf <- c(20, NA, 15, NA, 40, NA)
dt1 <- data.table(Country, Year, acctm, acctf)
Country Year acctm acctf
1 FRA 2010 20 20
2 FRA 2020 30 NA
3 DEU 2010 10 15
4 DEU 2020 NA NA
5 CHE 2010 20 40
6 CHE 2020 NA NA
Country <- c("FRA", "FRA", "DEU", "DEU", "CHE", "CHE")
Year <- c(2010, 2020, 2010, 2020, 2010, 2020)
acctm <- c(1, 1, 1, 60, 1, 70)
acctf <- c(1, 60, 1, 80, 1, 100)
dt2 <- data.table(Country, Year, acctm, acctf)
Country Year acctm acctf
1 FRA 2010 1 1
2 FRA 2020 2 60
3 DEU 2010 1 1
4 DEU 2020 60 80
5 CHE 2010 1 2
6 CHE 2020 70 100
I need to create a new data table that replaces NA values in dt1 with values for the corresponding country/year/variable match from dt2, yielding a table that looks like this:
Country Year acctm acctf
1 FRA 2010 20 20
2 FRA 2020 30 60
3 DEU 2010 10 15
4 DEU 2020 60 80
5 CHE 2010 20 40
6 CHE 2020 70 100
We can do this with a join on the 'Country', 'Year' columns
library(data.table)
nm1 <- names(dt1)[3:4]
nm2 <- paste0("i.", nm1)
dt3 <- copy(dt1)
dt3[dt2, (nm1) := Map(function(x, y)
fifelse(is.na(x), y, x), mget(nm1), mget(nm2)), on = .(Country, Year)]
dt3
# Country Year acctm acctf
#1: FRA 2010 20 20
#2: FRA 2020 30 60
#3: DEU 2010 10 15
#4: DEU 2020 60 80
#5: CHE 2010 20 40
#6: CHE 2020 70 100
Or to make this compact, use fcoalesce from data.table (comments from #IceCreamToucan)
dt3[dt2, (nm1) := Map(fcoalesce, mget(nm1), mget(nm2)), on = .(Country, Year)]
If the datasets are of same dimensions and have the same values for 'Country', 'Year', then another option is
library(purrr)
library(dplyr)
list(dt1[, .(acctm, acctf)], dt2[, .(acctm, acctf)]) %>%
reduce(coalesce) %>%
bind_cols(dt1[, .(Country, Year)], .)

how to fill values using conditional row in R?

I have following dataframe,
Here I want to add a column 'Constant Vol', where if the 'Year' column is 2006 all the values for 'Constant Vol' should be that od 2006 'Vol'. The result should be like following dataframe.
Using dplyr, we can group_by Seg and get the corresponding Vol where Year = 2006
library(dplyr)
df %>%
group_by(Seg) %>%
mutate(Constnt_Vol = Vol[Year == 2006])
# Seg Year Vol Constnt_Vol
# <fct> <int> <dbl> <dbl>
#1 Agri 2006 23 23
#2 Agri 2007 29 23
#3 Agri 2008 16 23
#4 Agri 2009 31 23
#5 Auto 2006 12 12
#6 Auto 2007 34 12
#7 Auto 2008 45 12
#8 Auto 2009 32 12
and in data.table that would be
library(data.table)
setDT(df)[, Constnt_Vol := Vol[Year == 2006], Seg]
This is assuming you have only one row with Year = 2006 in each Seg, if there are multiple we can use which.max to get the first one. (Vol[which.max(Year == 2006)]).
data
df <- data.frame(Seg = rep(c("Agri", "Auto"), each =4),
Year =2006:2009, Vol = c(23, 29, 16, 31, 12, 34, 45, 32))
We can use
library(dplyr)
df %>%
group_by(Seg) %>%
mutate(Constnt_Vol = Vol[match(2006, Year)])
data
df <- data.frame(Seg = rep(c("Agri", "Auto"), each =4),
Year =2006:2009, Vol = c(23, 29, 16, 31, 12, 34, 45, 32))

Resources