Mutate multiple columns using loop - r

I have a df like this :
Class_A Class_B
78 50
40 60
30 70
The result I want is
Class_A Class_B RankClass_A RankClass_B
78 50 1 3
40 60 2 2
30 70 3 1
Basically, I can create two or more cols by using mutate function. However, when I put it in a loop to create more cols the code does not work.
Here is my code
label<-c('RankClass_A',"RankClass_B")
for (i in 1:2){
for (k in 1:2){
mutate(df,label[i]=dense_rank(desc(df[k])
}
}

We can use mutate_all to create the 'Rank' columns
df %>%
mutate_all(funs(Rank = rank(-.)))
# Class_A Class_B Class_A_Rank Class_B_Rank
#1 78 50 1 3
#2 40 60 2 2
#3 30 70 3 1

Related

Find maximum between two rows, column wise

I'm a newbie in R, trying to figure out how to find the maximum value between 2 values in a single column.
Example data:
t min most max
---------------
1 10 20 40
2 5 10 30
3 14 28 60
4 40 75 150
Result I'm looking for:
t min most max
---------------
1 10 20 40
2 14 28 60
3 40 75 150
I have tried using rowWise(), but it's not working. I am getting the maximum value row wise using:
df$new <-pmax(df$min, df$most, df$max)
df
which gives me the maximum value for the entire row.
t min most max new
-------------------
1 10 20 40 40
2 5 10 30 30
3 14 28 60 60
4 40 75 150 150
Thanks in advance.
You can do this with pmax applied to the vector against its shifted self. Putting it in a nice little helper function:
adj_max = function(x) {
pmax(x[-1], x[-length(x)])
}
as.data.frame(lapply(your_data, adj_max))
# or with dplyr
your_data %>%
summarize(across(everything(), adj_max))
Reproducible demo:
x = c(10, 5, 14, 40)
adj_max(x)
# [1] 10 14 40

Using a function and mapply in R to create new columns that sums other columns

Suppose, I have a dataframe, df, and I want to create a new column called "c" based on the addition of two existing columns, "a" and "b". I would simply run the following code:
df$c <- df$a + df$b
But I also want to do this for many other columns. So why won't my code below work?
# Reproducible data:
martial_arts <- data.frame(gym_branch=c("downtown_a", "downtown_b", "uptown", "island"),
day_boxing=c(5,30,25,10),day_muaythai=c(34,18,20,30),
day_bjj=c(0,0,0,0),day_judo=c(10,0,5,0),
evening_boxing=c(50,45,32,40), evening_muaythai=c(50,50,45,50),
evening_bjj=c(60,60,55,40), evening_judo=c(25,15,30,0))
# Creating a list of the new column names of the columns that need to be added to the martial_arts dataframe:
pattern<-c("_boxing","_muaythai","_bjj","_judo")
d<- expand.grid(paste0("martial_arts$total",pattern))
# Creating lists of the columns that will be added to each other:
e<- names(martial_arts %>% select(day_boxing:day_judo))
f<- names(martial_arts %>% select(evening_boxing:evening_judo))
# Writing a function and using mapply:
kick_him <- function(d,e,f){d <- rowSums(martial_arts[ , c(e, f)], na.rm=T)}
mapply(kick_him,d,e,f)
Now, mapply produces the correct results in terms of the addition:
> mapply(ff,d,e,f)
Var1 <NA> <NA> <NA>
[1,] 55 84 60 35
[2,] 75 68 60 15
[3,] 57 65 55 35
[4,] 50 80 40 0
But it doesn't add the new columns to the martial_arts dataframe. The function in theory should do the following
martial_arts$total_boxing <- martial_arts$day_boxing + martial_arts$evening_boxing
...
...
martial_arts$total_judo <- martial_arts$day_judo + martial_arts$evening_judo
and add four new total columns to martial_arts.
So what am I doing wrong?
The assignment is wrong here i.e. instead of having martial_arts$total_boxing as a string, it should be "total_boxing" alone and this should be on the lhs of the Map/mapply. As the OP already created the 'martial_arts$' in 'd' dataset as a column, we are removing the prefix part and do the assignment
kick_him <- function(e,f){rowSums(martial_arts[ , c(e, f)], na.rm=TRUE)}
martial_arts[sub(".*\\$", "", d$Var1)] <- Map(kick_him, e, f)
-check the dataset now
> martial_arts
gym_branch day_boxing day_muaythai day_bjj day_judo evening_boxing evening_muaythai evening_bjj evening_judo total_boxing total_muaythai total_bjj total_judo
1 downtown_a 5 34 0 10 50 50 60 25 55 84 60 35
2 downtown_b 30 18 0 0 45 50 60 15 75 68 60 15
3 uptown 25 20 0 5 32 45 55 30 57 65 55 35
4 island 10 30 0 0 40 50 40 0 50 80 40 0

Assigning unique ID to records based on certain deference between values in consecutive rows using loop in r

This is my df (data.frame)
Time <- c("16:04:56", "16:04:59", "16:05:02", "16:05:04", "16:05:11", "16:05:13", "16:07:59", "16:08:09", "16:09:03", "16:09:51", "16:11:10")
Distance <- c(45,38,156,157,37,159,79,79,78,160,78)
df <-as.data.frame(cbind(Time,Distance));dat
Time Distance
16:04:56 45
16:04:59 38
16:05:02 156
16:05:04 157
16:05:11 37
16:05:13 159
16:07:59 79
16:08:09 79
16:09:03 78
16:09:51 160
16:11:10 78
I need to assign an ID to each record based on two conditions:
If the absolute difference between two consecutive rows of the Time column is 1 minute and
If the difference between two consecutive rows of the Distance column is 10.
Only when both conditions are satisfied then should assign a new ID.
Results should be like this
Time Distance ID
16:04:56 45 1
16:04:59 38 1
16:05:02 156 1
16:05:04 157 1
16:05:11 37 1
16:05:13 159 1
16:07:59 79 2
16:08:09 79 2
16:09:03 78 2
16:09:51 160 2
16:11:10 78 3
Thanks to all who contribute any thoughts.
Change Time column to POSIXct format. Take difference between consecutive rows for Time and Distance column and increment the count using cumsum.
library(dplyr)
df %>%
mutate(Time1 = as.POSIXct(Time, format = '%T'),
ID = cumsum(
abs(difftime(Time1, lag(Time1, default = first(Time1)), units = 'mins')) > 1 &
abs(Distance - lag(Distance, default = first(Distance))) > 10) + 1) %>%
select(-Time1)
# Time Distance ID
#1 16:04:56 45 1
#2 16:04:59 38 1
#3 16:05:02 156 1
#4 16:05:04 157 1
#5 16:05:11 37 1
#6 16:05:13 159 1
#7 16:07:59 79 2
#8 16:08:09 79 2
#9 16:09:03 78 2
#10 16:09:51 160 2
#11 16:11:10 78 3
data
df <-data.frame(Time,Distance)

sorting row values in dataframe by column values

I have difficulty sorting row values by particular column.
The values have different order, for example,
METHOD VAL1 VAL2 VAL3
1-A 10 2 15
10-B 11 5 15
11-c 23 45 65
2-F 4 65 67
3-T 4 56 11
and I need like this,
METHOD VAL1 VAL2 VAL3
1-A 10 2 15
2-F 4 65 67
3-T 4 56 11
10-B 11 5 15
11-c 23 45 65
The sorting order is based on METHOD column. I've tried to arrange it in many ways but without success.
I have solved this issue but there is an another issue on the same code. Individually, the following code works but when applied to function - creates an issue.
a1 <- a1[order(as.numeric(gsub("-.*", "", a1$varname))),]
My function as follows,
t1<- doTable1(AE_subset$Disp_code,AE_subset$FY,"DisposalMethod",thresh = 0.02,testvar = AE_subset$Attendance,fun="sum")
doTable1<- function(var1,var2,varname,testvar=NULL,fun=NULL,inc=TRUE,thresh=0.02) {
if (is.null(fun)) {
a1<- as.data.frame.matrix(table(var1,var2))
} else {
a1<- as.data.frame.matrix(tapply(testvar,list(var1,var2),FUN=fun,na.rm=TRUE))
}
a1<- rownames_to_column(a1,var=varname)
a1$FY3PR<- a1$FY3*proRata
if (!is.null(fun))
if (fun=="mean")
a1$FY3PR<- a1$FY3
a1 <- a1[order(as.numeric(gsub("-.*", "", a1$varname))),] # dataframe is not updating here
a1 <- a1 %>% replace(., is.na(.), 0)
a1 <- rbind(a1,c("Total",as.numeric(colSums(a1[,2:4]))))
return(a1)
}
Simple it returns NULL data frame.
Can anyone identify why this function fails when it comes to order() command?
You can use gsub to split the numbers from the characters and order them:
df[order(as.numeric(gsub("-.*", "", df$METHOD))),]
METHOD VAL1 VAL2 VAL3
1 1-A 10 2 15
4 2-F 4 65 67
5 3-T 4 56 11
2 10-B 11 5 15
3 11-c 23 45 65
With dplyr you can do:
library(dplyr)
dat %>% # we create a new column based on METHOD
mutate(met_num =as.numeric(gsub("\\D", "", METHOD)) ) %>% # gets only the number part
arrange(met_num) %>% # we arrange just by the number part of METHOD
select(-met_num) # removes that new column
METHOD VAL1 VAL2 VAL3
1 1-A 10 2 15
2 2-F 4 65 67
3 3-T 4 56 11
4 10-B 11 5 15
5 11-c 23 45 65
Data used:
tt <- "METHOD VAL1 VAL2 VAL3
1-A 10 2 15
10-B 11 5 15
11-c 23 45 65
2-F 4 65 67
3-T 4 56 11"
dat <- read.table(text = tt, header = T)

Summing values after every third position in data frame in R

I am new to R. I have a data frame like following
>df=data.frame(Id=c("Entry_1","Entry_1","Entry_1","Entry_2","Entry_2","Entry_2","Entry_3","Entry_4","Entry_4","Entry_4","Entry_4"),Start=c(20,20,20,37,37,37,68,10,10,10,10),End=c(50,50,50,78,78,78,200,94,94,94,94),Pos=c(14,34,21,50,18,70,101,35,2,56,67),Hits=c(12,34,17,89,45,87,1,5,6,3,26))
Id Start End Pos Hits
Entry_1 20 50 14 12
Entry_1 20 50 34 34
Entry_1 20 50 21 17
Entry_2 37 78 50 89
Entry_2 37 78 18 45
Entry_2 37 78 70 87
Entry_3 68 200 101 1
Entry_4 10 94 35 5
Entry_4 10 94 2 6
Entry_4 10 94 56 3
Entry_4 10 94 67 26
For each entry I would like to iterate the data.frame in 3 different modes. For an example, for Entry_1 mode_1 =seq(20,50,3)and mode_2=seq(21,50,3) and mode_3=seq(22,50,3). I would like sum all the Values in Column "Hits" whose corresponding values in Column "Pos" that falls in mode_1 or_mode_2 or mode_3 and generate a data.frame like follow:
Id Mode_1 Mode_2 Mode_3
Entry_1 0 17 34
Entry_2 87 89 0
Entry_3 1 0 0
Entry_4 26 8 0
I tried the following code:
mode_1=0
mode_2=0
mode_3=0
mode_1_sum=0
mode_2_sum=0
mode_3_sum=0
for(i in dim(df)[1])
{
if(df$Pos[i] %in% seq(df$Start[i],df$End[i],3))
{
mode_1_sum=mode_1_sum+df$Hits[i]
print(mode_1_sum)
}
mode_1=mode_1_sum+counts
print(mode_1)
ifelse(df$Pos[i] %in% seq(df$Start[i]+1,df$End[i],3))
{
mode_2_sum=mode_2_sum+df$Hits[i]
print(mode_2_sum)
}
mode_2_sum=mode_2_sum+counts
print(mode_2)
ifelse(df$Pos[i] %in% seq(df$Start[i]+2,df$End[i],3))
{
mode_3_sum=mode_3_sum+df$Hits[i]
print(mode_3_sum)
}
mode_3_sum=mode_3_sum+counts
print(mode_3_sum)
}
But the above code only prints 26. Can any one guide me how to generate my desired output, please. I can provide much more details if needed. Thanks in advance.
It's not an elegant solution, but it works.
m <- 3 # Number of modes you want
foo <- ((df$Pos - df$Start)%%m + 1) * (df$Start < df$Pos) * (df$End > df$Pos)
tab <- matrix(0,nrow(df),m)
for(i in 1:m) tab[foo==i,i] <- df$Hits[foo==i]
aggregate(tab,list(df$Id),FUN=sum)
# Group.1 V1 V2 V3
# 1 Entry_1 0 17 34
# 2 Entry_2 87 89 0
# 3 Entry_3 1 0 0
# 4 Entry_4 26 8 0
-- EXPLANATION --
First, we find the indices of df$Pos That are both bigger than df$Start and smaller than df$End. These should return 1 if TRUE and 0 if FALSE. Next, we take the difference between df$Pos and df$Start, we take mod 3 (which will give a vector of 0s, 1s and 2s), and then we add 1 to get the right mode. We multiply these two things together, so that the values that fall within the interval retain the right mode, and the values that fall outside the interval become 0.
Next, we create an empty matrix that will contain the values. Then, we use a for-loop to fill in the matrix. Finally, we aggregate the matrix.
I tried looking for a quicker solution, but the main problem I cannot work around is the varying intervals for each row.

Resources