Replace row values by max values in the group [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have a data frame look like this
a <- c(10,NA,30,40,NA,60,70,80,90,90,80,90,10,40)
b <- c(l,k,l,l,k,l,l,l,k,k,l,l,k,l)
c <- c(1,1,1,2,2,2,2,2,3,3,3,4,4,4)
I want to group data frame by column 'b' and 'c', then replace row values in 'a' column by max value of each group. For example: the 1st and 2nd of the 'a' column would be replaced by 30. Here is my code:
df%>%group_by(b, c)%>%mutate(a = max(a, na.rm = TRUE))
Other values are replaced by max value but not NA. I don't know why mutatefunction rewrite NA by inf. Here is the result I have with my code:
a <- c(30,inf,30,80,inf,80,80,80,90,90,90,90,10,90)
But I want it like this:
a <- c(30,30,30,80,80,80,80,80,90,90,90,90,10,90)

Assuming your data are:
Tuong_df <- data.frame(
c(10,NA,30,40,NA,60,70,80,90,90,80,90,10,40),
c("l","l","l","l","l","l","l","l","k","k","k","k","k","k"),
c(1,1,1,2,2,2,2,2,3,3,3,4,4,4))
names(Tuong_df) <- c("Var1","Var2","Var3")
You have to run the following code:
Tuong_df_mod <- Tuong_df %>%
group_by(Var2,Var3) %>%
mutate(Var1=max(Var1,na.rm=TRUE))
Anyway, for the near future, it should be better if you release reproducible code.

Related

mean() returns NaN for subset of dataframe [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I'm getting a NaN result for mean(group2$score). I'm trying to create a function to list descriptive statistics for my data.
printf <- function (...) {
cat(sprintf(...))
}
data <- data.frame(
id <- c(1:200),
group <- c(replicate(100, 1), replicate(100, 2)),
score <- rnorm(200, mean = 100, sd = 15)
)
descriptives <- function (data) {
group1 <- data[data$group <= 100, ]
group2 <- data[data$group >= 101, ]
printf("group 1 mean: %.2f\n", mean(group1$score))
printf("group 2 mean: %.2f\n", mean(group2$score)) #this is where the NaN gets printed
}
descriptives(data)
Sorry, just realised that I used $group when I was meant to use $id. Always easier to see mistakes after posting in a public space!
This usually means that there is no data in that subset.
mean(numeric(0))
[1] NaN
This is definitely the case with your data as the group variable is either 1 or 2 and you are trying to split them on a value of 100.

R: how to sample rows with custom frequencies [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have a data frame in R that has two columns, one with last names, the other with the frequency of each last name. I would like to randomly select last names based on the frequency values (0 -> 1).
So far I have tried using the sample function, but it doesn't allow for specific frequencies for each value. Not sure if this is possible :/
df1 <- data.frame(names = c("John","Mary"),freq=c(0.2,0.8))
df1
# names freq
# 1 John 0.2
# 2 Mary 0.8
set.seed(1)
sample100 <- sample(
x = df1$names,
size = 100,
replace=TRUE,
prob=df1$freq)
table(sample100)
# sample100
# John Mary
# 17 83

R: sort rows, query them and add results as colum [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an R dataframe with the dimension 32 x 11. For each row I would like to determine the highest value, the second highest, and the third highest value and add these values as extra colums to the initial dataframe (32 x 14). Many thanks in advance!
library(car)
data(mtcars)
mtcars
First, create a function to get the nth highest value for a vector. Then, create a copy of the dataframe, since the second highest value may change as you add more columns. Then apply your function using apply and 1 to operate row-wise. I'm not sure what would happen if there are NAs in the data. I haven't tested it...
Something like this...
nth_highest <- function(x, n)sort(x, decreasing=TRUE)[n]
tmp <- mtcars
mtcars$highest <- apply(tmp, 1, function(x)nth_highest(x,1))
mtcars$second_highest <- apply(tmp, 1, function(x)nth_highest(x,2))
mtcars$third_highest <- apply(tmp, 1, function(x)nth_highest(x,3))
rm(tmp)

R data table select rows to update column [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I am trying to update a column for a selection of rows based on the value of the same column. this column contains characters from '1','2',....'10','11'. what I want to do is to combine 11 categories into 3. So my code look like
library(data.table)
DT <- data.table(col = as.character(1:11))
DT[col %in% c('1','2','3'), col := '3']
DT[col %in% c('4','5','6','7','8'), col := '2']
DT[col %in% c('9','10','11'), col := '1']
weirdly, the last line doesn't work. the '10' and '11' are not updated. when I change 'c' to list (below code), it seems to work. but i don't know why this is the case.
DT[col %in% list('9','10','11'), col := '1']
Any help will be much appreciated.

WeiRd: R does not find value but it's just there [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
Trying to merge two data frames, using a variable called hash_id. For some reason R does not recognize the hash-id's in one of the data frames, while it does so in the other.
I have checked and I just don't get it. See below how I checked:
> head(df1[46],1) # so I take the first 'hash-id' from df1
# hash_id
# 1 abab123123
> which(df2 == "abab123123", arr.ind=TRUE) # here it shows that row 6847 contains a match
# row col
# [1,] 6847 32`
> which(df1 == "abab123123", arr.ind=TRUE) # and here there is NO matching value!
# row col
#
One possibility is trailing or leading spaces in the concerned columns for one of the datasets. You could do:
library(stringr)
df1[, "hash_id"] <- str_trim(df1[,"hash_id"])
df2[, "hash_id"] <- str_trim(df2[, "hash_id"])
which(df1[, "hash_id"]=="abab123123", arr.ind=TRUE)
which(df2[, "hash_id"]=="abab123123", arr.ind=TRUE)
Another way would be use grep
grepl("\\babab123123\\b", df1[,"hash_id"])
grepl("\\babab123123\\b", df2[,"hash_id"])

Resources