I want to select all columns that start in one of the four following ways: CB, LB, LW, CW but not any columns that have the string "con."
My current approach is:
tester <- df_ans[,names(df_ans) %in% colnames(df_ans)[grepl("^(LW|LB|CW|CB)[A-Z_0-9]*",colnames(df_ans))]]
tester <- tester[,names(tester) %in% colnames(tester)[!grepl("con",colnames(tester))]]
Is there a better / more efficient way to do this in a library like dplyr?
We can use matches
df %>%
select(matches("^(CB|LB|LW|CW)"), -matches("con"))
# CB1 LB2 CW3 LW20
#1 3 9 6 1
#2 3 3 4 5
#3 7 7 7 7
#4 5 8 7 2
#5 6 3 3 3
df <- as.data.frame(matrix(sample(1:9, 10 * 5, replace = TRUE),
ncol = 10, dimnames = list(NULL, c("CB1", "LB2", "CW3", "WC1",
"LW20", "conifer", "hercon", "other", "other2", "other3"))))
Try this:
nms <- names(df_ans)
df_ans[ grepl("^(LW|LB|CW|CB)", nms) & !grepl("con", nms) ]
I was examining below code
DF = data.frame('A' = 1:3, 'B' =2:4)
Condition = 'A'
fn1 = function(x) x + 3
fn2 = function(x) x + 5
DF %>% mutate('aa' = 3:5) %>%
{if (Condition == 'A') {
bb = . %>% mutate('A1' = fn1(A), 'B1' = fn1(B))
} else {
bb = . %>% mutate('A1' = fn2(A), 'B1' = fn2(B))
Basically, I have 2 similar functions fn1 and fn2. Now based on some condition, I want to use one of these functions.
Above implementation is throwing below error -
Functional sequence with the following components:
1. mutate(., A1 = fn1(A), B1 = fn1(B))
Use 'functions' to extract the individual functions.
Can you please help be how to properly write the pipe sequence to execute above code?
We could use across within mutate
DF %>%
mutate(aa = 3:5, across(c(A, B), ~ if(Condition == 'A') fn1(.)
else fn2(.), .names = "{.col}1"))
A B aa A1 B1
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
Also, an option is to get the functions in a list and convert the logical vector to numeric index for subsetting
DF %>%
mutate(aa = 3:5,
across(c(A, B), ~ list(fn2, fn1)[[1 + (Condition == 'A')]](.),
.names = "{.col}1"))
A B aa A1 B1
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
Based on the comments, if we need a custom name for the new columns, create a named vector and replace with str_replace_all
nm1 <- setNames(c("XXX", "YYY"), names(DF)[1:2])
DF %>%
mutate(aa = 3:5,
across(c(A, B), ~ list(fn2, fn1)[[1 + (Condition == 'A')]](.),
.names = "{str_replace_all(.col, nm1)}"))
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
I have a df where one variable is an integer. I'd like to split this column into it's individual digits. See my example below
Group Number
A 456
B 3
C 18
Group Number Digit1 Digit2 Digit3
A 456 4 5 6
B 3 3 NA NA
C 18 1 8 NA
We can use read.fwf from base R. Find the max number of character (nchar) in 'Number' column (mx). Read the 'Number' column after converting to character (as.character), specify the 'widths' as 1 by replicating 1 with mx and assign the output to new 'Digit' columns in the data
mx <- max(nchar(df1$Number))
df1[paste0("Digit", seq_len(mx))] <- read.fwf(textConnection(
as.character(df1$Number)), widths = rep(1, mx))
# Group Number Digit1 Digit2 Digit3
#1 A 456 4 5 6
#2 B 3 3 NA NA
#3 C 18 1 8 NA
df1 <- structure(list(Group = c("A", "B", "C"), Number = c(456L, 3L,
18L)), class = "data.frame", row.names = c(NA, -3L))
Another base R option (I think #akrun's approach using read.fwf is much simpler)
strsplit(as.character(Number), ""),
`length<-`, max(nchar(Number))
), paste0("Digit", seq(max(nchar(Number))))),
as.is = TRUE
which gives
Group Number Digit1 Digit2 Digit3
1 A 456 4 5 6
2 B 3 3 NA NA
3 C 18 1 8 NA
Using splitstackshape::cSplit
splitstackshape::cSplit(df, 'Number', sep = '', stripWhite = FALSE, drop = FALSE)
# Group Number Number_1 Number_2 Number_3
#1: A 456 4 5 6
#2: B 3 3 NA NA
#3: C 18 1 8 NA
I realized I could use max function for counting characters limit in each row so that I could include it in my map2 function and save some lines of codes thanks to an accident that led to an inspiration by dear #ThomasIsCoding.
df %>%
rowwise() %>%
mutate(map2_dfc(Number, 1:max(nchar(Number)), ~ str_sub(.x, .y, .y))) %>%
unnest(cols = !c(Group, Number)) %>%
rename_with(~ str_replace(., "\\.\\.\\.", "Digit"), .cols = !c(Group, Number)) %>%
mutate(across(!c(Group, Number), as.numeric, na.rm = TRUE))
# A tibble: 3 x 5
Group Number Digit1 Digit2 Digit3
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 456 4 5 6
2 B 3 3 NA NA
3 C 18 1 8 NA
df <- tribble(
~Group, ~Number,
"A", 456,
"B", 3,
"C", 18
Two base r methods:
no_cols <- max(nchar(as.character(df1$Number)))
# Using `strsplit()`:
cbind(df1, setNames(data.frame(do.call(rbind,
lapply(strsplit(as.character(df1$Number), ""),
function(x) {
length(x) <- no_cols
), paste0("Digit", seq_len(no_cols))))
# Using `regmatches()` and `gregexpr()`:
cbind(df1, setNames(data.frame(do.call(rbind,
lapply(regmatches(df1$Number, gregexpr("\\d", df1$Number)),
function(x) {
length(x) <- no_cols
), paste0("Digit", seq_len(no_cols))))
I have data like this:
Name Rating
Tom 3
Tom 4
Tom 2
Johnson 5
Johnson 7
But I'd like it so each unique name is instead a column, with the ratings below, in each row. How can I approach this?
Here is a good way of doing it
x <- data.frame(c("Tom", "Tom", "Tom", "Johnson", "Johnson"), c(3,4,2,5,7))
colnames(x) <- c("Name", "Rating")
n <- unique(x[,1])
m <- max(table(x[,1]))
c <- data.frame(matrix(, ncol = length(n), nrow = m))
for (i in 1:length(n)) {
l <- x[which(x[,1] == n[i]), 2]
l2 <- rep("", m - length(l))
c[,i] <- c(l, l2)
colnames(c) <- n
Tom Johnson
1 3 5
2 4 7
3 2
Here is a way using CRAN package reshape.
d <- dcast(mydata, Rating ~ Name, value.var = "Rating")[-1]
# Johnson Tom
#1 NA 2
#2 NA 3
#3 NA 4
#4 5 NA
#5 7 NA
As you can see, there are too many NA values in this result. One way of getting rid of them could be:
d <- lapply(d, function(x) x[!is.na(x)])
n <- max(sapply(d, length))
d <- do.call(cbind.data.frame, lapply(d, function(x) c(x, rep(NA, n - length(x)))))
# Johnson Tom
#1 5 2
#2 7 3
#3 NA 4
Well, this does the job but introduces some NAs.
Edit: Replace the NAs with some other Rating.
mydata1<-mydata %>%
mutate(Name=as.factor(Name)) %>%
melt(id.var="Name") %>%
dcast(variable+value~Name) %>%
select(-value) %>%
rename(Name=variable) %>%
mydata1 %>%
mutate(Johnson=as.factor(Johnson),Tom=as.factor(Tom)) %>%
mutate(Johnson=fct_explicit_na(Johnson,na_level = "No Rating"),
Tom=fct_explicit_na(Tom,na_level = "No Rating"))
Johnson Tom
1 No Rating 2
2 No Rating 3
3 No Rating 4
4 5 No Rating
5 7 No Rating
I have a data frame where column "A" has 6 distinct values. Column "B" has float values. By using dplyr, I can group by column "A" and find mean of column "B" of each group as follows:
mydf %>% group_by(A) %>% summarize(Mean = mean(B, na.rm=TRUE))
My utter aim is to find rows in each group whose "B" values are higher than the group average. How can I achieve this (using base R or dplyr)?
A simple alternative with base R ave would be
df[df$b > ave(df$b, df$a) , ]
# a b
#4 1 4
#5 1 5
#9 2 9
#10 2 10
The default argument for ave is mean so no need to mention it explicitly, if there are NA values present in b modify it to
df[df$b > ave(df$b, df$a, FUN = function(x) mean(x,na.rm = TRUE)) , ]
Another solution with subset and ave as suggested by #Onyambu
# a b
#4 1 4
#5 1 5
#9 2 9
#10 2 10
df <- data.frame(a = rep(c(1, 2), each = 5), b = 1:10)
# a b
#1 1 1
#2 1 2
#3 1 3
#4 1 4
#5 1 5
#6 2 6
#7 2 7
#8 2 8
#9 2 9
#10 2 10
You can just group and then filter:
mydf %>%
group_by(A) %>%
filter(B > mean(B, na.rm = TRUE)) %>%
Using Base R, I would go for this. It is not as elegant as dplyr.
mean.df <- aggregate(mydf$b, by =list(a = mydf$a), FUN = mean)
names(mean.df)[2] <- "mean"
mydf <- merge(mydf, mean.df, by = "a")
# Rows whose values are higher than mean
new.df <- subset(mydf, b > mean, select = -mean)
I like working with Data tables. So a data.table solution would be,
mydt <- data.table(mydf)
mydt[, mean := mean(b), by = a]
new.dt <- mydt[b > mean, -c("mean"), with = TRUE]
Another way to do it using base R and tapply:
mydf = cbind.data.frame(A=sample(6,20,rep=T),B=runif(20))
mydf.ave = tapply(mydf$B,mydf$A,mean)
newdf = mydf[mydf$B > mydf.ave[as.character(mydf$A)],]
(thus the one liner would be:mydf[mydf$B > tapply(mydf$B,mydf$A,mean)[as.character(mydf$A)],])
I would need to expand on this question: convert data frame of counts to proportions in R
I need to calculate proportion by one condition and retain the information of the dataset.
Reproducible example:
ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(cbind(ID, trial,variable1,variable2,variable3,condition))
For each variable I would like to have the proportion by the ID (i.e. 3 times)
Ideally the new variables would be stored in the same database as dat$variable1_p
I know how to do the trick by a series of for loops but I would like to learn how to use the apply function. Also to be able to expand it to more conditions if necessary.
We can use adply from the plyr package:
adply(dat, 1, function(x)
c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1)))
# ID trial variable1 variable2 variable3 condition variable1_p
# 1 1 a 3 5 4 i 0.20000000
# 2 1 a 8 9 9 j 0.53333333
# 3 1 a 4 4 8 k 0.26666667
# 4 2 a 7 10 5 i 0.50000000
# 5 2 a 6 8 10 j 0.42857143
# 6 2 a 1 1 7 k 0.07142857
# 7 3 a 10 6 3 i 0.47619048
# 8 3 a 9 7 6 j 0.42857143
# 9 3 a 2 3 2 k 0.09523810
Another option is to use dplyr, which would handle cases where there is more than one row per condition per ID:
dat %>%
group_by(ID, condition) %>%
mutate(sum_v1_cond = sum(variable1)) %>%
ungroup() %>%
group_by(ID) %>%
mutate(variable1_p = sum_v1_cond / sum(variable1)) %>%
Edit - here's a full solution for variable1, variable2, and variable3:
adply(dat, 1, function(x)
c('variable1_p' = x$variable1 / sum(dat[x$ID == dat$ID,]$variable1),
'variable2_p' = x$variable2 / sum(dat[x$ID == dat$ID,]$variable2),
'variable3_p' = x$variable3 / sum(dat[x$ID == dat$ID,]$variable3)))
ID <- rep(c(1,2,3), each=3)
trial <- rep("a", 9)
variable1 <- sample(1:10, 9)
variable2 <- sample(1:10, 9)
variable3 <- sample(1:10, 9)
condition <- rep(c("i","j","k"), 3)
dat <- data.frame(ID, trial,variable1,variable2,variable3,condition,
stringsAsFactors = FALSE)