Let's say I have a dataset with 6 columns, and I want to replace the strings in the names of the columns including the string "likes_comment" through the string "number_likes".
Example data:
da = data.frame(likes_comment_1 = c(1,2,3,4), likes_comment_2 = c(2,2,3,1), likes_comment_3=c(5,2,3,1), quotes_comment1=c(2,1,3,4), quotes_comment2_=c(3,5,7,1), quotes_comment3=c(2,3,1,2))
da
likes_comment_1 likes_comment_2 likes_comment_3 quotes_comment1 quotes_comment2_ quotes_comment3
1 1 2 5 2 3 2
2 2 2 2 1 5 3
3 3 3 3 3 7 1
4 4 1 1 4 1 2
Target data:
target_da = data.frame(number_likes_1 = c(1,2,3,4), number_likes_2 = c(2,2,3,1), number_likes_3=c(5,2,3,1), quotes_comment1=c(2,1,3,4), quotes_comment2_=c(3,5,7,1), quotes_comment3=c(2,3,1,2))
target_da
number_likes_1 number_likes_2 number_likes_3 quotes_comment1 quotes_comment2_ quotes_comment3
1 1 2 5 2 3 2
2 2 2 2 1 5 3
3 3 3 3 3 7 1
4 4 1 1 4 1 2
How can I do this?
You can use rename_with -
library(dplyr)
library(stringr)
da %>% rename_with(~str_replace(., 'likes_comment', 'number_likes'))
# number_likes_1 number_likes_2 number_likes_3 quotes_comment1
#1 1 2 5 2
#2 2 2 2 1
#3 3 3 3 3
#4 4 1 1 4
# quotes_comment2_ quotes_comment3
#1 3 2
#2 5 3
#3 7 1
#4 1 2
Use sub:
names(da) <- sub("likes_comment_(\\d+)", "number_likes_\\1", names(da))
Does this work:
names(da)[grepl('likes_comment',names(da))] <- gsub('likes_comment','number_likes',names(da)[grepl('likes_comment',names(da))])
da
number_likes_1 number_likes_2 number_likes_3 quotes_comment1 quotes_comment2_ quotes_comment3
1 1 2 5 2 3 2
2 2 2 2 1 5 3
3 3 3 3 3 7 1
4 4 1 1 4 1 2
Since OP is tagged with data.table:
library(data.table)
setnames(da, sub('likes_comment', 'number_likes', names(da), fixed = TRUE))
Related
I have a dataframe like this.
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3))
I want to populate a new variable Sequence which identifies whenever Condition starts again from 1.
So the new dataframe would look like this.
Thanks in advance for the help!
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3),
Sequence = c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,3))
base R
data$Sequence2 <- cumsum(c(TRUE, data$Condition[-1] == 1 & data$Condition[-nrow(data)] != 1))
data
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
dplyr
library(dplyr)
data %>%
mutate(
Sequence2 = cumsum(Condition == 1 & lag(Condition != 1, default = TRUE))
)
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
This took a while. Finally I find this solution:
library(dplyr)
data %>%
group_by(Sequnce = cumsum(
ifelse(Condition==1, lead(Condition)+1, Condition)
- Condition==1)
)
Condition Sequnce
<dbl> <int>
1 1 1
2 1 1
3 2 1
4 3 1
5 1 2
6 1 2
7 2 2
8 2 2
9 2 2
10 3 2
11 1 3
12 1 3
13 2 3
14 3 3
15 3 3
I've got variables in a dataset like this:
dat1 <- read.table(header=TRUE, text="
comp_T1_01 comp_T1_02 comp_T1_03 res_T1_01 res_T1_02 res_T1_03 res_T1_04
1 1 2 1 5 5 5
2 1 3 3 4 4 1
3 1 3 1 3 2 2
4 2 5 5 3 2 2
5 1 4 1 2 1 3
")
I would like erase the "T1" of all the variables at once. As I have over 100 Variables the "colnames" would be a bit too complicated.
Is there a command that can do that?
Thank you!
You can use sub :
names(dat1) <- sub('_T1', '', names(dat1))
dat1
# comp_01 comp_02 comp_03 res_01 res_02 res_03 res_04
#1 1 1 2 1 5 5 5
#2 2 1 3 3 4 4 1
#3 3 1 3 1 3 2 2
#4 4 2 5 5 3 2 2
#5 5 1 4 1 2 1 3
In dplyr, you can use rename_with :
library(dplyr)
dat1 %>% rename_with(~sub('_T1', '', .))
We can use str_remove
library(dplyr)
library(stringr)
dat1 %>%
rename_all(~ str_remove(., '_T1'))
this may also work,
names(dat1) <- gsub(x = names(dat1), pattern = "\\_T1", replacement = "")
dat1
comp_01 comp_02 comp_03 res_01 res_02 res_03 res_04
1 1 1 2 1 5 5 5
2 2 1 3 3 4 4 1
3 3 1 3 1 3 2 2
4 4 2 5 5 3 2 2
5 5 1 4 1 2 1 3
I’m trying to create a data frame in r that looks like this
X Y Z
3 1 1
3 1 2
3 1 3
3 2 1
3 2 2
3 2 3
4 1 1
4 1 2
4 1 3
4 2 1
...
So column z counts up to 3 then when it reaches 3 column y increments by 1 and z counts up again until 3. Then x increments by 1 and the process starts again
You could use expand.grid + rev
rev(expand.grid(z = 1:3, y = 1:2, x = 3:4))
x y z
1 3 1 1
2 3 1 2
3 3 1 3
4 3 2 1
5 3 2 2
6 3 2 3
7 4 1 1
8 4 1 2
9 4 1 3
10 4 2 1
11 4 2 2
12 4 2 3
An option is to use tidyr::crossing().
In your case:
crossing(X = 3:4,
Y = 1:2,
Z = 1:3)
data.frame(X=rep(3:4,each=6,1),
Y=rep(1:2,each=3,2),
Z=rep(1:3,each=1,4))
Here is another base R solution in addition to the expand.grid approach by #Onyambu.
The feature of this code below is that, you only need to put everything into the list lst, and pass it to function f:
f <- function(lst) data.frame(mapply(function(p,n) rep(p,each=n),lst, prod(lengths(lst))/cumprod(lengths(lst))))
lst<- list(x = 3:4,y = 1:2,z = 1:3)
res <- f(lst)
such that
> res
x y z
1 3 1 1
2 3 1 2
3 3 1 3
4 3 2 1
5 3 2 2
6 3 2 3
7 4 1 1
8 4 1 2
9 4 1 3
10 4 2 1
11 4 2 2
12 4 2 3
A data.table solution for completness:
data.table::CJ(x = 3:4, y = 1:2, z = 1:3)
x y z
1: 3 1 1
2: 3 1 2
3: 3 1 3
4: 3 2 1
5: 3 2 2
6: 3 2 3
7: 4 1 1
8: 4 1 2
9: 4 1 3
10: 4 2 1
11: 4 2 2
12: 4 2 3
I have done an experiment in which participants have solved a task in pairs, with another participant. Each participant has then received a score for how well they did the task. Pairs have gone through different amounts of trials.
I have a data frame similar to the one below:
participant <- c(1,1,2,2,3,3,3,4,4,4,5,6)
pair <- c(1,1,1,1,2,2,2,2,2,2,3,3)
trial <- c(1,2,1,2,1,2,3,1,2,3,1,1)
score <- c(2,3,6,3,4,7,3,1,8,5,4,3)
data <- data.frame(participant, pair, trial, score)
participant pair trial score
1 1 1 2
1 1 2 3
2 1 1 6
2 1 2 3
3 2 1 4
3 2 2 7
3 2 3 3
4 2 1 1
4 2 2 8
4 2 3 5
5 3 1 4
6 3 1 3
I would like to add a new vector to the data frame, where each participant gets the numeric difference between their own score and the other participant's score within each trial.
Does someone have an idea about how one might do that?
It should end up looking something like this:
participant pair trial score difference
1 1 1 2 4
1 1 2 3 0
2 1 1 6 4
2 1 2 3 0
3 2 1 4 3
3 2 2 7 1
3 2 3 3 2
4 2 1 1 3
4 2 2 8 1
4 2 3 5 2
5 3 1 4 1
6 3 1 3 1
Here's a solution that involves first reordering data such that each sequential pair of rows corresponds to a single pair within a single trial. This allows us to make a single call to diff() to extract the differences:
data <- data[order(data$trial,data$pair,data$participant),];
data$diff <- rep(diff(data$score)[c(T,F)],each=2L)*c(-1L,1L);
data;
## participant pair trial score diff
## 1 1 1 1 2 -4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 -3
## 11 5 3 1 4 1
## 12 6 3 1 3 -1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 -1
## 9 4 2 2 8 1
## 7 3 2 3 3 -2
## 10 4 2 3 5 2
I assumed you wanted the sign to capture the direction of the difference. So, for instance, if a participant has a score 4 points below the other participant in the same trial-pair, then I assumed you would want -4. If you want all-positive values, you can remove the multiplication by c(-1L,1L) and add a call to abs():
data$diff <- rep(abs(diff(data$score)[c(T,F)]),each=2L);
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 3
## 11 5 3 1 4 1
## 12 6 3 1 3 1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 1
## 9 4 2 2 8 1
## 7 3 2 3 3 2
## 10 4 2 3 5 2
Here's a solution built around ave() that doesn't require reordering the whole data.frame first:
data$diff <- ave(data$score,data$trial,data$pair,FUN=function(x) abs(diff(x)));
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 2 1 1 2 3 0
## 3 2 1 1 6 4
## 4 2 1 2 3 0
## 5 3 2 1 4 3
## 6 3 2 2 7 1
## 7 3 2 3 3 2
## 8 4 2 1 1 3
## 9 4 2 2 8 1
## 10 4 2 3 5 2
## 11 5 3 1 4 1
## 12 6 3 1 3 1
Here's how you can get the score of the other participant in the same trial-pair:
data$other <- ave(data$score,data$trial,data$pair,FUN=rev);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 2 1 1 2 3 3
## 3 2 1 1 6 2
## 4 2 1 2 3 3
## 5 3 2 1 4 1
## 6 3 2 2 7 8
## 7 3 2 3 3 5
## 8 4 2 1 1 4
## 9 4 2 2 8 7
## 10 4 2 3 5 3
## 11 5 3 1 4 3
## 12 6 3 1 3 4
Or, assuming the data.frame has been reordered as per the initial solution:
data$other <- c(rbind(data$score[c(F,T)],data$score[c(T,F)]));
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Alternative, using matrix() instead of rbind():
data$other <- c(matrix(data$score,2L)[2:1,]);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Here is an option using data.table:
library(data.table)
setDT(data)[,difference := abs(diff(score)), by = .(pair, trial)]
data
# participant pair trial score difference
# 1: 1 1 1 2 4
# 2: 1 1 2 3 0
# 3: 2 1 1 6 4
# 4: 2 1 2 3 0
# 5: 3 2 1 4 3
# 6: 3 2 2 7 1
# 7: 3 2 3 3 2
# 8: 4 2 1 1 3
# 9: 4 2 2 8 1
#10: 4 2 3 5 2
#11: 5 3 1 4 1
#12: 6 3 1 3 1
A slightly faster option would be:
setDT(data)[, difference := abs((score - shift(score))[2]) , by = .(pair, trial)]
If we need the value of the other pair:
data[, other:= rev(score) , by = .(pair, trial)]
data
# participant pair trial score difference other
# 1: 1 1 1 2 4 6
# 2: 1 1 2 3 0 3
# 3: 2 1 1 6 4 2
# 4: 2 1 2 3 0 3
# 5: 3 2 1 4 3 1
# 6: 3 2 2 7 1 8
# 7: 3 2 3 3 2 5
# 8: 4 2 1 1 3 4
# 9: 4 2 2 8 1 7
#10: 4 2 3 5 2 3
#11: 5 3 1 4 1 3
#12: 6 3 1 3 1 4
Or using dplyr:
library(dplyr)
data %>%
group_by(pair, trial) %>%
mutate(difference = abs(diff(score)))
# participant pair trial score difference
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 2 4
#2 1 1 2 3 0
#3 2 1 1 6 4
#4 2 1 2 3 0
#5 3 2 1 4 3
#6 3 2 2 7 1
#7 3 2 3 3 2
#8 4 2 1 1 3
#9 4 2 2 8 1
#10 4 2 3 5 2
#11 5 3 1 4 1
#12 6 3 1 3 1
I'm sure this has been asked before but for the life of me I can't figure out what to search for!
I have the following data:
x y
1 3
1 3
1 3
1 2
1 2
2 2
2 4
3 4
3 4
And I would like to output a running count that resets everytime either x or y changes value.
x y o
1 3 1
1 3 2
1 3 3
1 2 1
1 2 2
2 2 1
2 4 1
3 4 1
3 4 2
Try something like
df<-read.table(header=T,text="x y
1 3
1 3
1 3
1 2
1 2
2 2
2 4
3 4
3 4")
cbind(df,o=sequence(rle(paste(df$x,df$y))$lengths))
> cbind(df,o=sequence(rle(paste(df$x,df$y))$lengths))
x y o
1 1 3 1
2 1 3 2
3 1 3 3
4 1 2 1
5 1 2 2
6 2 2 1
7 2 4 1
8 3 4 1
9 3 4 2
After seeing #ttmaccer's I see my first attempt with ave was wrong and this is perhaps what is needed:
> dat$o <- ave(dat$y, list(dat$y, dat$x), FUN=seq )
# there was a warning but the answer is corect.
> dat
x y o
1 1 3 1
2 1 3 2
3 1 3 3
4 1 2 1
5 1 2 2
6 2 2 1
7 2 4 1
8 3 4 1
9 3 4 2