Creating a new column using for/nested loop in r - r

Just getting started using R and I need some help in understanding the application of for/nested loop.
StudyID<-c(1:5)
SubjectID<-c(1:5)
df<-data.frame(StudyID=rep(StudyID, each=5), SubjectID=rep(SubjectID, each=1))
How can I create a new column called as ID, which would use the combination of studyID and subjectID to create a unique ID ?
So for this data, unique ID should be from 1:25.
So the final data looks like this:
UniqueID<- c(1:25)
df<-cbind(df,UniqueID)
View(df)
Is there any other way which is faster and more efficient that looping ?

Using the dplyr package, you could do:
library(dplyr)
df$Id = group_indices(df,StudyID,SubjectID)
This returns:
#StudyID SubjectID Id
# 1 1 1
# 1 2 2
# 1 3 3
# 1 4 4
# 1 5 5
# 2 1 6
# 2 2 7
# 2 3 8
# 2 4 9
# 2 5 10
# 3 1 11
# 3 3 13
# 3 4 14
# 3 5 15
# 4 1 16
# 4 2 17
# 4 3 18
# 4 4 19
# 4 5 20
# 5 1 21
# 5 2 22
# 5 3 23
# 5 4 24
# 5 5 25

Another method to achieve that without loading any library (base R) would be this (assuming data frame is sorted based on the two columns):
StudyID<-c(1:5)
SubjectID<-c(1:5)
df<-data.frame(StudyID=rep(StudyID, each=5), SubjectID=rep(SubjectID, each=1))
df$uniqueID <- cumsum(!duplicated(df[1:2]))
or you can use this solution, mentioned in the comments (I prefer this over the first solution):
df$uniqueID <- as.numeric(factor(do.call(paste, df)))
The output would be:
> print(df, row.names = FALSE)
#StudyID SubjectID uniqueID
# 1 1 1
# 1 2 2
# 1 3 3
# 1 4 4
# 1 5 5
# 2 1 6
# 2 2 7
# 2 3 8
# 2 4 9
# 2 5 10
# 3 1 11
# 3 2 12
# 3 3 13
# 3 4 14
# 3 5 15
# 4 1 16
# 4 2 17
# 4 3 18
# 4 4 19
# 4 5 20
# 5 1 21
# 5 2 22
# 5 3 23
# 5 4 24
# 5 5 25

You could go for interaction in base R:
df$uniqueID <- with(df, as.integer(interaction(StudyID,SubjectID)))
For example (this example expresses better what you are after):
set.seed(10)
df <- data.frame(StudyID=sample(5,10,replace = T), SubjectID=rep(1:5,times=2))
df$uniqueID <- with(df, as.integer(interaction(StudyID,SubjectID)))
# StudyID SubjectID uniqueID
# 1 3 1 3
# 2 2 2 6
# 3 3 3 11
# 4 4 4 16
# 5 1 5 17
# 6 2 1 2
# 7 2 2 6
# 8 2 3 10
# 9 4 4 16
# 10 3 5 19

Related

join columns recursively in R

Hello I have a data frame of 245 columns but to add some sets and generate new columns try to do it recursively as follows
cl1<-sample(1:4,10,replace=TRUE)
cl2<-sample(1:4,10,replace=TRUE)
cl3<-sample(1:4,10,replace=TRUE)
cl4<-sample(1:4,10,replace=TRUE)
cl5<-sample(1:4,10,replace=TRUE)
cl6<-sample(1:4,10,replace=TRUE)
dat<-data.frame(cl1,cl2,cl3,cl4,cl5,cl6)
my intention is to add column 1 with column 3 and 5, likewise column 2 with 4 and 6 and in the end obtain a dataframe with two columns
and you should pay me something like that
I have programmed the following code
revisar<- function(a){
todos = list()
i=1
j=3
l=5
k=1
while(i<=2 ){
cl<-a[,i]
cl2<-a[,j]
cl3<-a[,l]
cl[is.na(cl)] <- 0
cl2[is.na(cl2)] <- 0
cl3[is.na(cl3)] <- 0
colu<-cl+cl2+cl3
col<-cbind(colu,colu)
i<-i+1
j<-j+1
l<-l+1
k<-k+1
}
return(col)
}
it turns out that it only returns column 2 repeated twice and I must replicate the same thing to join those 245 columns.7
I would like to know what is failing the example
base R
Literal programming:
with(dat, data.frame(s1 = cl1+cl3+cl5, s2 = cl2+cl4+cl6))
# s1 s2
# 1 7 11
# 2 7 7
# 3 4 11
# 4 4 10
# 5 9 8
# 6 12 5
# 7 7 6
# 8 7 10
# 9 4 9
# 10 6 5
Programmatically,
L <- list(s1 = c(1,3,5), s2 = c(2,4,6))
out <- data.frame(lapply(L, function(z) do.call(rowSums, list(as.matrix(dat[,z])))))
out
# s1 s2
# 1 7 11
# 2 7 7
# 3 4 11
# 4 4 10
# 5 9 8
# 6 12 5
# 7 7 6
# 8 7 10
# 9 4 9
# 10 6 5
dplyr
library(dplyr)
dat %>%
transmute(
s1 = rowSums(cbind(cl1, cl3, cl5)),
s2 = rowSums(cbind(cl2, cl4, cl6))
)
or programmatically using purrr:
purrr::map_dfc(L, ~ rowSums(dat[, .]))
Data
set.seed(42)
# your `dat` above
Here is an alternative general approach:
Here we sum all uneven columns -> s1 and
all even columns -> s2:
library(dplyr)
dat %>%
rowwise() %>%
mutate(s1 = sum(c_across(seq(1,ncol(dat),2)), na.rm = TRUE),
s2 = sum(c_across(seq(2,ncol(dat),2)), na.rm = TRUE))
cl1 cl2 cl3 cl4 cl5 cl6 s1 s2
<int> <int> <int> <int> <int> <int> <int> <int>
1 1 1 3 2 3 2 7 5
2 2 4 1 4 2 3 5 11
3 2 2 2 2 1 3 5 7
4 2 4 4 3 1 4 7 11
5 2 4 4 3 2 2 8 9
6 3 3 3 2 2 2 8 7
7 2 1 1 2 1 4 4 7
8 2 4 1 3 2 3 5 10
9 3 1 1 2 3 4 7 7
10 2 4 1 3 4 4 7 11

identify whenever values repeat in r

I have a dataframe like this.
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3))
I want to populate a new variable Sequence which identifies whenever Condition starts again from 1.
So the new dataframe would look like this.
Thanks in advance for the help!
data <- data.frame(Condition = c(1,1,2,3,1,1,2,2,2,3,1,1,2,3,3),
Sequence = c(1,1,1,1,2,2,2,2,2,2,3,3,3,3,3))
base R
data$Sequence2 <- cumsum(c(TRUE, data$Condition[-1] == 1 & data$Condition[-nrow(data)] != 1))
data
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
dplyr
library(dplyr)
data %>%
mutate(
Sequence2 = cumsum(Condition == 1 & lag(Condition != 1, default = TRUE))
)
# Condition Sequence Sequence2
# 1 1 1 1
# 2 1 1 1
# 3 2 1 1
# 4 3 1 1
# 5 1 2 2
# 6 1 2 2
# 7 2 2 2
# 8 2 2 2
# 9 2 2 2
# 10 3 2 2
# 11 1 3 3
# 12 1 3 3
# 13 2 3 3
# 14 3 3 3
# 15 3 3 3
This took a while. Finally I find this solution:
library(dplyr)
data %>%
group_by(Sequnce = cumsum(
ifelse(Condition==1, lead(Condition)+1, Condition)
- Condition==1)
)
Condition Sequnce
<dbl> <int>
1 1 1
2 1 1
3 2 1
4 3 1
5 1 2
6 1 2
7 2 2
8 2 2
9 2 2
10 3 2
11 1 3
12 1 3
13 2 3
14 3 3
15 3 3

Looping over all possible column combinations with some contstraints

xx is a sample data. it contains variables dep1,dep2,dep3,bet1,bet2,bet3. I want to select all possible 2 columns combinations but not the ones with the same name (except,number) . In this examples there are
9 such combos {dep1:bet1,dep1:bet2,dep1:bet3,dep2:bet1...........}
Below is code which I want to run for all combinations ( I did it just for one) also in last line
I added a code to keep track which variables were included in calculations. I believe the regex will help
to understand. help appreciated !
xx<-data.frame(id=1:10,
category=c(rep("A",5),rep("B",5)),
dep1=sample(1:5,10,replace = T),
dep2=sample(1:5,10,replace = T),
dep3=sample(1:5,10,replace = T),
bet1=sample(1:5,10,replace = T),
bet2=sample(1:5,10,replace = T),
bet3=sample(1:5,10,replace = T))
xx%>%select(2,dep1,bet1)%>%
mutate(vdep=if_else(dep1>3,1,0),
vbet=if_else(bet1>3,1,0))%>%
group_by(category)%>%
summarise(vdep=mean(vdep),
vbet=mean(vbet))%>%ungroup()%>%
gather(variable,value,-category)%>%
mutate(variable=as.factor(variable))%>%
unite(variable,category,col = "new")%>%
spread(new,value)%>%
mutate(first="dep1",second="bet1")
If I understand you correctly, something like the following should do it:
# the data
xx<-data.frame(id=1:10,
category=c(rep("A",5),rep("B",5)),
dep1=sample(1:5,10,replace = T),
dep2=sample(1:5,10,replace = T),
dep3=sample(1:5,10,replace = T),
bet1=sample(1:5,10,replace = T),
bet2=sample(1:5,10,replace = T),
bet3=sample(1:5,10,replace = T))
# Getting the column names with "dep" or "bet"
cols = names(xx)[grepl("dep|bet", names(xx))]
deps = cols[grepl("dep", cols)]
bets = cols[grepl("bet", cols)]
# Getting all possible combinations of these columns
comb = expand.grid(deps, bets)
comb
# Var1 Var2
# 1 dep1 bet1
# 2 dep2 bet1
# 3 dep3 bet1
# 4 dep1 bet2
# 5 dep2 bet2
# 6 dep3 bet2
# 7 dep1 bet3
# 8 dep2 bet3
# 9 dep3 bet3
# Transposing the dataframe containing these combinations, so that
# we can directly use sapply / lapply on the columns latter
comb = data.frame(t(comb), stringsAsFactors = FALSE)
# For each combination, subset the dataframe xx
result = sapply(comb, function(x){
xx[, x]
}, simplify = FALSE)
result
# $X1
# dep1 bet1
# 1 1 5
# 2 1 5
# 3 2 2
# 4 2 2
# 5 1 5
# 6 3 3
# 7 1 1
# 8 2 2
# 9 3 2
# 10 1 5
#
# $X2
# dep2 bet1
# 1 1 5
# 2 2 5
# 3 4 2
# 4 5 2
# 5 1 5
# 6 5 3
# 7 2 1
# 8 1 2
# 9 4 2
# 10 4 5
#
# $X3
# dep3 bet1
# 1 3 5
# 2 2 5
# 3 4 2
# 4 3 2
# 5 3 5
# 6 2 3
# 7 1 1
# 8 4 2
# 9 5 2
# 10 5 5
#
# $X4
# dep1 bet2
# 1 1 5
# 2 1 1
# 3 2 1
# 4 2 2
# 5 1 2
# 6 3 2
# 7 1 3
# 8 2 3
# 9 3 5
# 10 1 1
#
# $X5
# dep2 bet2
# 1 1 5
# 2 2 1
# 3 4 1
# 4 5 2
# 5 1 2
# 6 5 2
# 7 2 3
# 8 1 3
# 9 4 5
# 10 4 1
#
# $X6
# dep3 bet2
# 1 3 5
# 2 2 1
# 3 4 1
# 4 3 2
# 5 3 2
# 6 2 2
# 7 1 3
# 8 4 3
# 9 5 5
# 10 5 1
#
# $X7
# dep1 bet3
# 1 1 3
# 2 1 2
# 3 2 5
# 4 2 1
# 5 1 3
# 6 3 2
# 7 1 4
# 8 2 1
# 9 3 1
# 10 1 3
#
# $X8
# dep2 bet3
# 1 1 3
# 2 2 2
# 3 4 5
# 4 5 1
# 5 1 3
# 6 5 2
# 7 2 4
# 8 1 1
# 9 4 1
# 10 4 3
#
# $X9
# dep3 bet3
# 1 3 3
# 2 2 2
# 3 4 5
# 4 3 1
# 5 3 3
# 6 2 2
# 7 1 4
# 8 4 1
# 9 5 1
# 10 5 3

How do I create a tibble that has a column of tibbles or data.frames?

# A tibble: 12 x 3
x y z
<dbl> <int> <int>
1 1 1 2
2 3 2 3
3 2 3 4
4 3 4 5
5 2 5 6
6 4 6 7
7 5 7 8
8 2 8 9
9 1 9 10
10 1 10 11
11 3 11 12
12 4 12 13
The above is named df. It is one of ten tibbles that I want to store inside of:
t <- tibble(tbl=vector("list", 10))
If I do this:
t$tbl[1] <- df
or
t[1, 1] <- df
I get this warning message:
Warning message:
In t$tbl[1] <- df :
number of items to replace is not a multiple of replacement length
and the output of
t$tbl[1]
is
[[1]]
[1] 1 3 2 3 2 4 5 2 1 1 3 4

remove i+1th term if reoccuring

Say we have the following data
A <- c(1,2,2,2,3,4,8,6,6,1,2,3,4)
B <- c(1,2,3,4,5,1,2,3,4,5,1,2,3)
data <- data.frame(A,B)
How would one write a function so that for A, if we have the same value in the i+1th position, then the reoccuring row is removed.
Therefore the output should like like
data.frame(c(1,2,3,4,8,6,1,2,3,4), c(1,2,5,1,2,3,5,1,2,3))
My best guess would be using a for statement, however I have no experience in these
You can try
data[c(TRUE, data[-1,1]!= data[-nrow(data), 1]),]
Another option, dplyr-esque:
library(dplyr)
dat1 <- data.frame(A=c(1,2,2,2,3,4,8,6,6,1,2,3,4),
B=c(1,2,3,4,5,1,2,3,4,5,1,2,3))
dat1 %>% filter(A != lag(A, default=FALSE))
## A B
## 1 1 1
## 2 2 2
## 3 3 5
## 4 4 1
## 5 8 2
## 6 6 3
## 7 1 5
## 8 2 1
## 9 3 2
## 10 4 3
using diff, which calculates the pairwise differences with a lag of 1:
data[c( TRUE, diff(data[,1]) != 0), ]
output:
A B
1 1 1
2 2 2
5 3 5
6 4 1
7 8 2
8 6 3
10 1 5
11 2 1
12 3 2
13 4 3
Using rle
A <- c(1,2,2,2,3,4,8,6,6,1,2,3,4)
B <- c(1,2,3,4,5,1,2,3,4,5,1,2,3)
data <- data.frame(A,B)
X <- rle(data$A)
Y <- cumsum(c(1, X$lengths[-length(X$lengths)]))
View(data[Y, ])
row.names A B
1 1 1 1
2 2 2 2
3 5 3 5
4 6 4 1
5 7 8 2
6 8 6 3
7 10 1 5
8 11 2 1
9 12 3 2
10 13 4 3

Resources