rewriting variable using for loop R - r

I've got a column in my dataset that contains a collection of 0,1 and 2. The 2's are a weird leftover from some previous transformation, and I need to convert them to 1. I've written a simple loop to do this
for (i in my.cl.accept$enroll){
if (i==2){
i=1
}
}
however, this doesn't change the actual contents of the dataframe. ifelse() doesn't work, because I don't need to change the other digits at all; just the number 2.
I've been using R a little more after coming from python, what simple thing am I misunderstanding here?

Lets generate a sample set:
set.seed(10)
DF <- data.frame(
a=1:10,
b=sample(0:2,10,rep=T))
DF
Now, replace every entry corresponding to 2 with 1:
DF$b[DF$b==2] <- 1
DF
Note: This is a vectorized method, and will always work faster than loop iterations.

Dunno whether this is what you want?
> A<- 1:10
> B<- c(rep(0,5), rep(1,3), rep(2,2))
> data <- data.frame(A,B)
> data
A B
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 1
8 8 1
9 9 2
10 10 2
> data[data$B==2,]$B <- 1
> data
A B
1 1 0
2 2 0
3 3 0
4 4 0
5 5 0
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1

Are you sure you're using ifelse correctly? It actually does allow you to only change one value to another. Here's an example:
> x <- sample(c(0, 1, 2), 10, TRUE)
> x
## [1] 2 1 1 0 2 2 0 0 2 1
> ifelse(x == 2, 1, x)
## [1] 1 1 1 0 1 1 0 0 1 1
For future reference, your good old-fashioned for loop should go something like this...
for (i in 1:length(my.cl.accept$enroll)){
if (my.cl.accept$enroll[i] == 2){
my.cl.accept$enroll[i] <- 1
} else {
my.cl.accept$enroll[i]
}
}

Related

How to iterate over a column and assign value based on another column without a for loop?

I am trying to efficiently assign a value to a column, based on another column, but without a for loop as this takes too long.
I'm doing something like this: If the reference column value is greater than a certain random number, I assign 1 to the new column. Otherwise, assign 0. Can't figure out the best way to do this without a loop. I tried dplyr and case_when, but that wasn't iterating over each row.
Thanks!
for (i in 1:nrow(data)) {
if (data$value[i] > runif(1, 0, 1.7)) {
temp$newValue[i] <- 1
} else{
temp$newValue[i] <- 0
}
}
c0=data.frame(c(1,4,6,3,7,3),c(2,8,2,4,9,4))
names(c0)=c("A","B")
c0$C=ifelse(c0[,"A"]>runif(1,0,1.7),1,0)
c0
I'm not so sure if I understand you well. Please comment if I have any misunderstanding.
A
<dbl>
B
<dbl>
C
<dbl>
1 2 0
4 8 1
6 2 1
3 4 1
7 9 1
3 4 1
6 rows
Here is how I use A to generate C
Does this solve your problem?
DATA:
set.seed(1)
df <- data.frame(
refcol = rnorm(10)
)
randvalue <- 0
SOLUTION:
df$newcol <- ifelse(df$refcol > randvalue, 1, 0)
RESULT:
df
refcol newcol
1 0.2352207 1
2 -0.3307359 0
3 -0.3116238 0
4 -2.3023457 0
5 -0.1708760 0
6 0.1402782 1
7 -1.4974267 0
8 -1.0101884 0
9 -0.9484756 0
10 -0.4939622 0

R diagonal matrix error

I have the following type of dataframe
A B C D
1 0 1 10
0 2 1 15
1 1 0 11
I would like the following output
A B C D
1 0 1 10
1 1 0 11
0 2 1 15
I have tried this code
require(permute)
z <- apply(permute::allPerms(1:nrow(DF)), 1, function(x){
mat <- as.matrix(DF,2:ncol(DF)])
if(all(diag(mat[x,]) == rep(1,nrow(DF)))){
return(df[x,])} })
I am unable to get the desired output.
(Link for the above code- Arrange data frame in a specific way)
I request someone to guide me. The dataframe is a small sample but I have a huge one with a similar structure.
The following will work so long as there is at least one 1 in every suitable column. It's deterministic so will always just find the first 1 and swap that with the number in the diagonal position. But no combinatorial explosion. Perhaps someone can find a more elegant (or vectorised) solution???
fn<- function(colm){
i1<-match(1, colm)
colm[i1]<- colm[i]
colm[i]<-1
return(colm)
}
for(i in 1:nrow(DF))
{
DF[,i]=fn(DF[,i])
}
EDIT
Although this answer was accepted (so I cannot delete) when rereading it I don't think it does quite what you asked...
The folowing code should fix this answer..
DF<-read.table(text="A B C D
13 0 0 1
1 0 1 10
0 2 1 15
1 1 0 11", header=T)
rem<-1:nrow(DF)
for(i in 1:nrow(DF))
{
temp<-DF[i,]
any1<-intersect(rem, which(DF[,i]==1))
best1<-which.min(rowSums(DF[any1,]==1))
firsti<-any1[best1]
DF[i,]<-DF[firsti,]
DF[firsti,]<-temp
rem<-setdiff(rem, i)
}
DF
A B C D
1 1 0 1 10
2 1 1 0 11
3 0 2 1 15
4 13 0 0 1
My apologies for confusion.

R: cumulative sum with conditions [duplicate]

I have a vector of numbers in a data.frame such as below.
df <- data.frame(a = c(1,2,3,4,2,3,4,5,8,9,10,1,2,1))
I need to create a new column which gives a running count of entries that are greater than their predecessor. The resulting column vector should be this:
0,1,2,3,0,1,2,3,4,5,6,0,1,0
My attempt is to create a "flag" column of diffs to mark when the values are greater.
df$flag <- c(0,diff(df$a)>0)
> df$flag
0 1 1 1 0 1 1 1 1 1 1 0 1 0
Then I can apply some dplyr group/sum magic to almost get the right answer, except that the sum doesn't reset when flag == 0:
df %>% group_by(flag) %>% mutate(run=cumsum(flag))
a flag run
1 1 0 0
2 2 1 1
3 3 1 2
4 4 1 3
5 2 0 0
6 3 1 4
7 4 1 5
8 5 1 6
9 8 1 7
10 9 1 8
11 10 1 9
12 1 0 0
13 2 1 10
14 1 0 0
I don't want to have to resort to a for() loop because I have several of these running sums to compute with several hundred thousand rows in a data.frame.
Here's one way with ave:
ave(df$a, cumsum(c(F, diff(df$a) < 0)), FUN=seq_along) - 1
[1] 0 1 2 3 0 1 2 3 4 5 6 0 1 0
We can get a running count grouped by diff(df$a) < 0. Which are the positions in the vector that are less than their predecessors. We add c(F, ..) to account for the first position. The cumulative sum of that vector creates an index for grouping. The function ave can carry out a function on that index, we use seq_along for a running count. But since it starts at 1, we subtract by one ave(...) - 1 to start from zero.
A similar approach using dplyr:
library(dplyr)
df %>%
group_by(cumsum(c(FALSE, diff(a) < 0))) %>%
mutate(row_number() - 1)
You don't need dplyr:
fun <- function(x) {
test <- diff(x) > 0
y <- cumsum(test)
c(0, y - cummax(y * !test))
}
fun(df$a)
[1] 0 1 2 3 0 1 2 3 4 5 6 0 1 0
a <- c(1,2,3,4,2,3,4,5,8,9,10,1,2,1)
f <- c(0, diff(a)>0)
ifelse(f, cumsum(f), f)
that it is without reset.
with reset:
unlist(tapply(f, cumsum(c(0, diff(a) < 0)), cumsum))

R cumulative sum by condition with reset

I have a vector of numbers in a data.frame such as below.
df <- data.frame(a = c(1,2,3,4,2,3,4,5,8,9,10,1,2,1))
I need to create a new column which gives a running count of entries that are greater than their predecessor. The resulting column vector should be this:
0,1,2,3,0,1,2,3,4,5,6,0,1,0
My attempt is to create a "flag" column of diffs to mark when the values are greater.
df$flag <- c(0,diff(df$a)>0)
> df$flag
0 1 1 1 0 1 1 1 1 1 1 0 1 0
Then I can apply some dplyr group/sum magic to almost get the right answer, except that the sum doesn't reset when flag == 0:
df %>% group_by(flag) %>% mutate(run=cumsum(flag))
a flag run
1 1 0 0
2 2 1 1
3 3 1 2
4 4 1 3
5 2 0 0
6 3 1 4
7 4 1 5
8 5 1 6
9 8 1 7
10 9 1 8
11 10 1 9
12 1 0 0
13 2 1 10
14 1 0 0
I don't want to have to resort to a for() loop because I have several of these running sums to compute with several hundred thousand rows in a data.frame.
Here's one way with ave:
ave(df$a, cumsum(c(F, diff(df$a) < 0)), FUN=seq_along) - 1
[1] 0 1 2 3 0 1 2 3 4 5 6 0 1 0
We can get a running count grouped by diff(df$a) < 0. Which are the positions in the vector that are less than their predecessors. We add c(F, ..) to account for the first position. The cumulative sum of that vector creates an index for grouping. The function ave can carry out a function on that index, we use seq_along for a running count. But since it starts at 1, we subtract by one ave(...) - 1 to start from zero.
A similar approach using dplyr:
library(dplyr)
df %>%
group_by(cumsum(c(FALSE, diff(a) < 0))) %>%
mutate(row_number() - 1)
You don't need dplyr:
fun <- function(x) {
test <- diff(x) > 0
y <- cumsum(test)
c(0, y - cummax(y * !test))
}
fun(df$a)
[1] 0 1 2 3 0 1 2 3 4 5 6 0 1 0
a <- c(1,2,3,4,2,3,4,5,8,9,10,1,2,1)
f <- c(0, diff(a)>0)
ifelse(f, cumsum(f), f)
that it is without reset.
with reset:
unlist(tapply(f, cumsum(c(0, diff(a) < 0)), cumsum))

Cumulative sum for positive numbers only [duplicate]

This question already has answers here:
Create counter within consecutive runs of certain values
(6 answers)
Closed 1 year ago.
I have this vector :
x = c(1,1,1,1,1,0,1,0,0,0,1,1)
And I want to do a cumulative sum for the positive numbers only. I should have the following vector in return:
xc = (1,2,3,4,5,0,1,0,0,0,1,2)
How could I do it?
I've tried : cumsum(x) but that do the cumulative sum for all values and gives :
cumsum(x)
[1] 1 2 3 4 5 5 6 6 6 6 7 8
One option is
x1 <- inverse.rle(within.list(rle(x), values[!!values] <-
(cumsum(values))[!!values]))
x[x1!=0] <- ave(x[x1!=0], x1[x1!=0], FUN=seq_along)
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2
Or a one-line code would be
x[x>0] <- with(rle(x), sequence(lengths[!!values]))
x
#[1] 1 2 3 4 5 0 1 0 0 0 1 2
Here's a possible solution using data.table v >= 1.9.5 and its new rleid funciton
library(data.table)
as.data.table(x)[, cumsum(x), rleid(x)]$V1
## [1] 1 2 3 4 5 0 1 0 0 0 1 2
Base R, one line solution with Map Reduce :
> Reduce('c', Map(function(u,v) if(v==0) rep(0,u) else 1:u, rle(x)$lengths, rle(x)$values))
[1] 1 2 3 4 5 0 1 0 0 0 1 2
Or:
unlist(Map(function(u,v) if(v==0) rep(0,u) else 1:u, rle(x)$lengths, rle(x)$values))
x=c(1,1,1,1,1,0,1,0,0,0,1,1)
cumsum_ <- function(x) {
r <- rle(x)
s <- split(x, rep(seq_along(r$values), rle(x)$lengths))
return(unlist(sapply(s, cumsum), use.names = F))
}
(xc <- cumsum_(x))
# [1] 1 2 3 4 5 0 1 0 0 0 1 2
I dont know much of R but i have written a small code in Python. Logic remains the same in all language. Hope this will help you
x=[1,1,1,1,1,0,1,0,0,0,1,1]
tot=0
for i in range(0,len(x)):
if x[i]!=0:
tot=tot+x[i]
x[i]=tot
else:
tot=0
print x
x<-c(1,1,1,1,1,0,1,0,0,0,1,1)
skumulowana<-function(x) {
dl<-length(x)
xx<-numeric(dl+1)
for (i in 1:dl){
ifelse (x[i]==0,xx[i+1]<-0,xx[i+1]<-xx[i]+x[i])
}
wynik<<-xx[1:dl+1]
return (wynik)
}
skumulowana(x)
## [1] 1 2 3 4 5 0 1 0 0 0 1 2
Try this one-liner...
Reduce(function(x,y) (x+y)*(y!=0), x, accumulate=T)
split and lapply version:
x <- c(1,1,1,1,1,0,1,0,0,0,1,1)
unlist(lapply(split(x, cumsum(x==0)), cumsum))
step by step:
a <- split(x, cumsum(x==0)) # divides x into pieces where each 0 starts a new piece
b <- lapply(a, cumsum) # calculates cumsum in each piece
unlist(b) # rejoins the pieces
Result has useless names but is otherwise what you wanted:
# 01 02 03 04 05 11 12 2 3 41 42 43
# 1 2 3 4 5 0 1 0 0 0 1 2
Here is another base R solution using aggregate. The idea is to make a data frame with x and a new column named x.1 by which we can apply aggregate functions (cumsum in this case):
x <- c(1,1,1,1,1,0,1,0,0,0,1,1)
r <- rle(x)
df <- data.frame(x,
x.1=unlist(sapply(1:length(r$lengths), function(i) rep(i, r$lengths[i]))))
# df
# x x.1
# 1 1 1
# 2 1 1
# 3 1 1
# 4 1 1
# 5 1 1
# 6 0 2
# 7 1 3
# 8 0 4
# 9 0 4
# 10 0 4
# 11 1 5
# 12 1 5
agg <- aggregate(df$x~df$x.1, df, cumsum)
as.vector(unlist(agg$`df$x`))
# [1] 1 2 3 4 5 0 1 0 0 0 1 2

Resources