I have a vector like this
v <- c(0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0)
I now want to generate a second vector that counts backwards until it hits a 1, then starts over.
The result here would be
r <- c(6,5,4,3,2,1,8,7,6,5,4,3,2,1,4,3,2,1,0)
the last zero should be kept
I tried something like this but cannot get it to work:
lv <- c(1, which(v == 1))
res <- c()
for(i in 1:(length(lv)-1)) {
res <- c(res, rev(lv[i]:lv[i+1]))
}
We can use ave creating groups with cumsum and count the sequence in reverse in each group. We then re assign 1 to their original position in new_seq.
new_seq <- ave(v, cumsum(v==1), FUN = function(x) rev(seq_along(x))) + 1
new_seq[v == 1] <- 1
new_seq
#[1] 6 5 4 3 2 1 8 7 6 5 4 3 2 1 4 3 2 1 2
Update
To keep everything after last 1 as it is we can do
#Make groups
indx <- cumsum(v==1)
#Create reverse sequential counting in each group
new_seq <- ave(v, indx, FUN = function(x) rev(seq_along(x))) + 1
#Keep everything after last 1 as it is
new_seq[which.max(indx) : length(v)] <- v[which.max(indx) : length(v)]
#Change 1's same as their original position
new_seq[v == 1] <- 1
new_seq
#[1] 6 5 4 3 2 1 8 7 6 5 4 3 2 1 4 3 2 1 0
Related
I have a dataframe that stores adjacency relations. I want to divide numbers into different groups according to this dataframe. The dataframe are as follows:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df
from to
1 1 1
2 1 3
3 2 2
4 2 3
5 2 4
6 3 1
7 3 2
8 3 3
9 4 2
10 4 4
11 4 5
12 5 4
13 5 5
In above dataframe, number 1 has links with number 1 and 3, number 2 has links with number 2, 3, 4, so number 1 can not be in same group with number 3 and number 2 can not be in same group with number 3 and number 4. In the end, groups can be c(1, 2, 5) and c(3, 4).
I wonder how to program it?
First replace the values of to with NA when from and to are equal.
df2 <- transform(df, to = replace(to, from == to, NA))
Then recursively bind each row of the data if from of the latter row has not appeared in to of the former rows.
Reduce(function(x, y) {
if(y$from %in% x$to) x else rbind(x, y)
}, split(df2, 1:nrow(df2)))
# from to
# 1 1 NA
# 2 1 3
# 3 2 NA
# 4 2 3
# 5 2 4
# 12 5 4
# 13 5 NA
Finally, you could extract unique elements for the both columns to get the two groups.
The overall pipeline should be
df |>
transform(to = replace(to, from == to, NA)) |>
(\(dat) split(dat, 1:nrow(dat)))() |>
Reduce(f = \(x, y) if(y$from %in% x$to) x else rbind(x, y))
The answer of Darren Tsai has solved this problem, but with some flaw.
Following is a very clumsy solution:
df = data.frame(from=c(1,1,2,2,2,3,3,3,4,4,4,5,5), to=c(1,3,2,3,4,1,2,3,2,4,5,4,5))
df.list = lapply(split(df,df$from), function(x){
x$to
})
group.idx = rep(1, length(unique(df$from)))
for (i in seq_along(df.list)) {
df.vec <- df.list[[i]]
curr.group = group.idx[i]
remain.vec = setdiff(df.vec, i)
for (j in remain.vec) {
if(group.idx[j] == curr.group){
group.idx[j] = curr.group + 1
}
}
}
group.idx
[1] 1 1 2 2 1
I want to find count of rows with respect to number of Zero's and NA's in the data frame , for example
number of rows having zeros in only 1 column etc..
code for the df is below and need to find for columns from M1 to M5
O/P needed for Zeros and NA , link provided below for desired O/P
https://imgur.com/y9qeyhV
id <- 1:9
M1 <- c(0,NA,1,0,0,NA,NA,1,7)
M2 <- c(NA,NA,0,NA,0,NA,NA,1,7)
M3 <- c(1,NA,0,0,0,1,NA,1,7)
M4 <- c(0,NA,0,3,0,NA,NA,1,7)
M5 <- c(5,0,0,NA,0,0,NA,0,NA)
data <- cbind(id,M1,M2,M3,M4,M5)
data <- as.data.frame(data)
Desired Output:
Try this
table(rowSums(is.na(data)))
# 0 1 2 3 4 5
# 3 2 1 1 1 1
table(factor(rowSums(data == 0, na.rm = T), levels = 0:5))
# 0 1 2 3 4 5
# 2 3 2 0 1 1
You can also pass the codes above to data.frame() or as.data.frame() to get an data.frame object like your expected output shows.
For NA:
data.frame(table(rowSums(is.na(data[startsWith(names(data),"M")]))))
Var1 Freq
1 0 3
2 1 2
3 2 1
4 3 1
5 4 1
6 5 1
For zeros
data.frame(table(factor(rowSums(0==data[startsWith(names(data),"M")],TRUE),0:5)))
Var1 Freq
1 0 2
2 1 3
3 2 2
4 3 0
5 4 1
6 5 1
apply(data, 1, function(x) length(x[is.na(x)]))
This will give you a vector. Each element corresponds to a row and its value is the number of NA elements in that row.
My solution is kind of complicated, but it gives the desired output using apply functions:
myFun <- function(data, count, fun) {
applyFun <- function(x) {
length(which(
apply(data, 1, function(y) length(which(fun(y))) == x)
))
}
sapply(count, applyFun)
}
myFun(data, 0:5, is.na)
myFun(data, 0:5, function(x) x == 0)
(You made a mistake in your example: two rows have no zeroes in any column: rows 7 and 9.)
Here is a for loop option to count NAs and Zeros in each row and then use dplyr::count to summarize the frequency of each value.
data$CountNA<-NA
for (i in 1:nrow(data)){
data[i,"CountNA"]<-length(which(is.na(data[i,1:(ncol(data)-1)])))}
count(data, CountNA)
data$CountZero<-NA
for (i in 1:nrow(data)){
data[i,"CountZero"]<-length(which((data[i,1:(ncol(data)-2)]==0)))}
count(data, CountZero)
I would like to find numbers greater than the previous number by 5 and remove them.
For example, starting with the list below:
list <- c(1,1,15,1,4,2,3,1,20,1,3,2)
Resulting in the list below:
list <- c(1,1,1,4,2,3,1,1,3,2)
This removed 15 and 20 from the original list.
We can use diff :
list[c(TRUE, diff(list) <= 5)]
#[1] 1 1 1 4 2 3 1 1 3 2
Other options could be :
list[c(TRUE, tail(list, -1) - head(list, -1) <= 5)]
list[list - dplyr::lag(list, default = list[1]) <= 5]
list[list - data.table::shift(list, fill = list[1]) <= 5]
Instead of removing if we want to replace them with mean of corresponding values, we can turn the values to NA and then use na.approx.
list[c(FALSE, diff(list) >= 5)] <- NA
zoo::na.approx(list)
#[1] 1 1 1 1 4 2 3 1 1 1 3 2
using base
x <- c(1,1,15,1,4,2,3,1,20,1,3,2)
x[which(diff(c(0,x)) < 5 | diff(c(0, x)) < 0)]
#[1] 1 1 1 4 2 3 1 1 3 2
I have a series of batch records that are labeled sequentially. Sometimes batches overlap.
x <- c("1","1","1/2","2","3","4","5/4","5")
> data.frame(x)
x
1 1
2 1
3 1/2
4 2
5 3
6 4
7 5/4
8 5
I want to find the set of batches that are not overlapping and label those periods. Batch "1/2" includes both "1" and "2" so it is not unique. When batch = "3" that is not contained in any previous batches, so it starts a new period. I'm having difficulty dealing with the combined batches, otherwise this would be straightforward. The result of this would be:
x period
1 1 1
2 1 1
3 1/2 1
4 2 1
5 3 2
6 4 3
7 5/4 3
8 5 3
My experience is in more functional programming paradigms, so I know the way I did this is very un-R. I'm looking for the way to do this in R that is clean and simple. Any help is appreciated.
Here's my un-R code that works, but is super clunky and not extensible.
x <- c("1","1","1/2","2","3","4","5/4","5")
p <- 1 #period number
temp <- NULL #temp variable for storing cases of x (batches)
temp[1] <- x[1]
period <- NULL
rl <- 0 #length to repeat period
for (i in 1:length(x)){
#check for "/", split and add to temp
if (grepl("/", x[i])){
z <- strsplit(x[i], "/") #split character
z <- unlist(z) #convert to vector
temp <- c(temp, z, x[i]) #add to temp vector for comparison
}
#check if x in temp
if(x[i] %in% temp){
temp <- append(temp, x[i]) #add to search vector
rl <- rl + 1 #increase length
} else {
period <- append(period, rep(p, rl)) #add to period vector
p <- p + 1 #increase period count
temp <- NULL #reset
rl <- 1 #reset
}
}
#add last batch
rl <- length(x) - length(period)
period <- append(period, rep(p,rl))
df <- data.frame(x,period)
> df
x period
1 1 1
2 1 1
3 1/2 1
4 2 1
5 3 2
6 4 3
7 5/4 3
8 5 3
R has functional paradigm influences, so you can solve this with Map and Reduce. Note that this solution follows your approach in unioning seen values. A simpler approach is possible if you assume batch numbers are consecutive, as they are in your example.
x <- c("1","1","1/2","2","3","4","5/4","5")
s<-strsplit(x,"/")
r<-Reduce(union,s,init=list(),acc=TRUE)
p<-cumsum(Map(function(x,y) length(intersect(x,y))==0,s,r[-length(r)]))
data.frame(x,period=p)
x period
1 1 1
2 1 1
3 1/2 1
4 2 1
5 3 2
6 4 3
7 5/4 3
8 5 3
What this does is first calculate a cumulative union of seen values. Then, it maps across this to determine the places where none of the current values have been seen before. (Alternatively, this second step could be included within the reduce, but this would be wordier without support for destructuring.) The cumulative sum provides the "period" numbers based on the number of times the intersections have come up empty.
If you do make the assumption that the batch numbers are consecutive then you can do the following instead
x <- c("1","1","1/2","2","3","4","5/4","5")
s<-strsplit(x,"/")
n<-mapply(function(x) range(as.numeric(x)),s)
p<-cumsum(c(1,n[1,-1]>n[2,-ncol(n)]))
data.frame(x,period=p)
For the same result (not repeated here).
A little bit shorter:
x <- c("1","1","1/2","2","3","4","5/4","5")
x<-data.frame(x=x, period=-1, stringsAsFactors = F)
period=0
prevBatch=-1
for (i in 1:nrow(x))
{
spl=unlist(strsplit(x$x[i], "/"))
currentBatch=min(spl)
if (currentBatch<prevBatch) { stop("Error in sequence") }
if (currentBatch>prevBatch)
period=period+1;
x$period[i]=period;
prevBatch=max(spl)
}
x
Here's a twist on the original that uses tidyr to split the data into two columns so it's easier to use:
# sample data
x <- c("1","1","1/2","2","3","4","5/4","5")
df <- data.frame(x)
library(tidyr)
# separate x into two columns, with second NA if only one number
df <- separate(df, x, c('x1', 'x2'), sep = '/', remove = FALSE, convert = TRUE)
Now df looks like:
> df
x x1 x2
1 1 1 NA
2 1 1 NA
3 1/2 1 2
4 2 2 NA
5 3 3 NA
6 4 4 NA
7 5/4 5 4
8 5 5 NA
Now the loop can be a lot simpler:
period <- 1
for(i in 1:nrow(df)){
period <- c(period,
# test if either x1 or x2 of row i are in any x1 or x2 above it
ifelse(any(df[i, 2:3] %in% unlist(df[1:(i-1),2:3])),
period[i], # if so, repeat the terminal value
period[i] + 1)) # else append the terminal value + 1
}
# rebuild df with x and period, which loses its extra initializing value here
df <- data.frame(x = df$x, period = period[2:length(period)])
The resulting df:
> df
x period
1 1 1
2 1 1
3 1/2 1
4 2 1
5 3 2
6 4 3
7 5/4 3
8 5 3
I have the following code that I execute using a for loop. Is there a way to accomplish the same without a for loop?
first_list <- c(1,2,3, rep(1,5), rep(2,5), rep(3,5), rep(4,5))
print(first_list)
[1] 1 2 3 1 1 1 1 1 2 2 2 2 2
[1] 3 3 3 3 3 4 4 4 4 4
breaks <- c(rep(1,3), rep(5,4))
values <- vector()
i <- 1
prev <- 1
for (n in breaks){
values[i] <- sum(first_list[prev:sum(breaks[1:i])])
i <- i + 1
prev <- prev + n
}
print(values)
[1] 1 2 3 5 10 15 20
The purpose of the loop is to take the first three elements of a list, then add to that list the sum of the next four sets of 5.
You can use tapply for grouped operation
tapply(first_list, rep(1:length(breaks), breaks), sum)
or, preferably, using data.table
library(data.table)
data.table(first_list, id=rep(1:length(breaks), breaks))[, sum(first_list), id]$V1
If you have to perform it on your data as in your original post
setDT(mydata)
mydata[, id:=rep(1:length(breaks), breaks),][, sum(Freq), by=id]