I have vectors in R containing a lot of 0's, and a few non-zero numbers.Each vector starts with a non-zero number.
For example <1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0>
I would like to set all of the zeros equal to the most recent non-zero number.
I.e. this vector would become <1,1,1,1,1,1,2,2,2,2,2,2,4,4,4,4>
I need to do this for a about 100 vectors containing around 6 million entries each. Currently I am using a for loop:
for(k in 1:length(vector){
if(vector[k] == 0){
vector[k] <- vector[k-1]
}
}
Is there a more efficient way to do this?
Thanks!
One option, would be to replace those 0 with NA, then use zoo::na.locf:
x <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
x[x == 0] <- NA
zoo::na.locf(x) ## you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4
Thanks to Richard for showing me how to use replace,
zoo::na.locf(replace(x, x == 0, NA))
You could try this:
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
or another case that cummax would not be appropriate
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
Logic:
I am keeping "track" of the indices of the vector elements that are non zero which(k != 0), lets denote this new vector as x, x=c(1, 7, 13)
Next I am going to "sample" this new vector. How? From k I am creating a new vector that increments every time there is a non zero element cumsum(k != 0), lets denote this new vector as y y=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3)
I am "sampling" from vector x: x[y] i.e. taking the first element of x 6 times, then the second element 6 times and the third element 3 times. Let denote this new vector as z, z=c(1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7, 13, 13, 13)
I am "sampling" from vector k, k[z], i.e. i am taking the first element 6 times, then the 7th element 6 times then the 13th element 3 times.
Add to #李哲源's answer:
If it is required to replace the leading NAs with the nearest non-NA value, and to replace the other NAs with the last non-NA value, the codes can be:
x <- c(0,0,1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
zoo::na.locf(zoo::na.locf(replace(x, x == 0, NA),na.rm=FALSE),fromLast=TRUE)
# you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4
Related
I am trying to run a logical or statement across many columns in data.table but I am having trouble coming up with the code. My columns have a pattern like the one shown in the table below. I could use a regular logical vector if needed, but I was wondering if I could figure out a way to iterate across a1, a2, a3, etc. as my actual dataset has many "a" type columns.
Thanks in advance.
library(data.table)
x <- data.table(a1 = c(1, 4, 5, 6), a2 = c(2, 4, 1, 10), z = c(9, 10, 12, 12))
# this works but does not work for lots of a1, a2, a3 colnames
# because code is too long and unwieldy
x[a1 == 1 | a2 == 1 , b:= 1]
# this is broken and returns the following error
x[colnames(x)[grep("a", names(x))] == 1, b := 1]
Error in `[.data.table`(x, colnames(x)[grep("a", names(x))] == 1, `:=`(b, :
i evaluates to a logical vector length 2 but there are 4 rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.
Output looks like below:
a1 a2 z b
1: 1 2 9 1
2: 4 4 10 NA
3: 5 1 12 1
4: 6 10 12 NA
Try using a mask:
x$b <- 0
x[rowSums(ifelse(x[, list(a1, a2)] == 1, 1, 0)) > 0, b := 1]
Now imagine you have 100 a columns and they are the first 100 columns in your data table. Then you can select the columns using:
x[rowSums(ifelse(x[, c(1:100)] == 1, 1, 0) > 0, b := 1]
ifelse(x[, list(a1, a2)] == 1, 1, 0) returns a data table that only has the values 1 where there is a 1 in the a columns. Then I used rowSums to sum horizontally, and if any of these sums is > 0, it means there was a 1 in at least one of the columns of a given row, so I simply selected those rows and set b to 1.
Hello i need to find the position of first negative number for each row vector in a matrix
i've tried with match and apply but it ony shows the first
z<-matrix(c(-3,2,-1,3,2,-2,3,-4,-1),ncol=3)
k<-z<0
h<-apply(k,1,function(x) match(TRUE,k))
i want it to show [1,3,2]
but it shows only the first match of the entire matrix [1,1,1]
We can use max.col with ties.method = "first" to get first negative number
max.col(z < 0, ties.method = "first")
#[1] 1 3 1
With apply you could do
apply(z < 0, 1, which.max)
Both these approaches require at-least one negative number in the row or else it will return the first index. To avoid that we can check with rowSums whether there is at-least one negative number in the row and then use max.col. Rows with no negative value would get 0 then.
(rowSums(z < 0) > 0) * max.col(z < 0, ties.method = "first")
Alternative modifying OP attempt,
apply(z<0, 1, function(x) which(x)[1])
# [1] 1 3 1
Has the benefit of when there is no negative in a row, returns NA, not 1.
For example,
z2 <- structure(c(-3, 2, -1, 3, 2, -2, 3, 0, -1), .Dim = c(3L, 3L))
apply(z2<0, 1, function(x) which(x)[1])
[1] 1 NA 1
Edit
Bit faster is to use the match function:
apply(z<0, 1, function(x) match(TRUE, x))
# [1] 1 3 1
This question already has answers here:
Remove/collapse consecutive duplicate values in sequence
(5 answers)
Closed 4 years ago.
Does anyone know how to remove continuously repeating values? Not just repeating values with unique() function.
So for example, I want:
0,0,0,0,1,1,1,2,2,2,3,3,3,3,2,2,1,2
to become
0,1,2,3,2,1,2
and not just
0,1,2,3
Is there a word to describe this? I'm sure that the solution is out there somewhere and I just can't find it because I don't know the word for it.
Keep a value when it's difference from the previous value is not zero (and keep the first one):
x <- c(0,0,0,0,1,1,1,2,2,2,3,3,3,3,2,2,1,2)
x[c(1, diff(x)) != 0]
# [1] 0 1 2 3 2 1 2
v <- c(0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 2, 2, 1, 2)
rle(v)$values
Output:
[1] 0 1 2 3 2 1 2
I need a function, which checks for the frequency of values per row in a df, then checks whether one of the values appears 6 or more times, and if so, displays this value in a new column. If not, writes "nope" in the same new column instead.
In the example below: The values in the rows are either 1, 2, or 3. So if one of the values 1,2,or3 appears 6 or more times per row, whichever value that is (1,2,or3) has to appear in a new column. If none of the values appear 6 or more times per row, the value in that same new column should be "nope".
example
Try applying the table function for each row using
make_count_col <- function(x) {
cnt <- apply(x, 1, table)
x$newcolumn <- apply(cnt, 2, function(y) {
if (max(y, na.rm = T) < 6)
out <- 'nope'
else
out <- names(y)[which.max(y)]
out
})
x
}
Your example replicated
x <- as.data.frame(matrix(c(1, 2, 1, 2, 2, 2, 2, 2, 3,
2, 3, 1, 1, 3, 2, 1, 1, 3), nrow = 2, byrow = T))
colnames(x) <- paste0('svo', 1:9)
make_count_col(x)
svo1 svo2 svo3 svo4 svo5 svo6 svo7 svo8 svo9 newcolumn
1 2 1 2 2 2 2 2 3 2
2 3 1 1 3 2 1 1 3 nope
Given a vector:
eg.:
a = c(1, 2, 2, 4, 5, 3, 5, 3, 2, 1, 5, 3)
Using a[a%in%a[duplicated(a)]] I can remove values not duplicated. However, it only works for values that are only present once.
How would I go on about removing all values that aren't present in this thrice? (or more, in other situations)
The expected result would be:
2 2 5 3 5 3 2 5 3
with 1 and 4 removed, as they are only present twice and once
You can do this in one line with the ave function:
a[ave(a, a, FUN=length) >= 3]
# [1] 2 2 5 3 5 3 2 5 3
The call to ave(a, a, FUN=length) returns, for each element a[i] in vector a, the total number of times a[i] appears within a. Then you can subset a, limiting to the indices where the total number of times is 3 or more.
Reasonably straightforward (longer than using ave but possibly more comprehensible):
x <- c(1,2,2,4,5,3,5,3,2,1,5,3)
tt <- table(x) ## tabulate
## find relevant values
ttr <- as.numeric(names(tt)[tt>=3])
x[x %in% ttr] ## subset