Count values of same kind in a row and omit interruptions - r

my actual problem is, that I want to count the length of similar values in my vector, for example:
v <- c(1,1,1,1,2,1,1,3,3,3,1,1,2,2,2)
But additionally I want to omit all interruptions with the length 1.
How can I achieve that my result here would be:
1,1,1,1,1,1,1,3,3,3,1,1,2,2,2
Note that the single "two" should now turn in a "one" and with
v_new <- c(1,1,1,1,1,1,1,3,3,3,1,1,2,2,2)
rle(v_new)
lengths: int [1:4] 7 3 2 3
values : num [1:4] 1 3 1 2
Thanks,
Mike

> v <- c(1,1,1,1,2,1,1,3,3,3,1,1,2,2,2)
>
> local.peak <- which(diff(sign(diff(v)))==-2) + 1
>
> v[which(diff(sign(diff(v)))==-2) + 1] <- v[local.peak - 1]
> v
[1] 1 1 1 1 1 1 1 3 3 3 1 1 2 2 2
The local peak function is taken from Finding local maxima and minima

Below is a little function that replaces values that occur not more than one time in a row with either the value to the left or right of it.
Your input
v <- c(1,1,1,1,2,1,1,3,3,3,1,1,2,2,2)
fun(v)
# [1] 1 1 1 1 1 1 1 3 3 3 1 1 2 2 2
Modified input
v <- c(1,1,1,1,2,4,4,3,3,3,1,1,2,2,2)
# ^ ^
Usage
fun(v, align = "right")
# [1] 1 1 1 1 4 4 4 3 3 3 1 1 2 2 2
Default is left aligned
fun(v)
# [1] 1 1 1 1 1 4 4 3 3 3 1 1 2 2 2
function
fun <- function(x, align = c("left", "right")) {
align <- match.arg(align)
rle_x <- rle(x)
rle_x$values <- with(rle_x, replace(values, lengths == 1, NA))
switch(align,
left = approx(inverse.rle(rle_x), xout = seq_along(x), method = "constant", f = 0)$y,
right = approx(inverse.rle(rle_x), xout = seq_along(x), method = "constant", f = 1)$y)
}

Related

Create dummy-column based on another columns

Let's say I have this dataset
> example <- data.frame(a = 1:10, b = 10:1, c = 1:5 )
I want to create a new variable d. I want in d the value 1 when at least in of the variables a b c the value 1 2 or 3 is present.
d should look like this:
d <- c(1, 1, 1, 0, 0, 1, 1, 1, 1, 1)
Thanks in advance.
You can use rowSums to get a logical vector of 1, 2 or 3 appearing in each row and wrap it in as.integer to convert to 0 and 1, i.e.
as.integer(rowSums(df == 1|df == 2| df == 3) > 0)
#[1] 1 1 1 0 0 1 1 1 1 1
Will work for any number of vars:
example <- data.frame(a = 1:10, b = 10:1, c = 1:5 )
x <- c(1, 2, 3)
as.integer(Reduce(function(a, b) (a %in% x) | (b %in% x), example))
With the dplyr package:
library(dplyr)
x <- 1:3
example %>% mutate(d = as.integer(a %in% x | b %in% x | c %in% x))
Two other possibilities which work with any number of columns:
#option 1
example$d <- +(rowSums(sapply(example, `%in%`, 1:3)) > 0)
#option 2
library(matrixStats)
example$d <- rowMaxs(+(sapply(example, `%in%`, 1:3)))
which both give:
> example
a b c d
1 1 10 1 1
2 2 9 2 1
3 3 8 3 1
4 4 7 4 0
5 5 6 5 0
6 6 5 1 1
7 7 4 2 1
8 8 3 3 1
9 9 2 4 1
10 10 1 5 1
You can do this using apply(although little slow)
Logic: any will compare if there is any 1,2 or 3 is present or not, apply is used to iterate this logic on each of the rows. Then finally converting the boolean outcome to numeric by adding +0 (You may choose as.numeric here in case you want to be more expressive)
d <- apply(example,1 ,function(x)any(x==1|x==2|x==3))+0
In case someone wants to restrict the columns or want to run the logic on some columns, then one can do this also:
d <- apply(example[,c("a","b","c")], 1, function(x)any(x==1|x==2|x==3))+0
Here you have control on columns on which one to take or ignore basis your needs.
Output:
> d
[1] 1 1 1 0 0 1 1 1 1 1
general solution:
example %>%
sapply(function(i)i %in% x) %>% apply(1,any) %>% as.integer
#[1] 1 1 1 0 0 1 1 1 1 1
Try this method, verify if in any column there is at list one element present in x.
x<-c(1,2,3)
example$d<-as.numeric(example$a %in% x | example$b %in% x | example$c %in% x)
example
a b c d
1 1 10 1 1
2 2 9 2 1
3 3 8 3 1
4 4 7 4 0
5 5 6 5 0
6 6 5 1 1
7 7 4 2 1
8 8 3 3 1
9 9 2 4 1
10 10 1 5 1

function for, if and three conditions. and there are some NA in my data set

Hello everyone!
Here is my code:
f1 <- function(x){
if(x > 10){
z = 1
} else if (x < 10 && x >0) {
z=2
} else {
z=3}
return(z)
}
r <- c(NA,NA,NA,NA,1:96)
s <- numeric(100)
for(i in 1:length(s)){
s[i] <- f1(r)
}
ERROR:
Error in if (x > 10) { : missing value where TRUE/FALSE needed
In addition: Warning message:
In if (x > 10) { :
the condition has length > 1 and only the first element will be used
What I need:
I have a data set 'r'. I need to create a function or loop for judge my data set. Then put the result in 's'.
's' is somethings like this:
3 3 3 3 2 2 2 2 2 2 2 2 2 3 1 1 1 1 1 1....
You can either replace NA before hand, i.e.
r[is.na(r)] <- -1
ifelse(r > 10, 1, ifelse(r <10 & r > 0, 2, 3))
#[1] 3 3 3 3 2 2 2 2 2 2 2 2 2 3 1 1 1 1 1 1 1 1 1 1...
#or include !is.na(r) in every condition, i.e.
ifelse(r > 10 & !is.na(r), 1, ifelse(r <10 & r > 0 & !is.na(r), 2, 3))
#[1] 3 3 3 3 2 2 2 2 2 2 2 2 2 3 1 1 1 1 1 1 1 1 1 1...

R: Iterative deletion of rows with group criteria

I'm trying to delete rows iteratively, if they meet two criteria:
slope column < 0
max of Lfd within Ring group
Ring <- c(1, 1, 1, 1, 2, 2, 2, 2)
Lfd <- c(1:4, 1:4)
slope <- c(2, 2, -1, -2, 2, -1, 2, -2)
test <- data.frame(Ring, Lfd, slope)
Ring Lfd slope
1 1 1 2
2 1 2 2
3 1 3 -1
4 1 4 -2
5 2 1 2
6 2 2 -1
7 2 3 2
8 2 4 -2
After first iteration they should look like
Ring Lfd slope
1 1 1 2
2 1 2 2
3 1 3 -1
5 2 1 2
6 2 2 -1
7 2 3 2
And after second like
Ring Lfd slope
1 1 1 2
2 1 2 2
5 2 1 2
6 2 2 -1
7 2 3 2
I already tried without iteration:
test_out <- test %>%
group_by(Ring) %>%
filter(Lfd != which.max(Lfd) & (slope > 0)) %>%
ungroup
And with iteration:
del.high.neg <- function(x) {
success <- FALSE
while (!success) {
test_out <- test %>%
group_by(Ring) %>%
filter(Lfd == which.max(Lfd)) %>%
select(Ring, Lfd, slope) %>%
ungroup
Index <- test_out[test_out$slope < 0, ]
test_out <- test_out[!(test_out$Ring %in% Index),]
success <- Index == NULL
}
return(x)
}
I think this is what you want - it will delete every negative row from the end of the data, until it hits your first positive value:
library(dplyr)
test %>% group_by(Ring) %>%
mutate(row = row_number()) %>%
filter(row <= max(which(slope > 0)))
Source: local data frame [5 x 4]
Groups: Ring [2]
Ring Lfd slope row
(dbl) (int) (dbl) (int)
1 1 1 2 1
2 1 2 2 2
3 2 1 2 1
4 2 2 -1 2
5 2 3 2 3
you can add on a select(-row) if you'd like the row column gone too.
I think you are saying that you want to delete all the rows that have a negative slope and have Lfd that is greater than or equal to the row with the maximum value of Lfd and a non-negative slope. If you want to do that within Ring, you can use the following:
library(plyr)
testmax <- ddply(test,.(Ring),summarize,maxLfd = max(Lfd[slope>=0]))
test1 <- merge(test,testmax)
test_out <- test1[!(test1$Lfd>=test1$maxLfd & test1$slope<0),-4]
test_out
# Ring Lfd slope
# 1 1 1 2
# 2 1 2 2
# 5 2 1 2
# 6 2 2 -1
# 7 2 3 2

R function for counting how often a value falls below a particular value

This is proving to be a monster for me with zero experience in R script. I have a data frame with 57 columns, 30 rows of data
Here is what I am trying to do:
1) Go to each column:
2) Count the number of times 2/3/4/5/6/7/8/9 consecutive values are less than -1
3) Print the result as a text file
4) Repeat step 2 and 3 for the second column and so on
I looked around and also on r stackoverflow
check number of times consecutive value appear based on a certain criteria
This is one column of my data:
data<-c(-0.996,-1.111,-0.638,0.047,0.694,1.901,2.863,2.611,2.56,2.016,0.929,-0.153,-0.617,-0.143
0.199,0.556,0.353,-0.638,0.347,0.045,-0.829,-0.882,-1.143,-0.869,0.619,0.923,-0.474,0.227
0.394,0.789,1.962,1.132,0.1,-0.278,-0.303,-0.606,-0.705,-0.858,-0.723,-0.081,1.206,2.329
1.863,2.1,1.547,2.026,0.015,-0.441,-0.371,-0.304,-0.668,-0.953,-1.256,-1.185,-0.891,-0.569
0.485,0.421,-0.004,0.024,-0.39,-0.58,-1.178,-1.101,-0.882,0.01,0.052,-0.166,-1.703,-1.048
-0.718,-0.036,-0.561,-0.08,0.272,-0.041,-0.811,-0.929,-0.853,-1.047,0.431,0.576,0.642,1.62
2.324,1.251,1.384,0.195,-0.081,-0.335,-0.176,1.089,-0.602,-1.134,-1.356,-1.203,-0.795,-0.752
-0.692,-0.813,-1.172,-0.387,-0.079,-0.374,-0.157,0.263,0.313,0.975,2.298,1.71,0.229,-0.313
-0.779,-1.12,-1.102,-1.01,-0.86,-1.118,-1.211,-1.081,-1.156,-0.972)
When I run the following code:
for (col in 1:ncol(data)) {
runs <- rle(data[,col])
print(runs$lengths[which(runs$values < -1)])
}
It gives me this:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
It has counted the number of values <-1 but not runs. Is it something that I am during wrong here?
(massive edit)
Fixed data vector (was missing commas):
data <- c(-0.996,-1.111,-0.638,0.047,0.694,1.901,2.863,2.611,2.56,2.016,0.929,-0.153,-0.617,-0.143,
0.199,0.556,0.353,-0.638,0.347,0.045,-0.829,-0.882,-1.143,-0.869,0.619,0.923,-0.474,0.227,
0.394,0.789,1.962,1.132,0.1,-0.278,-0.303,-0.606,-0.705,-0.858,-0.723,-0.081,1.206,2.329,
1.863,2.1,1.547,2.026,0.015,-0.441,-0.371,-0.304,-0.668,-0.953,-1.256,-1.185,-0.891,-0.569,
0.485,0.421,-0.004,0.024,-0.39,-0.58,-1.178,-1.101,-0.882,0.01,0.052,-0.166,-1.703,-1.048,
-0.718,-0.036,-0.561,-0.08,0.272,-0.041,-0.811,-0.929,-0.853,-1.047,0.431,0.576,0.642,1.62,
2.324,1.251,1.384,0.195,-0.081,-0.335,-0.176,1.089,-0.602,-1.134,-1.356,-1.203,-0.795,-0.752,
-0.692,-0.813,-1.172,-0.387,-0.079,-0.374,-0.157,0.263,0.313,0.975,2.298,1.71,0.229,-0.313,
-0.779,-1.12,-1.102,-1.01,-0.86,-1.118,-1.211,-1.081,-1.156,-0.972)
Doing data < -1 gives you a logical vector, and we can count runs of TRUE & FALSE:
runs <- rle(data < -1)
print(runs)
## Run Length Encoding
## lengths: int [1:21] 1 1 20 1 29 2 8 2 4 2 ...
## values : logi [1:21] FALSE TRUE FALSE TRUE FALSE TRUE ...
Then extract the length of only the TRUE runs:
print(runs$lengths[which(runs$values)])
## [1] 1 1 2 2 2 1 3 1 3 4
and, iterate over columns of a data frame as previously shown:
# make a data frame from sampled versions of data
set.seed(1492) # repeatable
df <- data.frame(V1=data,
V2=sample(data, length(data), replace=TRUE),
V3=sample(data, length(data), replace=TRUE),
V4=sample(data, length(data), replace=TRUE))
# do the extraction
for (col in 1:ncol(df)) {
runs <- rle(df[, col] < -1)
print(runs$lengths[which(runs$values)])
}
## [1] 1 1 2 2 2 1 3 1 3 4
## [1] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Calculating ratio of values with specific labels in data.table

I have a data.table and I need to add additional column that is a ratio between labels == 1 and labels == 2 for same cID. I have the code that can do that but the results is the reduced form according to the number of unique "l". But I need a full list with duplicates. Any suggestions? Thank's in advance!
x y l cID
0.03588851 0.081635056 1 1
0.952514891 0.82677373 1 1
0.722920691 0.687278396 1 1
0.772207687 0.743329599 2 1
0.682710551 0.946685728 1 2
0.795816439 0.024320077 2 2
0.50788885 0.106910923 2 2
0.145871035 0.802771467 2 2
0.092942384 0.335054397 1 3
0.439765866 0.199329139 1 4
to reproduce
x = c(0.03588851,0.952514891,0.722920691,0.772207687,0.682710551,0.795816439,0.50788885,0.145871035,0.092942384,0.439765866)
y = c(0.081635056,0.82677373,0.687278396,0.743329599,0.946685728,0.024320077,0.106910923,0.802771467,0.335054397,0.199329139)
l = c(1,1,1,2,1,2,2,2,1,1)
cID = c(1,1,1,1,2,2,2,2,3,4)
dt <- data.table(x,y,l,cID)
dt[,sum(l == 1)/sum(l == 2), by = cID]
I need to obtain the ratio column that looks like this
x y l cID ratio
0.03588851 0.081635056 1 1 3
0.952514891 0.82677373 1 1 3
0.722920691 0.687278396 1 1 3
0.772207687 0.743329599 2 1 3
0.682710551 0.946685728 1 2 0.333333333
0.795816439 0.024320077 2 2 0.333333333
0.50788885 0.106910923 2 2 0.333333333
0.145871035 0.802771467 2 2 0.333333333
0.092942384 0.335054397 1 3 Inf
0.439765866 0.199329139 1 4 Inf
You were pretty close. Try this:
dt[, ratio := sum(l == 1) / sum(l == 2), by = cID]

Resources