I have a list of two items. I would like to set a condition using the if statement. The condition needs to contain the all statement as well.
In other words,
Suppose I have the following list:
library(VineCopula)
x <- BiCop(0,0)
y <- BiCop(0,0)
z <- list(x, y)
I would like to have a condition that said that if all the z[[i]]$tau less or greater than a specific value, then z must be set to zero.
Here is my code (kindly note that my list can have any length. That is, the length is not fixed. Here, I fixed the length to two elements only, but my real data needs to be more than 2.):
for (i in seq_len(m)){
if (all( 0 <= z[[i]]$tau =< 0.15))
z <- 0
}
How to do this in R?
Extract the list element, wrap with all on the logical condition to return a single TRUE/FALSE, use that in if, loop over the 'z' and assign the tau elements to 0
tau1 <- sapply(z, "[[", "tau")
i1 <- all(tau1 >= 0 & tau1 <= 0.15)
if(i1) {
z <- lapply(z, function(x) {x$tau <- 0; x})
}
The syntax 0 <= z[[i]]$tau =< 0.15 seems to mathematical which is not a correct R syntax as we need to have two expressions joined with &. Also, as we are doing the check on all the list elements, we may need to do this in two for loop (if for loop is used) - first one to check if all meets the condition and second to do the assignment (in case the first returned TRUE) i.e.
i1 <- TRUE
# // first loop
for(i in seq_along(z)) {
i1 <- i1 & z[[i]]$tau >= 0 & z[[i]]$tau <= 0.15
}
# // second loop
if(i1) {
for(i in seq_along(z)) {
z[[i]]$tau <- 0
}
}
Related
Consider the below MWE that splits a distance matrix and attempts to compare partitions:
set.seed(1234) # set random seed for reproducibility
# generate random normal variates
x <- rnorm(5)
y <- rnorm(5)
df <- data.frame(x, y) # merge vectors into dataframe
d <- dist(x) # generate distance matrix
splt <- split(d, 1:5) # split data with 5 values in each partition
# compare partitions
for (i in 1:length(splt)) {
for (j in 1:length(splt)) {
if(splt[[i]] != splt[[j]]) {
a <- length(which(splt[[i]] >= min(splt[[j]]))) / length(splt[[i]])
b <- length(which(splt[[j]] <= max(splt[[i]]))) / length(splt[[j]])
}
}
}
# Error in if (splt[[i]] != splt[[j]]) { : the condition has length > 1
The above for loop should compare all unique partitions (i.e., (1, 2), (1, 3), ... ,(4, 5)). However, the condition is greater than 1.
The result for comparing partition 1 (split[[1]]) and partition 2 (split[[2]]) for instance should be a = b = 1.
a <- length(which(splt[[1]] >= min(splt[[2]]))) / length(splt[[1]])
b <- length(which(splt[[2]] <= max(splt[[1]]))) / length(splt[[2]])
I know the solution is to instead use ifelse() but there is no else within the nested loop.
Any ideas on how to proceed?
Is your problem the error message? That is, why R does not like your comparison splt[[i]] == splt[[j]]? The reason is that we get a vector of comparisons:
> splt[[1]] != splt[[2]]
[1] TRUE TRUE
If I understand you correctly, splt[[i]] is equal to splt[[j]] if all entries are equal and different otherwise. If so, change the comparison to be !(all(splt[[i]] == splt[[j]])).
The total loop looks like this:
for (i in 1:length(splt)) {
for (j in 1:length(splt)) {
if (!(all(splt[[i]] == splt[[j]]))) {
a <- length(which(splt[[i]] >= min(splt[[j]]))) / length(splt[[i]])
b <- length(which(splt[[j]] <= max(splt[[i]]))) / length(splt[[j]])
}
}
}
I'd like to build a function that takes a vector as an argument and returns the largest number of consecutive zeros. For instance:
a <- c(0,0,1,1,0)
b <- c(0,1,3,10,0,0,0)
x <- count_max_consecutive_zeros(a)
y <- count_max_consecutive_zeros(b)
Should result in x=2 and y=3. I could go for the obvious solution and make a loop:
count_max_consecutive_zeros <- function(x) {
max_count <- 0
current_count <- 0
for (i in 1:length(x) {
if(x[i] == 0) {
current_count = current_count + 1
} else {
if(current_count > max_count) {
max_count <- current_count
}
current_count <- 0
}
}
This solution is fine for short vectors, however I'll have to use this function thousands of times on vectors that are tens of thousands of entries long, so I'm afraid I'll run into performance issues. Is there a vectorized function equivalent to count_max_consecutive_zeros?
You can use rle and max to count largest number of consecutive zeros.
x <- rle(a==0)
max(x$lengths[x$values == TRUE])
#[1] 2
An option with rleid
library(data.table)
max(tapply(a[a==0], rleid(a)[a == 0], FUN = length))
#[1] 2
I've got the following problem from my lector:
Make a vector which for each time y takes the value 5 shows the
cumulated sum of x values from the preceding position where y took the
value 5. 1 (Hint: use the commands cumsum() and diff().)
I'm not really sure in what format the vector in question is supposed to be formatted, so I propose two functions that give me the two different possible answers I can think of. Furthermore I don't understand in what way I'm supposed to use diff() instead I used a for-loop to tackle the problem:
# Creates random vectors
x <- round(runif(100,0,10))
y <- round(runif(100,0,10))
# X_,Y_ are arbitrary vectors of length n, and value decides what index from X_ is supposed to be chosen, as formulated in the question it is supposed to be 5.
# index is a vector that stores which entries of Y which has the value "value".
# Gives a vector as (x_1, x_(1+2),..., x_(sum(1,2...,index[1])))
test.cumsum <- function(X_,Y_, value){
index <- which(Y_ == value)
for(i in 1:length(index)){
if(i == 1){
z <- cumsum(x[1:index[1]])
}
if(i == length(index)){
break
}
else{
z <- c(z,cumsum(x[(index[i]+1):index[i+1]]))
}
}
z
}
# Gives a vector as (x_1+x_2...+index[1],x_(index[1]+1)+x_(index[1]+2)+...)
test.sum <- function(X_,Y_, value){
index <- which(Y_ == value)
for(i in 1:length(index)){
if(i == 1){
z <- sum(x[1:index[1]])
}
if(i == length(index)){
break
}
else{
z <- c(z,sum(x[(index[i]+1):index[i+1]]))
}
}
z
}
Is there any easier way to do this?
I'm running a simple ifelse function
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
where shift is from the data.table package
which allows me to change, for each column in a dataframe (usig apply), a value which is exactly the same as the previous one. The problem is that the ifelse function returns a length which is equal to the length of the test. In this case, the length is the one of shift(x) and not x. Therefore I end up with the first element (or the last, if using type = "lead", instead of the default "lag") of each column turned into NA.
Here a MWE:
a <- c(1,2,2,3,4,5,6)
b <- c(4,5,6,7,8,8,9)
data <- data.frame(cbind(a,b))
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
apply(data, 2, f)
Therefore I thought I could change the ifelse function: I've done a few attempts to change the length.out but I haven't succeeded yet
function (test, yes, no)
{
if (is.atomic(test)) {
if (typeof(test) != "logical")
storage.mode(test) <- "logical"
if (length(test) == 1 && is.null(attributes(test))) {
if (is.na(test))
return(NA)
else if (test) {
if (length(yes) == 1 && is.null(attributes(yes)))
return(yes)
}
else if (length(no) == 1 && is.null(attributes(no)))
return(no)
}
}
else test <- if (isS4(test))
methods::as(test, "logical")
else as.logical(test)
ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok]))
ans[test & ok] <- rep(yes, length.out = length(ans))[test &
ok]
if (any(!test[ok]))
ans[!test & ok] <- rep(no, length.out = length(ans))[!test &
ok]
ans[nas] <- NA
ans
}
EDIT
My original code was:
copy <- copy(data)
for (j in 1: ncol(copy)) {
for (i in 2: nrow(copy)) {
if (copy[i,j] == copy[i-1,j] & !is.na(copy[i,j]) & !is.na(copy[i-1,j])) {
copy[i,j] <- copy[i-1,j] + (0.0001*sd(copy[,j], na.rm = T))
}
}
}
but using it with large matrices may cause slow running time. This deals with multiple repetitions.
The goal was to get to a vectorised, quicker method using a function and apply.
As you mention, your approach leads to a NA in the first element of the vector returned by f. This first element is not similar to the previous (since there is none), so we would like to have the first value unchanged.
A straightforward approach is to do just that. Apologies, it does not answer your title question although it does solve your problem.
f <- function(x) {
# storing the output of ifelse in a variable
out <- ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
Note that this will not take care properly of elements repeated more than twice (e.g. c(1,2,2,2,3)). Also, this will change all your element the same way. So in c(1,2,2,1,2,2), all the second twos will be changed the same way. This may or mat not be something you want.
You could hack something (a comment suggests ?rle), but I suggest changing the way you randomize your data, if this makes sense with your particular data.
Instead of adding 0.001*sd, maybe you could add a gaussian noise with this standard dev? This depends on your application obviously.
f <- function(x) {
# adding gaussian noise with small sd to repeated values
# storing the output in a variable `out`
out <- ifelse(x==shift(x),
x + rnorm(length(x), mean=0,
sd=0.01*sd(x, na.rm = TRUE)),
x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
It depends on what is your purpose for getting rid of exact duplicated values.
I am not sure I quite understand how to use an if and else statement within a for loop. My code looks like this:
> X
[1] 1 0 1 1 1
A=0
for (i in 1:5){
if (X[i]=1)
A=A+1
}
else{
A=A
}
for example in this case there are 4 '1's in the vector X. So for every 1 in the vector I want it to add 1 to the value of A. (So A should equal 4).
As others have mentioned: = is for assignment (similar to <-; see here for details) and == is to compare equality.
You final loop should be:
for(i in X){
if (i == 1) A <- A + 1
}
You don't need the else clause because it doesn't really do anything, just slows down the loop slightly.
R is vectorized though so use that fact. It is more idiomatic to say:
sum(X)
if you want to count them all up or look at apply and lapply for more complex situations, depending on the context of what you are doing.
X <- c(1, 0, 1, 1, 1)
A <- 0
for (i in X){
if (i == 1)
A <- A + 1
else A <- A + 0
}
This is what you are after:
for (i in 1:5){
if (X[i]==1){
A=A+1
}else{
A=A
}
}
Notice the double = sign in X[i]==1gives you your falsifiable boolean statement (does X[i] equal 1), compared to the single = sign in your example which sets X[i] equal to 1
The comparison sign should be (X[i]==1) and the way you are using the if statement is fine. But you got extra braces which is messing up your code.
it should look like this
X <- c(1,0,1,1,1)
A=0
for (i in 1:5){
if (X[i]==1)
A=A+1
else
A=A
}
or like this
X <- c(1,0,1,1,1)
A=0
for (i in 1:5){
if (X[i]==1){
A=A+1
}
else{
A=A
}
}
If you have just one instruction within an if statement you don't need the braces, but if more two or more instructions will run for a certain condition the braces are necessary.
The indentation is there only to help the visualisation.