I'm running a simple ifelse function
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
where shift is from the data.table package
which allows me to change, for each column in a dataframe (usig apply), a value which is exactly the same as the previous one. The problem is that the ifelse function returns a length which is equal to the length of the test. In this case, the length is the one of shift(x) and not x. Therefore I end up with the first element (or the last, if using type = "lead", instead of the default "lag") of each column turned into NA.
Here a MWE:
a <- c(1,2,2,3,4,5,6)
b <- c(4,5,6,7,8,8,9)
data <- data.frame(cbind(a,b))
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
apply(data, 2, f)
Therefore I thought I could change the ifelse function: I've done a few attempts to change the length.out but I haven't succeeded yet
function (test, yes, no)
{
if (is.atomic(test)) {
if (typeof(test) != "logical")
storage.mode(test) <- "logical"
if (length(test) == 1 && is.null(attributes(test))) {
if (is.na(test))
return(NA)
else if (test) {
if (length(yes) == 1 && is.null(attributes(yes)))
return(yes)
}
else if (length(no) == 1 && is.null(attributes(no)))
return(no)
}
}
else test <- if (isS4(test))
methods::as(test, "logical")
else as.logical(test)
ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok]))
ans[test & ok] <- rep(yes, length.out = length(ans))[test &
ok]
if (any(!test[ok]))
ans[!test & ok] <- rep(no, length.out = length(ans))[!test &
ok]
ans[nas] <- NA
ans
}
EDIT
My original code was:
copy <- copy(data)
for (j in 1: ncol(copy)) {
for (i in 2: nrow(copy)) {
if (copy[i,j] == copy[i-1,j] & !is.na(copy[i,j]) & !is.na(copy[i-1,j])) {
copy[i,j] <- copy[i-1,j] + (0.0001*sd(copy[,j], na.rm = T))
}
}
}
but using it with large matrices may cause slow running time. This deals with multiple repetitions.
The goal was to get to a vectorised, quicker method using a function and apply.
As you mention, your approach leads to a NA in the first element of the vector returned by f. This first element is not similar to the previous (since there is none), so we would like to have the first value unchanged.
A straightforward approach is to do just that. Apologies, it does not answer your title question although it does solve your problem.
f <- function(x) {
# storing the output of ifelse in a variable
out <- ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
Note that this will not take care properly of elements repeated more than twice (e.g. c(1,2,2,2,3)). Also, this will change all your element the same way. So in c(1,2,2,1,2,2), all the second twos will be changed the same way. This may or mat not be something you want.
You could hack something (a comment suggests ?rle), but I suggest changing the way you randomize your data, if this makes sense with your particular data.
Instead of adding 0.001*sd, maybe you could add a gaussian noise with this standard dev? This depends on your application obviously.
f <- function(x) {
# adding gaussian noise with small sd to repeated values
# storing the output in a variable `out`
out <- ifelse(x==shift(x),
x + rnorm(length(x), mean=0,
sd=0.01*sd(x, na.rm = TRUE)),
x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
It depends on what is your purpose for getting rid of exact duplicated values.
Related
I'm trying to write a function that takes an input vector, v, and returns a vector of the same length whose elements are boolean values indicating whether or not the corresponding element of the input vector indicates the variable homelessness. The function should loop over elements of
v and uses the homeless vector to flag those elements of v that indicate the arrestee is homeless
I keep getting an error saying "object 'n' not found"
I've tried changing the ith variable to flag_homeless to no success.
flag_homeless <- function(v)
n <- length(v)
homeless <- rep(FALSE, n)
for (i in 1:n) {
if (v[i] == "No Permanent Address") {
homeless[i] <- TRUE }
}
return(homeless)
You could use another pair of braces for your function call. It's difficult to understand the scope of flag_homeless otherwise.
flag_homeless <- function(v) {
n <- length(v)
homeless <- rep(FALSE, n)
for (i in 1:n) {
if (v[i] == "No Permanent Address") {
homeless[i] <- TRUE }
}
return(homeless)
}
Because R is vectorized, you usually don't need to use an explicit loop. Here is a toy example with a binary A/B test analogous to your requirement:
v <- sample(c('A', 'B'), 100, replace = TRUE, prob = c(0.3, 0.7))
test <- v == 'B'
sum(test)
[1] 71
You would use "No Permanent Address" as your B test.
I have a list of two items. I would like to set a condition using the if statement. The condition needs to contain the all statement as well.
In other words,
Suppose I have the following list:
library(VineCopula)
x <- BiCop(0,0)
y <- BiCop(0,0)
z <- list(x, y)
I would like to have a condition that said that if all the z[[i]]$tau less or greater than a specific value, then z must be set to zero.
Here is my code (kindly note that my list can have any length. That is, the length is not fixed. Here, I fixed the length to two elements only, but my real data needs to be more than 2.):
for (i in seq_len(m)){
if (all( 0 <= z[[i]]$tau =< 0.15))
z <- 0
}
How to do this in R?
Extract the list element, wrap with all on the logical condition to return a single TRUE/FALSE, use that in if, loop over the 'z' and assign the tau elements to 0
tau1 <- sapply(z, "[[", "tau")
i1 <- all(tau1 >= 0 & tau1 <= 0.15)
if(i1) {
z <- lapply(z, function(x) {x$tau <- 0; x})
}
The syntax 0 <= z[[i]]$tau =< 0.15 seems to mathematical which is not a correct R syntax as we need to have two expressions joined with &. Also, as we are doing the check on all the list elements, we may need to do this in two for loop (if for loop is used) - first one to check if all meets the condition and second to do the assignment (in case the first returned TRUE) i.e.
i1 <- TRUE
# // first loop
for(i in seq_along(z)) {
i1 <- i1 & z[[i]]$tau >= 0 & z[[i]]$tau <= 0.15
}
# // second loop
if(i1) {
for(i in seq_along(z)) {
z[[i]]$tau <- 0
}
}
All I want to do is to implement the solution given here (the one in python)
in R.
I'm not very used to do debugging in R-Studio but even after I have tried that I still can't figure out why my code does not work. Basically (with the example input provided) I get the function to run over all the numbers and then it is stuck in a sort of infinite loop (or function). Can someone please point me in the right direction regarding this?
subset_sum <- function(numbers, target, partial = numeric(0)){
s <- sum(partial,na.rm = TRUE)
# check if the partial sum equals to target
if (s == target){
cat("sum(",partial,")","=",target)
}
else if (s >= target) {
return() # if we reach the number why bother to continue
}
else {
for(i in 1:length(numbers)){
n <- numbers[i]
remaining <- numbers[i+1:length(numbers)]
subset_sum(remaining, target, partial = append(partial,n))
}
}
}
subset_sum(c(3,9,8,4,5,7,10),15)
When not run in debug mode it gives me these errors:
Error: node stack overflow
Error during wrapup: node stack overflow
Here's a recursive implementation in R
subset_sum = function(numbers,target,partial=0){
if(any(is.na(partial))) return()
s = sum(partial)
if(s == target) print(sprintf("sum(%s)=%s",paste(partial[-1],collapse="+"),target))
if(s > target) return()
for( i in seq_along(numbers)){
n = numbers[i]
remaining = numbers[(i+1):length(numbers)]
subset_sum(remaining,target,c(partial,n))
}
}
I had to add one extra catch in R from python to handle when i+1 > length(numbers) and returned an NA.
> subset_sum(c(3,9,8,4,5,7,10),15)
[1] "sum(3+8+4)=15"
[1] "sum(3+5+7)=15"
[1] "sum(8+7)=15"
[1] "sum(5+10)=15"
I think (but I'm not sure) that your issue was nest if/else if logic in a recursive function. Interestingly, when I put the if(i+1 > length(numbers)) return() inside the for loop, that broke the functionality so I didn't get all the answers right - the return's need to be outside the recursion.
This is not a recursive function but it takes advantage of R's ability to handle matrix/array type data. Some output is shown after #
v <- c(3,9,8,4,5,7,10)
v <- sort(v)
# [1] 3 4 5 7 8 9 10
target <- 15
# we don't need to check more than at most 4 numbers since 3+4+5+7 (the smallest numbers) is greater than 15
mincombs <- min(which(cumsum(v) > target))
# [1] 4
Combs <- combn(v, mincombs) # make combinations of numbers
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
# [1] "3+4+8=15" "3+4+8=15" "3+5+7=15" "3+5+7=15" "3+5+7=15" "7+8=15"
In a function
myfun <- function(V, target) {
V <- sort(V)
mincombs <- min(which(cumsum(V) > target))
Combs <- combn(V, mincombs)
ans <- mapply(function(x,y) ifelse(y > 0, paste0(paste0(Combs[1:y,x], collapse="+"), "=", target), NA), 1:ncol(Combs), apply(Combs, 2, function(I) which(cumsum(I) == target)))
ans <- unlist(ans[lengths(ans) > 0])
return(ans)
}
myfun(V = c(3,9,8,4,5,7,10), target = 15)
myfun(V = c(3,9,8,4,5,7,10,12,4,32),target = 20)
I wrote the following code trying to find all the prime numbers from a random generated data set. sadly it seems something went wrong, could anybody help me.
set.seed(20171106)
n <- 10000
num <- sample(1:100000,n,replace=TRUE)
findPrime <- function(x){
apple<-c()
n<-length(x)
for(i in n){
if(any(x[i]%%(1:(x[i]-1))!=0)) apple <-c(apple,x[i])
}
return(apple)
}
To get results:
type:findPrime(num)
This is the warning message:
Warning message:
In if (x[i]%%(1:(x[i] - 1)) == 0) apple <- c(apple, x[i]) :
the condition has length > 1 and only the first element will be used
so how can I fix the problem?
if statements only accept single elements and in your declaration seems to get the whole vector. I have rewritten your function using a ifelse expression wrapped inside a sapply loop.
I hope this works for you.
findPrime <- function(x = 0){
primes <- c()
# Prime finder
primes <- sapply(X = x,FUN = function(x) {
ifelse(any(x %% (1:(x - 1)) != 0), T, F)}
)
# Select primes
primes <- num[primes]
return(primes)
}
findPrime(num)
I have checked another silly mistake... Inside the function change num for x in the select primes step and invert the F, T outcomes. It should look like this:
findPrime <- function(x = 0){
primes <- c()
# Prime finder
primes <- sapply(X = x,FUN = function(x) {
ifelse(any(x %% (2:(x - 1)) == 0), F, T)}
)
# Select primes
primes <- x[primes]
return(primes)
}
I have just tried it and it works fine.
use package "gmp" which has a function "isprime" which returns 0 for non prime numbers and 2 for prime numbers and then subset the data based on the same
say you have a vector a = c(1:10)
a = c(1:10)
b = gmp::isprime(a)
c = cbind(a,b)
c = as.data.frame(c)
c = c[c$b==2,]
a1 = c$a
a1
In your code: for(i in 1:n), there is the error
In this piece of code:
q7 <- function (x) {
if (is.numeric (x) == FALSE) {stop ("Input is non-numeric.")}
if (all (x > 0) == FALSE) {stop ("Input has negative values.")}
sum <- 0
while (sum <= 100) {
if (x[1] > 50) {next}
else if (x[1] %% 2 == 0) {sum <- sum + (x[1] / 2)}
else {sum <- sum + x[1]}
x <- x [x[2]:length (x)]
}
}
Used for example as q7 (c(10,20,30,40,50,60,70,80,90,100)), I get this error:
Error in if (x[1] > 50) { : missing value where TRUE/FALSE needed
I think the condition here couldn't be simpler and I must be missing some R heuristic, because obviously in the console;
> x <- c(10,20,30,40,50,60,70,80,90,100)
> x[1] > 50
[1] FALSE
So simple.
What is going on?
A quick diagnose with debug shows the problem is with
x <- x[x[2]:length(x)]
After the first iteration this will be a vector of NA. Then in the next iteration, when you check x[1] > 50, you are in fact doing NA > 50, thus you get the error.
After checking with you, you simply want to remove the first value of x. So why not use
x <- x[-1]
Changing x[2] to just 2 will do it.
x <- x [2:length (x)]
Because of the 'next' statement you will end up with an infinite loop, since from x[1] = 60 onward, it will keep moving to the next iteration of the while loop (forever). Maybe you wanted to use 'break'.