Replace Numeric values with NA - r

I am writing a function that takes in some vector and checks if it is numeric. If it is false, it will replace the numeric values with NA in the else statement. I have tried using the is.numeric() function in many ways, without much luck. Any help would be appreciated!
test <- function(x){
if(is.numeric(x) == TRUE){
mean.x <- mean(x)
vectorlist <- list(mean.x)
}
else
return(vectorlist)
}
x <- c("a", 1, 2)
test(x)

It sounds like you're looking for a function roughly like this:
test <- function(x){
if(is.numeric(x)){
return(mean(x))
}
else{
x[!is.na(as.numeric(x))] <- NA
return(x)
}
}
x <- c("a", 1, 2)
test(x)
Note that if (is.numeric(x)) is sufficient, you don't need the == TRUE stuff in the if clause.

Related

function assignment error in a vector(R language)

i need to find the number 35 in x and assign it a function.
Then call the function.
code:
x <- 1:100
z <- 0
z[x == 35] <- function() { # error here
print("hello")
}
z <- max(z, na.rm=TRUE) # remove all NA in vector
z() # run it
error:
Error in z[x == 35] <- function() { :
incompatible types (from closure to double) in subassignment type fix
Thanks!!
Quite why you'd want to do it, but...
x <- 1:100
z <- list()
z[[which(x == 35)]] <- function() {
print("hello")
}
z[[which(x == 35)]]() # run it
[1] "hello"
The key is to use a list, not a vector.
Do you mean something like this?
z <- function(x) {
print("hello")
max(x, na.rm=TRUE) # remove all NA in vector
}
for (x in 1:100){
if (x == 35){
z(x)
}
}

Unknown usages of functions in source code of predict.naiveBayes method in e1071 package

I don't understand, how method predict.naiveBayes works, if there are two misspelled usages of functions, i.e., isnumeric[attribs[v]] and islogical[attribs[v]].
In my opinion, there should be is.numeric(attribs[v]) and is.logical(attribs[v]), respectively.
Code below:
...
L <- sapply(1:nrow(newdata), function(i) {
ndata <- newdata[i, ]
L <- log(object$apriori) + apply(log(sapply(seq_along(attribs),
function(v) {
nd <- ndata[attribs[v]]
if (is.na(nd)) rep(1, length(object$apriori)) else {
prob <- if (isnumeric[attribs[v]]) {
msd <- object$tables[[v]]
msd[, 2][msd[, 2] <= eps] <- threshold
dnorm(nd, msd[, 1], msd[, 2])
} else object$tables[[v]][, nd + islogical[attribs[v]]]
prob[prob <= eps] <- threshold
prob
}
})), 1, sum)
if (type == "class")
L
else {
## Numerically unstable:
## L <- exp(L)
## L / sum(L)
## instead, we use:
sapply(L, function(lp) {
1/sum(exp(L - lp))
})
}
})
...
Everything works fine, when I am using naive Bayes classifier from package, but it is rather strange, due to these inconsistencies. Can anyone explain me my doubts?
Just two lines above your code excerpt there is basically what you expect:
isnumeric <- sapply(newdata, is.numeric)
islogical <- sapply(newdata, is.logical)
That is, isnumeric and islogical are not functions, they are validly defined logical vectors.

Change length.out in ifelse function

I'm running a simple ifelse function
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
where shift is from the data.table package
which allows me to change, for each column in a dataframe (usig apply), a value which is exactly the same as the previous one. The problem is that the ifelse function returns a length which is equal to the length of the test. In this case, the length is the one of shift(x) and not x. Therefore I end up with the first element (or the last, if using type = "lead", instead of the default "lag") of each column turned into NA.
Here a MWE:
a <- c(1,2,2,3,4,5,6)
b <- c(4,5,6,7,8,8,9)
data <- data.frame(cbind(a,b))
f <- function(x) {
ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
}
apply(data, 2, f)
Therefore I thought I could change the ifelse function: I've done a few attempts to change the length.out but I haven't succeeded yet
function (test, yes, no)
{
if (is.atomic(test)) {
if (typeof(test) != "logical")
storage.mode(test) <- "logical"
if (length(test) == 1 && is.null(attributes(test))) {
if (is.na(test))
return(NA)
else if (test) {
if (length(yes) == 1 && is.null(attributes(yes)))
return(yes)
}
else if (length(no) == 1 && is.null(attributes(no)))
return(no)
}
}
else test <- if (isS4(test))
methods::as(test, "logical")
else as.logical(test)
ans <- test
ok <- !(nas <- is.na(test))
if (any(test[ok]))
ans[test & ok] <- rep(yes, length.out = length(ans))[test &
ok]
if (any(!test[ok]))
ans[!test & ok] <- rep(no, length.out = length(ans))[!test &
ok]
ans[nas] <- NA
ans
}
EDIT
My original code was:
copy <- copy(data)
for (j in 1: ncol(copy)) {
for (i in 2: nrow(copy)) {
if (copy[i,j] == copy[i-1,j] & !is.na(copy[i,j]) & !is.na(copy[i-1,j])) {
copy[i,j] <- copy[i-1,j] + (0.0001*sd(copy[,j], na.rm = T))
}
}
}
but using it with large matrices may cause slow running time. This deals with multiple repetitions.
The goal was to get to a vectorised, quicker method using a function and apply.
As you mention, your approach leads to a NA in the first element of the vector returned by f. This first element is not similar to the previous (since there is none), so we would like to have the first value unchanged.
A straightforward approach is to do just that. Apologies, it does not answer your title question although it does solve your problem.
f <- function(x) {
# storing the output of ifelse in a variable
out <- ifelse(x==shift(x), x + 0.001* sd(x, na.rm = TRUE), x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
Note that this will not take care properly of elements repeated more than twice (e.g. c(1,2,2,2,3)). Also, this will change all your element the same way. So in c(1,2,2,1,2,2), all the second twos will be changed the same way. This may or mat not be something you want.
You could hack something (a comment suggests ?rle), but I suggest changing the way you randomize your data, if this makes sense with your particular data.
Instead of adding 0.001*sd, maybe you could add a gaussian noise with this standard dev? This depends on your application obviously.
f <- function(x) {
# adding gaussian noise with small sd to repeated values
# storing the output in a variable `out`
out <- ifelse(x==shift(x),
x + rnorm(length(x), mean=0,
sd=0.01*sd(x, na.rm = TRUE)),
x)
# changing the first element of `out` into first element of x
out[1] <- x[1]
# returning `out` -- in a R function,
# the last thing evaluated is returned
out
}
It depends on what is your purpose for getting rid of exact duplicated values.

function not filling empty vector in R

I'm to get the vector simualted_results to take values returned by "simulation," which produced a vector of varying length depending on the iteration.
Initially I have this code which works, but is very slow:
simulated_results<-NULL
while(as.numeric(Sys.time())-start<duration){
simulated_results <- cbind(simulated_results,simulation(J,4* (length(J)^2),0.0007,duration,start))
}
But its very slow so I modified it:
start<-as.numeric(Sys.time())
duration<-10
simulated_results<-NULL
simulated_results <- cbind(simulated_results,
replicate(n=10000,expr=(while(as.numeric(Sys.time())-start<duration)
{simulation(J,4*(length(J)^2),0.0007,duration,start)})))
Now with the new code, my problem is that despite everything running, I cant get the results of simulation to be passed to simualted_results, instead simualted_results jsut takes on a column vector of NULL values
I get no error messages
I would greatly appreciate any help!!
for reference the simulation code is:
iter<-as.numeric(Sys.getenv("PBS_ARRAY_INDEX"))
if(iter <= 40){J<-1:500
}else if(iter <= 80){J<-1:1500
}else if(iter <= 120){J<-1:2500
}else if(iter <= 160){J<-1:5000}
set.seed(iter)
simulation <- function(J,gens,v=0.1,duration,start){
species_richness <- function(J){
a <- table(J)
return(NROW(a))
}
start<-as.numeric(Sys.time())
species_richness_output <- rep(NA,gens)
for(rep in 1:gens){
if (as.numeric(Sys.time())-start<duration){
index1 <- sample(1:length(J),1)
if(runif(1,0,1) < v){
J[index1] <- (rep+100)
}
else{
index2 <- sample(1:length(J),1)
while(index1==index2) {
index2 <- sample(1:length(J),1)
}
J[index1] <- J[index2]
}
species_richness_output[rep] <- species_richness(J)} else break
}
species_abundance <- function(J){
a <- table(J)
return(a)
}
abuntable <- species_abundance(J)
octaves <- function(abuntable)
{
oct<-rep(0,floor(log2(length(J))+1))
for(i in 1:length(abuntable)){
oct2 <- floor(log2(abuntable[i])+1)
oct[oct2] <- oct[oct2]+1
}
return(oct)
}
octaves(abuntable)
}
I agree with #Nathan G, but something did catch my attention: You are trying to cbind two things that cannot be bound together, since they have different dimensions. We don't know what kind of data type your simulation function returns, but it clearly is not NULL. Consider this:
df1 <- NULL
df2 <- data.frame(x = 1:10, y = 11:20)
cbind(df1, df2)
cbind(df2, df1)
Both cbind statements give errors. Do you get an error? If this is what's going on, you should initialize simulated_results not as NULL but as an empty version of whatever the function simulation returns.
EDIT
iter = 10
set.seed(iter)
J <- 1:1500
# critical to preallocate the list size for speed
res <- vector("list", iter)
for (i in 1: iter) {
res[[i]] <- simulation(J,4* (length(J)^2),0.0007,duration = 10,start)
}
str(res)
res[[1]]
Now I don't think I'm using this quite the way you ultimately intend, but perhaps this will give you enough to get to what you actually want.

while loop construction for use with `apply`

I have a data.frame df like this:
df <- data.frame (x=1:5,y=1:5)
I want to use an apply function row-wise to this data frame, where I check a condition that is dependent on both the x and y and then change the x and y elements until they meet my condition. In this example if x and y don't add up to 8 I keep picking new random numbers for them and try again.
I figured an apply function using a while loop would be best. So I tried the following:
checkchange <- function(x) while(x[1] + x[2] < 8)
{
x[1] <- sample(5,1)
x[2] <- sample(5,1)
return(cbind(x[1],x[2]))
}
I would then plan on doing this:
newdf <- apply(df,1,checkchange)
This doesn't work. Should I be using repeat and a break or do I need to specify a return value more clearly. while loop grammar help greatly appreciated.
You are missing the curly braces for your anonymous function
This works for me:
checkchange <- function(x)
{
while((x[1] + x[2]) < 8)
{
x[1] <- sample(5,1)
x[2] <- sample(5,1)
}
return(cbind(x[1],x[2]))
}
As #Nico pointed out, the function will work with additional braces.
checkchange <- function(x) {
while (sum(x) < 8) {
# no need to sample from 1:5 if the sum has to be at least 8
x <- sample(3:5, 2, TRUE)
}
return(x)
}
The output of the apply needs to transposed to match the original arrangement of the data.
t(apply(df, 1, checkchange))
By the way, you don't need a loop for the function:
checkchange <- function(x) {
if (sum(x) < 8) {
x[1] <- sample(3:5, 1)
x[2] <- ifelse(x[1] == 3, 5, sample(seq(8 - x[1], 5), 1))
}
return(x)
}

Resources