Replace integer(0) by NA - r

I have a function that I apply to a column and puts results in another column and it sometimes gives me integer(0) as output. So my output column will be something like:
45
64
integer(0)
78
How can I detect these integer(0)'s and replace them by NA? Is there something like is.na() that will detect them ?
Edit: Ok I think I have a reproducible example:
df1 <-data.frame(c("267119002","257051033",NA,"267098003","267099020","267047006"))
names(df1)[1]<-"ID"
df2 <-data.frame(c("257051033","267098003","267119002","267047006","267099020"))
names(df2)[1]<-"ID"
df2$vals <-c(11,22,33,44,55)
fetcher <-function(x){
y <- df2$vals[which(match(df2$ID,x)==TRUE)]
return(y)
}
sapply(df1$ID,function(x) fetcher(x))
The output from this sapply is the source of the problem.
> str(sapply(df1$ID,function(x) fetcher(x)))
List of 6
$ : num 33
$ : num 11
$ : num(0)
$ : num 22
$ : num 55
$ : num 44
I don't want this to be a list - I want a vector, and instead of num(0) I want NA (note in this toy data it gives num(0) - in my real data it gives (integer(0)).

Here's a way to (a) replace integer(0) with NA and (b) transform the list into a vector.
# a regular data frame
> dat <- data.frame(x = 1:4)
# add a list including integer(0) as a column
> dat$col <- list(45,
+ 64,
+ integer(0),
+ 78)
> str(dat)
'data.frame': 4 obs. of 2 variables:
$ x : int 1 2 3 4
$ col:List of 4
..$ : num 45
..$ : num 64
..$ : int
..$ : num 78
# find zero-length values
> idx <- !(sapply(dat$col, length))
# replace these values with NA
> dat$col[idx] <- NA
# transform list to vector
> dat$col <- unlist(dat$col)
# now the data frame contains vector columns only
> str(dat)
'data.frame': 4 obs. of 2 variables:
$ x : int 1 2 3 4
$ col: num 45 64 NA 78

Best to do that in your function, I'll call it myFunctionForApply but that's your current function. Before you return, check the length and if it is 0 return NA:
myFunctionForApply <- function(x, ...) {
# Do your processing
# Let's say it ends up in variable 'ret':
if (length(ret) == 0)
return(NA)
return(ret)
}

Related

Apply na.locf to multiple datasets

I have multiple datasets (Eg: data01, data02..). In all these datasets, I want to apply na.locf to var1, and create a new variable 'var2' from the locf applied 'var1'. I tried using the following code:
L=list(data01,data02)
for (i in L){i$var2 <- na.locf(i$var1)}
However, when I try to read the locf column using code:
head(data01$var2)
The result given is NULL.
There are a few problems:
in the question i is a copy of each data frame so L is not changed. Index into L to ensure that it is the data frame in L that is changed.
use na.locf0 or equivalently na.locf(..., na.rm = FALSE) to ensure that the output is the same length as the input
the data01 and data02 in L are copies of data01 and data02 and modifying one does not modify the other. That is why you get NULL.
Using the built-in BOD data frame to construct sample input:
library(zoo)
# construct sample input
BOD1 <- BOD2 <- BOD
BOD1$Time[c(1, 3)] <- BOD2$Time[c(3, 5)] <- NA
L <- list(BOD1, BOD2)
for(i in seq_along(L)) L[[i]]$Time2 <- na.locf0(L[[i]]$Time)
giving:
str(L)
List of 2
$ :'data.frame': 6 obs. of 3 variables:
..$ Time : num [1:6] NA 2 NA 4 5 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..$ Time2 : num [1:6] NA 2 2 4 5 7
..- attr(*, "reference")= chr "A1.4, p. 270"
$ :'data.frame': 6 obs. of 3 variables:
..$ Time : num [1:6] 1 2 NA 4 NA 7
..$ demand: num [1:6] 8.3 10.3 19 16 15.6 19.8
..$ Time2 : num [1:6] 1 2 2 4 4 7
..- attr(*, "reference")= chr "A1.4, p. 270"
Any of these would also work and instead of modifying L produce a new list:
L2 <- lapply(L, function(x) { x$Time2 <- na.locf0(x$Time); x })
L3 <- lapply(L, transform, Time2 = na.locf0(Time))
If your aim is to modify BOD1 and BOD2 as opposed to creating a list with the modified BOD1 and BOD2 then the following would do that (although it is usually better to organize objects in a list if you intend to iterate over them) rather than leave them loose in the global environment.
nms <- c("BOD1", "BOD2")
for(nm in nms) assign(nm, transform(get(nm), Time2 = na.locf0(Time)))
or
nms <- c("BOD1", "BOD2")
for(nm in nms) .GlobalEnv[[nm]]$Time2 <- na.locf0(.GlobalEnv[[nm]]$Time2)
or other variations.

Extract multiple objects from list in R

I have some output from the vegan function specaccum. It is a list of 8 objects of varying lengths;
> str(SPECIES)
List of 8
$ call : language specaccum(comm = PRETEND.DATA, method = "rarefaction")
$ method : chr "rarefaction"
$ sites : num [1:5] 1 2 3 4 5
$ richness : num [1:5] 20.9 34.5 42.8 47.4 50
$ sd : num [1:5] 1.51 2.02 1.87 1.35 0
$ perm : NULL
$ individuals: num [1:5] 25 50 75 100 125
$ freq : num [1:50] 1 2 3 2 4 3 3 3 4 2 ...
- attr(*, "class")= chr "specaccum"
I want to extract three of the lists ('richness', 'sd' and 'individuals') and convert them to columns in a data frame. I have developed a workaround;
SPECIES.rich <- data.frame(SPECIES[["richness"]])
SPECIES.sd <- data.frame(SPECIES[["sd"]])
SPECIES.individuals <- data.frame(SPECIES[["individuals"]])
SPECIES.df <- cbind(SPECIES.rich, SPECIES.sd, SPECIES.individuals)
But this seems clumsy and protracted. I wonder if anyone could suggest a neater solution? (Should I be looking at something with lapply??) Thanks!
Example data to generate the specaccum output;
Set.Seed(100)
PRETEND.DATA <- matrix(sample(0:1, 250, replace = TRUE), 5, 50)
library(vegan)
SPECIES <- specaccum(PRETEND.DATA, method = "rarefaction")
We can concatenate the names in a vector and extract it
SPECIES.df <- data.frame(SPECIES[c("richness", "sd", "individuals")])
Another alternative, similar to akrun, is:
ctoc1 = as.data.frame(cbind(SPECIES$richness, SPECIES$sd, SPECIES$individuals))
Please note that in both cases (my answer and akrun) you will get an error if the lengths of the columns do not match.
e.g.: SPECIES.df <- data.frame(SPECIES[c( "sd", "freq")])
Error in data.frame(richness = c(20.5549865665613, 33.5688503093388, 41.4708434700877, :
arguments imply differing number of rows:7, 47
If so, remember to use length() function :
length(SPECIES$sd) <- 47 # this will add NAs to increase the column length.
SPECIES.df <- data.frame(SPECIES[c("sd", "freq")])
SPECIES.df # dataframe with 2 columns and 7 rows.

Why does mutate change the variable type?

activity <- mutate(
activity, steps = ifelse(is.na(steps), lookup_mean(interval), steps))
The "steps" variable changes from an int to a list. I want it to stay an "int" so I can aggregate it (aggregate is failing because it is a list type).
Before:
> str(activity)
'data.frame': 17568 obs. of 3 variables:
$ steps : int NA NA NA NA NA NA NA NA NA NA ...
$ date : Factor w/ 61 levels "2012-10-01","2012-10-02",..: 1 1 1 1 1 1 1 1 1 1 ...
$ interval: int 0 5 10 15 20 25 30 35 40 45 ...
After:
> str(activity)
'data.frame': 17568 obs. of 3 variables:
$ steps :List of 17568
..$ : num 1.72
..$ : num 1.72
Lookup mean is defined here:
lookup_mean <- function(i) {
return filter(daily_activity_pattern, interval == 0) %>% select(steps)
}
The problem is that lookup_mean returns a list, so R casts each value in activity$steps to a list. lookup_mean should be:
lookup_mean <- function(i) {
interval <- filter(daily_activity_pattern, interval == 0) %>% select(steps)
return(interval$steps)
}

Error: all entries of 'x' must be nonnegative and finite in fisher.test

I am running a Fisher Exact test on some contingency matrix in R. However, using this code:
for (class in 1:5) {
for (test in c("amp", "del")) {
prefisher <- read.table("prefisher.txt", sep="\t", row.names=1)
for (gene in rownames(prefisher)) {
genemat <- matrix(prefisher[gene,], ncol=2)
print(genemat)
result <- fisher.test(genemat)
write(paste(gene, result$estimate, result$p.value, sep = "\t"), "")
}
}
}
I am getting the following error:
[,1] [,2]
[1,] 1 0
[2,] 101 287
Error in fisher.test(genemat): all entries of 'x' must be nonnegative and finite
As you can see, the matrix genemat is nonnegative and finite.
str(genemat) returns:
List of 4
$ : int 1
$ : int 101
$ : int 0
$ : int 287
- attr(*, "dim")= int [1:2] 2 2
What am I doing wrong?
Thanks
You use matrix on a one-row data.frame, which results in a list with dimension attribute (i.e., a special kind of matrix). That's not what you intended. Use unlist to make the data.frame row an atomic vector first:
DF <- data.frame(a = 1, b = 101, c = 0, d = 287)
m <- matrix(DF, 2)
str(m)
# List of 4
# $ : num 1
# $ : num 101
# $ : num 0
# $ : num 287
# - attr(*, "dim")= int [1:2] 2 2
fisher.test(m)
#Error in fisher.test(m) :
# all entries of 'x' must be nonnegative and finite
m <- matrix(unlist(DF), 2)
fisher.test(m)
#no error

daply: Correct results, but confusing structure

I have a data.frame mydf, that contains data from 27 subjects. There are two predictors, congruent (2 levels) and offset (5 levels), so overall there are 10 conditions. Each of the 27 subjects was tested 20 times under each condition, resulting in a total of 10*27*20 = 5400 observations. RT is the response variable. The structure looks like this:
> str(mydf)
'data.frame': 5400 obs. of 4 variables:
$ subject : Factor w/ 27 levels "1","2","3","5",..: 1 1 1 1 1 1 1 1 1 1 ...
$ congruent: logi TRUE FALSE FALSE TRUE FALSE TRUE ...
$ offset : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 5 5 1 2 5 5 2 2 3 5 ...
$ RT : int 330 343 457 436 302 311 595 330 338 374 ...
I've used daply() to calculate the mean RT of each subject in each of the 10 conditions:
myarray <- daply(mydf, .(subject, congruent, offset), summarize, mean = mean(RT))
The result looks just the way I wanted, i.e. a 3d-array; so to speak 5 tables (one for each offset condition) that show the mean of each subject in the congruent=FALSE vs. the congruent=TRUE condition.
However if I check the structure of myarray, I get a confusing output:
List of 270
$ : num 417
$ : num 393
$ : num 364
$ : num 399
$ : num 374
...
# and so on
...
[list output truncated]
- attr(*, "dim")= int [1:3] 27 2 5
- attr(*, "dimnames")=List of 3
..$ subject : chr [1:27] "1" "2" "3" "5" ...
..$ congruent: chr [1:2] "FALSE" "TRUE"
..$ offset : chr [1:5] "1" "2" "3" "4" ...
This looks totally different from the structure of the prototypical ozone array from the plyr package, even though it's a very similar format (3 dimensions, only numerical values).
I want to compute some further summarizing information on this array, by means of aaply. Precisely, I want to calculate the difference between the congruent and the incongruent means for each subject and offset.
However, already the most basic application of aaply() like aaply(myarray,2,mean) returns non-sense output:
FALSE TRUE
NA NA
Warning messages:
1: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
2: In mean.default(piece, ...) :
argument is not numeric or logical: returning NA
I have no idea, why the daply() function returns such weirdly structured output and thereby prevents any further use of aaply. Any kind of help is kindly appreciated, I frankly admit that I have hardly any experience with the plyr package.
Since you haven't included your data it's hard to know for sure, but I tried to make a dummy set off your str(). You can do what you want (I'm guessing) with two uses of ddply. First the means, then the difference of the means.
#Make dummy data
mydf <- data.frame(subject = rep(1:5, each = 150),
congruent = rep(c(TRUE, FALSE), each = 75),
offset = rep(1:5, each = 15), RT = sample(300:500, 750, replace = T))
#Make means
mydf.mean <- ddply(mydf, .(subject, congruent, offset), summarise, mean.RT = mean(RT))
#Calculate difference between congruent and incongruent
mydf.diff <- ddply(mydf.mean, .(subject, offset), summarise, diff.mean = diff(mean.RT))
head(mydf.diff)
# subject offset diff.mean
# 1 1 1 39.133333
# 2 1 2 9.200000
# 3 1 3 20.933333
# 4 1 4 -1.533333
# 5 1 5 -34.266667
# 6 2 1 -2.800000

Resources