I have a several bins in my data frame.
[1] "bin1" "bin2" "bin3" "bin4" "bin5" "bin6"
I have a bin number, and would like to exclude everything EXCEPT that bin and the previous bin. If bin=1, I would only like to exclude everything except bin1 (bin0 does not exist).
To produce a vector of names of bins to exclude later from my data frame, I produce:
BinsToDelete <- ifelse(i>1, paste("bin",1:6,sep="")[-((i-1):i)],paste("bin",1:6,sep="")[-i])
For ease of understanding
> i=3
> paste("bin",1:6,sep="")[-((i-1):i)]
[1] "bin1" "bin4" "bin5" "bin6"
> paste("bin",1:6,sep="")[-i]
[1] "bin1" "bin2" "bin4" "bin5" "bin6"
Weirdly an ifelse statement produces this:
> i=3
> BinsToDelete <- ifelse(i==1, paste("bin",1:6,sep="")[-i],paste("bin",1:6,sep="")[-((i-1):i)])
> BinsToDelete
[1] "bin1"
What happened there?
A normal if-else statement gives the desired results:
> if(i==1){
BinsToDelete <- paste("bin",1:6,sep="")[-i]
} else { BinsToDelete <- paste("bin",1:6,sep="")[-((i-1):i)]}
> BinsToDelete
[1] "bin1" "bin4" "bin5" "bin6"
Thanks for helping me understand how ifelse arrives to this conclusion.
From ?ifelse
Value:
A vector of the same length and attributes (including dimensions
and ‘"class"’) as ‘test’ and data values from the values of ‘yes’
or ‘no’.
In your case:
> i <- 3
> length(i)
[1] 1
So you got a length 1 output
I generally avoid ifelse when possible because I think the resulting code is aesthetically unpleasing. The fact that it removes class attributes and makes handling factors and Dates and data-times difficult is a further reason to avoid it. The grep` funciton is designed to return a vector suitable for indexed selection:
> z=3
> grep( paste0("bin",(z-1):z, collapse="|") , x)
[1] 2 3
> x[ grep( paste0("bin",(z-1):z, collapse="|") , x)]
[1] "bin2" "bin3"
> z=1
> x[ grep( paste0("bin",(z-1):z, collapse="|") , x)]
[1] "bin1"
My understanding is that the dplyr if_else function addresses some of those issues.
Related
I am implementing a replacement for the subset operator in an S3 class. I followed the advice on
How to define the subset operators for a S4 class?
However I am having a special problem. How do I distinguish in R code if someone wrote x[i] or x[i,]. In both cases, the variable j just comes back missing.
setOldClass("myclass")
'[.myclass' <- function(x, i, j, ..., drop=TRUE) {
print(missing(j))
return(invisible(NULL))
}
And as a result I get:
x <- structure(list(), class="myclass")
> x[i]
[1] TRUE
> x[i,]
[1] TRUE
> x[i,j]
[1] FALSE
I don't see a way on how to distinguish between the two. I assume the internal C code does it by looking at the length of the argument pairlist, but is there a way to do the same in native R?
Thanks!
From alexis_laz's comment:
See, perhaps, how [.data.frame handles arguments and nargs()
Inside the function call nargs() to see how many arguments were supplied, including missing ones.
> myfunc = function(i, j, ...) {
+ nargs()
+ }
>
> myfunc()
[1] 0
> myfunc(, )
[1] 2
> myfunc(, , )
[1] 3
> myfunc(1)
[1] 1
> myfunc(1, )
[1] 2
> myfunc(, 1)
[1] 2
> myfunc(1, 1)
[1] 2
This should be enough to help you figure out which arguments were passed in the same fashion as [.data.frame.
I have a list of 310 data.frames, mrns[[i]], that I am subsetting based on the value of a factor, mrns[[i]]$ar.cat. I am able to use subset on them all in a way that those data.frames that don't match the condition are left with 0 observations, but I would like the code to just remove these data.frames rather than leave them in the new list as empty.
My code is:
arlow <- lapply(mrns, function(x) subset(x, x$ar.cat[1] == "Arousals Index: LOW"))
Which gives me:
length(arlow)
[1] 310
When I see the contents of the arlow list, I see this for the data.frames that don't meet the condition:
[[98]]
[1] raw.Number raw.Reading_Status raw.Month raw.Day raw.Year
[6] raw.Hour raw.Minute raw.Systolic raw.Diastolic raw.MAP
[11] raw.PP raw.HR raw.Event_Code raw.Edit_Status raw.Diary_Activity
[16] na.strings raw.facility raw.lastname raw.firstname raw.id
[21] raw.hookup raw.datetime raw.mrn unis ar.value
[26] ar.cat ID avg.hr.prhr avg.sys.prhr avg.dias.prhr
[31] avg.map.prhr avg.pp.prhr time time_60 raw.Minutee
<0 rows> (or 0-length row.names)
Let's say that the x$ar.cat[1] == "Arousals Index: LOW" condition is only met in 180 of my 310 mrns[[i]] data.frames, I would want the result of length(arlow) to equal 180.
Anyone have any suggestions on how to remove those data.frames not matching the condition?
Thanks!
How about that
arlow <- lapply( lapply(mrns, function(x) subset(x, x$ar.cat[1] == "Arousals Index: LOW")), function(y) nrow(y) >0)
first you filter that you did and then take frames only with data.
So you want to remove the NULLs from arlow?
Try:
arlow <- arlow[[!is.null(arlow)]]
As in:
lst <- list(data.frame(x=1:10,y=rnorm(10)), NULL, data.frame(x=1:10,y=rnorm(10)))
length(lst)
# [1] 3
result <- lst[[!is.null(lst)]]
length(result)
# [1] 2
Here's another way:
result <- Filter(Negate(is.null), lst)
length(result)
# [1] 2
edit: Actually, my answer does not make much sense. I did not do the subsetting in each data frame that you want. I still think which() is useful to subset without NA and NULL values, though.
mrns[which(sapply(1:length(mrns), function(x) mrns[x]$ar.cat == "Arousals Index: LOW"))]
This solution tests if the category (ar.cat) has the answer "Arousals Index: LOW" for each data frame in your list of data frames. The resulting vector should have 320 elements, where elements that met the condition are true.
Now we use which() to get the indices of the true values. These indices should ignore any NULL or NA values that occur in the vector we produced.
As a last step we subset the list of data frames with the indices we want.
Thanks everyone for your responses! I found the added the following code and gave me what I was looking for.
> arlow <- arlow[sapply(arlow, function(x) dim(x)[1]) > 0]
> length(arlow)
[1] 103
I want to compare two vectors but it is not working, kindly tell me how two vectors can be compared:
x <- c(1,2,3,4)
y <- c(5,6,7,8)
if (x==y) print("same") else print("different")
Use all can work here.
> all(x==y)
[1] FALSE
> y1=c(5,6,7,8)
> all(y==y1)
[1] TRUE
EDIT
best is to use isTRUE(all.equal(x,y)) to avoid recycling
recycling
> x=c(5,6,5,6)
> y=c(5,6)
> all(x==y)
[1] TRUE
better way
> isTRUE(all.equal(x,y))
[1] FALSE
> isTRUE(all.equal(y,y1))
[1] TRUE
> x=c(5,6,5,6)
> y=c(5,6)
>isTRUE(all.equal(x,y))
[1] FALSE
When it comes to array comparison, all and any are your friends. If you do not really mean geometric vector but array of values, sort should also be necessary:
> all(sort(x)==sort(y))
Try:
x <- c(1,2,3,4)
y <- c(5,6,7,8)
if(identical(x,y)) print("identical") else print("not identical")
Let's say I have a vector where I've set a few attributes:
vec <- sample(50:100,1000, replace=TRUE)
attr(vec, "someattr") <- "Hello World"
When I subset the vector, the attributes are dropped. For example:
tmp.vec <- vec[which(vec > 80)]
attributes(tmp.vec) # Now NULL
Is there a way to, subset and persist attributes without having to save them to another temporary object?
Bonus: Where would one find documentation of this behaviour?
I would write a method for [ or subset() (depending on how you are subsetting) and arrange for that to preserve the attributes. That would need a "class" attribute also adding to your vector so that dispatch occurs.
vec <- 1:10
attr(vec, "someattr") <- "Hello World"
class(vec) <- "foo"
At this point, subsetting removes attributes:
> vec[1:5]
[1] 1 2 3 4 5
If we add a method [.foo we can preserve the attributes:
`[.foo` <- function(x, i, ...) {
attrs <- attributes(x)
out <- unclass(x)
out <- out[i]
attributes(out) <- attrs
out
}
Now the desired behaviour is preserved
> vec[1:5]
[1] 1 2 3 4 5
attr(,"someattr")
[1] "Hello World"
attr(,"class")
[1] "foo"
And the answer to the bonus question:
From ?"[" in the details section:
Subsetting (except by an empty index) will drop all attributes except names, dim and dimnames.
Thanks to a similar answer to my question #G. Grothendieck, you can use collapse::fsubset see here.
library(collapse)
#tmp_vec <- fsubset(vec, vec > 80)
tmp_vec <- sbt(vec, vec > 80) # Shortcut for fsubset
attributes(tmp_vec)
# $someattr
# [1] "Hello World"
I manage to do the following:
stuff <- c("banana_fruit","apple_fruit","coin","key","crap")
fruits <- stuff[stuff %in% grep("fruit",stuff,value=TRUE)]
but I can't get select the-not-so-healthy stuff with the usual thoughts and ideas like
no_fruit <- stuff[stuff %not in% grep("fruit",stuff,value=TRUE)]
#or
no_fruit <- stuff[-c(stuff %in% grep("fruit",stuff,value=TRUE))]
don't work. The latter just ignores the "-"
> stuff[grep("fruit",stuff)]
[1] "banana_fruit" "apple_fruit"
> stuff[-grep("fruit",stuff)]
[1] "coin" "key" "crap"
You can only use negative subscripts with numeric/integer vectors, not logical because:
> -TRUE
[1] -1
If you want to negate a logical vector, use !:
> !TRUE
[1] FALSE
As Joshua mentioned: you can't use - to negate your logical index; use ! instead.
stuff[!(stuff %in% grep("fruit",stuff,value=TRUE))]
See also the stringr package for this kind of thing.
stuff[!str_detect(stuff, "fruit")]
There is also a parameter called 'invert' in grep that does essentially what you're looking for:
> stuff <- c("banana_fruit","apple_fruit","coin","key","crap")
> fruits <- stuff[stuff %in% grep("fruit",stuff,value=TRUE)]
> fruits
[1] "banana_fruit" "apple_fruit"
> grep("fruit", stuff, value = T)
[1] "banana_fruit" "apple_fruit"
> grep("fruit", stuff, value = T, invert = T)
[1] "coin" "key" "crap"