Understanding output of the seq function - r

When using the seq function, I get the following outputs:
>seq(1,4)
1 2 3 4
and this retrieves the second element from the sequence
>seq(1,4) [2]
2
These two I understand. However, I don't understand why the following yields four NA values
>seq(1,4) [NA]
NA NA NA NA
But the below example does not initiate four "ABC" values instead just one NA
>seq(1,4) ["ABC"]
NA
Why is this happening?

What is important here is that NA is logical:
class(NA)
## [1] "logical"
and logical indexes always get recycled.
seq(1, 4)[c(TRUE, FALSE)]
## [1] 1 3
If you use an integer NA then this won't happen:
seq(1, 4)[NA_integer_]
## [1] NA

I don't think it has anything to do with seq function. If you try to subset values using NA, you get back a vector of NAs.
a <- c(1, 2)
a[NA]

Related

Subsetting rows from a data frame in R using []

To subset rows from a data frame, inserting the condition in the first part of [ , ] seems to be the reference method, and inserting this condition inside "which()" seems to be useless.
However, in the presence of missing data, why is the first method not working, while the "which method" does, as in the following example?
df <- data.frame(var1=c(1,2,3,NA,NA), var2=c(4,0,5,2,3), var3=c(1,2,3,0,6))
testvar1<-df[df$var1==3,]
testvar1.which<-df[which(df$var1==3),]
testvar1
var1
var2
var3
3
3
5
3
NA
NA
NA
NA
NA.1
NA
NA
NA
testvar1.which
var1
var2
var3
3
3
5
3
The simple answer is that which suppresses NA values by default, whereas a straightforward logical test will return a vector of the same length as the input with NA preserved. Compare:
df$var1 == 3
#> [1] FALSE FALSE TRUE NA NA
which(df$var1 == 3)
#> [1] 3
If you subset the data frame with the first result, the first two rows are dropped as expected (because they correspond to FALSE) and the third row is kept because it is TRUE, which is also expected. The last two rows are where the confusion comes in. If you subset a data frame with an NA, you don't get a NULL result, you get an NA result, which is different. The two rows at the bottom are NA rows, which you get if you subset a data frame with NA values.

NA Remove to calculation

I have some problems with NA value cause my dataset from excel is not same column number so It showed NA. It deleted all row containing NA value when make calculation Similarity Index function Psicalc in RInSp package.
B F
4 7
5 6
6 8
7 5
NA 4
NA 3
NA 2
Do you know how to handle with NA or remove it but not delete all row or not affect to package?. Beside when I import.RinSP it has message
In if (class(filename) == "character") { :
the condition has length > 1 and only the first element will be used
Thank you so much
Many R functions ( specifically base R ) have an na.rm argument, which is FALSE by default. That means if you omit this argument, and your data has NA, your "calculation" will result in NA. To remove these in the calculations, include an na.rm argument and assign it to TRUE.
Example:
x <- c(4,5,6,7,NA,NA)
mean(x) # Oops!
[1] NA
mean(x, na.rm=TRUE)
[1] 5.5

ifelse r - x and y lengths differ

I'm trying to use an ifelse on an array called "OutComes" but it's giving me some trouble.
> PersonNumber Risk_Factor OC_Death OnsetAge Clinical CS_Death Cure AC_Death
>[1,] 1 1 99.69098 NA NA NA NA NA
>[2,] 2 1 60.68009 NA NA NA NA NA
>[3,] 3 0 88.67483 NA NA NA NA NA
>[4,] 4 0 87.60846 NA NA NA NA NA
>[5,] 5 0 78.23118 NA NA NA NA NA
Now I will try to use an apply to analyse this table's Risk_Factor Column and apply one of two functions to replace the OnsetAge column's NA's.
I've been using an apply function -
apply(OutComes, 1, function(x)ifelse(OutComes[,"Risk_Factor"] == 1,
HighOnsetFunction(x), OnsetFunction(x))
However this obviously won't work as the ifelse itself won't work. the error being -
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
I'm not sure what's going on in this ifelse or what the x and y lengths are.
There is a mistake in your apply function. You are applying a function with argument x (one row of OutComes), but then whithin ifelse, you use a vector OutComes[,"Risk_Factor"] which is a column of the original matrix, not a single number. One simple solution is to do
apply(OutComes, 1, function(x) ifelse(x["Risk_Factor"] == 1,
HighOnsetFunction(x), OnsetFunction(x)))
But when dealing with a scalar, there is no real need to use ifelse, so it may be more efficient to write
apply(OutComes, 1, function(x) if (x["Risk_Factor"] == 1) HighOnsetFunction(x) else OnsetFunction(x)))

Difference between intersect and match in R

I am trying to understand the difference between match and intersect in R. Both return the same output in a different format. Are there any functional differences between both?
match(names(set1), names(set2))
# [1] NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 11
intersect(names(set1), names(set2))
# [1] "Year" "ID"
match(a, b) returns an integer vector of length(a), with the i-th element giving the position j such that a[i] == b[j]. NA is produced by default for no_match (although you can customize it).
If you want to get the same result as intersect(a, b), use either of the following:
b[na.omit(match(a, b))]
a[na.omit(match(b, a))]
Example
a <- 1:5
b <- 2:6
b[na.omit(match(a, b))]
# [1] 2 3 4 5
a[na.omit(match(b, a))]
# [1] 2 3 4 5
I just wanted to know if there any other differences between the both. I was able to understand the results myself.
Then we read source code
intersect
#function (x, y)
#{
# y <- as.vector(y)
# unique(y[match(as.vector(x), y, 0L)])
#}
It turns out that intersect is written in terms of match!
Haha, looks like I forgot the unique in the outside. Em, by setting nomatch = 0L we can also get rid of na.omit. Well, R core is more efficient than my guess.
Follow-up
We could also use
a[a %in% b] ## need a `unique`, too
b[b %in% a] ## need a `unique`, too
However, have a read on ?match. In "Details" we can see how "%in%" is defined:
"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0
So, yes, everything is written using match.

Indexing integer vector with NA

I have problems understanding this. I have an integer vector of length 5:
x <- 1:5
If I index it with a single NA, the result is of length 5:
x[NA]
# [1] NA NA NA NA NA
My first idea was that R checks whether 1-5 is NA but
x <- c(NA, 2, 4)
x[NA]
# NA NA NA.
So this cannot be the solution. My second approach is that x[NA] is indexing but then I do not understand
Why this gives me five NA's
What NA as an index means. x[1] gives you the first value but what should be the result of x[NA]?
Compare your code:
> x <- 1:5; x[NA]
[1] NA NA NA NA NA
with
> x <- 1:5; x[NA_integer_]
[1] NA
In the first case, NA is of type logical (class(NA) shows), whereas in the second it's an integer. From ?"[" you can see that in the case of i being logical, it is recycled to the length of x:
For [-indexing only: i, j, ... can be logical vectors, indicating
elements/slices to select. Such vectors are recycled if necessary to
match the corresponding extent. i, j, ... can also be negative
integers, indicating elements/slices to leave out of the selection.

Resources