I have a data frame of numerics,integers and string. I would like to check which columns are integers and I do
raw<-read.csv('./rawcorpus.csv',head=F)
ints<-sapply(raw,is.integer)
anyway this gives me all false. So I have to make a little change
nums<-sapply(raw,is.numeric)
ints2<-sapply(raw[,nums],function(col){return(!(sum(col%%1)==0))})
The second case works fine. My question is: what is actually checking the 'is.integer' function?
By default, R will store all numbers as double precision floating points, i.e., the numeric. Three useful functions class, typeof and storage.mode will tell you how a value is stored. Try:
x <- 1
class(x)
typeof(x)
storage.mode(x)
If you want x to be integer 1, you should do with suffix "L"
x <- 1L
class(x)
typeof(x)
storage.mode(x)
Or, you can cast numeric to integers by:
x <- as.integer(1)
class(x)
typeof(x)
storage.mode(x)
The is.integer function checks whether the storage mode is integer or not. Compare
is.integer(1)
is.integer(1L)
You should be aware that some functions actually return numeric, even if you expect it to return integer. These include round, floor, ceiling, and mod operator %%.
From R documentation:
is.integer(x) does not test if x contains integer numbers! For that, use round, as in the function is.wholenumber(x) in the examples.
So in is.integer(x), x must be a vector and if that contains integer numbers, you will get true. In your first example, argument is a number, not a vector
Hope that helps
Source: https://stat.ethz.ch/R-manual/R-devel/library/base/html/integer.html
Related
Suppose I want to do something like:
mask_values <- function(x, mask) ifelse(mask, x, NA)
The purpose of this function is to take a vector and replace some of its values with NA based on the value of mask. However, this function doesn't guarantee that the return type is always the same as the input x. For example:
date_vec <- rep(lubridate::today(), 10)
my_mask <- rep(c(TRUE, FALSE), length.out = 10)
class(mask_values(date_vec, my_mask))
which yields "numeric" rather than the desired "Date". So I try switching to dplyr::if_else, which is supposed to preserve types:
mask_values <- function(x, mask) dplyr::if_else(mask, x, NA)
class(mask_values(date_vec, my_mask))
However, if_else also requires the input types to be the same as each other, and NA has type "logical", which means I get this error:
Error: `false` must be a `Date` object, not a logical vector.
So it seems that if I want to use if_else in order to preserve the input type, I need to be able to obtain an NA value with the same class as the input. Is there a reliable way to do this for any class? One possibility seems to be x[NA], but I'm not sure if that is a universal solution or if it just happens to work with the examples that I've tested. You can assume that the only classes that matter are "vector-like" classes for which NA values exist, such as Date and POSIXct, as well as all the basic R data types (logical, character, numeric, etc.).
Alternatively, is there another way to implement my mask_values function such that the return value always has the same type as x?
I recommend avoiding ifelse whenever possible. It is quite inefficient and as you have seen also quirky regarding what it returns (although that is well documented). I rarely use it and, if I do use it, only for interactive use and not programmatically.
The canonical and safe way of setting values to NA in base R is is.na<-. (Note that it supports logical and positional indexing. mask could also be a numeric vector.)
mask_values <- function(x, mask) {
is.na(x) <- mask
x
}
#or simply this:
#mask_values <- `is.na<-`
#i.e., `is.na<-` is already what you want.
class(mask_values(date_vec, my_mask))
#[1] "Date"
Alternatively, you can also use simple subset-assignment. NA is a logical value. (If you create it like this. It can be coerced to other types and of course you can specify it as other types with NA_real_ etc.) If you assign a logical vector into any other vector, it will be coerced to that other vector's type (because "logical" is the most primitive type).
mask_values <- function(x, mask) {
x[mask] <- NA
x
}
class(mask_values(date_vec, my_mask))
#[1] "Date"
Btw., this subset-assignment is how the is.na<-.default method is defined.
I prefer doing subset-assignment explicitly in my code but occasionally the convenience function replace can be useful.
Why is the return of dim null for lists generally?
Also why are vectors not considered 1D matrix,since the result is the same as Null with dim?
Instead of m*n ,why is the dim null in case of lists?
numbers<-c(1,2,1,1,2,3,1,1,1)
dim(numbers)<-c(3,3)
dim( lapply(numbers[1,],sum))
dim( lapply(numbers[1,],sum))
NULL
Since there are 3 lists,why is it not 3*1 ?
The default dim() function returns the dim attribute of an object, which needs to be a vector of integer values. Lists can have dim attributes, though they usually don't. For example,
x <- array(as.list(1:10), c(5, 2))
dim(x)
# [1] 5 2
typeof(x)
# [1] "list"
typeof(x[1,1])
# [1] "list"
Note that dim() is internally a generic function, so you can define S3 methods for it. For example,
x <- structure(1, class = "foobar")
dim.foobar <- function(x) 1:3
dim(x)
# [1] 1 2 3
For dataframes, the dim.data.frame method is called.
You asked why vectors aren't considered to be 1D matrices. In R notation, matrices always have 2 dimensions; things with other numbers of dimensions are arrays. The reason that vectors aren't considered to be 1D arrays is that there's not enough gained from it. Matrices are vectors with dimension, so if vectors were arrays, you'd have two dimensions to keep track of.
However, when multiplying a matrix times a vector it's useful for vectors to be considered to be either row or column matrices, i.e. arrays with two dimensions where one dimension is 1. R automatically treats vectors in the appropriate way when this is done, following the usual rules from linear algebra.
I have an integer vector that I expected I could treat as a numeric vector:
> class(pf$age)
[1] "integer"
> is.numeric(pf$age)
[1] TRUE
However, when I try to use it to calculate a correlation, I get an error:
> cor.test(x = "age", y = "friend_count", data = pf)
Error in cor.test.default(x = "age", y = "friend_count", data = pf) :
'x' must be a numeric vector
None of my best guesses at alternate syntax work either: http://pastie.org/9595290
What's going on?
Edit:
The following syntax works:
> x = pf$age
> y = pf$friend_count
> cor.test(x, y, data = pf, method="pearson", alternative="greater")
However, I don't understand why I can't specify x and y in the function (as you can with other R functions like ggplot). What is the difference between ggplot and cor.test?
You don't refer to variables using character strings like that in a function call. You want to pass to the x and y arguments numeric vectors. You passed length 1 character vectors:
> is.numeric("age")
[1] FALSE
> is.character("age")
[1] TRUE
Hence you were asking cor.test() to compute the correlation between the strings "age" and "friend_count".
You also mixed up the formula method of cor.test() with the default one. You supply a formula and a data object or you supply arguments x and y. You can't mix and match.
Two solutions are:
with(pdf, cor.test(x = age, y = friend_count))
cor.test( ~ age + friend_count, data = pf)
The first uses the default method, but we allow ourselves to refer to the variables in pf directly by using with(). The second uses the formula method.
As to your question in the title; yes, integer vectors are considered numeric in R:
> int <- c(1L, 2L)
> is.integer(int)
[1] TRUE
> is.numeric(int)
[1] TRUE
Do note #Joshua Ulrich's point in the comment below. Technically integers are slightly different to numerics in R as Joshua shows. However this difference need not concern users most of the time as R can convert/use these as needed. It does matter in some places, such as .C() calls for example.
You can use 'get' with strings to get data:
age = pf$age
friend_count = pf$friend_count
or:
attach(pf)
then following should work:
cor.test(x = get("age"), y = get("friend_count"))
I am downloading data from FRED with getSymbols. This creates an xts class object with the data attribute set to type integer for the series' that I am downloading. I want these data to be of type/class double.
What is an idiomatic way of doing this?
getSymbols("GDPMC1", src = 'FRED', auto.assign = TRUE)
growthRate <-
function (x) {
stopifnot(length(x) > 1)
(x[2:length(x)] - x[-length(x)] )/ x[-length(x)]
}
stopifnot(growthRate(c(2,3,4)) == c(0.5 , 1/3 ))
realGDPGrowthRate <- growthRate(GDPMC1) ### zeros due to integer math
You can change the storage mode for GDPMC1 to "double" via:
storage.mode(GDPMC1) <- "double"
But that won't solve your problem because the issue isn't integer arithmetic. The issue is that xts/zoo align objects by index before performing any Ops methods (arithmetic, logical operations, etc), so your growthRate function will never work correctly on xts/zoo objects.
You can use quantmod's Delt function instead of writing your own.
realGDPGrowthRate <- Delt(GDPMC1)
I have dataframe dih_y2. These two lines give me a warning:
> memb = dih_y2$MemberID[1:10]
> dih_col = which(dih_y2$MemberID == memb)
Warning message:
In dih_y2$MemberID == memb :
longer object length is not a multiple of shorter object length
Why?
You don't give a reproducible example but your warning message tells you exactly what the problem is.
memb only has a length of 10. I'm guessing the length of dih_y2$MemberID isn't a multiple of 10. When using ==, R spits out a warning if it isn't a multiple to let you know that it's probably not doing what you're expecting it to do. == does element-wise checking for equality. I suspect what you want to do is find which of the elements of dih_y2$MemberID are also in the vector memb. To do this you would want to use the %in% operator.
dih_col <- which(dih_y2$MemeberID %in% memb)
When you perform a boolean comparison between two vectors in R, the "expectation" is that both vectors are of the same length, so that R can compare each corresponding element in turn.
R has a much loved (or hated) feature called recycling, whereby in many circumstances if you try to do something where R would normally expect objects to be of the same length, it will automatically extend, or recycle, the shorter object to force both objects to be of the same length.
If the longer object is a multiple of the shorter, this amounts to simply repeating the shorter object several times. Oftentimes R programmers will take advantage of this to do things more compactly and with less typing.
But if they are not multiples, R will worry that you may have made a mistake, and perhaps didn't mean to perform that comparison, hence the warning.
Explore yourself with the following code:
> x <- 1:3
> y <- c(1,2,4)
> x == y
[1] TRUE TRUE FALSE
> y1 <- c(y,y)
> x == y1
[1] TRUE TRUE FALSE TRUE TRUE FALSE
> y2 <- c(y,2)
> x == y2
[1] TRUE TRUE FALSE FALSE
Warning message:
In x == y2 :
longer object length is not a multiple of shorter object length
I had a similar issue and using %in% operator instead of the == (equality) operator was the solution:
# %in%
Hope it helps.
I had a similar problem but it had to do with the structure and class of the object. I would check how dih_y2$MemberID is formatted.