I have an xts object. Thus a time series of "outstanding share" of a company that are ordered by date.
I want to multiply the time series of "outstanding shares" by the factor 7 in order to account for a stock split.
> outstanding_shares_xts <- shares_xts1[,1]
> adjusted <- outstanding_shares_xts*7
Error: Non-numeric argument for binary operator.
The ts "oustanding_shares_xts" is a column of integers.
Does anyone has an idea??
My guess is that they may look like integers but are in fact not.
Sleuthing:
I initially thought it could be [-vs-[[ column subsetting, since tibble(a=1:2)[,1] does not produce an integer vector (it produces a single-column tibble), but tibble(a=1:2)[,1] * 7 still works.
Then I thought it could be due to factors, but it's a different error:
data.frame(a=factor(1:2))[,1]*7
# Warning in Ops.factor(data.frame(a = factor(1:2))[, 1], 7) :
# '*' not meaningful for factors
# [1] NA NA
One possible is that you have character values that look like integers.
dat <- data.frame(a=as.character(1:2))
dat
# a
# 1 1
# 2 2
dat[,1]*7
# Error in dat[, 1] * 7 : non-numeric argument to binary operator
Try converting that column to integer, something like
str(dat)
# 'data.frame': 2 obs. of 1 variable:
# $ a: chr "1" "2"
dat$a <- as.integer(dat$a)
str(dat)
# 'data.frame': 2 obs. of 1 variable:
# $ a: int 1 2
dat[,1]*7
# [1] 7 14
Related
I have a data frame in which there is column that I want to use to join with another data frame. The column contains number as string and strings such as follows:
x<-data.frame(referenceNumber=c("80937828","gdy","12267133","72679267","72479267"))
How Can I convert the numbers as string to numeric and replace the strings with zeros/null?
I tried x %>% mutate_if(is.character,as.numeric)
But it returns the following error :
"Error in UseMethod("tbl_vars") :
no applicable method for 'tbl_vars' applied to an object of class "character""
We could try just using as.numeric, which would assign NA to any non numeric entry in the vector. Then, we can selectively replace the NA values with zero:
x <- c("80937828","gdy","12267133","72679267","72479267")
output <- as.numeric(x)
output[is.na(output)] <- 0
output
[1] 80937828 0 12267133 72679267 72479267
Edit based on the comment by #Sotos: If the column/vector is actually factor, then it would have to be cast to character in order for my answer above to work.
I'd check for NAs in an ifelse construction:
x<-data.frame(referenceNumber=c("80937828","gdy","12267133","72679267","72479267"), stringsAsFactors = F)
x$referenceNumber <- ifelse(!is.na(as.numeric(x$referenceNumber)), x$referenceNumber, 0)
This only works if your strings are not factors. Otherwise you need to add as.character first.
Probably due the referenceNumber is factor:
x<-data.frame(referenceNumber=c("80937828","gdy","12267133","72679267","72479267"), stringsAsFactors=F)
str(x)
#'data.frame': 5 obs. of 1 variable:
# $ referenceNumber: chr "80937828" "gdy" "12267133" "72679267" ...
xx<-x %>% mutate_if(is.character,as.numeric)
#Warning message:
#In evalq(as.numeric(referenceNumber), <environment>) :
# NAs introduced by coercion
xx
# referenceNumber
#1 80937828
#2 NA
#3 12267133
#4 72679267
#5 72479267
str(xx)
#'data.frame': 5 obs. of 1 variable:
# $ referenceNumber: num 80937828 NA 12267133 72679267 72479267
I'm curious about data frame behavior from read.csv for the purposes of doing some data integrity checks to fail early in some algorithm work we're doing. Is it true that the default behavior for loading up a data frame from a csv file will only recognize as factors those columns holding character data? In other words can anything else also be recognized as a factor by default? I'm guessing not but the documentation I'm looking at only speaks of the relation of character data to factors but no other types and makes me weary that I may be making the converse error.
R- data.frame Documentation
stringsAsFactors
logical: should character vectors be converted to factors? The
‘factory-fresh’ default is TRUE, but this can be changed by setting options(stringsAsFactors = FALSE).
Basically the check I'm intending will go something like
if ( any( sapply( myCsvDataFrame, class ) == "factor" ) ) {
stop("DataIntegrityError--dataframe contains character data")
}
Further documentation seems to support my guess:
Unless colClasses is specified, all columns are read as character
columns and then converted using type.convert to logical, integer,
numeric, complex or (depending on as.is) factor as appropriate. Quotes
are (by default) interpreted in all fields, so a column of values like
"42" will result in an integer column.
So this explains more of the behavior
as.is the default behavior of read.table is to convert character
variables (which are not converted to logical, numeric or complex) to
factors. The variable as.is controls the conversion of columns not
otherwise specified by colClasses. Its value is either a vector of
logicals (values are recycled if necessary), or a vector of numeric or
character indices which specify which columns should not be converted
to factors.
Note: to suppress all conversions including those of numeric columns,
set colClasses = "character".
Note that as.is is specified per column (not per variable) and so
includes the column of row names (if any) and any columns to be
skipped.
What I'm taking away from all this is that R first loads everything as characters (which makes sense in the CSV context, being just a flat text file right) and then attempts to coerce/convert certain columns into numeric/logical types and only where such conversion was unsuccessful are left columns which remain as character data which are subsequently stored within factors to become what we see in the resulting data frame.
Building on Richard Scriven's comment, read.table (and its wrapper functions) can create a data.frame with five types of columns:
Logical
Integer
Numeric
Character, or factor (depending on the stringsAsFactors argument/option)
Complex
Here's a simple example showing these five types of data being read in:
str(read.csv(text = "a,b,c,d,e
TRUE,1,4.0,a,1i
FALSE,2,5.5,b,2i
TRUE,3,6.0,c,3i", header = TRUE))
# 'data.frame': 3 obs. of 5 variables:
# $ a: logi TRUE FALSE TRUE
# $ b: int 1 2 3
# $ c: num 4 5.5 6
# $ d: Factor w/ 3 levels "a","b","c": 1 2 3
# $ e: cplx 0+1i 0+2i 0+3i
Note how the fourth column is a character column, which is read in as a factor. Each column is read in as a character vector and coerced to a specific class using either the colClasses argument or automated type checking via type.convert (as you highlight in your question).
This means that everything is a character, unless R can detect that it is something else. If stringsAsFactors = TRUE, then those columns are returned as factors.
This should be pretty intuitive except that, as Richard Scriven points out, you can sometimes get caught when type.convert cannot quite figure out a column. Here are some examples, all of which are typos or the result of poorly formed columns:
Mixing logical representations (expect logical, get factor):
str(read.csv(text = "a
TRUE
FALSE
1
0", header = TRUE))
# 'data.frame': 4 obs. of 1 variable:
# $ a: Factor w/ 4 levels "0","1","FALSE",..: 4 3 2 1
Character string in an otherwise numeric column (expect integer, get factor):
str(read.csv(text = "a
1
2
3
4a", header = TRUE))
# 'data.frame': 4 obs. of 1 variable:
# $ a: Factor w/ 4 levels "1","2",..: 1 2 3 4
Another example of character string in a numeric column (expect numeric, get factor):
str(read.csv(text = "a
1.1
2.1
3.1
4.x", header = TRUE))
# 'data.frame': 4 obs. of 1 variable:
# $ a: Factor w/ 4 levels "1.1","2.1",..: 1 2 3 4
Saying there isn't a header when there actually is (expect integer, get factor):
str(read.csv(text = "a
1
2
3
4a", header = FALSE))
# 'data.frame': 5 obs. of 1 variable:
# $ V1: Factor w/ 5 levels "1","2",..: 5 1 2 3 4
Accidental spaces in numeric values (expect numeric, get factor):
str(read.csv(text = "a
1
2
3 .4", header = FALSE))
# 'data.frame': 3 obs. of 1 variable:
# $ a: Factor w/ 3 levels "1","2","3 . 4",..: 1 2 3
In R 3.1.0, one could also end up with a factor column if reading in a numeric column would have resulted in a loss of precision (because the column contained too many decimal places to represent in R). This behavior is now seen in the numerals argument to read.table:
# default behavior (expect numeric, get numeric)
str(read.csv(text = "a
1.1
2.2
3.123456789123456789", header = TRUE, numerals = "allow.loss"))
# 'data.frame': 3 obs. of 1 variable:
# $ a: num 1.1 2.2 3.12
# "no.loss" argument (expect numeric, get factor)
str(read.csv(text = "a
1.1
2.2
3.123456789123456789", header = TRUE, numerals = "no.loss"))
# 'data.frame': 3 obs. of 1 variable:
# $ a: Factor w/ 3 levels " 1.1",..: 1 2 3
There are probably some other situations that would result in receiving a factor column, but all of them are going to be due to malformed files or inappropriately used arguments to read.table.
I have a character data frame in R which has NaNs in it. I need to remove any row with a NaN and then convert it to a numeric data frame.
If I just do as.numeric on the data frame, I run into the following
Error: (list) object cannot be coerced to type 'double'
1:
0:
As #thijs van den bergh points you to,
dat <- data.frame(x=c("NaN","2"),y=c("NaN","3"),stringsAsFactors=FALSE)
dat <- as.data.frame(sapply(dat, as.numeric)) #<- sapply is here
dat[complete.cases(dat), ]
# x y
#2 2 3
Is one way to do this.
Your error comes from trying to make a data.frame numeric. The sapply option I show is instead making each column vector numeric.
Note that data.frames are not numeric or character, but rather are a list which can be all numeric columns, all character columns, or a mix of these or other types (e.g.: Date/logical).
dat <- data.frame(x=c("NaN","2"),y=c("NaN","3"),stringsAsFactors=FALSE)
is.list(dat)
# [1] TRUE
The example data just has two character columns:
> str(dat)
'data.frame': 2 obs. of 2 variables:
$ x: chr "NaN" "2"
$ y: chr "NaN" "3
...which you could add a numeric column to like so:
> dat$num.example <- c(6.2,3.8)
> dat
x y num.example
1 NaN NaN 6.2
2 2 3 3.8
> str(dat)
'data.frame': 2 obs. of 3 variables:
$ x : chr "NaN" "2"
$ y : chr "NaN" "3"
$ num.example: num 6.2 3.8
So, when you try to do as.numeric R gets confused because it is wondering how to convert this list object which may have multiple types in it. user1317221_G's answer uses the ?sapply function, which can be used to apply a function to the individual items of an object. You could alternatively use ?lapply which is a very similar function (read more on the *apply functions here - R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate )
I.e. - in this case, to each column of your data.frame, you can apply the as.numeric function, like so:
data.frame(lapply(dat,as.numeric))
The lapply call is wrapped in a data.frame to make sure the output is a data.frame and not a list. That is, running:
lapply(dat,as.numeric)
will give you:
> lapply(dat,as.numeric)
$x
[1] NaN 2
$y
[1] NaN 3
$num.example
[1] 6.2 3.8
While:
data.frame(lapply(dat,as.numeric))
will give you:
> data.frame(lapply(dat,as.numeric))
x y num.example
1 NaN NaN 6.2
2 2 3 3.8
A very unexpected behavior of the useful data.frame in R arises from keeping character columns as factor. This causes many problems if it is not considered. For example suppose the following code:
foo=data.frame(name=c("c","a"),value=1:2)
# name val
# 1 c 1
# 2 a 2
bar=matrix(1:6,nrow=3)
rownames(bar)=c("a","b","c")
# [,1] [,2]
# a 1 4
# b 2 5
# c 3 6
Then what do you expect of running bar[foo$name,]? It normally should return the rows of bar that are named according to the foo$name that means rows 'c' and 'a'. But the result is different:
bar[foo$name,]
# [,1] [,2]
# b 2 5
# a 1 4
The reason is here: foo$name is not a character vector, but an integer vector.
foo$name
# [1] c a
# Levels: a c
To have the expected behavior, I manually convert it to character vector:
foo$name = as.character(foo$name)
bar[foo$name,]
# [,1] [,2]
# c 3 6
# a 1 4
But the problem is that we may easily miss to perform this, and have hidden bugs in our codes. Is there any better solution?
This is a feature and R is working as documented. This can be dealt with generally in a few ways:
use the argument stringsAsFactors = TRUE in the call to data.frame(). See ?data.frame
if you detest this behaviour so, set the option globally via
options(stringsAsFactors = FALSE)
(as noted by #JoshuaUlrich in comments) a third option is to wrap character variables in I(....). This alters the class of the object being assigned to the data frame component to include "AsIs". In general this shouldn't be a problem as the object inherits (in this case) the class "character" so should work as before.
You can check what the default for stringsAsFactors is on the currently running R process via:
> default.stringsAsFactors()
[1] TRUE
The issue is slightly wider than data.frame() in scope as this also affects read.table(). In that function, as well as the two options above, you can also tell R what all the classes of the variables are via argument colClasses and R will respect that, e.g.
> tmp <- read.table(text = '"Var1","Var2"
+ "A","B"
+ "C","C"
+ "B","D"', header = TRUE, colClasses = rep("character", 2), sep = ",")
> str(tmp)
'data.frame': 3 obs. of 2 variables:
$ Var1: chr "A" "C" "B"
$ Var2: chr "B" "C" "D"
In the example data below, author and title are automatically converted to factor (unless you add the argument stringsAsFactors = FALSE when you are creating the data). What if we forgot to change the default setting and don't want to set the options globally?
Some code I found somewhere (most likely SO) uses sapply() to identify factors and convert them to strings.
dat = data.frame(title = c("title1", "title2", "title3"),
author = c("author1", "author2", "author3"),
customerID = c(1, 2, 1))
# > str(dat)
# 'data.frame': 3 obs. of 3 variables:
# $ title : Factor w/ 3 levels "title1","title2",..: 1 2 3
# $ author : Factor w/ 3 levels "author1","author2",..: 1 2 3
# $ customerID: num 1 2 1
dat[sapply(dat, is.factor)] = lapply(dat[sapply(dat, is.factor)],
as.character)
# > str(dat)
# 'data.frame': 3 obs. of 3 variables:
# $ title : chr "title1" "title2" "title3"
# $ author : chr "author1" "author2" "author3"
# $ customerID: num 1 2 1
I assume this would be faster than re-reading in the dataset with the stringsAsFactors = FALSE argument, but have never tested.
I have a numeric vector (future_prices) in my case. I use a date vector from another vector (here: pred_commodity_prices$futuredays) to create numbers for the months. After that I use cbind to bind the months to the numeric vector. However, was happened is that the numeric vector become non-numeric. Do you know how what the reason for this is? When I use as.numeric(future_prices) I get strange values. What could be an alternative? Thanks
head(future_prices)
pred_peak_month_3a pred_peak_quarter_3a
1 68.33907 62.37888
2 68.08553 62.32658
is.numeric(future_prices)
[1] TRUE
> month = format(as.POSIXlt.date(pred_commodity_prices$futuredays), "%m")
> future_prices <- cbind (future_prices, month)
> head(future_prices)
pred_peak_month_3a pred_peak_quarter_3a month
1 "68.3390747063745" "62.3788824938719" "01"
is.numeric(future_prices)
[1] FALSE
The reason is that cbind returns a matrix, and a matrix can only hold one data type. You could use a data.frame instead:
n <- 1:10
b <- LETTERS[1:10]
m <- cbind(n,b)
str(m)
chr [1:10, 1:2] "1" "2" "3" "4" "5" "6" "7" "8" "9" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "n" "b"
d <- data.frame(n,b)
str(d)
'data.frame': 10 obs. of 2 variables:
$ n: int 1 2 3 4 5 6 7 8 9 10
$ b: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
See ?format. The format function returns:
An object of similar structure to ‘x’ containing character
representations of the elements of the first argument ‘x’ in a
common format, and in the current locale's encoding.
from ?cbind, cbind returns
... a matrix combining the ‘...’ arguments
column-wise or row-wise. (Exception: if there are no inputs or
all the inputs are ‘NULL’, the value is ‘NULL’.)
and all elements of a matrix must be of the same class, so everything is coerced to character.
F.Y.I.
When one column is "factor", simply/directly using as.numeric will change the value in that column. The proper way is:
data.frame[,2] <- as.numeric(as.character(data.frame[,2]))
Find more details: Converting values to numeric, stack overflow