R: numeric vector becoming non-numeric after cbind of dates - r

I have a numeric vector (future_prices) in my case. I use a date vector from another vector (here: pred_commodity_prices$futuredays) to create numbers for the months. After that I use cbind to bind the months to the numeric vector. However, was happened is that the numeric vector become non-numeric. Do you know how what the reason for this is? When I use as.numeric(future_prices) I get strange values. What could be an alternative? Thanks
head(future_prices)
pred_peak_month_3a pred_peak_quarter_3a
1 68.33907 62.37888
2 68.08553 62.32658
is.numeric(future_prices)
[1] TRUE
> month = format(as.POSIXlt.date(pred_commodity_prices$futuredays), "%m")
> future_prices <- cbind (future_prices, month)
> head(future_prices)
pred_peak_month_3a pred_peak_quarter_3a month
1 "68.3390747063745" "62.3788824938719" "01"
is.numeric(future_prices)
[1] FALSE

The reason is that cbind returns a matrix, and a matrix can only hold one data type. You could use a data.frame instead:
n <- 1:10
b <- LETTERS[1:10]
m <- cbind(n,b)
str(m)
chr [1:10, 1:2] "1" "2" "3" "4" "5" "6" "7" "8" "9" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "n" "b"
d <- data.frame(n,b)
str(d)
'data.frame': 10 obs. of 2 variables:
$ n: int 1 2 3 4 5 6 7 8 9 10
$ b: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10

See ?format. The format function returns:
An object of similar structure to ‘x’ containing character
representations of the elements of the first argument ‘x’ in a
common format, and in the current locale's encoding.
from ?cbind, cbind returns
... a matrix combining the ‘...’ arguments
column-wise or row-wise. (Exception: if there are no inputs or
all the inputs are ‘NULL’, the value is ‘NULL’.)
and all elements of a matrix must be of the same class, so everything is coerced to character.

F.Y.I.
When one column is "factor", simply/directly using as.numeric will change the value in that column. The proper way is:
data.frame[,2] <- as.numeric(as.character(data.frame[,2]))
Find more details: Converting values to numeric, stack overflow

Related

#R #Non-numeric argument for binary operator #xts object * integer

I have an xts object. Thus a time series of "outstanding share" of a company that are ordered by date.
I want to multiply the time series of "outstanding shares" by the factor 7 in order to account for a stock split.
> outstanding_shares_xts <- shares_xts1[,1]
> adjusted <- outstanding_shares_xts*7
Error: Non-numeric argument for binary operator.
The ts "oustanding_shares_xts" is a column of integers.
Does anyone has an idea??
My guess is that they may look like integers but are in fact not.
Sleuthing:
I initially thought it could be [-vs-[[ column subsetting, since tibble(a=1:2)[,1] does not produce an integer vector (it produces a single-column tibble), but tibble(a=1:2)[,1] * 7 still works.
Then I thought it could be due to factors, but it's a different error:
data.frame(a=factor(1:2))[,1]*7
# Warning in Ops.factor(data.frame(a = factor(1:2))[, 1], 7) :
# '*' not meaningful for factors
# [1] NA NA
One possible is that you have character values that look like integers.
dat <- data.frame(a=as.character(1:2))
dat
# a
# 1 1
# 2 2
dat[,1]*7
# Error in dat[, 1] * 7 : non-numeric argument to binary operator
Try converting that column to integer, something like
str(dat)
# 'data.frame': 2 obs. of 1 variable:
# $ a: chr "1" "2"
dat$a <- as.integer(dat$a)
str(dat)
# 'data.frame': 2 obs. of 1 variable:
# $ a: int 1 2
dat[,1]*7
# [1] 7 14

R- expand.grid given a data.frame of parameter names and sequence definitions

I have a data.frame that arbitrarily defines parameter names and sequence boundaries:
dfParameterValues <- data.frame(ParameterName = character(), seqFrom = integer(), seqTo = integer(), seqBy = integer())
row1 <- data.frame(ParameterName = "parameterA", seqFrom = 1, seqTo = 2, seqBy = 1)
row2 <- data.frame(ParameterName = "parameterB", seqFrom = 5, seqTo = 7, seqBy = 1)
row3 <- data.frame(ParameterName = "parameterC", seqFrom = 10, seqTo = 11, seqBy = 1)
dfParameterValues <- rbind(dfParameterValues, row1)
dfParameterValues <- rbind(dfParameterValues, row2)
dfParameterValues <- rbind(dfParameterValues, row3)
I would like to use this approach to create a grid of c parameter columns based on the number of unique ParameterNames that contain r rows of all possible combinations of the sequences given by seqFrom, seqTo, and seqBy. The result would therefore look somewhat like this or should have a content like the following:
ParameterA ParameterB ParameterC
1 5 10
1 5 11
1 6 10
1 6 11
1 7 10
1 7 11
2 5 10
2 5 11
2 6 10
2 6 11
2 7 10
2 7 11
Edit: Note that the parameter names and their numbers are not known in advance. The data.frame comes from elsewhere so I cannot use the standard static expand.grid approach and need something like a flexible function that creates the expanded grid based on any dataframe with the columns ParameterName, seqFrom, seqTo, seqBy.
I've been playing around with for loops (which is bad to begin with) and it hasn't lead me to any elegant ideas. I can't seem to find a way to come up with the result by using tidyr without constructing the sequences seperately first, either. Do you have any elegant approaches?
Bonus kudos for extending this to include not only numerical sequences, but vectors/sets of characters / other factors, too.
Many thanks!
Going off CPak's answer, you could use
my_table <- expand.grid(apply(dfParameterValues, 1, function(x) seq(as.numeric(x['seqFrom']), as.numeric(x['seqTo']), as.numeric(x['seqBy']))))
names(my_table) <- c("ParameterA", "ParameterB", "ParameterC")
my_table <- my_table[order(my_table$ParameterA, my_table$ParameterB), ]
#smanski's answer is technically correct (and should arguably be accepted since it motivated this), but it is also a good example of when to be careful when using apply with data.frames. In this case, the frame contains at least one column that is character, so all columns are converted, resulting in the need to use as.numeric. The safer alternative is to only pull the columns needed, such as either of:
expand.grid(apply(dfParameterValues[,-1], 1,
function(x) seq(x['seqFrom'], x['seqTo'], x['seqBy']) ))
expand.grid(apply(dfParameterValues[,c("seqFrom","seqTo","seqBy")], 1,
function(x) seq(x['seqFrom'], x['seqTo'], x['seqBy']) ))
I prefer the second, because it only pulls what it needs and therefore what it "knows" should be numeric. (I find explicit is often safer.)
The reason this is happening is that apply silently converts the data to a matrix, so to see the effects, try:
str(as.matrix(dfParameterValues))
# chr [1:3, 1:4] "parameterA" "parameterB" "parameterC" " 1" " 5" ...
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:3] "1" "2" "3"
# ..$ : chr [1:4] "ParameterName" "seqFrom" "seqTo" "seqBy"
str(as.matrix(dfParameterValues[c("seqFrom","seqTo","seqBy")]))
# num [1:3, 1:3] 1 5 10 2 7 11 1 1 1
# - attr(*, "dimnames")=List of 2
# ..$ : chr [1:3] "1" "2" "3"
# ..$ : chr [1:3] "seqFrom" "seqTo" "seqBy"
(Note the chr on the first and the num on the second.)
Neither one preserves the parameter names. To do that, just sandwich the call with setNames:
setNames(
expand.grid(apply(dfParameterValues[,c("seqFrom","seqTo","seqBy")], 1,
function(x) seq(x['seqFrom'], x['seqTo'], x['seqBy']) )),
dfParameterValues$ParameterName)

Setting variable attributes via subsetting a dataframe

I want to set an attribute ("full.name") of certain variables in a data frame by subsetting the dataframe and iterating over a character vector. I tried two solutions but neither works (varsToPrint is a character vector containing the variables, questionLabels is a character vector containing the labels of questions):
Sample data:
jtiPrint <- data.frame(question1 = seq(5), question2 = seq(5), question3=seq(5))
questionLabels <- c("question1Label", "question2Label")
varsToPrint <- c("question1", "question2")
Solution 1:
attrApply <- function(var, label) {
`<-`(attr(var, "full.name"), label)
}
mapply(attrApply, jtiPrint[varsToPrint], questionLabels)
Solution 2:
i <- 1
for (var in jtiPrint[varsToPrint]) {
attr(var, "full.name") <- questionLabels[i]
i <- i + 1
}
Desired output (for e.g. variable 1):
attr(jtiPrint$question1, "full.name")
[1] "question1Label"
The problems seems to be in solution 2 that R sets the attritbute to a new dataframe only containing one variable (the indexed variable). However, I don't understand why solution 1 does not work. Any ideas how to fix either of these two ways?
Solution 1 :
The function is 'attr<-' not '<-'(attr...), also you need to set SIMPLIFY=FALSE (otherwise a matrix is returned instead of a list) and then call as.data.frame :
attrApply <- function(var, label) {
`attr<-`(var, "full.name", label)
}
df <- as.data.frame(mapply(attrApply,jtiPrint[varsToPrint],questionLabels,SIMPLIFY = FALSE))
> str(df)
'data.frame': 5 obs. of 2 variables:
$ question1: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question1Label"
$ question2: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question2Label"
Solution 2 :
You need to set the attribute on the column of the data.frame, you're setting the attribute on copies of the columns :
for(i in 1:length(varsToPrint)){
attr(jtiPrint[[i]],"full.name") <- questionLabels[i]
}
> str(jtiPrint)
'data.frame': 5 obs. of 3 variables:
$ question1: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question1Label"
$ question2: atomic 1 2 3 4 5
..- attr(*, "full.name")= chr "question2Label"
$ question3: int 1 2 3 4 5
Anyway, note that the two approaches lead to a different result. In fact the mapply solution returns a subset of the previous data.frame (so no column 3) while the second approach modifies the existing jtiPrint data.frame.

Accessing particular cells within dataframes organized into list in R in a "vectorized" way

This is my first question here, sorry for possible mistakes.
I have got a "tt" list of dataframes after I streamed-in a jason file.
some of dataframes are empty, some have predefined structure, here is an example:
> str(tt)
List of 2
$ :'data.frame': 0 obs. of 0 variables
$ :'data.frame': 2 obs. of 2 variables:
..$ key : chr [1:2] "issue_id" "letter_id"
..$ value: chr [1:2] "43" "223663"
> tt
[[1]]
data frame with 0 columns and 0 rows
[[2]]
key value
1 issue_id 43
2 letter_id 223663
I would like to get a column (e.g. named "t") with issue_id's out of "tt" structure, so that
t[1] = NA (or NULL)
t[2] = 43
I can do it accessing dataframes as a list elements like this
> tt[[1]][1,2]
NULL
> tt[[2]][1,2]
[1] "43"
How can I do this in a "vectorized" way? tried different things with no success like
> t <- tt[[]][1,2]
Error in tt[[]] : invalid subscript type 'symbol'
> t <- tt[][1,2]
Error in tt[][1, 2] : incorrect number of dimensions
> t <- tt[[]][1][2]
Error in tt[[]] : invalid subscript type 'symbol'
> t <- tt[][1][2]
> t
[[1]]
NULL
It should be something very simple I guess
We can use lapply to loop over the list. As there are null elements or if the number of rows are zero, we skip it and extract the 'value' from the other elements.
lapply(tt, function(x) if(!(is.null(x)|!nrow(x))) with(x, value[key=="issue_id"]))
As #MikeRSpencer mentioned in the comments, if we need to extract the first 'value'
sapply(tt, function(x) if(!(is.null(x)|!nrow(x))) x$value[1])
and it would be return a vector

Converting from a character to a numeric data frame

I have a character data frame in R which has NaNs in it. I need to remove any row with a NaN and then convert it to a numeric data frame.
If I just do as.numeric on the data frame, I run into the following
Error: (list) object cannot be coerced to type 'double'
1:
0:
As #thijs van den bergh points you to,
dat <- data.frame(x=c("NaN","2"),y=c("NaN","3"),stringsAsFactors=FALSE)
dat <- as.data.frame(sapply(dat, as.numeric)) #<- sapply is here
dat[complete.cases(dat), ]
# x y
#2 2 3
Is one way to do this.
Your error comes from trying to make a data.frame numeric. The sapply option I show is instead making each column vector numeric.
Note that data.frames are not numeric or character, but rather are a list which can be all numeric columns, all character columns, or a mix of these or other types (e.g.: Date/logical).
dat <- data.frame(x=c("NaN","2"),y=c("NaN","3"),stringsAsFactors=FALSE)
is.list(dat)
# [1] TRUE
The example data just has two character columns:
> str(dat)
'data.frame': 2 obs. of 2 variables:
$ x: chr "NaN" "2"
$ y: chr "NaN" "3
...which you could add a numeric column to like so:
> dat$num.example <- c(6.2,3.8)
> dat
x y num.example
1 NaN NaN 6.2
2 2 3 3.8
> str(dat)
'data.frame': 2 obs. of 3 variables:
$ x : chr "NaN" "2"
$ y : chr "NaN" "3"
$ num.example: num 6.2 3.8
So, when you try to do as.numeric R gets confused because it is wondering how to convert this list object which may have multiple types in it. user1317221_G's answer uses the ?sapply function, which can be used to apply a function to the individual items of an object. You could alternatively use ?lapply which is a very similar function (read more on the *apply functions here - R Grouping functions: sapply vs. lapply vs. apply. vs. tapply vs. by vs. aggregate )
I.e. - in this case, to each column of your data.frame, you can apply the as.numeric function, like so:
data.frame(lapply(dat,as.numeric))
The lapply call is wrapped in a data.frame to make sure the output is a data.frame and not a list. That is, running:
lapply(dat,as.numeric)
will give you:
> lapply(dat,as.numeric)
$x
[1] NaN 2
$y
[1] NaN 3
$num.example
[1] 6.2 3.8
While:
data.frame(lapply(dat,as.numeric))
will give you:
> data.frame(lapply(dat,as.numeric))
x y num.example
1 NaN NaN 6.2
2 2 3 3.8

Resources