Trying to find daily difference for field in a dataframe

Trying to find daily difference for field in a dataframe - r

I have this.
head(df2)
Close Group
2007-01-03 17.625 S
2007-01-04 17.645 B
2007-01-05 17.570 B
2007-01-08 17.505 B
2007-01-09 17.430 B
2007-01-10 17.375 S
I am trying to find the daily change of 'Close'.
I tried this: dailychange <- diff(df2$Close)
That didn't work because 'non-numeric argument to binary operator'. This is a time series, but I don't think that matters at all.
str(df2)
‘zoo’ series from 2007-01-03 to 2018-07-27
Data: chr [1:2913, 1:2] "17.625" "17.645" "17.570" "17.505" "17.430" "17.375" "17.905" "17.950" "18.110" "18.145" ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2913] "2007-01-03" "2007-01-04" "2007-01-05" "2007-01-08" ...
..$ : chr [1:2] "Close" "Group"
Index: Date[1:2913], format: "2007-01-03" "2007-01-04" "2007-01-05" "2007-01-08" "2007-01-09" "2007-01-10" "2007-01-11" "2007-01-12" ...

From the documentation of ts:
a vector or matrix of the observed time-series values. A data frame
will be coerced to a numeric matrix via data.matrix
But, a timeseries with a character column will transform all columns into character. The same behaviour can be seen in a matrix with a character value.
Either keep your data in as a data.frame, or use as.numeric inside your diff statement.
dailychange <- diff(as.numeric(df2$Close))
dailychange
[1] 0.020 -0.075 -0.065 -0.075 -0.055

The error message arises because some data is not numerical. Only numerical data can be used in diff(). Check your data in the close column to check if the data is numerical.

Related

After converting to numeric still not in numeric format in R [duplicate]

This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 1 year ago.
I have an issue in converting data into the numeric format.
str(DfFilter)
output
'data.frame': 32 obs. of 5 variables:
$ InstanceType : chr " c1.xlarge" " c1.xlarge" " c1.xlarge" " c1.xlarge" ...
$ ProductDescription: chr " Linux/UNIX" " Linux/UNIX" " Linux/UNIX" " Linux/UNIX" ...
$ SpotPrice : num 0.052 0.0739 0.0747 0.0751 0.0755 ...
$ ymd_hms(Timestamp): POSIXct, format: "2021-05-16 06:26:40" "2021-05-16 00:58:55" "2021-05-16 06:46:50" ...
$ Timestamp : 'times' num 06:26:40 00:58:55 06:46:50 14:17:55 19:07:09 ...
..- attr(*, "format")= chr "h:m:s"
but when i run to check for numeric values as follow
is.numeric(DfFilter)
[1] FALSE
why is that so. Kindly help in understanding this issue. Thanks in advance.

With purrr package and based on the comments:
DfModel <- DfFilter %>%
purrr::keep(.p = function(x) is.numeric(x))
It will keep only the numeric variables

Filter with is.numeric could be used to get only numeric columns.
Filter(is.numeric, DfFilter)
# a c
#1 1 2.2
Another way to keep only numeric value in a data.frame the result of is.numeric used in sapply could be used for subsetting with [:
DfFilter[sapply(DfFilter, is.numeric)]
# a c
#1 1 2.2
Example dataset:
DfFilter <- data.frame(a=1, b="b", c=2.2)

Adding name to the first column in r

my dataset is missing name for the first column (there are dates in it)
I tried colnames(managers)[1] <- "date" but it renamed the second column
> #load data
> data(managers)
> colnames(managers)[1] <- "date"
> View(head(managers,10))
> str(managers)
An ‘xts’ object on 1996-01-31/2006-12-31 containing:
Data: num [1:132, 1:10] 0.0074 0.0193 0.0155 -0.0091 0.0076 -0.0039 -0.0231 0.0395 0.0147 0.0288 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:10] "date" "HAM2" "HAM3" "HAM4" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
dataset headers

The 'managers' is an xts object and the dates are the index
library(PerformanceAnalytics)
index(managers)
#[1] "1996-01-31" "1996-02-29" "1996-03-31" "1996-04-30" "1996-05-31" "1996-06-30" ...
The columns of the dataset are
colnames(managers)
#[1] "HAM1" "HAM2" "HAM3" "HAM4" "HAM5" "HAM6" "EDHEC LS EQ" "SP500 TR" "US 10Y TR" "US 3m TR"
If we want to convert it to data.frame, then use fortify.zoo
library(zoo)
managers1 <- fortify.zoo(managers)
colnames(managers)[1] <- 'date'
Or specify the names in fortify.zoo
managers1 <- fortify.zoo(managers, names = "date")

Extract Value Labels from a Stata file loaded with Haven (Value Labels not Variable Labels)

I am trying to get a list of the value labels from a data.frame I loaded with haven. My variables are stored as haven_labelled and I know that the value labels are there because when I run str() they are listed as an attribute.
str( x$tranwork )
'haven_labelled' num [1:498381] NA NA NA NA NA NA NA NA NA NA ...
- attr(*, "label")= chr "Means of transportation to work"
- attr(*, "format.stata")= chr "%24.0g"
- attr(*, "labels")= Named num [1:19] 0 10 11 12 13 14 15 20 30 31 ...
..- attr(*, "names")= chr [1:19] "N/A " "Auto, truck, or van" "Auto" "Driver" ...
>
There seem to be alot of good ways to get the variable label. I can't figure out how to get the value label Variable labels in the R package Haven with SPSS or Convenient way to access variables label after importing Stata data with haven
I have tried converting variables to factors, and
attr( x$tranwork , "label" )
[1] "Means of transportation to work"
> attr( x$tranwork , "names" )
NULL
Essentially I would like to see the label associated with x$transwork 1- through 19

There are a few ways to get the value labels.
With the labelled package:
library(labelled)
names(val_labels(x$tranwork))
With the sjlabelled package:
sjlabelled::get_labels(x$tranwork)
With base:
names(attr(x$tranwork, "labels"))
If you want to see the value labels along with the values, then use:
labelled::val_labels(x$tranwork)
or
attr(x$tranwork, "labels")

Another way is to convert the vector into its labels, using
x$transwork_labels <- labelled::to_character(x$transwork)

Subtracting r objects of class 'times'

I have two objects of class 'times' generated using chron that I am trying to compare. On the surface they look identical:
> str(x)
Class 'times' atomic [1:6] 0.04444 0.05417 0.05486 0.00208 0.01111 ...
..- attr(*, "format")= chr "h:m:s"
> str(y)
Class 'times' atomic [1:6] 0.04444 0.05417 0.05486 0.00208 0.01111 ...
..- attr(*, "format")= chr "h:m:s"
So I expected that x - y = 0 or x==y would return TRUE, but this is not the case:
> x-y
[1] -6.245005e-17 -2.775558e-17 -2.775558e-17 7.372575e-18 -7.112366e-17 0.000000e+00
> x==y
[1] FALSE FALSE FALSE FALSE FALSE TRUE
Any idea what is going on or how I can compare the two? I already tried changing it to POSIXct and that works, but before comparing, I have operations to do on the data frame columns this data comes from (adding and subtracting), which can't be done with POSIXct. Also, it requires extra steps and this is meant to be a quick check up to see if there are any discrepencies in the data.
I guess I can use as.character(x)==as.character(y), and it works, but there has to be a more elegant way of doing this...

R dataframe define column names at creation

I get monthly price value for the two assets below from Yahoo:
if(!require("tseries") | !require(its) ) { install.packages(c("tseries", 'its')); require("tseries"); require(its) }
startDate <- as.Date("2000-01-01", format="%Y-%m-%d")
MSFT.prices = get.hist.quote(instrument="msft", start= startDate,
quote="AdjClose", provider="yahoo", origin="1970-01-01",
compression="m", retclass="its")
SP500.prices = get.hist.quote(instrument="^gspc", start=startDate,
quote="AdjClose", provider="yahoo", origin="1970-01-01",
compression="m", retclass="its")
I want to put these two into a single data frame with specified columnames (Pandas allows this now - a bit ironic since they take the data.frame concept from R). As below, I assign the two time series with names:
MSFTSP500.prices <- data.frame(msft = MSFT.prices, sp500= SP500.prices )
However, this does not preserve the column names [msft, snp500] I have appointed. I need to define column names in a separate line of code:
colnames(MSFTSP500.prices) <- c("msft", "sp500")
I tried to put colnames and col.names inside the data.frame() call but it doesn't work. How can I define column names while creating the data frame?
I found ?data.frame very unhelpful...

The code fails with an error message indicating no availability of as.its. So I added the missing code (which appears to have been successful after two failed attempts.) Once you issue the missing require() call you can use str to see what sort of object get.hist.quote actually returns. It is neither a dataframe nor a zoo object, although it resembles a zoo-object in many ways:
> str(SP500.prices)
Formal class 'its' [package "its"] with 2 slots
..# .Data: num [1:180, 1] 1394 1366 1499 1452 1421 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:180] "2000-01-02" "2000-01-31" "2000-02-29" "2000-04-02" ...
.. .. ..$ : chr "AdjClose"
..# dates: POSIXct[1:180], format: "2000-01-02 16:00:00" "2000-01-31 16:00:00" ...
If you run cbind on those two objects you get a regular matrix with dimnames:
> str(cbind(SP500.prices, MSFT.prices) )
num [1:180, 1:2] 1394 1366 1499 1452 1421 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:180] "2000-01-02" "2000-01-31" "2000-02-29" "2000-04-02" ...
..$ : chr [1:2] "AdjClose" "AdjClose"
You will still need to change the column names since there does not seem to be a cbind.its that lets you assign column-names. I would caution about using the data.frame method, since the object is might get confusing in its behavior:
> str( MSFTSP500.prices )
'data.frame': 180 obs. of 2 variables:
$ AdjClose :Formal class 'AsIs', 'its' [package ""] with 1 slot
.. ..# .S3Class: chr "AsIs" "its"
$ AdjClose.1:Formal class 'AsIs', 'its' [package ""] with 1 slot
.. ..# .S3Class: chr "AsIs" "its"
The columns are still S4 objects. I suppose that might be useful if you were going to pass them to other its-methods but could be confusing otherwise. This might be what you were shooting for:
> MSFTSP500.prices <- data.frame(msft = as.vector(MSFT.prices),
sp500= as.vector(SP500.prices) ,
row.names= as.character(MSFT.prices#dates) )
> str( MSFTSP500.prices )
'data.frame': 180 obs. of 2 variables:
$ msft : num 35.1 32 38.1 25 22.4 ...
$ sp500: num 1394 1366 1499 1452 1421 ...
> head(rownames(MSFTSP500.prices))
[1] "2000-01-02 16:00:00" "2000-01-31 16:00:00" "2000-02-29 16:00:00"
[4] "2000-04-02 17:00:00" "2000-04-30 17:00:00" "2000-05-31 17:00:00"

MSFT.prices is a zoo object, which seems to be a data-frame-alike, with its own column name which gets transferred to the object. Confer
tmp <- data.frame(a=1:10)
b <- data.frame(lost=tmp)
which loses the second column name.
If you do
MSFTSP500.prices <- data.frame(msft = as.vector(MSFT.prices),
sp500=as.vector(SP500.prices))
then you will get the colnames you want (though you won't get zoo-specific behaviours). Not sure why you object to renaming columns in a second command, though.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Trying to find daily difference for field in a dataframe - r

The error message arises because some data is not numerical. Only numerical data can be used in diff(). Check your data in the close column to check if the data is numerical.

Related

After converting to numeric still not in numeric format in R [duplicate]

Adding name to the first column in r

Extract Value Labels from a Stata file loaded with Haven (Value Labels not Variable Labels)

Subtracting r objects of class 'times'

R dataframe define column names at creation

Categories

Resources