Let's consider following data frame for reproducible example:
df <- data.frame(
"Date" =
c(
"2009-11-02", "2009-11-03", "2009-11-04", "2009-11-05", "2009-11-06",
"2009-11-09", "2009-11-10", "2009-11-12", "2009-11-13", "2009-11-16",
"2009-11-17", "2009-11-18", "2009-11-19", "2009-11-20"
),
"Open" = c(
64.97971, 64.64817, 63.88567, 64.34973, 67.16770, 67.63186,
69.48868, 68.95794, 70.08527, 72.47256,
72.53886, 72.73724, 71.07980, 69.75345
),
"High" = c(
65.47689, 65.14544, 65.44378, 66.96887, 68.75883, 69.62065, 70.81439, 73.26807,
71.07980, 73.13536,
73.26807, 72.93625, 71.87532, 72.27345
),
"Low" = c(
63.98508, 62.75843, 63.71976, 64.34973, 65.47689, 66.96887, 68.36125, 68.95794,
69.28966, 72.00803,
72.00803, 71.14620, 69.68705, 69.75345
),
"Close" = c(
64.64817, 62.85784, 65.21174, 66.96887, 65.70910, 69.62065, 70.81439, 71.94172, 70.61537, 72.53886,
72.80355, 71.60999, 69.68705, 70.94709
)
)
This is some data frame in format OHLC (Open, High, Close, Low).
Now let's change this data frame into xts object:
df <- as.xts(df, order.by = as.Date(df[, 1]))
And now I want to apply to.period function e.g.:
to.period(df, period = "days", k =3)
I obtain error:
'to.period(df, period = "days", k = 3)':unsupported type
I read about this error and the source of it lies in the definition of xts object. Becuase xts object is a matrix every variable should have same type. The problem is here, becuase for example column "Open" is created by numeric values and first column is filled with values in date format. This is the reasoning why as.xts() converts everything to strings, as most common data format. However, even if I know the justification why it doens't work - I have no idea how can I made to.period work. Do you have any idea how it can be solved ?
The issue was that the first column was not removed while constructing the xts
df <- as.xts(df[-1], order.by = as.Date(df[, 1]))
to.period(df, period = "days", k =3)
# df.Open df.High df.Low df.Close
#2009-11-03 64.97971 65.47689 62.75843 62.85784
#2009-11-06 63.88567 68.75883 63.71976 65.70910
#2009-11-09 67.63186 69.62065 66.96887 69.62065
#2009-11-12 69.48868 73.26807 68.36125 71.94172
#2009-11-13 70.08527 71.07980 69.28966 70.61537
#2009-11-18 72.47256 73.26807 71.14620 71.60999
#2009-11-20 71.07980 72.27345 69.68705 70.94709
Without removing the first column i.e. a character class column ('Date'), the xts converts the whole data into a character class as it is also a matrix and matrix can have only single class
str(as.xts(df, order.by = as.Date(df[, 1])))
An ‘xts’ object on 2009-11-02/2009-11-20 containing:
Data: chr [1:14, 1:5] "2009-11-02" "2009-11-03" "2009-11-04" "2009-11-05" "2009-11-06" "2009-11-09" "2009-11-10" "2009-11-12" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Date" "Open" "High" "Low" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
Related
I have an excel file containing 3 columns of data against a column of time at one hour interval. I tried to convert the data into a zoo object. But everytime i tried to that there is an error that says "In zoo(y, order.by = index(x), ...) : some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique".
> datos_meterologicos <- read_excel(datos, sheet = "Precip")
> idx <- as.Date(datos_meterologicos$Fecha)
> date.matrix <- as.data.frame(datos_meterologicos[,-1])
> date.xts <- as.xts(date.matrix,order.by=idx)
> date.zoo <- as.zoo(date.xts)
Warning message:
In zoo(y, order.by = index(x), ...) :
some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
I looked up some of the solutions from other case with the same conflict that I Have, so I tried the next code
datos_meterologicos$Fecha <- read.zoo(datos_meterologicos, FUN=as.POSIXct, format = "%Y/%m/%d %H:%M", tz="UTC"). But I get the same error.
The data is right here https://docs.google.com/spreadsheets/d/1oV2uk5LIL9aFy3Eepw0WkIWucI3_GgkV/edit?usp=sharing&ouid=115562552506837112131&rtpof=true&sd=true
You are transforming the your datetime values into a date with as.Date. You need to add the time as well otherwise you have 24 values of 1 day instead of the day and the hours. Using as.POSIXct will preserve your times.
idx <- as.POSIXct(datos_meterologicos$Fecha)
# rest of your code...
In R, I am trying to read a file that has a timestamp, and update the timestamp based on the condition of another field. The below code works with no problem:
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
Sys.sleep(5)
t$last_update <- as.POSIXlt(ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update), origin = "1970-01-01")
print(t)
The problem is when I read an existing file and try to dynamically change an as.POSIXlt value. The following code is producing the error that accompanies it in the code block afterwards:
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong2#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
write.csv(t, "so_question.csv", row.names = FALSE)
t <- read.csv("so_question.csv")
t$last_update <- as.POSIXlt(t$last_update)
Sys.sleep(5)
t$last_update <- as.POSIXlt(ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update), origin = "1970-01-01")
Error in as.POSIXlt.default(ifelse(t$user == "bshelton#email1.com", Sys.time(), :
do not know how to convert 'ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update)' to class “POSIXlt”
In addition: Warning message:
In ans[!test & ok] <- rep(no, length.out = length(ans))[!test & :
number of items to replace is not a multiple of replacement length
The first case is curiously working only because you don't have what you think—those datetimes are in fact POSIXct, not POSIXlt:
last_update <- rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2)
str(last_update)
#> POSIXlt[1:2], format: "2019-07-28 20:52:10" "2019-07-28 20:52:10"
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = last_update)
str(t)
#> 'data.frame': 2 obs. of 2 variables:
#> $ user : Factor w/ 2 levels "bshelton#email1.com",..: 1 2
#> $ last_update: POSIXct, format: "2019-07-28 20:52:10" "2019-07-28 20:52:10"
If you dig into ?data.frame, it says
data.frame converts each of its arguments to a data frame by calling as.data.frame(optional = TRUE). As that is a generic function, methods can be written to change the behaviour of arguments according to their classes: R comes with many such methods. Character variables passed to data.frame are converted to factor columns unless protected by I or argument stringsAsFactors is false. If a list or data frame or matrix is passed to data.frame it is as if each component or column had been passed as a separate argument (except for matrices protected by I).
This is what's happening: as.data.frame.POSIXlt in fact converts to POSIXct:
now <- Sys.time()
str(now)
#> POSIXct[1:1], format: "2019-07-28 22:50:12"
str(data.frame(time = now))
#> 'data.frame': 1 obs. of 1 variable:
#> $ time: POSIXct, format: "2019-07-28 22:50:12"
as.data.frame.POSIXlt
#> function (x, row.names = NULL, optional = FALSE, ...)
#> {
#> value <- as.data.frame.POSIXct(as.POSIXct(x), row.names,
#> optional, ...)
#> if (!optional)
#> names(value) <- deparse(substitute(x))[[1L]]
#> value
#> }
#> <bytecode: 0x7fc938a11060>
#> <environment: namespace:base>
More immediately, since Sys.time() returns a POSIXct object, ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update) in the second case is getting a POSIXct object for one observation and POSIXlt for the other. The POSIXlt object's class attribute is dropped by ifelse revealing the list underneath, which ifelse then doesn't know how to turn into a vector together with the unclassed POSIXct object (which is just a number).
The solution here, then, is to follow the hint data.frame is giving you and use POSIXct instead of POSIXlt.
If you really want to make it work with POSIXlt, you can iterate over the conditions and POSIXlt vector with Map with if/else (which maintain attributes including class, but only handle scalar conditions) and coerce the resulting list back to a vector with do.call(c, ...):
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
t$last_update <- as.POSIXlt(t$last_update)
t$last_update <- do.call(c, Map(
function(condition, last_update){
if (condition) {
as.POSIXlt(Sys.time() + 5)
} else {
last_update
}
},
condition = t$user == "bshelton#email1.com",
last_update = t$last_update
))
t
#> user last_update
#> 1 bshelton#email1.com 2019-07-28 23:11:04
#> 2 lwong#email1.com 2019-07-28 23:10:59
...but frankly that's a little silly. Just use POSIXct instead, and your life will be better.
I am trying to remove all NA values from two columns in a matrix and make sure that neither column has a value that the other doesn't.
code:
data <- dget(file)
dependent <- data[,"chroma"]
independent <- data[,"mass..Pantheria."]
names(independent) <- names(dependent) <- rownames(data)
for (name in rownames(data)) {
if(is.na(dependent[name])) {
independent$name <- NULL
dependent$name <- NULL
}
if(is.na(independent[name])) {
independent$name <- NULL
dependent$name <- NULL
}
}
print(dput(independent))
print(dput(dependent))
I am brand new to R and am trying to perform this task with a for loop. However, when I delete a section by assigning NULL I receive the following warning:
1: In independent$Aeretes_melanopterus <- NULL : Coercing LHS to a list
2: In dependent$name <- NULL : Coercing LHS to a list
No elements are deleted and independent and dependent retain all their original rows.
file (input):
structure(list(chroma = c(7.443501276, 10.96156313, 13.2987235,
17.58110922, 13.4991105), mass..Pantheria. = c(NA, 126.57, NA,
160.42, 250.57)), .Names = c("chroma", "mass..Pantheria."), class = "data.frame", row.names = c("Aeretes_melanopterus",
"Ammospermophilus_harrisii", "Ammospermophilus_insularis", "Ammospermophilus_nelsoni",
"Atlantoxerus_getulus"))
chroma mass..Pantheria.
Aeretes_melanopterus 7.443501 NA
Ammospermophilus_harrisii 10.961563 126.57
Ammospermophilus_insularis 13.298723 NA
Ammospermophilus_nelsoni 17.581109 160.42
Atlantoxerus_getulus 13.499111 250.57
desired output:
structure(list(chroma = c(10.96156313, 17.58110922, 13.4991105
), mass..Pantheria. = c(126.57, 160.42, 250.57)), .Names = c("chroma",
"mass..Pantheria."), class = "data.frame", row.names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
chroma mass..Pantheria.
Ammospermophilus_harrisii 10.96156 126.57
Ammospermophilus_nelsoni 17.58111 160.42
Atlantoxerus_getulus 13.49911 250.57
structure(c(126.57, 160.42, 250.57), .Names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii Ammospermophilus_nelsoni Atlantoxerus_getulus
126.57 160.42 250.57
structure(c(10.96156313, 17.58110922, 13.4991105), .Names = c("Ammospermophilus_harrisii",
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii Ammospermophilus_nelsoni Atlantoxerus_getulus
10.96156 17.58111 13.49911
Looks like you want to omit rows from your data where chroma or mass..Pantheria are NA. Here's a quick way to do it:
data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]
I'm not sure why you are breaking independent and dependent out separately, but after filtering out bad observations is a good time to do it.
Since those are your only two columns, this is equivalent to omitting rows from your data frame that have any NA values, so you can use a shortcut like this:
data = na.omit(data)
If you want to keep a "pristine" copy of your raw data, simply change the name of the result:
data_no_na = na.omit(data)
# or
data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]
As to what's wrong with your code, $ is used for extracting columns from a data frame, but you're trying to use it for a named vector (since you've already extracted the columns), which doesn't work. Even then, $ only works with a literal string, you can't use it with a variable. For data frames, you need to use brackets to extract columns stored in variables. For example, the built-in mtcars data has a column called "mpg":
# these work:
mtcars$mpg
mtcars[, "mpg"]
my_col = "mpg"
mtcars[, my_col]
mtcars$my_col ## does not work, need to use brackets!
You can never use $ with row names in a data frame, only column names.
I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).
When using lattice to plot values against hourly timestamps, I've found there is an annoying timezone shift from UTC to local time in the graph's x-axis labels. Though this example uses lubridate, the issue occurs when using POSIXct directly. For example:
library(lattice)
library(lubridate)
foo <- data.frame(t = seq(ymd_hms("2015-01-01 00:00:00"),
ymd_hms("2015-01-02 00:00:00"),
by = "hour"),
y = 1:25)
head(foo)
xyplot(y~t, foo) # time axis is behind by 5 hours (EST = UTC-5)
One solution is to specify the timezone explicitly:
tz(foo$t) <- "" # or tz(foo$t) <- "EST"
head(foo)
xyplot(y~t, foo) # time axis now agrees
Are there other ways to get lattice to plot directly in UTC without modifying the timezone of the data? Perhaps using the scales = list(x = list(format = ...)) argument? I can imagine situations where it would be bad to change the data's timezone, specifically when dealing with daylight saving events.
A brute force approach would be to temporarily overwrite your system's timezone with the timezone from your data, and reset it to its previous value once the plot has been created:
tzn = Sys.getenv("TZ")
Sys.setenv(TZ = tz(foo$t))
xyplot(
y ~ t
, data = foo
)
Sys.setenv(TZ = tzn)
This is a bug; the process of combining per-panel limits into an overall limit (even if there's only one panel) loses attributes, including the tz attribute.
p <- xyplot(y ~ t, foo)
str(attributes(foo$t))
# List of 2
# $ class: chr [1:2] "POSIXct" "POSIXt"
# $ tzone: chr "UTC"
str(attributes(p$x.limits))
# List of 1
# $ class: chr [1:2] "POSIXct" "POSIXt"
Fixed by setting
attributes(p$x.limits)$tzone <- "UTC"
Other workarounds are
xyplot(y ~ t, foo, xlim = extendrange(foo$t))
xyplot(y ~ t, foo, scales = "free")