I am running some analysis code in R studio and on my Mac I get an error that prevents me continuing, but when I run the same code on a PC I only get a warning, which then allows me to continue with the analysis.
convert.dates <- function(data, columns, format="%Y-%m-%d %H:%M:%S") {
if (length(columns) == 1) {
posdates <- data.frame(strptime(data[, columns], format))
names(posdates) <- columns
} else
posdates <- as.data.frame(apply(data[, columns], 2, strptime, format))
data <- data[, -match(columns, names(data))]
data.frame(data, posdates)
}
recdat$datetime <- as.character(strptime(recdat$datetime, "%Y:%m:%d %H:%M:%S"))
recdat <- convert.dates(recdat, "datetime")
Mac error code: Error in strptime(data[, columns], format) : input string is too long
PC warning: row names were found from a short variable and were discarded
Why might these be different and how can I ultimately avoid receiving the error?
`structure(list(placeID = c("SHPP9", "SHPP9", "MAPP4", "MAPP4",
"MAPP4", "MAPP4", "HAPP15", "HAPP15", "HAPP15", "DEPP2"), fileID = c("IMG_0086_grid_compass.JPG",
"IMG_0089_grid_compass.JPG", "IMG_0146_grid_compass.JPG", "IMG_0149_grid_compass.JPG",
"IMG_0152_grid_compass.JPG", "IMG_0155_grid_compass.JPG", "IMG_2671_grid_compass.JPG",
"IMG_2673_grid_compass.JPG", "IMG_2676_grid_compass.JPG", "IMG_0008_grid_compass.JPG"
), datetime = c("2019:06:01 02:59:22", "2019:06:01 02:59:23",
"2019:06:06 00:45:32", "2019:06:06 00:45:33", "2019:06:06 00:45:49",
"2019:06:06 00:45:50", "2019:06:07 23:05:46", "2019:06:07 23:05:47",
"2019:06:07 23:05:48", "2019:06:10 22:29:47"), time = c(0.782634725415124,
0.78270744746729, 31.6146031824166, 31.6146759044688, 31.6158394573035,
31.6159121793556, 43.7456595925075, 43.7457323145597, 43.7458050366119,
62.438208603419), species = c("Civet", "Civet", "Civet", "Civet",
"Civet", "Civet", "Civet", "Civet", "Civet", "Civet"), absangle = c(0.233516621633944,
0.22839855680029, 0.0168248958289925, 0.112940296923411, 0.00228507404113898,
0.0993839863232428, 0.221094165650959, 0.180608750801481, 0.12502377891255,
0.269755048548634), distance = c(6.25093914350113, 5.90579526441211,
3.35416246724451, 3.5986715069128, 3.74569894202013, 3.2687423704556,
8.63358410851277, 9.12459363127179, 9.03891669576689, 3.70247933884298
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))`
It's unclear which values cause the issues - the ones you posted don't. Try looking at something like table(nchar(recdat$datetime)) to find any odd values first.
FWIW few comments: you can simply convert columns in place, e.g.:
convert.dates <- function(data, columns, format="%Y-%m-%d %H:%M:%S") {
for (col in columns) data[[col]] <- strptime(data[[col]], format)
data
}
and also dealing with POSIXlt is highly inefficient so I'd replace all of the above with
recdat$datetime <- as.POSIXct(recdat$datetime, "%Y:%m:%d %H:%M:%S")
which is much easier to compute with (you add seconds directly etc.).
Related
I`m trying to visualise data of the following form:
date volaEUROSTOXX volaSA volaKENYA25 volaNAM volaNIGERIA
1 10feb2012 0.29844454 0.1675901 0.007862087 0.12084170 0.10247617
2 17feb2012 0.31811157 0.2260064 0.157017220 0.33648935 0.22584127
3 24feb2012 0.30013672 0.1039974 0.083863921 0.11694768 0.16388161
To do so, I first converted the date (stored as a character in the original data frame) into a date-format. Which works just fine:
vola$date <- as.Date(vola$date)
str(vola$date)
Date[1:543], format: "2012-02-10" "2012-02-17" "2012-02-24" "2012-03-02" "2012-03-09"
However, if I now try to graph my data by using the chart.TimeSeries command, I get the following:
chart.TimeSeries(volatility_annul_stringdate,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
Error in if (class(x) == "numeric") { : the condition has length > 1
I tried:
Converting my date variable (in the date format) further into a time series object:
vola$date <- ts(vola$date, frequency=52, start=c(2012,9)) #returned same error from above
Converting the whole data set using its-command:
vol.xts <- xts(vola, order.by= vola$date, unique = TRUE ) # which then returned:
order.by requires an appropriate time-based object
#even though date is a time-series
What am I doing wrong? I am rather new to RStudio.. I really want to use the chart.TimeSeries command. Can someone help me?
Thanks in advance!
My MRE:
library(PerformanceAnalytics)
vola <- structure(list(date_2 = c("2012-02-10", "2012-02-17", "2012-02-24",
"2012-03-02"), volaEUROSTOXX = c(0.298444539308548, 0.318111568689346,
0.300136715173721, 0.299697518348694), volaKENYA25 = c(0.00786208733916283,
0.157017216086388, 0.0838639214634895, 0.152377054095268), volaNAM = c(0.120841704308987,
0.336489349603653, 0.116947680711746, 0.157027021050453), volaNIGERIA = c(0.102476172149181,
0.225841268897057, 0.163881614804268, 0.317349642515182), volaSA = c(0.167590111494064,
0.226006388664246, 0.103997424244881, 0.193037077784538), date = structure(c(1328832000,
1329436800, 1330041600, 1330646400), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
vola <- subset(vola, select = -c(date))
vola$date_2 <- as.Date(vola$date_2)
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#This returns the above mentioned error message.
#Thus, I tried the following:
vola$date_2 <- ts(vola$date_2, frequency=52, start=c(2012,9))
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#Which returned a different error (as described above)
#And I tried:
vol.xts <- xts(vola, order.by= vola$date_2, unique = TRUE )
#This also returned an error message.
#My intention was to then run:
#chart.TimeSeries(vol.xts,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
The documentation of PerformanceAnalytics::chart.TimeSeries is a bit vague. The issue is that when passing a dataframe you have to set the dates as row.names. To this end I first converted your data (which is a tibble) to a data.frame. Afterwards I add the dates as rownames and drop the date column:
library(PerformanceAnalytics)
vola <- as.data.frame(vola)
vola <- subset(vola, select = -c(date))
row.names(vola) <- as.Date(vola$date_2)
vola$date_2 <- NULL
chart.TimeSeries(vola,
lwd = 2, auto.grid = F, ylab = "Annualized Log Volatility", xlab = "Time",
main = "Log Volatility", lty = 1,
legend.loc = "topright"
)
I have a sample q below that contains three dates of dd/mm/yy in q$test
test
1 210376
2 141292
3 280280
I want to create a new covariate q$new that calculates the date difference from q$test to today.
I tried
q$new <- as.numeric(difftime(as.Date(q$test,format='%d/%m/%y'), as.Date(Sys.Date()), unit="weeks"))
But I receive an error message
Error in q$new <- as.numeric(difftime(as.Date(q$test, format =
"%d/%m/%y"), : object of type 'closure' is not subsettable
Do you have any idea whats wrong? Or have another solution?
q <- structure(list(test = c(210376L, 141292L, 280280L)), class = "data.frame", row.names = c(NA,
-3L))
You could do
as.numeric(difftime(Sys.Date(), as.Date(as.character(q$test), "%d%m%y"), units = "weeks"))
#[1] 2257.286 1384.143 2051.714
Few pointers -
1) Sys.Date is already of class "Date" so no need for as.Date there
2) as.Date was expecting a character string as input hence wrapped q$test in as.character
3) format in as.Date is used to represent the format we have as input and not the output we want. So in your case you used the format "%d/%m/%y" whereas the format you had was %d%m%y.
I'm working with minute data of NASDAQ, it has the index "2015-07-13 12:05:00 EST". I adjusted the system time with Sys.setenv(TZ = 'EST').
I want to program a simple buy/hold/sell strategy, therefore I create a vector of flat positions as a foundation.
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
Then I want to apply a constraint, that in a certain time window, positions are bound to be flat, which in my case means equal to 1.
pos_flat["T13:41/T14:00"] <- 1
And this returns the error:
"Error in as.POSIXlt.POSIXct(.POSIXct(.index(x)), tz = indexTZ(x)) :invalid 'tz' value".
I also get this error doing other calculations, I just used this example because it is easy and shows the problem.
As extra information:
> Sys.timezone
function (location = TRUE)
{
tz <- Sys.getenv("TZ", names = FALSE)
if (nzchar(tz))
return(tz)
if (location)
return(.Internal(tzone_name()))
z <- as.POSIXlt(Sys.time())
zz <- attr(z, "tzone")
if (length(zz) == 3L)
zz[2L + z$isdst]
else zz[1L]
}
<bytecode: 0x03648ff4>
<environment: namespace:base>
I don't understand the problem with the tz value... Any ideas?
The source of your "invalid 'tz' value" error is because, for whatever reason, R doesn't accept tz = df$var. If you set tz = 'America/New_York' or some other character value, then it will work.
Better answer (instead of using force_tz below) for converting UTC times to various timezones based on location. It is also simpler and better than looping through or using a nested ifelse. I subset and change tz based on a timezone column (which my data already has, if not you can create it). Just make sure you account for all timezones in your data
(unique(df$timezone))
df$datetime2[df$timezone == 'America/New_York'] <- format(df$datetime, tz="America/New_York")[df$timezone == 'America/New_York']
df$datetime2[df$timezone == 'America/Chicago'] <- format(df$datetime, tz="America/Chicago")[df$timezone == 'America/Chicago']
df$datetime2[df$timezone == 'America/Denver'] <- format(df$datetime, tz="America/Denver")[df$timezone == 'America/Denver']
df$datetime2[df$timezone == 'America/Los_Angeles'] <- format(df$datetime, tz="America/Los_Angeles")[df$timezone == 'America/Los_Angeles']
Previous solution: Converting to Local Time in R - Vector of Timezones
require(lubridate)
require(dplyr)
df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"), localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)
df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")
df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))
df
You are getting errors because "EST" is not a valid timezone specification. It's an abbreviation that's often used when printing and displaying timezones.
The index is printed as "2015-07-13 12:05:00 EST" because "EST" probably represents Eastern Standard Time in the United States. If you want to set the TZ environment variable to that timezone, you should use Sys.setenv() with Country/City notation:
Sys.setenv(TZ = "America/New_York")
You can also set the timezone in the xts constructor:
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ), tzone = "America/New_York")
Your error occurs because of a misinterpretation of the time object. You need to have UNIX timestamps in order to use something like
pos_flat["T13:41/T14:00"] <- 1
Try a conversion of your indices by doing something like this:
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
As you want to use EST, you have to change your environment variables (if you are not living in EST timezone). So all in all, this should work:
Sys.setenv(TZ = 'EST')
#load stuff
#...
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
pos_flat["T13:41/T14:00"] <- 1
For further information, have a look at the POSIXct and POSIXlt structures in R.
Best regards
I am importing a csv file into R, creating a 3x3 dataframe, and attempting to convert the dataframe to an xts object. But I get error message "do not match the length of object".
#DATSB <- fread("C:/Temp/GoogleDrive/R/temp.csv", select = c("DateTime","Last","Volume"))
#that results in following dput() output:
DATSB <- structure(list(DateTime = c("3/28/2016 20:37", "3/28/2016 20:36","3/28/2016 20:35"), Last = c(1221.7, 1221.8, 1221.9), Volume = c(14L,2L, 22L)), .Names = c("DateTime", "Last", "Volume"), row.names = c(NA,3L), class = "data.frame")
setDF(DATSB)
DATSB$DateTime <- strptime(DATSB$DateTime, format = "%m/%d/%Y %H:%M")
DATSBxts <- as.xts(DATSB[, -2], order.by = as.Date(DATSB$DateTime, "%Y/%m/%d %H:%M"))
DateTime Last Volume
1 3/28/2016 20:37 1221.7 14
2 3/28/2016 20:36 1221.8 2
3 3/28/2016 20:35 1221.9 22
Exact error message is "Error in as.matrix.data.frame(x) :
dims [product 12] do not match the length of object [14]"
Somehow the root of the problem is the column Volume. Without that column, it works. Unfortunately can't figure it out. Thanks for your help!
There was a typo here DATSB[, -2], correcting it works fine. General theme for xts is,
xts(data[,-date_column], order.by = data[,date_column])
Also coredata(DATSBxts) and index(DATSBxts) are helpful functions
DATSBxts = xts(DATSB[, -1], order.by = DATSB[,1] ,dateFormat = "%Y/%m/%d %H:%M:%S");rev(DATSBxts)
DATSBxts
# Last Volume
#2016-03-28 20:35:00 1221.9 22
#2016-03-28 20:36:00 1221.8 2
#2016-03-28 20:37:00 1221.7 14
I'm measuring a physiological variable with a millisecond timestamp on a number of patients. For each patient I want to apply a factor to a subset of the timestamped rows describing their posture at that exact moment.
I've tried creating the following function, which works fine when describing the first posture. When trying to apply the next "posture-factor," the previously registered posture is deleted.
TestPatient <- data.frame(Time=seq(c(ISOdatetime(2011,12,22,12,00,00)), by = "sec", length.out = 100),Value=rnorm(100, 9, 3))
patientpositionslice <- function(patient,positiontype,timestart,timestop) {
patient$Position[
format(patient$Time, "%Y-%m-%d %H:%M:%S") >= timestart &
format(patient$Time, "%Y-%m-%d %H:%M:%S") < timestop] <- positiontype
patient
}
TestPatientNew <- patientpositionslice(TestPatient,"Horizontal","2011-12-22 12:00:05","2011-12-22 12:00:10")
TestPatientNew <- patientpositionslice(TestPatient,"Vertical","2011-12-22 12:00:15","2011-12-22 12:00:20")
How do I modify the function so I can apply it repeatedly on the same patient with different postures such as "Horizontal", "Vertical", "Sitting" etc.?
Here's your solution. Probably there are more elegant ways but this is mine ;)
TestPatient <- data.frame(Time=seq(c(ISOdatetime(2011,12,22,12,00,00)), by = "sec", length.out = 100),Value=rnorm(100, 9, 3))
#Included column with position
TestPatient$position <- NA
patientpositionslice <- function(patient,positiontype,timestart,timestop) {
#changed the test to ifelse() function
new<-ifelse(
format(patient$Time, "%Y-%m-%d %H:%M:%S") >= timestart &
format(patient$Time, "%Y-%m-%d %H:%M:%S") < timestop , positiontype, patient$position)
patient$position <- new
patient
}
TestPatientNew <- patientpositionslice(TestPatient,"Horizontal","2011-12-22 12:00:05","2011-12-22 12:00:10")
#For repeated insertion use the previous object
TestPatientNew <- patientpositionslice(TestPatientNew ,"Vertical","2011-12-22 12:00:15","2011-12-22 12:00:20")
i commented the changes. hope it is like you wanted it else just correct me.