Error: Column named '1640233800' cannot be found in 'df' - r

I'm using R and moveVis package of R to do some movement visualization. Below is the csv from where I import the data using read.csv
I'm having trouble converting the data.frame to moveStack using df2move
trackId,x,y,time,x1,x2,optional,sensor,timestamps
A34,19.00094708496841,72.8264388198447,2021-12-23 10:00:00,19.00094708496841,72.8264388198447,FALSE,unknown,2021-12-23 10:00:00
A34,18.986663359819435,72.84012881354482,2021-12-23 10:02:00,18.986663359819435,72.84012881354482,FALSE,unknown,2021-12-23 10:02:00
raw_data <- read.csv("mdata2.csv", header = TRUE)
m <- df2move(raw_data, proj = "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs", x = "x1", y = "x2", time = as.POSIXct(raw_data$timestamps, format = "%Y-%m-%d %H:%M:%S", tz = "UTC"), track_id = "trackId")
Getting this error on running above code
Error: Column named '1640233800' cannot be found in 'df'

The problem is with your time argument. The format of time in your dataset and the one you are specifying in your code do not match. That's why you are getting an error.
In case you are using excel, it formats timestamps to its own default. You'll need to change it first (if it's the case).
This is what it does:
So, please check the format in your csv and what you are specifying in your code. You can change the format in excel by selecting the timestamp values and pressing Ctrl + 1 key.
All you need is this:
raw_data$timestamps <- as.POSIXct(raw_data$timestamps, format = "%Y-%m-%d %H:%M", tz = "UTC")
m <- df2move(raw_data, proj = "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs", x = "x1", y = "x2", time = "timestamps", track_id = "trackId")

You have to specify a "character" for time within the df2move-function. Therefore, you have to do the transformation before applying the function (as #Vishal A. suggested as well). However, the transformation to Timestamps of class POSIXct was not correct, so NAs were introduced. See the solution:
raw_data <- structure(list(trackId = c("vipin", "vipin"), x = c(72.8409492130316, 72.8363572715711), y = c(18.9968003664781, 18.9958569245008), time = c("2021-12-23 10:00:00", "2021-12-23 10:02:00"), x1 = c(72.8409492130316, 72.8363572715711), x2 = c(18.9968003664781, 18.9958569245008 ), optional = c(FALSE, FALSE), sensor = c("unknown", "unknown" ), timestamps = structure(c(NA_real_, NA_real_), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, -2L), class = "data.frame")
raw_data$timestamps <- as.POSIXct(raw_data$time, format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
m <- moveVis::df2move(raw_data, proj = "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs", x = "x1", y = "x2", time = "timestamps", track_id = "trackId")

Related

POSIXct objects and time zones

I have date time data table imported from Excel and the date/time column in a number format (i.e., 43596.22). I used the following code to convert the number to a date time format with UTC time zone:
info_dt1$Date_time<-convertToDateTime(info_dt1$date_time, origin = "1900-01-01",tx="UTC")
I am using the forverlaps function from data.table to merge this data table with another data table by date and time. When I first ran the following code:
info_dt3 = foverlaps(info_dt2, info_access3, by.x=c("Date_time", "dummy"), nomatch=NA)[, dummy := NULL]
I got an error message stating the two date time fields had different time zones. The time zone for the other data table was also specified as UTC.
I used the attr function to set both data tables date times columns to UTC:
#make sure all date/times have same time zone
attr(info_access2$Start_time, "tzone") <- "UTC"
attr(info_access2$End_time, "tzone") <- "UTC"
attr(info_dt1$Date_time, "tzone") <- "UTC"
When I do this, the info_dt1 data table time moves forward 4 hours and the resulting merge is off. I would like to know what I am doing incorrect when setting the format and time zone for both data tables for the merge to work correctly.
Some example data and code:
#first data table reduced example
info_dt1<-
structure(list(date_time = c(NA, 43596.2284722222, 43596.2285069444,
43596.2285416667, 43596.2285763889, 43596.2286111111, 43596.2286458333,
43596.2286805556, 43596.2287152778, 43596.22875), Temp = c(NA,
22.75, 22.66, 22.57, 22.49, 22.37, 22.28, 22.16, 22.08, 21.99
), Depth = c(NA, 0.19, 0.27, 0.7, 0.27, 0.27, 0.27, 0.19, 0.19,
0.19), Angle = c(NA, -3, -4, -3, -1, 1, -1, -2, 1, -6)), .Names = c("date_time",
"Temp", "Depth", "Angle"), row.names = c(NA, 10L), class = "data.frame")
#convert date time to POSIXct
info_dt1$Date_time<-convertToDateTime(info_dt1$date_time, origin = "1900-01-01",tx="UTC")
#second example data set
info_access2<-
structure(list(Tow = 201905001:201905010, Start_time = structure(c(1557554271,
1557564948, 1557569853, 1557573081, 1557577149, 1557582317, 1557586050,
1557588636, 1557590697, 1557593679), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), End_time = structure(c(1557555117, 1557565710,
1557570765, 1557573846, 1557577974, 1557583210, 1557586797, 1557589428,
1557591441, 1557594511), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
time_interval = structure(c(846, 762, 912, 765, 825, 893,
747, 792, 744, 832), start = structure(c(1557554271, 1557564948,
1557569853, 1557573081, 1557577149, 1557582317, 1557586050,
1557588636, 1557590697, 1557593679), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), tzone = "UTC", class = structure("Interval", package = "lubridate"))), .Names = c("Tow",
"Start_time", "End_time", "time_interval"), row.names = c(NA,
10L), class = "data.frame")
library(data.table)
#make info_dt2 and info_access2 data.tables
info_access3<-as.data.table(info_access2)
info_dt2<-as.data.table(info_dt1)
#remove NA from info_dt2
info_dt2<-info_dt2[complete.cases(info_dt2),]
#set dummy column for info_dt2
info_dt2[, dummy := Date_time]
#define setkey for info_access2
setkey(info_access3, Start_time, End_time)
#if I run the code like this I get the error message about different time zones
#use foverlaps to merge info_access3 and info_dt2
info_dt3 = foverlaps(info_dt2, info_access3, by.x=c("Date_time", "dummy"), nomatch=NA)[, dummy := NULL]
#if I run this chunk of code the times in info_dt1 are moved forward 4 hours
#make sure all date/times have same time zone
attr(info_access2$Start_time, "tzone") <- "UTC"
attr(info_access2$End_time, "tzone") <- "UTC"
attr(info_dt1$Date_time, "tzone") <- "UTC"
#make info_dt2 and info_access2 data.tables
info_access3<-as.data.table(info_access2)
info_dt2<-as.data.table(info_dt1)
#remove NA from info_dt2
info_dt2<-info_dt2[complete.cases(info_dt2),]
#but the foverlaps to merge info_access2 and info_dt2 doesn't give an error message
info_dt3 = foverlaps(info_dt2, info_access3, by.x=c("Date_time", "dummy"), nomatch=NA)[, dummy := NULL]
You can use lubridate::force_tz() to change a timestamp which had an inaccurate timezone when it was read in:
lubridate::force_tz(Sys.time(), "UTC")
#[1] "2019-06-25 14:04:32 UTC"
This will change the underlying timestamp double whereas merely altering the attribute won't.

Calculate sunrise function that works with dataframe with dplyr::mutate?

I am having trouble with a function I wrote when trying to apply it to a dataframe to mutate in a new column
I want to add a column to a dataframe that calculates the sunrise/sunset time for all rows based on existing columns for Latitude, Longitude and Date. The sunrise/sunset calculation is derived from the "sunriseset" function from the maptools package.
Below is my function:
library(maptools)
library(tidyverse)
sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1)
{
lat.long <- matrix(c(long, lat), nrow = 1)
day <- as.POSIXct(date, tz = timezone)
sequence <- seq(from = day, length.out = num.days, by = "days")
sunrise <- sunriset(lat.long, sequence, direction = "sunrise",
POSIXct = TRUE)
sunset <- sunriset(lat.long, sequence, direction = "sunset",
POSIXct = TRUE)
ss <- data.frame(sunrise, sunset)
ss <- ss[, -c(1, 3)]
colnames(ss) <- c("sunrise", "sunset")
if (direction == "sunrise") {
return(ss[1,1])
} else {
return(ss[1,2])
}
}
When I run the function for a single input I get the expected output:
sunrise.set2(41.2, -73.2, "2018-12-09 07:34:0", timezone="EST",
direction = "sunset", num.days = 1)
[1] "2018-12-09 16:23:46 EST"
However, when I try to do this on a dataframe object to mutate in a new column like so:
df <- df %>%
mutate(set = sunrise.set2(Latitude, Longitude, LocalDateTime, timezone="UTC", num.days = 1, direction = "sunset"))
I get the following error:
Error in mutate_impl(.data, dots) :
Evaluation error: 'from' must be of length 1.
The dput of my df is below. I suspect I'm not doing something right in order to properly vectorize my function but I'm not sure what.
Thanks
dput(df):
structure(list(Latitude = c(20.666, 20.676, 20.686, 20.696, 20.706,
20.716, 20.726, 20.736, 20.746, 20.756, 20.766, 20.776), Longitude = c(-156.449,
-156.459, -156.469, -156.479, -156.489, -156.499, -156.509, -156.519,
-156.529, -156.539, -156.549, -156.559), LocalDateTime = structure(c(1534318440,
1534404840, 1534491240, 1534577640, 1534664040, 1534750440, 1534836840,
1534923240, 1535009640, 1535096040, 1535182440, 1535268840), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), .Names = c("Latitude", "Longitude",
"LocalDateTime"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), spec = structure(list(cols = structure(list(
Latitude = structure(list(), class = c("collector_double",
"collector")), Longitude = structure(list(), class = c("collector_double",
"collector")), LocalDateTime = structure(list(format = "%m/%d/%Y %H:%M"), .Names = "format", class = c("collector_datetime",
"collector"))), .Names = c("Latitude", "Longitude", "LocalDateTime"
)), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
The problem is indeed that your function as it is now is not vectorized, it breaks if you give it more than one value. A workaround (as Suliman suggested) is using rowwise() or a variant of apply, but that would give your function a lot of unnecessary work.
So better to make it vectorized, as maptools::sunriset is also vectorized. First suggestion: Debug or rewrite it with vectors as input, and then you easily see the lines where something unexpected happens. Let's go at it line by line, I've outcommented your lines where I replace it with something else:
library(maptools)
library(tidyverse)
# sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1)
sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset")
# Why an argument saying how many days? You have the length of your dates
{
#lat.long <- matrix(c(long, lat), nrow = 1)
lat.long <- cbind(lon, lat)
day <- as.POSIXct(date, tz = timezone)
# sequence <- seq(from = day, length.out = num.days, by = "days") # Your days object is fine
sunrise <- sunriset(lat.long, day, direction = "sunrise",
POSIXct = TRUE)
sunset <- sunriset(lat.long, day, direction = "sunset",
POSIXct = TRUE)
# I've replaced sequence with day here
ss <- data.frame(sunrise, sunset)
ss <- ss[, -c(1, 3)]
colnames(ss) <- c("sunrise", "sunset")
if (direction == "sunrise") {
#return(ss[1,1])
return(ss[,1])
} else {
#return(ss[1,2])
return(ss[,2])
}
}
But looking at your function, I think there is still a lot of extra work done that doesn't serve any purpose.
You're calculating both sunrise and sunset, only to use one of them. And you can just pass one your direction-argument, without even looking at it.
Is it useful to ask for a seperate date and timezone? When your users give you a POSIXt-object, the timezone is included. And it's nice if you can input a string as a date, but that only works if it's in the right format. To keep it simple, I'd just ask for a POSIXct as input (which is in your example-data.frame)
Why are you making a data.frame and assigning names before returning? As soon as you're subsetting, it all gets dropped again.
Which means your function can be a lot shorter:
sunrise.set2 <- function(lat, lon, date, direction = c("sunrise", "sunset")) {
lat.long <- cbind(lon, lat)
sunriset(lat.long, date, direction=direction, POSIXct.out=TRUE)[,2]
}
If you have no control over your input you might need to add some checks, but usually I find it most useful to keep focused on just the thing you want to accomplish.

how to simply plot similar dates of different years in one plot

I have a dataframe with dates. Here are the first 3 rows with dput:
df.cv <- structure(list(ds = structure(c(1448064000, 1448150400, 1448236800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), y = c(10.4885204292416,
10.456538985014, 10.4264986311659), yhat = c(10.4851491194439,
10.282089547027, 10.4354960430083), yhat_lower = c(10.4169914076864,
10.2162549984153, 10.368531352493), yhat_upper = c(10.5506038959764,
10.3556867861042, 10.5093092789713), cutoff = structure(c(1447977600,
1447977600, 1447977600), class = c("POSIXct", "POSIXt"), tzone = "UTC")),.Names = c("ds",
"y", "yhat", "yhat_lower", "yhat_upper", "cutoff"), row.names = c(NA,
-3L), class = c("`enter code here`tbl_df", "tbl", "data.frame"))
I'm trying to plot the data with ggplot + geom_line from similar day/month combinations in one plot. So, for example, I want the y-value of 2016-01-01 to appear on the same x-value as 2017-01-01. If found a way to do this, but it seems to be a very complex workaround:
library(tidyverse)
library(lubridate)
p <- df.cv %>%
mutate(jaar = as.factor(year(ds))) %>%
mutate(x = as_date(as.POSIXct(
ifelse(jaar==2016, ds + years(1), ds),
origin = "1970-01-01")))
ggplot(p %>% filter(jaar!=2015), aes(x=x, group=jaar, color=jaar)) +
geom_line(aes(y=y))
It works, but as you can see I first have to extract the year, then use an ifelse to add one year to only the 2016 dates, convert with POSIXct because ifelse strips the class, convert back into POSIXct while supplying an origin, and finally remove the timestamp with as_date.
Isn't there a simpler, more elegant way to do this?
Use year<- to replace the year with any fixed leap year:
p <- df.cv %>%
mutate(jaar = as.factor(year(ds)),
x = `year<-`(as_date(ds), 2000))
ggplot(p, aes(x = x, y = y, color = jaar)) +
geom_line()

R: Unambiguous format when reading zoo and converting to with "as.POSIXct"

I am trying to read some data into a read.zoo data, but always getting some error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
The code is:
df_zoo <- read.zoo("mydata.csv",
header = TRUE,
sep = ";",
index = 1:2,
FUN = paste,
FUN2 = as.POSIXct,
format = "%d.%m.%Y %H:%M:%S",
tz = "UTC",
dec = ",")
The first lines of the log are
DATE;TIME_UTC;#1;#2;#3
14.06.2016;12:15:11;TRUE;TRUE;43,2
14.06.2016;12:15:12;TRUE;TRUE;43,3
14.06.2016;12:15:13;TRUE;TRUE;43,3
...
I could change the date in the CSV, but it should be doable in R and i don't want to change it for any further CSV. Also
as.POSIXct(paste("14.06.2016","12:15:11"), format = "%d.%m.%Y %H:%M:%S", tz = "UTC")
is working just fine:
[1] "2016-06-14 12:15:11 UTC"
I don't see the problem.
1) read.zoo will automatically paste multi-column indexes together so this will work and does not use FUN= at all. Note that zoo represents the data as a matrix and so in this case the logicals will be coerced to numeric.
library(zoo)
read.zoo("mydata.csv", check.names = FALSE,
header = TRUE, sep = ";", comment = "", dec = ",",
index = 1:2, format = "%d.%m.%Y %H:%M:%S", tz = "")
2) read.zoo can also read data frames so this would work and takes advantage of the fact that many of the needed arguments are default arguments of read.csv2:
d <- read.csv2("mydata.csv", check.names = FALSE)
read.zoo(d, index = 1:2, format = "%d.%m.%Y %H:%M:%S", tz = "")
You only need 1 FUN line to define the function to paste col1 and col2:
FUN = function(d,t) as.POSIXct(paste(d,t),format = "%d.%m.%Y %H:%M:%S",tz = "UTC”)
So to import your file as zoo object:
df_zoo <- read.zoo("mydata.csv",
header = TRUE,
sep = ";",
index = 1:2,
FUN = function(d,t) as.POSIXct(paste(d,t),format = "%d.%m.%Y %H:%M:%S",tz = "UTC"),
dec = ",")

How to convert matrix with tick data into xts?

I have the data which i am trying to convert into xts format:
> dput(data)
structure(list(50370788L, 50370777L, 50370694L, 50370620L, 50370504L,
620639L, 620639L, 592639L, 592639L, 592639L, "2015-10-24",
"2015-10-24", "2015-09-04", "2015-09-04", "2015-09-04", structure(list(
id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
structure(list(id = 12544L, symbol = "GBSN", title = "Great Basin Scientific, Inc."), .Names = c("id",
"symbol", "title"), class = "data.frame", row.names = 1L),
"$GBSN Still sticking with my prediction of FDA coming sometime in March..",
"$GBSN Last time I check NASDAQ gave them till sometime in April to get it together or else they'll see pink. Correct me if in wrong?",
"$GBSN time for retailers to get knocked out of the ring with a 25 to 30 % gain",
"$GBSN market cap will end up around 65 million not enough to comply rs takes it to 21 dollars pps 26$ by august",
"$GBSN shorts are going to attack the sell off"), .Dim = c(5L,
5L), .Dimnames = list(c("2016-02-28 16:59:53", "2016-02-28 16:58:58",
"2016-02-28 16:51:36", "2016-02-28 16:46:09", "2016-02-28 16:34:34"
), c("GBSN.Message_ID", "GBSN.User_ID", "GBSN.User_Join_Date",
"GBSN.Message_Symbols", "GBSN.Message_Body")))
I have been trying to use :
Message_series <- xts(zoo(data, format='%Y-%m-%d %H:%M:%S'))
i get this error:
Error in zoo(data, format = "%Y-%m-%d %H:%M:%S") :
unused argument (format = "%Y-%m-%d %H:%M:%S")
Your matrix is not tidy. Look at the fourth column (data[,4]). zoo and hence xts not support so complicated object only simple matrix with all element in the same type.
First and second columns are OK. They inherited the list properties so the conversion is not so straightforward.
data.mat <- matrix(as.numeric(data[,1:2]), ncol = 2)
colnames(data.mat) <- colnames(data)[1:2]
xts(data.mat, order.by = as.POSIXct(rownames(data)))
Join data can be converted and included:
data.mat <- cbind(data.mat, as.numeric(as.Date(as.character(data[,3]))))
colnames(data.mat) <- colnames(data)[1:3]
data.xts <- xts(data.mat, order.by = as.POSIXct(rownames(data)))
and transformable back:
as.Date(coredata(data.xts['2016-02-28 16:59:53',3]))
You can code variables id, symbol, title from Message_Symbols too in the same way.
I recommend you store Message_Body in a separate object (e.g. data.frame).
Based on the column names of data, it appears that all of your data is or could be of character type. However, data[,4], GBSN.Message_Symbols, contains lists, not an atomic vector so we'll have to flatten using rbind. apply is then used to convert each column to a character vector and combine to form a character matrix. The xts object is formed by converting the rownames to POSIX date/time types and using them as the index. Code would look like
# flatten list data in column 4 to a data frame
mat4 <- do.call(rbind, data[,4])
# convert all data to character type
data.mat <- apply(cbind(data[,-4], mat4), 2, as.character)
# create xts time series
data.xts <- xts(data.mat, order.by = as.POSIXct(rownames(data)))

Resources