Calculate sunrise function that works with dataframe with dplyr::mutate? - r

I am having trouble with a function I wrote when trying to apply it to a dataframe to mutate in a new column
I want to add a column to a dataframe that calculates the sunrise/sunset time for all rows based on existing columns for Latitude, Longitude and Date. The sunrise/sunset calculation is derived from the "sunriseset" function from the maptools package.
Below is my function:
library(maptools)
library(tidyverse)
sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1)
{
lat.long <- matrix(c(long, lat), nrow = 1)
day <- as.POSIXct(date, tz = timezone)
sequence <- seq(from = day, length.out = num.days, by = "days")
sunrise <- sunriset(lat.long, sequence, direction = "sunrise",
POSIXct = TRUE)
sunset <- sunriset(lat.long, sequence, direction = "sunset",
POSIXct = TRUE)
ss <- data.frame(sunrise, sunset)
ss <- ss[, -c(1, 3)]
colnames(ss) <- c("sunrise", "sunset")
if (direction == "sunrise") {
return(ss[1,1])
} else {
return(ss[1,2])
}
}
When I run the function for a single input I get the expected output:
sunrise.set2(41.2, -73.2, "2018-12-09 07:34:0", timezone="EST",
direction = "sunset", num.days = 1)
[1] "2018-12-09 16:23:46 EST"
However, when I try to do this on a dataframe object to mutate in a new column like so:
df <- df %>%
mutate(set = sunrise.set2(Latitude, Longitude, LocalDateTime, timezone="UTC", num.days = 1, direction = "sunset"))
I get the following error:
Error in mutate_impl(.data, dots) :
Evaluation error: 'from' must be of length 1.
The dput of my df is below. I suspect I'm not doing something right in order to properly vectorize my function but I'm not sure what.
Thanks
dput(df):
structure(list(Latitude = c(20.666, 20.676, 20.686, 20.696, 20.706,
20.716, 20.726, 20.736, 20.746, 20.756, 20.766, 20.776), Longitude = c(-156.449,
-156.459, -156.469, -156.479, -156.489, -156.499, -156.509, -156.519,
-156.529, -156.539, -156.549, -156.559), LocalDateTime = structure(c(1534318440,
1534404840, 1534491240, 1534577640, 1534664040, 1534750440, 1534836840,
1534923240, 1535009640, 1535096040, 1535182440, 1535268840), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), .Names = c("Latitude", "Longitude",
"LocalDateTime"), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"), spec = structure(list(cols = structure(list(
Latitude = structure(list(), class = c("collector_double",
"collector")), Longitude = structure(list(), class = c("collector_double",
"collector")), LocalDateTime = structure(list(format = "%m/%d/%Y %H:%M"), .Names = "format", class = c("collector_datetime",
"collector"))), .Names = c("Latitude", "Longitude", "LocalDateTime"
)), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

The problem is indeed that your function as it is now is not vectorized, it breaks if you give it more than one value. A workaround (as Suliman suggested) is using rowwise() or a variant of apply, but that would give your function a lot of unnecessary work.
So better to make it vectorized, as maptools::sunriset is also vectorized. First suggestion: Debug or rewrite it with vectors as input, and then you easily see the lines where something unexpected happens. Let's go at it line by line, I've outcommented your lines where I replace it with something else:
library(maptools)
library(tidyverse)
# sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1)
sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset")
# Why an argument saying how many days? You have the length of your dates
{
#lat.long <- matrix(c(long, lat), nrow = 1)
lat.long <- cbind(lon, lat)
day <- as.POSIXct(date, tz = timezone)
# sequence <- seq(from = day, length.out = num.days, by = "days") # Your days object is fine
sunrise <- sunriset(lat.long, day, direction = "sunrise",
POSIXct = TRUE)
sunset <- sunriset(lat.long, day, direction = "sunset",
POSIXct = TRUE)
# I've replaced sequence with day here
ss <- data.frame(sunrise, sunset)
ss <- ss[, -c(1, 3)]
colnames(ss) <- c("sunrise", "sunset")
if (direction == "sunrise") {
#return(ss[1,1])
return(ss[,1])
} else {
#return(ss[1,2])
return(ss[,2])
}
}
But looking at your function, I think there is still a lot of extra work done that doesn't serve any purpose.
You're calculating both sunrise and sunset, only to use one of them. And you can just pass one your direction-argument, without even looking at it.
Is it useful to ask for a seperate date and timezone? When your users give you a POSIXt-object, the timezone is included. And it's nice if you can input a string as a date, but that only works if it's in the right format. To keep it simple, I'd just ask for a POSIXct as input (which is in your example-data.frame)
Why are you making a data.frame and assigning names before returning? As soon as you're subsetting, it all gets dropped again.
Which means your function can be a lot shorter:
sunrise.set2 <- function(lat, lon, date, direction = c("sunrise", "sunset")) {
lat.long <- cbind(lon, lat)
sunriset(lat.long, date, direction=direction, POSIXct.out=TRUE)[,2]
}
If you have no control over your input you might need to add some checks, but usually I find it most useful to keep focused on just the thing you want to accomplish.

Related

Error: Column named '1640233800' cannot be found in 'df'

I'm using R and moveVis package of R to do some movement visualization. Below is the csv from where I import the data using read.csv
I'm having trouble converting the data.frame to moveStack using df2move
trackId,x,y,time,x1,x2,optional,sensor,timestamps
A34,19.00094708496841,72.8264388198447,2021-12-23 10:00:00,19.00094708496841,72.8264388198447,FALSE,unknown,2021-12-23 10:00:00
A34,18.986663359819435,72.84012881354482,2021-12-23 10:02:00,18.986663359819435,72.84012881354482,FALSE,unknown,2021-12-23 10:02:00
raw_data <- read.csv("mdata2.csv", header = TRUE)
m <- df2move(raw_data, proj = "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs", x = "x1", y = "x2", time = as.POSIXct(raw_data$timestamps, format = "%Y-%m-%d %H:%M:%S", tz = "UTC"), track_id = "trackId")
Getting this error on running above code
Error: Column named '1640233800' cannot be found in 'df'
The problem is with your time argument. The format of time in your dataset and the one you are specifying in your code do not match. That's why you are getting an error.
In case you are using excel, it formats timestamps to its own default. You'll need to change it first (if it's the case).
This is what it does:
So, please check the format in your csv and what you are specifying in your code. You can change the format in excel by selecting the timestamp values and pressing Ctrl + 1 key.
All you need is this:
raw_data$timestamps <- as.POSIXct(raw_data$timestamps, format = "%Y-%m-%d %H:%M", tz = "UTC")
m <- df2move(raw_data, proj = "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs", x = "x1", y = "x2", time = "timestamps", track_id = "trackId")
You have to specify a "character" for time within the df2move-function. Therefore, you have to do the transformation before applying the function (as #Vishal A. suggested as well). However, the transformation to Timestamps of class POSIXct was not correct, so NAs were introduced. See the solution:
raw_data <- structure(list(trackId = c("vipin", "vipin"), x = c(72.8409492130316, 72.8363572715711), y = c(18.9968003664781, 18.9958569245008), time = c("2021-12-23 10:00:00", "2021-12-23 10:02:00"), x1 = c(72.8409492130316, 72.8363572715711), x2 = c(18.9968003664781, 18.9958569245008 ), optional = c(FALSE, FALSE), sensor = c("unknown", "unknown" ), timestamps = structure(c(NA_real_, NA_real_), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, -2L), class = "data.frame")
raw_data$timestamps <- as.POSIXct(raw_data$time, format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
m <- moveVis::df2move(raw_data, proj = "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs", x = "x1", y = "x2", time = "timestamps", track_id = "trackId")

Casting data correctly in R using the grep function

I'm trying to reshape my data based on the value in a particular column (ie. "up" and "down"). The Up and Down are not in the same order in the data frame, so I'm having difficultly "casting" the data into the right shape.
I've tried used the cast function to shift the data, but I can't get the answers to work in a consistent (aka accurate) fashion.
This is my input:
input = structure(list(X = 1:6, Report = c("Sales.csv", "Sales.csv",
"Sales.csv", "Sales.csv", "Sales.csv", "Sales.csv"), Shock = c("Currencies.USD_Up",
"Currencies.USD_Down", "Currencies.AUD_Up", "Currencies.AUD_Down",
"Currencies.EUR_Down", "Currencies.EUR_Up"), Result = c(-519375.9816,
-7388851.423, -42950.77683, -667.367063, -12819532.15, -138054.0061
), FX = c("USD", "USD", "AUD", "AUD", "EUR", "EUR")), class = "data.frame", row.names = c(NA,
-6L))
and this is my preferred output:
output = structure(list(X = 1:3, Report = c("Sales.csv", "Sales.csv",
"Sales.csv"), Shock = c("Currencies.USD", "Currencies.AUD", "Currencies.EUR"
), Currency = c("USD", "AUD", "EUR"), Up = c(-519375.9816, -42950.77683,
-138054.0061), Down = c(-7388851.423, -667.367063, -12819532.15
)), class = "data.frame", row.names = c(NA, -3L))
Because the EUR data in the input is in a different order, I can't seem to make the data shape correctly. I've tried using the grep function to order this, but I can't make this work. Can anyone suggest a better way?
This is a tidyverse approach to do it:
library(dplyr)
library(tidyr)
library(tibble)
input %>%
as_tibble() %>%
separate(Shock, c("Shock", "tmp"), sep = "_") %>%
rename(Currency = FX) %>%
select(-X) %>%
spread(tmp, Result) %>%
mutate(X = row_number()) %>%
select(X, Report, Shock, Currency, Up, Down)

how to simply plot similar dates of different years in one plot

I have a dataframe with dates. Here are the first 3 rows with dput:
df.cv <- structure(list(ds = structure(c(1448064000, 1448150400, 1448236800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), y = c(10.4885204292416,
10.456538985014, 10.4264986311659), yhat = c(10.4851491194439,
10.282089547027, 10.4354960430083), yhat_lower = c(10.4169914076864,
10.2162549984153, 10.368531352493), yhat_upper = c(10.5506038959764,
10.3556867861042, 10.5093092789713), cutoff = structure(c(1447977600,
1447977600, 1447977600), class = c("POSIXct", "POSIXt"), tzone = "UTC")),.Names = c("ds",
"y", "yhat", "yhat_lower", "yhat_upper", "cutoff"), row.names = c(NA,
-3L), class = c("`enter code here`tbl_df", "tbl", "data.frame"))
I'm trying to plot the data with ggplot + geom_line from similar day/month combinations in one plot. So, for example, I want the y-value of 2016-01-01 to appear on the same x-value as 2017-01-01. If found a way to do this, but it seems to be a very complex workaround:
library(tidyverse)
library(lubridate)
p <- df.cv %>%
mutate(jaar = as.factor(year(ds))) %>%
mutate(x = as_date(as.POSIXct(
ifelse(jaar==2016, ds + years(1), ds),
origin = "1970-01-01")))
ggplot(p %>% filter(jaar!=2015), aes(x=x, group=jaar, color=jaar)) +
geom_line(aes(y=y))
It works, but as you can see I first have to extract the year, then use an ifelse to add one year to only the 2016 dates, convert with POSIXct because ifelse strips the class, convert back into POSIXct while supplying an origin, and finally remove the timestamp with as_date.
Isn't there a simpler, more elegant way to do this?
Use year<- to replace the year with any fixed leap year:
p <- df.cv %>%
mutate(jaar = as.factor(year(ds)),
x = `year<-`(as_date(ds), 2000))
ggplot(p, aes(x = x, y = y, color = jaar)) +
geom_line()

Using literal month names with year in ramcharts

Here is my code to generate barplot using rAmChart,
library(rAmCharts)
amBarplot(x = "month", y = "value", data = dataset,
dataDateFormat = "MM/YYYY", minPeriod = "MM",
show_values = FALSE, labelRotation = -90, depth = 0.1)
However, is there a way to use month names & year in my x axis? I am trying to use MMM-YY formats.
Sample dataset,
structure(list(value = c(11544, 9588, 9411, 10365, 11154, 12688
), month = c("05/2012", "06/2012", "07/2012", "08/2012", "09/2012",
"10/2012")), .Names = c("value", "month"), row.names = c(NA,
6L), class = "data.frame")
Thanks.
It appears that rAmCharts doesn't expose AmCharts' dateFormats setting in the categoryAxis, so you have to access it through the init event and create your own dateFormats array with a modified format string for the MM period. I'm not very experienced with R, but here's how I managed to make it work using R 3.4.2 and rAmCharts 2.1.5
chart <- amBarplot( ... settings omitted ... )
addListener(.Object = chart,
name = 'init',
expression = paste(
"function(e) {",
"e.chart.categoryAxis.dateFormats = ",
'[{"period":"fff","format":"JJ:NN:SS"},{"period":"ss","format":"JJ:NN:SS"},',
'{"period":"mm","format":"JJ:NN"},{"period":"hh","format":"JJ:NN"},{"period":"DD","format":"MMM DD"},',
'{"period":"WW","format":"MMM DD"},',
'{"period":"MM","format":"MMM-YY"},', # "add YY to default MM format
'{"period":"YYYY","format":"YYYY"}]; ',
'e.chart.validateData();',
"}")
)
Here is a different solution:
library(rAmCharts)
dataset <- structure(list(value = c(11544, 9588, 9411, 10365, 11154, 12688
), month = c("05/2012", "06/2012", "07/2012", "08/2012", "09/2012",
"10/2012")), .Names = c("value", "month"), row.names = c(NA,
6L), class = "data.frame")
dataset$month <- as.character(
format(
as.Date(paste0("01/",dataset$month), "%d/%m/%Y"),
"%B %Y"))
amBarplot(x = "month", y = "value", data = dataset,
show_values = FALSE, labelRotation = -90, depth = 0.1)

R - count of items in line chart: match DateTime to count of items

I have a dataframe with the following structure:
df <- structure(list(Name = structure(1:9, .Label = c("task 1", "task 2",
"task 3", "task 4", "task 5", "task 6", "task 7", "task 8", "task 9"
), class = "factor"), Start = structure(c(1479799800, 1479800100,
1479800400, 1479800700, 1479801000, 1479801300, 1479801600, 1479801900,
1479802200), class = c("POSIXct", "POSIXt"), tzone = ""), End = structure(c(1479801072,
1479800892, 1479801492, 1479802092, 1479802692, 1479803292, 1479803892,
1479804492, 1479805092), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("Name",
"Start", "End"), row.names = c(NA, -9L), class = "data.frame")
Now I want to count the items in column "Name" over time. They all have a start and end datetimes, which are formated as POSIXct.
With help of this solution here on SO I was able to do so (or at least I think I was) with following code:
library(data.table)
setDT(df)
dates = seq(min(df$Start), max(df$End), by = "min")
lookup = data.table(Start = dates, End = dates, key = c("Start", "End"))
ans = foverlaps(df, lookup, type = "any", which = TRUE)
library(ggplot2)
ggplot(ans[, .N, by = yid], aes(x = yid, y = N)) + geom_line()
Problem now:
How do I match my DateTime-scale to those integer values on the x-axis? Or is there a faster and better solution to solve my problem?
I tried to use x = as.POSIXct(yid, format = "%Y-%m-%dT%H:%M:%S", origin = min(df$Start)) within the aes of the ggplot(). But that didn't work.
EDIT:
When using the solution for this problem, I face another. Items, where there is no count, are displayed with the count of the latest countable item in the plot. This is why we have to merge (leftjoin) the table with the counts (ants) again with a complete sequence of all Datetimes and put a 0 for every NA. So we get explicit values for every necessary datapoint.
Like this:
# The part we use to count and match the right times
df1 <- ans[, .N, by = yid] %>%
mutate(time = min(df$Start) + minutes(yid))
# The part where we use the sequence from the beginning for a LEFT JOIN with the counting dataframe
df2 <- data.frame(time = dates)
dt <- merge(x = df2, y = df1, by = "time", all.x = TRUE)
dt[is.na(dt)] <- 0
In the tidyverse framework, this is a slightly different task -
Generate the sames dates variable you have.
Construct a data frame with all dates and all times (cartesian join)
Filter out the rows that are not in the interval for each task
Add up the tasks for each minute that remain
Plot.
That looks something like this --
library(tidyverse)
library(lubridate)
dates = seq(min(df$Start), max(df$End), by = "min")
df %>%
mutate(key = 1) %>%
left_join(data_frame(key = 1, times = dates)) %>%
mutate(include = times %within% interval(Start, End)) %>%
filter(include) %>%
group_by(times) %>%
summarise(count = n()) %>%
ggplot(aes(times, count)) +
geom_line()
#> Joining, by = "key"
If you need it to be faster, it will almost certainly be faster using your original data.table code.
Consider this.
library(data.table)
setDT(df)
dates = seq(min(df$Start), max(df$End), by = "min")
lookup = data.table(Start = dates, End = dates, key = c("Start", "End"))
ans = foverlaps(df, lookup, type = "any", which = TRUE)
ans[, .N, by = yid] %>%
mutate(time = min(df$Start) + minutes(yid)) %>%
ggplot(aes(time, N)) +
geom_line()
Now we use data.table to calculate the overlap, and then index time off the starting minute. Once we add a new column with the times, we can plot.

Resources