Finding maximum or minimum date value for each individual

Finding maximum or minimum date value for each individual - r

I have a dataframe in a wide format in R, denoting different visit dates for each individual (visitdate1, visitdate2, visitdate3, etc.). I'm trying to find the latest date for each individual and save it as a new column, but this doesn't seem to be working.
I checked the class of the dataframe and each visitdate is already recognized as a Date, so I don't know why the code is not working.
This is the code I tried:
df1$latestdate <- pmax(as_date(df1$visitdate1), as_date(df1$visitdate2),
as_date(df1$visitdate3))
The error I'm getting is the following:
Error in as.Date.default(x, ...) :
do not know how to convert 'x' to class “Date”
The problem is that I'm asking R to find the maximum date value per row, not to convert any date (as it's already a date).
However, even when I leave as_date out of the code, I get the error that :
replacement has 0 rows, data has 120.
Any insight that might help? Thanks in advance! Btw, I'm new to R. :)

Below I provide an example, kind of guessing what your data looks like. pmax may not be the best thing for this.
DATES = seq(as.Date('2011-01-01'),as.Date('2017-01-01'),"months")
df = data.frame(id=1:10,
visitdate1 = sample(DATES,10),
visitdate2 = sample(DATES,10),
visitdate3 = sample(DATES,10)
)
#set columns to find row Max
COLUMNS = c("visitdate1","visitdate2","visitdate3")
df$latestdate = apply(df[,COLUMNS],1,max)

Related

Executing getSunlightTimes function in R with data frame?

I am hoping that you can support me with the use of the getSunlightTimes function. I have a pixel level data frame ("latlon2") with latitude ("lat"), longitude ("lon"), and one date ("date") in format YYYY-MM-DD. The data covers the continental US, and I also have a state code variable in the data frame.
To obtain the date variable as a date class variable, I executed:
latlon2$date=as.Date(latlon2$d2003s)
I am trying to use the getSunlightTimes to identify the time of sunrise and sunset for each pixel on the designated date. However, I am having a hard time getting the function to work. There is not a lot of information on this command beyond R's help guides, so I am hoping some of you have worked with it and can offer your suggestions based on my approach so far.
First I tried using the getSunlightTimes function designating each latitude/longitude/date column in my data frame
sunrise2003CET=getSunlightTimes(date="latlon2$date", lat="latlon2$lat", lon="latlon2$lon", tz="CET", keep = c("sunrise", "sunset"))
R returns the error:
Error in getSunlightTimes(date = "latlon2$date2", lat = "latlon2$lat",
: date must to be a Date object (class Date)
What's frustrating about this is that when I look at class(latlon2$date) R verifies that the column is a "Date" class!
Next, I tried designating the data frame only:
sunrise2003CET=getSunlightTimes(data="latlon2", tz="CET", keep = c("sunrise", "sunset"))
R returns the error:
Error in .buildData(date = date, lat = lat, lon = lon, data = data) :
all(c("date", "lat", "lon") %in% colnames(data)) is not TRUE
This seems odd because I named the columns in the dataframe "date", "lat", "lon", but perhaps the error is due to the fact that there are other variables in the data frame (such as state code).
I am trying to perform this task for several dates across 15 years (and four time zones), so any suggestions on how to get this running, and also running efficiently, are much appreciated!
Thank you so much!
Colette

The problem is with the quotes. When you write
sunrise2003CET=getSunlightTimes(date="latlon2$date",
lat="latlon2$lat",
lon="latlon2$lon",
tz="CET",
keep = c("sunrise", "sunset"))
you shouldn't put the expressions for the date, lat and lon arguments in quotes, because then R will see them as strings. (You could try class("latlon2$date") to see this.) Just write it as
sunrise2003CET=getSunlightTimes(date=latlon2$date,
lat=latlon2$lat,
lon=latlon2$lon,
tz="CET",
keep = c("sunrise", "sunset"))

Integers change its values generating time series from dataframe in R

I have a list with dataframes inside it like this:
x = data.frame("city" = c("Madrid","Madrid","Madrid","Madrid"),
"date" = c('2018-11-01','2018-11-02','2018-11-03','2018-11-04'),
"visits" = c(100,200,80,38), "temp"=c(20,10,17,16))
list_of_cities= split(x, x$city) #In my original df there are a lot of cities
Then, to create a time series object (ts), I follow the next process:
madrid_data = select(list_of_cities[['Madrid']],date,visits,temp)
madrid = ts(madrid_data[,2:3], start = c(2018,305), frequency = 365)
In this example, the problem I have does not arise. However, with my original dataframe I get this:
How could I solve it? Thank you very much in advance

The problem comes from the type "integer64". It is needed to change integer64 to numeric, and in that way, everything is solved.
x$visits = as.numeric(x$visits)

csv to frequency polygon using R or python

I have a result.csv file to which contains information in the following format :
date,tweets
2015-06-15,tweet
2015-06-15,tweet
2015-06-12,tweet
2015-06-11,tweet
2015-06-11,tweet
2015-06-11,tweet
2015-06-08,tweet
2015-06-08,tweet
i want to plot a frequency polygon with number of entries corresponding to each date as y axis and dates as x axis
i have tried the following code :
pf<-read.csv("result.csv")
library(ggplot2)
qplot(datetime, data =pf, geom = "freqpoly")
but it shows the following error :
geom_path: Each group consist of only one observation. Do you need to adjust the group aesthetic?
can anyone tell me how to solve this problem. I am totally new to R so any kind of guidance will be of great help to me

Your issue is that you are trying to treat datetime as continuous, but it's imported it as a factor (discrete/categorical). Let's convert it to a Date object and then things should work:
pf$datetime = as.Date(pf$datetime)
qplot(datetime, data =pf, geom = "freqpoly")

Based on your code, I assume that the result.csv has a header: datetime, atweet. By default, read.csv takes the first line of the CSV file as column names. That means you will be able to access the two columns with pf$datetime and pf$atweet.
If you look at the documentation of read.csv, you will find that stringsAsFactors = default.stringsAsFactors(), which is FALSE. That is, the strings from CSV files are kept as factors.
Now, even if you change the value of stringsAsFactors, you still get the same error. That is because ggplot does not know how to order the dates, as it does not recognize the strings as such.
To transform the strings into logical dates, you can use strptime.
Here is the working example:
pf<-read.csv("result.csv", stringsAsFactors=FALSE)
library(ggplot2)
qplot(strptime(pf$datetime, "%Y-%m-%d"), data=pf, geom='freqpoly')

Reading CSV file in R and formatting dates and time while reading and avoiding missing values marked as?

I am trying to Reading CSV file in R . How can I read and format dates and times while reading and avoid missing values marked as ?. The data I load after reading should be clean.
I tried something like
data <- read.csv("Data.txt")
It worked, but the dates and times were as is.
Also how can I extract a subset of data from specific data range?
For this I tried something like
subdata <- subset(data,
Date== 01/02/2007 & Date==02/02/2007,
select = Date:Sub_metering_3)
I get error Error in eval(expr, envir, enclos) : object 'Date' not found
Date is the first column.

The functions read.csv() and read.table() are not set up to do detailed fancy conversion of things like dates that can have many formats. When these functions don't automatically do what's wanted, I find it best to read the data in as text and then convert variables after the fact.
data <- read.csv("Data.txt",colClasses="character",na.strings="?")
data$FixedDate <- as.Date(data$Date,format="%Y/%m/%d")
or whatever your date format is. The variable FixedDate will then be of type Date and you can use equality and other conditions to subset.
Also, in your example code you are putting 01/02/2007 as bare code, which results in dividing 1 by 2 and then by 2007 yielding 0.0002491281, rather than inserting a meaningful date. Consider as.Date("2007-01-02") instead.

In R cannot use AdjustedSharpeRatio() from 'Performance Analytics'

I have some troubles using the function AdjustedSharpeRatio() from the package PerformanceAnalytics, the following code sample in R 3.0.0:
library(PerformanceAnalytics)
logrets = array(dim=c(3,2),c(1,2,3,4,5,6))
weights = c(0.4,0.6)
AdjustedSharpeRatio(rowSums(weights*logrets),0.01)
gives the following error:
Error in checkData(R) :
The data cannot be converted into a time series. If you are trying to pass in
names from a data object with one column, you should use the form 'data[rows,
columns, drop = FALSE]'. Rownames should have standard date formats, such as
'1985-03-15'.
Replacing the last line with zoo gives the same error:
AdjustedSharpeRatio(zoo(rowSums(weights*logrets)),0.01)
Am I missing something obvious ?

Hmm...not too sure what you are trying to achieve with the logrets and weights objects there....but if logrets are already in percentages. then maybe something like this...
AdjustedSharpeRatio(xts(rowSums(weights*logrets)/100,Sys.Date()-(c(3:1)*365)), Rf=0.01)

This might work:
a <- rowSums(weights*logrets)
names(a) <- c('1985-03-15', '1985-03-16', '1985-03-17')
AdjustedSharpeRatio(a,0.01)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Finding maximum or minimum date value for each individual - r

Related

Executing getSunlightTimes function in R with data frame?

Integers change its values generating time series from dataframe in R

csv to frequency polygon using R or python

Reading CSV file in R and formatting dates and time while reading and avoiding missing values marked as?

In R cannot use AdjustedSharpeRatio() from 'Performance Analytics'

Categories

Resources