persnr date
411223-6213 2011-01-19
420211-6911 2012-01-19
420604-7716 2007-09-01
430404-8558 2011-09-01
431030-7030 2011-09-01
440127-0055 2012-09-01
I want to create a new column for persnr if the 10th digit is odd or even.
The new column will they be true or false depending on whether the 10th digit of persnr is odd or even. odd=true, even=false
I also would like to create another column för 'date' so for example 2011-09-01 is fall and in the new column fall=true
2012-01-19 is spring and in the new column spring=false.
This is certainly basic but I am a new user in the R and may not be right on it.
You can try substr. Not sure if you count the - character also. In that case,
v1 <- as.numeric(substr(df1$persnr,10,10))
Or else replace 10 by 11 as in #nico's post
df1$newCol <- as.logical(v1%%2)
I would prefer to have it as a logical column, but if you need to change it to 'odd', 'even'
df1$newCol <- c('even', 'odd')[df1$newCol+1L]
# Generate the data
my.data <- data.frame(
persnr=c("411223-6213", "420211-6911",
"420604-7716", "430404-8558",
"431030-7030", "440127-0055"),
date = c("2011-01-19", "2012-01-19",
"2007-09-01", "2011-09-01",
"2011-09-01", "2012-09-01"))
# Get the 10th digit of persnr using substring, then check the reminder
# of its division by 2 to determine if it is odd or even
# Note that I get the 11th char as there is a - in the middle of the number
digit.10 <- substr(my.data$persnr, 11, 11)
my.data$evenOdd <- ifelse(as.integer(digit.10)%%2, "odd", "even")
my.data$evenOdd <- factor(my.data$evenOdd, levels=c("odd", "even"))
To find the season of each date:
# Get month and day, ignore year
month.day <- strftime(my.data$date, format="%m-%d")
# Now check which season we're in -- ASSUMING NORTHERN HEMISPHERE, change if needed
# Also note that the dates of solstices and equinoxes are variable so
# this is approximative...
# Set everyone to winter
my.data$season <- "Winter"
# Find indices for the other seasons
spring <- which(month.day >= "03-21" & month.day < "06-21")
summer <- which(month.day >= "06-21" & month.day < "09-21")
fall <- which(month.day >= "09-21" & month.day < "12-21")
my.data$season[spring] <- "Spring"
my.data$season[summer] <- "Summer"
my.data$season[fall] <- "Fall"
my.data$season <- factor(my.data$season, levels =
c("Spring", "Summer", "Fall", "Winter"))
Related
For a time series analysis of over 1000 raster in a raster stack I need the date. The data is almost weekly in the structure of the files
"... 1981036 .... tif"
The zero separates year and week
I need something like: "1981-36"
but always get the error
Error in charToDate (x): character string is not in a standard unambiguous format
library(sp)
library(lubridate)
library(raster)
library(Zoo)
raster_path <- ".../AVHRR_All"
all_raster <- list.files(raster_path,full.names = TRUE,pattern = ".tif$")
all_raster
brings me:
all_raster
".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif"
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif"
…
To get the year and the associated week, I have used the following code:
timeline <- data.frame(
year= as.numeric(substr(basename(all_raster), start = 17, stop = 17+3)),
week= as.numeric(substr(basename(all_raster), 21, 21+2))
)
timeline
brings me:
timeline
year week
1 1981 35
2 1981 36
3 1981 37
4 1981 38
…
But I need something like = "1981-35" to be able to plot my time series later
I tried that:
timeline$week <- as.Date(paste0(timeline$year, "%Y")) + week(timeline$week -1, "%U")
and get the error:Error in charToDate(x) : character string is not in a standard unambiguous format
or I tried that
fileDates <- as.POSIXct(substr((all_raster),17,23), format="%y0%U")
and get the same error
until someone will post a better way to do this, you could try:
x <- c(".../VHP.G04.C07.NC.P1981036.SM.SMN.Andes.tif", ".../VHP.G04.C07.NC.P1981037.SM.SMN.Andes.tif",
".../VHP.G04.C07.NC.P1981038.SM.SMN.Andes.tif")
xx <- substr(x, 21, 27)
library(lubridate)
dates <- strsplit(xx,"0")
dates <- sapply(dates,function(x) {
year_week <- unlist(x)
year <- year_week[1]
week <- year_week[2]
start_date <- as.Date(paste0(year,'-01-01'))
date <- start_date+weeks(week)
#note here: OP asked for beginning of week.
#There's some ambiguity here, the above is end-of-week;
#uncommment here for beginning of week, just subtracted 6 days.
#I think this might yield inconsistent results, especially year-boundaries
#hence suggestion to use end of week. See below for possible solution
#date <- start_date+weeks(week)-days(6)
return (as.character(date))
})
newdates <- as.POSIXct(dates)
format(newdates, "%Y-%W")
Thanks to #Soren who posted this anwer here: Get the month from the week of the year
You can do it if you specify that Monday is a Weekday 1 with %u:
w <- c(35,36,37,38)
y <- c(1981,1981,1981,1981)
s <- c(1,1,1,1)
df <- data.frame(y,w,s)
df$d <- paste(as.character(df$y), as.character(df$w),as.character(df$s), sep=".")
df$date <- as.Date(df$d, "%Y.%U.%u")
# So here we have variable date as date if you need that for later.
class(df$date)
#[1] "Date"
# If you want it to look like Y-W, you can do the final formatting:
df$date <- format(df$date, "%Y-%U")
# y w s d date
# 1 1981 35 1 1981.35.1 1981-35
# 2 1981 36 1 1981.36.1 1981-36
# 3 1981 37 1 1981.37.1 1981-37
# 4 1981 38 1 1981.38.1 1981-38
# NB: though it looks correct, the resulting df$date is actually a character:
class(df$date)
#[1] "character"
Alternatively, you could do the same by setting the Sunday as 0 with %w.
I have a large data set that collects multiple data points each day from people over multiple days. My R dataset has participants' responses and the timestamp for their response. I want to recode the timestamp to reflect which order prompt they responded to. So basically, I want to assign a value to the timestamp based on a range of time. So if on Monday, a response falls between 10:00 and 10:30, I want the value to be 1. If a response falls between 12:15 and 12:45, I want the value to be 2. If a response falls between 2:20 and 2:50, I want the value to be 3.
BUT I need that code to work only for Monday's data. For Tuesday's data, the timestamp ranges changes. For example, if a Tuesday response falls between 9:10 and 9:40, that value should be 1. And so on.
I can't for the life of me how to figure this out with an if else statement. When I write time into R, it thinks I'm writing a code for a series of values (10 through 30) rather than time (10:30).
Example of what I have:
Example of what I want: (see the new Prompt column)
So for 10/11/15 I want Prompt 1 to fall between 11:15:00 and 11:45:00, but for 11/11/15 I want Prompt 1 to be different--between 12:00:00 and 12:30:00
If you want to work with times and dates, the POSIXlt class is helpful. If your timestamps are
stored as strings, the first step is to convert them into POSIXlt. You can use "strptime" for this, e.g.
> t <- strptime("2015-01-01 12:18",format="%Y-%m-%d %H:%M")
> t
[1] "2015-01-01 12:18:00 CET"
> class(t)
[1] "POSIXlt" "POSIXt"
>
The following function "timerange" assigns a time range number to such a POSIXlt object:
R <- list( Sun = list(),
Mon = list( c("10:00","10:30"), c("12:15","12:40"), c("13:15","13:40") ),
Tue = list( c( "9:10", "9:40"), c("11:00","11:30"), c("13:15","13:40") ),
Wed = list( c("10:00","10:30"), c("12:15","12:40"), c("13:15","13:40") ),
Thu = list( c("10:00","10:30"), c("12:15","12:40"), c("13:15","13:40") ),
Fri = list( c("10:00","10:30"), c("12:15","12:40"), c("13:15","13:40") ),
Sat = list( c("10:00","10:30"), c("12:15","12:40"), c("13:15","13:40") ) )
timerange <- function(t)
{
s <- unlist(strsplit(strftime(t,format="%Y-%m-%d %H:%M:%S %w")," "))
w <- as.numeric(s[3]) + 1
n <- sapply(R[[w]], function(x){ strptime(paste(s[1]," ",x,":00",sep=""),
format="%Y-%m-%d %H:%M:%S")})
return( which(sapply(n,function(x){ t-x[1]>=0 & t-x[2]<=0})) )
}
"R" is the list of all time ranges. You can change it as you like.
"strftime" is the counterpart to "strptime", i.e. it converts the POSIXlt object "t" into
a string of a desired format. This string is then spitted into the date part, the time part,
and the day of the week. The latter is used to pick the appropriate sublist in "R".
Then "strptime" is used to create a list of pairs of POSIXlt objects. The time part comes from the
appropriate sublist of "R", and the date part comes from "t". Each such pair represents a time interval.
Then the time range number is the index of the time interval which contains "t".
Some examples:
> t <- strptime("2015-01-01 12:18",format="%Y-%m-%d %H:%M")
> timerange(t)
[1] 2
> t <- strptime("2015-01-05 10:01",format="%Y-%m-%d %H:%M")
> timerange(t)
[1] 1
> t <- strptime("05.01.2015 13:25",format="%d.%m.%Y %H:%M")
> timerange(t)
[1] 3
I have a simpler solution using days, hours and minutes and your (manual) filters which you can use as a function.
Check my simple example:
library(lubridate)
# example dataset
dt = data.frame(responce = 1:3,
date = c("2015-08-10 10:15:34","2015-08-10 12:29:14","2015-08-11 09:12:18"),
stringsAsFactors = F)
dt
# responce date
# 1 1 2015-08-10 10:15:34
# 2 2 2015-08-10 12:29:14
# 3 3 2015-08-11 09:12:18
# transform to date and obtain day, hour and minutes
dt$date = ymd_hms(dt$date)
dt$day = wday(dt$date, label=T)
dt$hour = hour(dt$date)
dt$minute = minute(dt$date)
dt
# responce date day hour minute
# 1 1 2015-08-10 10:15:34 Mon 10 15
# 2 2 2015-08-10 12:29:14 Mon 12 29
# 3 3 2015-08-11 09:12:18 Tues 9 12
# create a column with an arbitrary value to start with and also double check in the end
dt$value = -1
# conditions for Monday
dt$value[dt$day=="Mon" & dt$hour==10 & dt$minute >= 0 & dt$minute <=30] = 1
dt$value[dt$day=="Mon" & dt$hour==12 & dt$minute >= 15 & dt$minute <=45] = 2
dt$value[dt$day=="Mon" & dt$hour==14 & dt$minute >= 20 & dt$minute <=50] = 3
# conditions for Tuesday
dt$value[dt$day=="Tues" & dt$hour==9 & dt$minute >= 10 & dt$minute <=40] = 1
dt
# responce date day hour minute value
# 1 1 2015-08-10 10:15:34 Mon 10 15 1
# 2 2 2015-08-10 12:29:14 Mon 12 29 2
# 3 3 2015-08-11 09:12:18 Tues 9 12 1
# double check all your rows matched (you have no -1 values)
dt[dt$value == -1]
# data frame with 0 columns and 3 rows
I ended up using some of both of those answers.
library(lubridate)
#change data to POSIXct class
data$StartDate <- dmy(as.character(data$StartDate))
data$EndDate <- dmy(as.character(data$EndDate))
data$StartTime2 <- hms(as.character(data$StartTime))
data$EndTime2 <- hms(as.character(data$Endataime))
I didn't have to do both, but I did anyway. I created an additional variable because changing it makes it look funny.
#check me out
class(data$StartDate)
#[1] "POSIXct" "POSIXt"
class(data$StartTime2)
#[1] "Period"
#attr(,"package")
#[1] "lubridate"
Based off the second comment I then did:
data$day = wday(data$StartDate, label=T)
data$hour = hour(data$StartTime2)
data$minute = minute(data$StartTime2)
# create a column with an arbitrary value to start with and also double check in the end
data$prompt = -1
# conditions for Tuesday (10/11/2015)
data$prompt[data$day=="Tues" & data$hour==11 & data$minute >= 10 & data$minute <=40] = 1
data$prompt[data$day=="Tues" & data$hour==13 & data$minute >= 35 & data$minute <=59] = 2
data$prompt[data$day=="Tues" & data$hour==16 & data$minute >= 15 & data$minute <=45] = 3
And so on. I know I have to fix the prompt 2 for this day because it goes into hour 14, but that's to play with next. Thanks for your help!
I currently have a column "Month" & a column "DayWeek" with the Month and Day of the week written out. Using the code below I can get a column with a 1 for each Wednesday in Feb, May, Aug & Nov. Im struggling to find a way to get a column with 1s just for the first Wednesday of each of the 4 months I just mentioned. Any ideas or do I have to create a loop for it?
testPrices$Rebalance <- ifelse((testPrices$Month=="February" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="May" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="August" & testPrices$DayWeek == "Wednesday"),1,ifelse((testPrices$Month=="November" & testPrices$DayWeek == "Wednesday"),1,0))))
Well, without a reproducible example, I couldn't come up with a complete solution, but here is a way to generate the first Wednesday date of each month. In this example, I start at 1 JAN 2013 and go out 36 months, but you can figure out what's appropriate for you. Then, you can check against the first Wednesday vector produced here to see if your dates are members of the first Wednesday of the month group and assign a 1, if so.
# I chose this as an origin
orig <- "2013-01-01"
# generate vector of 1st date of the month for 36 months
d <- seq(as.Date(orig), length=36, by="1 month")
# Use that to make a list of the first 7 dates of each month
d <- lapply(d, function(x) as.Date(seq(1:7),origin=x)-1)
# Look through the list for Wednesdays only,
# and concatenate them into a vector
do.call('c', lapply(d, function(x) x[strftime(x,"%A")=="Wednesday"]))
Output:
[1] "2013-01-02" "2013-02-06" "2013-03-06" "2013-04-03" "2013-05-01" "2013-06-05" "2013-07-03"
[8] "2013-08-07" "2013-09-04" "2013-10-02" "2013-11-06" "2013-12-04" "2014-01-01" "2014-02-05"
[15] "2014-03-05" "2014-04-02" "2014-05-07" "2014-06-04" "2014-07-02" "2014-08-06" "2014-09-03"
[22] "2014-10-01" "2014-11-05" "2014-12-03" "2015-01-07" "2015-02-04" "2015-03-04" "2015-04-01"
[29] "2015-05-06" "2015-06-03" "2015-07-01" "2015-08-05" "2015-09-02" "2015-10-07" "2015-11-04"
[36] "2015-12-02"
Note: I adapted this code from answers found here and here.
I created a sample dataset to work with like this (Thanks #Frank!):
orig <- "2013-01-01"
d <- data.frame(date=seq(as.Date(orig), length=1000, by='1 day'))
d$Month <- months(d$date)
d$DayWeek <- weekdays(d$date)
d$DayMonth <- as.numeric(format(d$date, '%d'))
From a data frame like this, you can extract the first Wednesday of specific months using subset, like this:
subset(d, Month %in% c('January', 'February') & DayWeek == 'Wednesday' & DayMonth < 8)
This takes advantage of the fact that the day number (1..31) will always be between 1 to 7, and obviously there will be precisely one such day. You could do similarly for 2nd, 3rd, 4th Wednesday, changing the condition to accordingly, for example DayMonth > 7 & DayMonth < 15.
This works for me in R:
# Setting up the first inner while-loop controller, the start of the next water year
NextH2OYear <- as.POSIXlt(firstDate)
NextH2OYear$year <- NextH2OYear$year + 1
NextH2OYear<-as.Date(NextH2OYear)
But this doesn't:
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
I get this error:
Error in as.Date.POSIXlt(NextH2OMonth) :
zero length component in non-empty POSIXlt structure
Any ideas why? I need to systematically add one year (for one loop) and one month (for another loop) and am comparing the resulting changed variables to values with a class of Date, which is why they are being converted back using as.Date().
Thanks,
Tom
Edit:
Below is the entire section of code. I am using RStudio (version 0.97.306). The code below represents a function that is passed an array of two columns (Date (CLass=Date) and Discharge Data (Class=Numeric) that are used to calculate the monthly averages. So, firstDate and lastDate are class Date and determined from the passed array. This code is adapted from successful code that calculates the yearly averages - there maybe one or two things I still need to change over, but I am prevented from error checking later parts due to the early errors I get in my use of POSIXlt. Here is the code:
MonthlyAvgDischarge<-function(values){
#determining the number of values - i.e. the number of rows
dataCount <- nrow(values)
# Determining first and last dates
firstDate <- (values[1,1])
lastDate <- (values[dataCount,1])
# Setting up vectors for results
WaterMonths <- numeric(0)
class(WaterMonths) <- "Date"
numDays <- numeric(0)
MonthlyAvg <- numeric(0)
# while loop variables
loopDate1 <- firstDate
loopDate2 <- firstDate
# Setting up the first inner while-loop controller, the start of the next water month
NextH2OMonth <- as.POSIXlt(firstDate)
NextH2OMonth$mon <- NextH2OMonth$mon + 1
NextH2OMonth <- as.Date(NextH2OMonth)
# Variables used in the loops
dayCounter <- 0
dischargeTotal <- 0
dischargeCounter <- 1
resultsCounter <- 1
loopCounter <- 0
skipcount <- 0
# Outer while-loop, controls the progression from one year to another
while(loopDate1 <= lastDate)
{
# Inner while-loop controls adding up the discharge for each water year
# and keeps track of day count
while(loopDate2 < NextH2OMonth)
{
if(is.na(values[resultsCounter,2]))
{
# Skip this date
loopDate2 <- loopDate2 + 1
# Skip this value
resultsCounter <- resultsCounter + 1
#Skipped counter
skipcount<-skipcount+1
} else{
# Adding up discharge
dischargeTotal <- dischargeTotal + values[resultsCounter,2]
}
# Adding a day
loopDate2 <- loopDate2 + 1
#Keeping track of days
dayCounter <- dayCounter + 1
# Keeping track of Dicharge position
resultsCounter <- resultsCounter + 1
}
# Adding the results/water years/number of days into the vectors
WaterMonths <- c(WaterMonths, as.Date(loopDate2, format="%mm/%Y"))
numDays <- c(numDays, dayCounter)
MonthlyAvg <- c(MonthlyAvg, round((dischargeTotal/dayCounter), digits=0))
# Resetting the left hand side variables of the while-loops
loopDate1 <- NextH2OMonth
loopDate2 <- NextH2OMonth
# Resetting the right hand side variable of the inner while-loop
# moving it one year forward in time to the next water year
NextH2OMonth <- as.POSIXlt(NextH2OMonth)
NextH2OMonth$year <- NextH2OMonth$Month + 1
NextH2OMonth<-as.Date(NextH2OMonth)
# Resettting vraiables that need to be reset
dayCounter <- 0
dischargeTotal <- 0
loopCounter <- loopCounter + 1
}
WaterMonths <- format(WaterMonthss, format="%mm/%Y")
# Uncomment the line below and return AvgAnnualDailyAvg if you want the water years also
# AvgAnnDailyAvg <- data.frame(WaterYears, numDays, YearlyDailyAvg)
return((MonthlyAvg))
}
Same error occurs in regular R. When doing it line by line, its not a problem, when running it as a script, it it.
Plain R
seq(Sys.Date(), length = 2, by = "month")[2]
seq(Sys.Date(), length = 2, by = "year")[2]
Note that this works with POSIXlt too, e.g.
seq(as.POSIXlt(Sys.Date()), length = 2, by = "month")[2]
mondate.
library(mondate)
now <- mondate(Sys.Date())
now + 1 # date in one month
now + 12 # date in 12 months
Mondate is bit smarter about things like mondate("2013-01-31")+ 1 which gives last day of February whereas seq(as.Date("2013-01-31"), length = 2, by = "month")[2] gives March 3rd.
yearmon If you don't really need the day part then yearmon may be preferable:
library(zoo)
now.ym <- yearmon(Sys.Date())
now.ym + 1/12 # add one month
now.ym + 1 # add one year
ADDED comment on POSIXlt and section on yearmon.
Here is you can add 1 month to a date in R, using package lubridate:
library(lubridate)
x <- as.POSIXlt("2010-01-31 01:00:00")
month(x) <- month(x) + 1
>x
[1] "2010-03-03 01:00:00 PST"
(note that it processed the addition correctly, as 31st of Feb doesn't exist).
Can you perhaps provide a reproducible example? What's in firstDate, and what version of R are you using? I do this kind of manipulation of POSIXlt dates quite often and it seems to work:
Sys.Date()
# [1] "2013-02-13"
date = as.POSIXlt(Sys.Date())
date$mon = date$mon + 1
as.Date(date)
# [1] "2013-03-13"
I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.
I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.
> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"
I "want" it to be.
> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18" "10-20"
> class((ds$date))
[1] "Date"
If possible, I would love to set the first date to be October 1st instead of January 1st.
Any help you can provide will be greatly appreciated.
EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.
> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"
Is this what you are looking for?
library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)
## make up some score data
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
## create a data frame
df <- data.frame(a.date, b.date, a.score, b.score)
df
## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt
Then, I am using ggplot2 for plotting the data:
## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +
geom_line() + facet_wrap(~variable, ncol = 1) +
scale_x_date("Date", format = "%m-%d")
If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.
You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :
> str(Date)
Class 'Date' num [1:10] 14245 14360 14475 14590 14705 ...
This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.
Now if you want to set the first date of a year as October 1st, you can construct some year index like this :
redefine.year <- function(x,start="10-1"){
year <- as.numeric(strftime(x,"%Y"))
yearstart <- as.Date(paste(year,start,sep="-"))
year + (x >= yearstart) - min(year) + 1
}
Testing code :
Start <- as.Date("2009-1-1")
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)
data.frame( Date=as.character(Date),
year=redefine.year(Date))
gives
Date year
1 2009-01-01 1
2 2009-04-25 1
3 2009-08-18 1
4 2009-12-11 2
5 2010-04-05 2
6 2010-07-29 2
7 2010-11-21 3
8 2011-03-16 3
9 2011-07-09 3
10 2011-11-01 4