I have a csv that contains "Period", which are quarters, and "Percent". After reading the data into R, the "Period" column is "chr" and "Percent" column is "num". I want to change the quarter values to dates, so:
for (i in 1:length(sloos_tighten$Period)) {
sloos_tighten$Period[i] <- paste("Q", substring(sloos_tighten$Period[i], 6), "/", substring(sloos_tighten$Period[i], 1, 4), sep = "")
sloos_tighten$Period[i] <- as.Date(as.yearqtr(sloos_tighten$Period[i], format = "Q%q/%Y"))
}
where the first line in the for-loop changes the format of the quarter to be readable by as.yearqtr, and the second line changes the quarter to a date. The first line works as intended, but the second line converts the date to a four-digit number. I think this is because "Period" is of type "chr", but I don't know how to change it to date. I have tried to create a new column with type date, but I cannot find any resource online that explains it. Any help is appreciated. Thanks in advance.
> dput(head(sloos_tighten, 10))
structure(list(Period = c("1990:2", "1990:3", "1990:4", "1991:1",
"1991:2", "1991:3", "1991:4", "1992:1", "1992:2", "1992:3"),
`Large and medium` = c(54.4, 46.7, 54.2, 38.6, 20, 18.6,
16.7, 10, 3.5, -3.4), Small = c(52.7, 33.9, 40.7, 31.6, 6.9,
8.8, 7, 0, -7.1, -1.7)), row.names = c(NA, 10L), class = "data.frame")
^What the data looks like after import
The literal for loop is fine in a sense, but unfortunately there are two problems here:
There is a class problem here: if $Period is a string, then when you reassign one of its values with something of Date class, the date is then converted into a string. This is because in R data.frame's, with few exceptions, all values in a column must be the same type. That's because a column is (almost always) a vector, and R treats vectors as homogenous.
You can get around this by pre-allocating a vector of type Date and assigning it piecemeal:
newdate <- rep(Sys.Date()[NA], nrow(sloos_tighten)) # just to get the class right
for (i in 1:length(sloos_tighten$Period)) {
tmp <- paste("Q", substring(sloos_tighten$Period[i], 6), "/", substring(sloos_tighten$Period[i], 1, 4), sep = "")
newdate[i] <- as.Date(as.yearqtr(tmp, format = "Q%q/%Y"))
}
(But please, don't use this code, look at #2 below first.)
Not a problem per se, but an efficiency: R is good at doing things as a whole vector. If you reassign all of $Period in one step, then all is faster.
sloos_tighten$Period <-
as.Date(
paste0(substring(sloos_tighten$Period, 6),
"/", substring(sloos_tighten$Period, 1, 4)),
format = "%q/%Y")
This switches from paste(.., sep="") to paste0, a convenience function. Then, it removes the leading "Q" since really we don't keep it around, so why add it (other than perhaps declarative code). Last, it does a whole vector of strings at once.
(This is taking the data sight-unseen, so untested.)
Related
I want to transform from chr to date format
I have this representing year -week:
2020-53
I ve tried to do this
mutate(semana=as_date(year_week,format="%Y-%U"))
but I get the same date in all dataset 2020-01-18
I also tried
mutate(semana=strptime(year_week, "%Y-%U"))
getting the same result
Here you can see the wrong convertion
Any idea?, thanks
I think I've got something that does the job.
library(tidyverse)
library(lubridate)
# Set up table like example in post
trybble <- tibble(year_week = c("2020-53", rep("2021-01", 5)),
country = c("UK", "FR", "GER", "ITA", "SPA", "UK"))
# Function to go into mutate with given info of year and week
y_wsetter <- function(fixme, yeargoal, weekgoal) {
lubridate::year(fixme) <- yeargoal
lubridate::week(fixme) <- weekgoal
return(fixme)
}
# Making a random date so col gets set right
rando <- make_datetime(year = 2021, month = 1, day = 1)
# Show time
trybble <- trybble %>%
add_column(semana = rando) %>% # Set up col of dates to fix
mutate(yerr = substr(year_week, 1, 4)) %>% # Get year as chr
mutate(week = substr(year_week, 6, 7)) %>% # Get week as chr
mutate(semana2 = y_wsetter(semana,
as.numeric(yerr),
as.numeric(week))) %>% # fixed dates
select(-c(yerr, week, semana))
Notes:
If you somehow plug in a week greater than 53, lubridate doesn't mind, and goes forward a year.
I really struggled to get mutate to play nicely without writing my own function y_wsetter. In my experience with mutates with multiple inputs, or where I'm changing a "property" of a value instead of the whole value itself, I need to probably write a function. I'm using the lubridate package to change just the year or week based on your year_week column, so this is one such situation where a quick function helps mutate out.
I was having a weird time when I tried setting rando to Sys.Date(), so I manually set it to something using make_datetime. YMMV
I have a large dataset that I'm importing from a txt file that has multiple date variables that are being formatted as number values 20190101, is there a way to assign a date format as part of import? There is no header in the file and I'm assigning names and lengths sample code below.
df <- read_fwf("file name",
fwf_cols(id = 8,
update_date = 8,
name = 35),
skip = 0)
Or is there a way to convert multiple values in one statement vs one at a time?
df$update_date <- as.Date(as.character(df$update_date), "%Y%m%d")
Here is a way to convert multiple values in one statement into Dates
(assuming yyyy mm dd). Here we target all columns that end with "date" in their name.
library(dplyr)
df <- data.frame(update_date = c(20190101, 20190102, 20190103),
end_date = c(20200101, 20200102, 20200103))
df %>% mutate_at(vars(ends_with("date")), ~as.Date(as.character(.x),format="%Y%m%d"))
You might similarly use
mutate_at(vars(starts_with("date"))
or
mutate_at(vars(c(update_date, end_date)
I am currently working on a project and I need some help. I want to predict the length of flight delays, using a statistical model. The data set does not contain the length of flight delays, but it can be calculated from the actual and scheduled departure times.
I will include a link if you want the whole dataset:
https://drive.google.com/file/d/11BXmJCB5UGEIRmVkM-yxPb_dHeD2CgXa/view?usp=sharing
I then ran the following code
Delays <- read.table("FlightDelays.csv", header=T, sep=",")
DepatureTime <- strptime(formatC(Delays$deptime, width = 4, format = "d", flag = "0"), "%H%M")
ScheduleTime <- strptime(formatC(Delays$schedtime, width = 4, format = "d", flag = "0"), "%H%M")
DelayTime <- as.numeric(difftime(DepatureTime, ScheduleTime))/60
DelayData <- data.frame(DelayTime, Delays)
The above code allowed me to get the delay time in minutes
For those of you who do not want to obtain the whole dataset I will now include a small example of some observations of the form
structure(list(schedtime = c(1455, 1640, 1245, 1715, 1039 , 2120), deptime = c(1455, 1640, 1245, 1709, 1035, 0010)), .Names = c("schedtime", "deptime"), row.names = c(NA, 6L), class = "data.frame")
and if you run the a code I did at the beginning, the delay in minutes for the 6th observation will be -1270 minutes not a delay of 170 minutes as i believe strptime assumes you are still in the same day and doesn't recognise that the delay caused the departure time to be the early hours of the following day.
How can i get the code to recognise the delays will sometimes mean the departure time will go on to the following day?
Thank you for any help
Using lubridate:
library(lubridate)
ScheduleTime <- as_datetime(formatC(Delays$schedtime, width = 4, format = "d", flag = "0"),format="%H%M")
DepatureTime <- as_datetime(formatC(Delays$deptime, width = 4, format = "d", flag = "0"),format="%H%M") + hours(ifelse(Delays$deptime < Delays$schedtime & Delays$schedtime > 2000,24,0))
DelayTime <- difftime(DepatureTime, ScheduleTime)/60
DelayData <- data.frame(DelayTime, Delays)
The Problem is, that you have to decide when it isn't resonable, that a smaller value of deptime compared to schedtime does not correspond to a day shift, but to a flight leaving early. I don't see a general way around that.
I have data in the format:
['12,Dec,2014, 02,15,28,31,37,04,06', '9,Dec,2014, 01,03,31,42,46,04,11',...]
I am trying to convert the str(date component) into date format using:
new_data =''
for line in date_data:
line = datetime.datetime.strptime(str(line), "%d,%b,%Y")
new_data = new_data + line
print(new_data)
At least the 'routine recognises the date part, but can do nothing with the numbers. How could I overcome this problem please. I have tried using % for as many characters as follow the date without success. I have never used the time module before.
What I want to achieve is to associate each number with the date it appears. I am trying to teach myself parsing of text files by the way
If the date is separated from the numbers by a comma followed by a space, then you could use line.split(', ', 1) to split the line into two parts.
Then you could call datetime.datetime.strptime to parse the date.
import datetime as DT
date_data = ['12,Dec,2014, 02,15,28,31,37,04,06', '9,Dec,2014, 01,03,31,42,46,04,11']
for line in date_data:
part = line.split(', ', 1)
date = DT.datetime.strptime(part[0], '%d,%b,%Y').date()
numbers = map(int, part[1].split(','))
print(date, numbers)
yields
(datetime.date(2014, 12, 12), [2, 15, 28, 31, 37, 4, 6])
(datetime.date(2014, 12, 9), [1, 3, 31, 42, 46, 4, 11])
I have a data that has one date column and 10 other columns.
The date column has the format of 199010.
so it's yyyymm.
It seems like that zoo/xts requires that the date has days info in it.
Is there any way to address this issue?
hier ist my data
structure(list(Date = 198901:198905, NoDur = c(5.66, -1.44, 5.51,
5.68, 5.32)), .Names = c("Date", "NoDur"), class = "data.frame", row.names = c(NA,
5L))
data<-read.zoo("C:/***/data_port.csv",sep=",",format="%Y%m",header=TRUE,index.column=1,colClasses=c("character",rep("numeric",1)))
The code has these problems:
the data is space separated but the code specifies that it is comma separated
the data does not describe dates since there is no day but the code is using the default of dates
the data is not provided in reproducible form. Note how one can simply copy the data and code below and paste it into R without any additional work.
Try this:
Lines <- "Date NoDur
198901 5.66
198902 -1.44
198903 5.51
198904 5.68
198905 5.32
"
library(zoo)
read.zoo(text = Lines, format = "%Y%m", FUN = as.yearmon, header = TRUE,
colClasses = c("character", NA))
The above converts the index to "yearmon" class which probably makes most sense here but it would alternately be possible to convert it to "Date" class by using FUN = function(x, format) as.Date(as.yearmon(x, format)) in place of the
FUN argument above.