I am getting an odd error when I try to extract the year from a date object
here is a dput of my dates:
structure(list(date = structure(c(15706, 15707, 15708, 15709,
15710, 15711), class = "Date")), .Names = "date", row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
when I pipe to lubridate::year(date) I get the following error.
Error in year(., date) : unused argument (date)
In a pipe if you do that, it wouldn't work because of the order or evaluation
dates %>%
lubridate::year(date)
Error in lubridate::year(., date) : unused argument (date)
Either we need to pull the column and then apply the function
dates %>%
pull(date) %>%
lubridate::year(.)
Or another way is to use the function within {}
dates %>%
{lubridate::year(.$date)}
#[1] 2013 2013 2013 2013 2013 2013
Or use the standard way of creating column by using mutate
dates %>%
mutate(year = lubridate::year(date))
Having named your object data, I'm assuming this is what you did:
data %>%
year(date)
That also didn't work for me. You can try this:
year(data$date)
Alternatively, you could use magrittr operator %$%:
library(magrittr)
dates %$%
year(date)
Related
I have the dataframe below and I want to display it as DT::datatable()
library(DT)
library(lubridate)
eventex<-structure(list(registration_type = c("Start", "Stopp"), timestamp = c("25.11.2022 13:19:42",
"31.10.2022 14:19:13")), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
eventex$timestamp<-as.POSIXct(paste0(eventex$timestamp), format="%d.%m.%Y %H:%M:%S")
datatable(eventex)
why is it displayed like this?
People's definition of "normal" may vary, especially when ambiguous (and irrational) conventions such as commonly used in the US are assumed. There is a formatDate function that can be applied to your datetime column. (The options are not well documented in the R package):
DT:::DateMethods
[1] "toDateString" "toISOString" "toLocaleDateString" "toLocaleString" "toLocaleTimeString"
[6] "toString" "toTimeString" "toUTCString"
datatable(eventex) %>% formatDate(~timestamp, "toLocaleString" )
Further information and worked examples can be found here: https://rstudio.github.io/DT/functions.html
I would like to transform EncounterDate_G column to be in the format of Year-month-day eg: 2021-1-24. I tried messing around with the mdy and ymd functions as part of lubridate but I'm not having much luck.
library(dplyr)
library(tidyr)
library(lubridate)
gwlfullflattened2 <-
structure(list(V1_G = 1:20, mrn_G = 100:119, Full.name_G = c("Aintzane Eilert",
"Dervila Muriel", "Hermes Ingólfr", "Yordana Hadley", "Talaat Archembald",
"Erato Hozan", "Abram Eli", "Drahoslava Gottfrid", "Itxaro Csenge",
"Isokrates Linas", "Yejide Calixto", "Bohuslav Fedlimid", "Siva Jerneja",
"Mae Albie", "Rodolfo Slavomír", "Neptune Mahesh", "Madhavi Luka",
"Lexia Lành", "Marnie Urien", "Hovsep Tase"), date_of_birth_G = c("1/1/1990",
"1/1/1991", "1/1/1992", "1/1/1993", "1/1/1994", "1/1/1995", "1/1/1996",
"1/1/1997", "1/1/1998", "1/1/1999", "1/1/2000", "1/1/2001", "1/1/2002",
"1/1/2003", "1/1/2004", "1/1/2005", "1/1/2006", "1/1/2007", "1/1/2008",
"1/1/2009"), EncounterDate_G = c("1/5/2016", "1/4/2021", "1/21/2021",
"5/25/2021", "5/19/2021", "5/17/2021", "12/2/2021", "12/1/2021",
"1/5/2016", "1/4/2021", "1/21/2021", "5/25/2021", "5/19/2021",
"5/17/2021", "12/2/2021", "12/1/2021", "1/5/2016", "1/4/2021",
"1/21/2021", "5/25/2021")), row.names = c(NA, -20L), class = c("data.table",
"data.frame"))
gwlfullflattened22 <-
gwlfullflattened2 %>%
mutate(across(c(EncounterDate_G), mdy())) %>%
mutate(across(c(EncounterDate_G), as_datetime)) %>%
as_tibble()
The error I get is
Error: Problem with mutate() input ..1.
i ..1 = across(c(EncounterDate_G), mdy()).
x Problem with across() input .fns.
i .fns must be NULL, a function, a formula, or a list of functions/formulas.
You can convert them into date object by as.Date(vector, format).
I could suggest a simple code:
gwlfullflattened2$EncounterDate_G <- as.Date(gwlfullflattened2$EncounterDate_G, format = "%m/%d/%Y")
I have the following dataframe/tibble sample:
structure(list(name = c("Contents.Key", "Contents.LastModified",
"Contents.ETag", "Contents.Size", "Contents.Owner", "Contents.StorageClass",
"Contents.Bucket", "Contents.Key", "Contents.LastModified", "Contents.ETag"
), value = c("2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0_0e94e664-4d5e-4646-b2b9-1937398cfaed_2019-01-01-07-54-46-064",
"2019-01-01T07:54:47.000Z", "\"378d04496cb27d93e1c37e1511a79ec7\"",
"24187", "e7c0d260939d15d18866126da3376642e2d4497f18ed762b608ed2307778bdf1",
"STANDARD", "vfevvv-edrfvevevev-streamed-data", "2019/01/01/07/556662_cba3a4fc-cb8f-4150-859f-5f21a38373d0_33a8ba28-245c-490b-99b2-254507431d47_2019-01-01-07-54-56-755",
"2019-01-01T07:54:57.000Z", "\"df8cc7082e0cc991aa24542e2576277b\""
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
I want to spread the names column using tidyr::spread() function but I don't get the desired result
df %>% tidyr::spread(key = name, value = value)
I get an error:
Error: Duplicate identifiers for rows:...
Also tried with melt function same result.
I have connected to S3 using aws.s3::get_bucket() function and trying to convert it to dataframe. I am aware there is a aws.s3::get_bucket_df() function which should do this but it doesn't work (you may look at my relevant question.
After I've got the bucket list, I've unlisted it and run enframe command.
Please advise.
You can introduce a new column first(introduces NAs, will have to deal with them).
df %>%
mutate(RN=row_number()) %>%
group_by(RN) %>%
spread(name,value)
I have a time recorded in following format mm:ss where the minutes values can actually be greater than 59. I have parsed it as chr. Now I need to sort my observations in a descending order so I firstly created a time2 variable with ms function and used arrange on the new variable. However arranging doesn't work and the values in the second column are totally mixed up.
library(tidyverse)
library(lubridate)
test <- structure(list(time = c("00:58", "07:30", "08:07", "15:45", "19:30",
"24:30", "30:05", "35:46", "42:23", "45:30", "48:08", "52:01",
"63:45", "67:42", "80:12", "86:36", "87:51", "04:27", "09:34",
"12:33", "18:03", "20:28", "21:39", "23:31", "24:02", "26:28",
"31:13", "43:03", "44:00", "45:38")), .Names = "time", row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
test %>% mutate(time2 = ms(time)) %>% arrange(time2) %>% View()
How can I arrange this times?
I think it would be easier to just put time in te same unit and then arrange(). Try this:
test %>% mutate(time_in_seconds = minute(ms(time) )*60 + second(ms(time))) %>%
arrange(desc(time_in_seconds)) %>%
View()
seconds_to_period(minute(ms(test$time))*60 + second(ms(test$time))) # to get right format (with hours)
This is a known limitation with arrange. dplyr does not support S4 objects: https://github.com/tidyverse/lubridate/issues/515
I have some have some water quality sample data.
> dput(GrowingArealog90s[1:10,])
structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
This data is collected monthly, although some months are missed over the 25 year period.
I know there is so much help out there for converting dates to different formats but I have not been able to figure this out. I want to create a time series with just a month/year format, so that I can do things like decompose the data by month and run seasonal kendalls and such. I have tried so many different ways of converting my date to the desired format that I have completely confused myself. I don't care about the exact format as long as it is recognized month/year.
I also need to fill in the missing months with NAs.
I tried uploading the "SampleDate" column in a numeric format, "yyyymm". I could then merge that data frame with another that contained all the dates I need.
GA90 <- merge(Dates, GrowingArealog90s, by.x = "Date", by.y = "Date", all.x = TRUE)
However, when I converted the resulting data frame to a time series it would not recognize the 12 month frequency.
GA90ts <- as.ts(GA90, frequency(12))
> GA90ts
Time Series:
Start = 1
End = 324
Frequency = 1
Any help with this is appreciated.
Here's how to do it with zoo. You'll get a warning, but it's OK for now. You'll get a series with mon/yy.
series <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
library(zoo)
series <-as.data.frame(series) #to drop dplyr class
series.zoo <-zoo(series[,-1,drop=FALSE],as.yearmon(series[,1]))
Best practice would be to keep your series with actual date and use as.yearmon or as.yearmon only when you actually need to make calculations or aggregate.zoo by month and year.
The following is a matter of taste, but I've dealt with a lot of time series and I think zoo is superior to ts and xts. Much more flexible.
Now, to fill in missing values, you have to create a vector of dates. Here, I'm using a zoo object with actual dates. I then use na.locf, which is "last observation carry forward". You could also look at na.approx.
series.zoo <-zoo(series[,-1,drop=FALSE],(series[,1]))
my.seq <-seq.Date(first(series[,1,drop=FALSE]), last(series[,1,drop=FALSE]),by="month")
merged <-merge.zoo(series.zoo,zoo(,my.seq))
na.locf(merged)
UPDATE
With aggregate.
GrowingArealog90s <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
library(zoo);library(xts)
GrowingArealog90s <-as.data.frame(GrowingArealog90s) #to remove dplyr format
GrowingArealog90s.zoo <-zoo(GrowingArealog90s[,-1,drop=FALSE],as.Date(GrowingArealog90s[,1]))
#First aggregate by month. I chose to get the mean per month
GrowingArealog90s.agg <-aggregate(GrowingArealog90s.zoo, as.yearmon, mean) #replace mean with last to get last reading of the month
#Then create a sequence of months and merge it
my.seq <-seq.Date(first(GrowingArealog90s[,1]), last(GrowingArealog90s[,1]),by="month")
merged <-merge.zoo(GrowingArealog90s.agg ,zoo(,as.yearmon(my.seq)))
na.locf(merged)