Related
I have a dataframe with a "month" column that has a number in it but I'm missing all the other months of the year because they have 0 data in it.
So I would like to have all the months of the year and put 0 to the months that have no data.
Here is an example of my data frame:
(I have a "yearmonth" column (2015 jui -2022 mai) & "month" column & "year" column & an "n" column.)
Can you help me ? Thanks in advance!
If you have set your locale to French (using Sys.setlocale()) this will output the completed dataframe, with the month-names in French, using complete from tidyverse:
# copy of the data -----------------------
df <- structure(list(yearmonth = c("2015 jui", "2015 aoû", "2015 sep",
"2015 nov", "2015 déc", "2016 fév", "2016 mar", "2016 avr",
"2016 jul", "2016 aoû", "2016 oct", "2016 nov", "2017 fév",
"2017 jui", "2017 jul", "2017 aoû", "2017 nov", "2017 déc",
"2018 avr", "2018 mai", "2018 aoû", "2018 sep", "2019 jan",
"2019 fév", "2019 mar", "2019 jul", "2019 aoû", "2019 sep"),
month = c(6, 8, 9, 11, 12, 2, 3, 4, 7, 8, 10, 11, 2, 6, 7,
8, 11, 12, 4, 5, 8, 9, 1, 2, 3, 7, 8, 9), year = c(2015,
2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018,
2018, 2019, 2019, 2019, 2019, 2019, 2019), n = c(1, 2, 1,
1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1,
1, 3, 3, 1, 1, 1)), row.names = c(NA, -28L), class = "data.frame")
# answer ----------------------------------
library(tidyverse)
df %>% complete(year = 2015:2019, month = 1:12, fill = list(n = 0)) %>%
mutate(date = as.Date(paste(year, month, "01", sep = "-")),
yearmonth = format(date, format = "%Y %b")) %>%
filter(between(date, as.Date("2015-06-01"), as.Date("2022-05-01"))) %>%
select(-date)
#> # A tibble: 55 × 4
#> year month yearmonth n
#> <dbl> <dbl> <chr> <dbl>
#> 1 2015 6 2015 Jun 1
#> 2 2015 7 2015 Jul 0
#> 3 2015 8 2015 Aug 2
#> 4 2015 9 2015 Sep 1
#> 5 2015 10 2015 Oct 0
#> 6 2015 11 2015 Nov 1
#> 7 2015 12 2015 Dec 1
#> 8 2016 1 2016 Jan 0
#> 9 2016 2 2016 Feb 1
#> 10 2016 3 2016 Mar 1
#> # … with 45 more rows
Created on 2022-06-19 by the reprex package (v2.0.1)
I want to reshape my Date column in the same months-wise. For example, the first 12 rows in the Date column of the dataframe contains a character of all the months in the year 2001 and the respective year itself. The next rows are the same, until the year 2017.
Therefore. I want to reshape my data like the following. For example, I want all Januaries of the all years to be in the first rows of Date column, then all februaries in the next rows, Marches of the all years and so on, until December of all years (2001-2017)
I need like this format:
1 2001 1
2 2002 1
3 2003 1
4 2004 1
5 2005 1
6 2006 1
7 2007 1
8 2008 1
9 2009 1
10 2010 1
11 2011 1
12 2012 1
13 2013 1
14 2014 1
15 2015 1
16 2016 1
17 2017 1
18 2001 2
19 2002 2
20 2003 2
21 2004 2
22 2005 2
23 2006 2
24 2007 2
This is my dataframe I generated using dput()
#load packages that might be required
> library(tidyverse)
> library(xts)
> # Read NDVI
> NDVI <- read.csv("processed//NDVI.csv")
> dput(NDVI)
structure(list(Date = c("Jan 1, 2001", "Feb 1, 2001", "Mar 1, 2001",
"Apr 1, 2001", "May 1, 2001", "Jun 1, 2001", "Jul 1, 2001", "Aug 1, 2001",
"Sep 1, 2001", "Oct 1, 2001", "Nov 1, 2001", "Dec 1, 2001", "Jan 1, 2002",
"Feb 1, 2002", "Mar 1, 2002", "Apr 1, 2002", "May 1, 2002", "Jun 1, 2002",
"Jul 1, 2002", "Aug 1, 2002", "Sep 1, 2002", "Oct 1, 2002", "Nov 1, 2002",
"Dec 1, 2002", "Jan 1, 2003", "Feb 1, 2003", "Mar 1, 2003", "Apr 1, 2003",
"May 1, 2003", "Jun 1, 2003", "Jul 1, 2003", "Aug 1, 2003", "Sep 1, 2003",
"Oct 1, 2003", "Nov 1, 2003", "Dec 1, 2003", "Jan 1, 2004", "Feb 1, 2004",
"Mar 1, 2004", "Apr 1, 2004", "May 1, 2004", "Jun 1, 2004", "Jul 1, 2004",
"Aug 1, 2004", "Sep 1, 2004", "Oct 1, 2004", "Nov 1, 2004", "Dec 1, 2004",
"Jan 1, 2005", "Feb 1, 2005", "Mar 1, 2005", "Apr 1, 2005", "May 1, 2005",
"Jun 1, 2005", "Jul 1, 2005", "Aug 1, 2005", "Sep 1, 2005", "Oct 1, 2005",
"Nov 1, 2005", "Dec 1, 2005", "Jan 1, 2006", "Feb 1, 2006", "Mar 1, 2006",
"Apr 1, 2006", "May 1, 2006", "Jun 1, 2006", "Jul 1, 2006", "Aug 1, 2006",
"Sep 1, 2006", "Oct 1, 2006", "Nov 1, 2006", "Dec 1, 2006", "Jan 1, 2007",
"Feb 1, 2007", "Mar 1, 2007", "Apr 1, 2007", "May 1, 2007", "Jun 1, 2007",
"Jul 1, 2007", "Aug 1, 2007", "Sep 1, 2007", "Oct 1, 2007", "Nov 1, 2007",
"Dec 1, 2007", "Jan 1, 2008", "Feb 1, 2008", "Mar 1, 2008", "Apr 1, 2008",
"May 1, 2008", "Jun 1, 2008", "Jul 1, 2008", "Aug 1, 2008", "Sep 1, 2008",
"Oct 1, 2008", "Nov 1, 2008", "Dec 1, 2008", "Jan 1, 2009", "Feb 1, 2009",
"Mar 1, 2009", "Apr 1, 2009", "May 1, 2009", "Jun 1, 2009", "Jul 1, 2009",
"Aug 1, 2009", "Sep 1, 2009", "Oct 1, 2009", "Nov 1, 2009", "Dec 1, 2009",
"Jan 1, 2010", "Feb 1, 2010", "Mar 1, 2010", "Apr 1, 2010", "May 1, 2010",
"Jun 1, 2010", "Jul 1, 2010", "Aug 1, 2010", "Sep 1, 2010", "Oct 1, 2010",
"Nov 1, 2010", "Dec 1, 2010", "Jan 1, 2011", "Feb 1, 2011", "Mar 1, 2011",
"Apr 1, 2011", "May 1, 2011", "Jun 1, 2011", "Jul 1, 2011", "Aug 1, 2011",
"Sep 1, 2011", "Oct 1, 2011", "Nov 1, 2011", "Dec 1, 2011", "Jan 1, 2012",
"Feb 1, 2012", "Mar 1, 2012", "Apr 1, 2012", "May 1, 2012", "Jun 1, 2012",
"Jul 1, 2012", "Aug 1, 2012", "Sep 1, 2012", "Oct 1, 2012", "Nov 1, 2012",
"Dec 1, 2012", "Jan 1, 2013", "Feb 1, 2013", "Mar 1, 2013", "Apr 1, 2013",
"May 1, 2013", "Jun 1, 2013", "Jul 1, 2013", "Aug 1, 2013", "Sep 1, 2013",
"Oct 1, 2013", "Nov 1, 2013", "Dec 1, 2013", "Jan 1, 2014", "Feb 1, 2014",
"Mar 1, 2014", "Apr 1, 2014", "May 1, 2014", "Jun 1, 2014", "Jul 1, 2014",
"Aug 1, 2014", "Sep 1, 2014", "Oct 1, 2014", "Nov 1, 2014", "Dec 1, 2014",
"Jan 1, 2015", "Feb 1, 2015", "Mar 1, 2015", "Apr 1, 2015", "May 1, 2015",
"Jun 1, 2015", "Jul 1, 2015", "Aug 1, 2015", "Sep 1, 2015", "Oct 1, 2015",
"Nov 1, 2015", "Dec 1, 2015", "Jan 1, 2016", "Feb 1, 2016", "Mar 1, 2016",
"Apr 1, 2016", "May 1, 2016", "Jun 1, 2016", "Jul 1, 2016", "Aug 1, 2016",
"Sep 1, 2016", "Oct 1, 2016", "Nov 1, 2016", "Dec 1, 2016", "Jan 1, 2017",
"Feb 1, 2017", "Mar 1, 2017", "Apr 1, 2017", "May 1, 2017", "Jun 1, 2017",
"Jul 1, 2017", "Aug 1, 2017", "Sep 1, 2017", "Oct 1, 2017", "Nov 1, 2017",
"Dec 1, 2017"), NDVI = c(0.722, 0.418, 0.435, 0.568, 0.616, 0.75,
0.62, 0.516, 0.389, 0.484, 0.65, 0.7, 0.533, 0.327, 0.279, 0.43,
0.701, 0.797, 0.748, 0.569, 0.597, 0.662, 0.794, 0.706, 0.537,
0.338, 0.404, 0.461, 0.759, 0.76, 0.578, 0.408, 0.56, 0.62, 0.684,
0.785, 0.539, 0.442, 0.307, 0.461, 0.593, 0.739, 0.674, 0.495,
0.603, 0.712, 0.776, 0.665, 0.522, 0.333, 0.398, 0.28, 0.713,
0.746, 0.757, 0.549, 0.711, 0.685, 0.693, 0.716, 0.617, 0.476,
0.407, 0.325, 0.555, 0.757, 0.622, 0.544, 0.452, 0.586, 0.761,
0.649, 0.635, 0.472, 0.351, 0.402, 0.662, 0.757, 0.712, 0.69,
0.618, 0.632, 0.797, 0.778, 0.543, 0.401, 0.393, 0.556, 0.68,
0.784, 0.659, 0.654, 0.412, 0.653, 0.694, 0.727, 0.715, 0.55,
0.641, 0.443, 0.709, 0.818, 0.646, 0.59, 0.39, 0.562, 0.67, 0.56,
0.476, 0.312, 0.349, 0.405, 0.705, 0.766, 0.665, 0.577, 0.614,
0.518, 0.718, 0.779, 0.792, 0.547, 0.585, 0.373, 0.605, 0.715,
0.779, 0.758, 0.751, 0.77, 0.752, 0.805, 0.521, 0.471, 0.561,
0.595, 0.688, 0.74, 0.77, 0.69, 0.704, 0.613, 0.653, 0.803, 0.63,
0.472, 0.589, 0.512, 0.562, 0.699, 0.814, 0.637, 0.395, 0.462,
0.419, 0.742, 0.498, 0.35, 0.308, 0.321, 0.597, 0.677, 0.729,
0.526, 0.394, 0.567, 0.806, 0.752, 0.703, 0.495, 0.453, 0.48,
0.593, 0.731, 0.799, 0.619, 0.403, 0.396, 0.556, 0.743, 0.717,
0.504, 0.408, 0.356, 0.215, 0.68, 0.487, 0.594, 0.392, 0.566,
0.547, 0.632, 0.604, 0.528, 0.424, 0.627, 0.448, 0.788, 0.764,
0.592, 0.517, 0.274, 0.577, 0.754)), class = "data.frame", row.names = c(NA,
-204L))
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
mutate(Date = mdy(Date)) %>%
separate(Date, into = c("year", "month", "day"), remove = FALSE) %>%
arrange(month, day, year)
# Date year month day NDVI
# 1 2001-01-01 2001 01 01 0.722
# 2 2002-01-01 2002 01 01 0.533
# 3 2003-01-01 2003 01 01 0.537
# 4 2004-01-01 2004 01 01 0.539
# 5 2005-01-01 2005 01 01 0.522
# 6 2006-01-01 2006 01 01 0.617
# ...
This is not generally called a "reshaping" operation - this is sorting/ordering/arranging rows. Reshaping/pivoting is when something that is a column becomes a row, or vice versa.
I'm trying to order each groups in ascending order of the 'first' time point but can't seem to override the alphabetical ordering
ggplot(total, aes(fill=reorder(timepoint, sumofpct), y=sumofpct, x=CountryCode))
+ geom_bar(position="dodge", stat="identity")+scale_fill_manual(values = c("violetred3",'steelblue'))
+ geom_text(aes(label=sumofpct), hjust=1, color='black',size=5, position=position_dodge(width=0.9))
+ xlab('')+ ylab('Survey response (%)')
+ theme_classic()+theme(axis.text = element_text(size=25),axis.title=element_text(size=20),legend.text = element_text(size = 25))+coord_flip()
Is there a way to order this so that it is in ascending order of 'first'?
Data structure
structure(list(CountryCode = c("AUS", "CAN", "DEU", "DNK", "ESP",
"FRA", "ITA", "JPN", "KOR", "NHL", "NOR", "SGP", "SWE", "UK",
"AUS", "CAN", "DEU", "DNK", "ESP", "FRA", "ITA", "JPN", "KOR",
"NHL", "NOR", "SGP", "SWE", "UK"), Date = c("Mar 30 - Apr 05 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)", "Apr 06 - Apr 12 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)", "Apr 06 - Apr 12 (2010)",
"Apr 06 - Apr 12 (2010)", "Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 22 - Mar 28 (2000)", "Mar 22 - Mar 28 (2000)",
"Apr 05 - Apr 11 (2000)", "Mar 22 - Mar 28 (2000)", "Apr 05 - Apr 11 (2000)",
"Apr 05 - Apr 11 (2000)", "Apr 05 - Apr 11 (2000)", "Mar 29 - Apr 04 (2000)",
"Mar 22 - Mar 28 (2000)", "Feb 08 - Feb 14 (2000)", "Mar 22 - Mar 28 (2000)",
"Mar 22 - Mar 28 (2000)", "Apr 05 - Apr 11 (2000)", "Apr 05 - Apr 11 (2000)"
), sumofpct = c(94, 95, 92, 90, 96, 95, 97, 83, 95, 89, 92, 91,
91, 96, 89, 95, 90, 89, 95, 93, 95, 84, 94, 85, 91, 86, 88, 93
), timepoint = c("first", "first", "first", "first", "first",
"first", "first", "first", "first", "first", "first", "first",
"first", "first", "last", "last", "last", "last", "last", "last",
"last", "last", "last", "last", "last", "last", "last", "last"
)), row.names = c(NA, -28L), groups = structure(list(CountryCode = c("AUS",
"CAN", "DEU", "DNK", "ESP", "FRA", "ITA", "JPN", "KOR", "NHL",
"NOR", "SGP", "SWE", "UK"), .rows = structure(list(c(1L, 15L),
c(2L, 16L), c(3L, 17L), c(4L, 18L), c(5L, 19L), c(6L, 20L
), c(7L, 21L), c(8L, 22L), c(9L, 23L), c(10L, 24L), c(11L,
25L), c(12L, 26L), c(13L, 27L), c(14L, 28L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 14L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Thanks!
you can try
ggplot(df, aes(x=reorder(CountryCode, sumofpct, max), y=sumofpct, fill=factor(timepoint, levels = c("last", "first")))) +
geom_col(position = "dodge") + # recommended over geom_bar
geom_text(aes(label=sumofpct), hjust=1, color='black',size=5, position=position_dodge(width=0.9)) +
scale_fill_manual("timepoint", values = c("violetred3",'steelblue'), guide = guide_legend(reverse = TRUE))+
coord_flip()
Instead of reorder you can use forcats::fct_reorder2(CountryCode, timepoint=="first", -sumofpct)
I usually face problems with dataframes that have many columns.
For example, trying to calculate the Mann-Kendall test:
library(tidyverse)
library(Kendall)
df <- tibble::tribble(
~Season, ~Luxembourg, ~Germany, ~France, ~Russia, ~USA, ~Spain,
"Summer 2000", 29, 88, NA, NA, 31, 10,
"Summer 2001", 134, 36, 23, NA, 37, 4,
"Summer 2002", 22, 9, 10, NA, 7, 3,
"Summer 2003", 40, 11, 19, NA, 16, 6,
"Summer 2004", 74, 19, 26, NA, 27, 9,
"Summer 2005", 16, 8, 14, NA, 8, 4,
"Summer 2006", 191, 22, 32, NA, 32, 16,
"Summer 2007", 15, 7, 13, NA, 14, 4,
"Summer 2008", 43, 48, 22, NA, 27, 14,
"Summer 2009", 16, 35, 14, NA, 7, 9,
"Summer 2010", 29, 8, 25, 18, 11, 7,
"Summer 2011", 10, 4, 6, 12, 3, 3,
"Summer 2012", 55, 33, 11, 17, 46, 23,
"Summer 2013", 62, 57, 23, 70, 45, 21,
"Summer 2014", 65, 7, 6, 21, 12, 7,
"Summer 2015", 74, 11, 43, 110, 21, 11,
"Summer 2016", 85, 35, 79, 89, 30, 20,
"Summer 2017", NA, 11, 12, 14, 10, 6,
"Summer 2018", NA, 22, 11, 30, 18, 12,
"Summer 2019", NA, 20, 11, 25, NA, 14
)
Mk_Luxembourg <- MannKendall(df$Luxembourg)
Mk_Germany <- MannKendall(df$Germany)
Mk_France <- MannKendall(df$France)
Mk_Russia <- MannKendall(df$Russia)
Mk_USA <- MannKendall(df$USA)
Mk_Spain <- MannKendall(df$Spain)
Mk_results <- as.matrix(bind_rows(unlist(Mk_Luxembourg),
unlist(Mk_Germany),
unlist(Mk_France),
unlist(Mk_Russia),
unlist(Mk_USA),
unlist(Mk_Spain)))
row.names(Mk_results) <- c("Mk_Luxembourg", "Mk_Germany", "Mk_France", "Mk_Russia", "Mk_USA", "Mk_Spain" )
Mk_results
#> country tau sl S D varS
#> 1 France -0.113 0.527 -19 168. 810.
#> 2 Germany -0.0697 0.696 -13 186. 942.
#> 3 Luxembourg 0.171 0.364 23 134. 586.
#> 4 Russia 0.244 0.371 11 45 125
#> 5 Spain 0.280 0.0965 52 186. 941.
#> 6 USA -0.0529 0.779 -9 170. 815
This becomes very tedious when instead of 6 I have 70 variables.
Is there a simpler way to call all the variables at once?
You can use something like following
data.frame(
stringsAsFactors = FALSE,
Season = c("Summer 2000","Summer 2001",
"Summer 2002","Summer 2003","Summer 2004","Summer 2005",
"Summer 2006","Summer 2007","Summer 2008",
"Summer 2009","Summer 2010","Summer 2011","Summer 2012",
"Summer 2013","Summer 2014","Summer 2015","Summer 2016",
"Summer 2017","Summer 2018","Summer 2019"),
Luxembourg = c(29,134,22,40,74,16,
191,15,43,16,29,10,55,62,65,74,85,
NA,NA,NA),
Germany = c(88,36,9,11,19,8,22,
7,48,35,8,4,33,57,7,11,35,11,22,
20),
France = c(NA,23,10,19,26,14,
32,13,22,14,25,6,11,23,6,43,79,12,
11,11),
Russia = c(NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,18,12,17,70,21,110,89,14,30,
25),
USA = c(31,37,7,16,27,8,32,
14,27,7,11,3,46,45,12,21,30,10,
18,NA),
Spain = c(10,4,3,6,9,4,16,
4,14,9,7,3,23,21,7,11,20,6,12,14)
) -> df
library(Kendall)
#Apply the MannKendall function across the columns
try <- lapply(df[-1], MannKendall)
#Convert list to data frame
do.call(rbind.data.frame, try)
# tau sl S D varS
# Luxembourg 0.17101437 0.36358595 23 134.4916 586.3333
# Germany -0.06971737 0.69586229 -13 186.4672 942.3333
# France -0.11311328 0.52717423 -19 167.9732 810.3333
# Russia 0.24444444 0.37109327 11 45.0000 125.0000
# USA -0.05294209 0.77930272 -9 169.9971 815.0000
# Spain 0.27963457 0.09646106 52 185.9570 941.3333
This seems to be a case where the traditional (cbind, apply...) syntax looks better than the tidyverse's:
library(Kendall)
# tidyverse
df %>%
pivot_longer(-Season, names_to = "country") %>%
group_by(country) %>%
summarise(x = list(enframe(unlist(MannKendall(value))))) %>%
unnest(x) %>%
pivot_wider(names_from = name, values_from = value)
# traditional
do.call(rbind.data.frame, lapply(df[-1], MannKendall))
This question already has answers here:
How to convert a String to Date in R
(3 answers)
Closed 2 years ago.
I have a data.frame that looks like this:
samples data can be build using codes:
df<-structure(list(LSAF..Date.acted.On. = structure(c(33L, 18L, 7L,
32L, 51L, 47L, 2L, 45L, 38L, 15L), .Label = c("Aug 13, 2019, 10:44:07 AM GMT-04:00",
"Aug 13, 2019, 10:45:47 AM GMT-04:00", "Aug 13, 2019, 10:47:20 AM GMT-04:00",
"Aug 13, 2019, 10:53:29 AM GMT-04:00", "Aug 13, 2019, 10:54:54 AM GMT-04:00",
"Aug 13, 2019, 10:57:11 AM GMT-04:00", "Aug 13, 2019, 10:58:13 AM GMT-04:00",
"Feb 13, 2020, 1:13:07 PM GMT-05:00", "Feb 13, 2020, 1:17:07 PM GMT-05:00",
"Feb 22, 2019, 11:13:51 AM GMT-05:00", "Jan 28, 2020, 11:19:19 AM GMT-05:00",
"Jan 28, 2020, 11:21:28 AM GMT-05:00", "Jan 7, 2020, 9:59:28 AM GMT-05:00",
"Jul 13, 2020, 9:17:01 PM GMT-04:00", "Jul 15, 2020, 9:08:54 PM GMT-04:00",
"Jul 18, 2019, 11:15:26 AM GMT-04:00", "Jul 23, 2020, 10:50:32 AM GMT-04:00",
"Mar 5, 2020, 10:00:07 AM GMT-05:00", "Mar 5, 2020, 10:01:22 AM GMT-05:00",
"Mar 5, 2020, 9:56:36 AM GMT-05:00", "Mar 5, 2020, 9:58:31 AM GMT-05:00",
"Mar 6, 2020, 1:17:17 PM GMT-05:00", "May 13, 2019, 11:07:52 AM GMT-04:00",
"May 14, 2019, 2:32:40 PM GMT-04:00", "May 16, 2019, 3:46:08 PM GMT-04:00",
"May 16, 2019, 4:23:27 PM GMT-04:00", "May 16, 2019, 4:25:40 PM GMT-04:00",
"May 16, 2019, 4:29:12 PM GMT-04:00", "May 16, 2019, 4:40:58 PM GMT-04:00",
"May 21, 2019, 2:22:07 PM GMT-04:00", "Nov 13, 2020, 4:14:55 PM GMT-05:00",
"Nov 13, 2020, 4:21:06 PM GMT-05:00", "Nov 13, 2020, 4:24:07 PM GMT-05:00",
"Nov 13, 2020, 9:10:19 AM GMT-05:00", "Nov 24, 2020, 10:51:52 AM GMT-05:00",
"Oct 15, 2020, 3:46:52 PM GMT-04:00", "Oct 15, 2020, 3:48:35 PM GMT-04:00",
"Oct 17, 2019, 9:09:20 PM GMT-04:00", "Oct 18, 2019, 11:15:51 AM GMT-04:00",
"Oct 20, 2020, 1:28:57 PM GMT-04:00", "Oct 26, 2020, 1:52:26 PM GMT-04:00",
"Oct 26, 2020, 2:15:11 PM GMT-04:00", "Oct 29, 2019, 1:55:53 PM GMT-04:00",
"Oct 9, 2019, 10:39:10 AM GMT-04:00", "Oct 9, 2019, 10:45:14 AM GMT-04:00",
"Sep 11, 2020, 2:40:34 PM GMT-04:00", "Sep 19, 2019, 2:45:46 PM GMT-04:00",
"Sep 23, 2019, 2:24:01 PM GMT-04:00", "Sep 9, 2019, 10:16:01 AM GMT-04:00",
"Sep 9, 2019, 10:22:23 AM GMT-04:00", "Sep 9, 2019, 10:25:03 AM GMT-04:00",
"Sep 9, 2019, 10:32:44 AM GMT-04:00"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
How can I modify the data so it can be used as a date variable?
Thanks.
You can use lubridate's mdy_hms
df$date_time <- lubridate::mdy_hms(df$LSAF..Date.acted.On.)
df$date <- as.Date(df$date_time)
head(df)
#. LSAF..Date.acted.On. date_time date
#1 Nov 13, 2020, 4:24:07 PM GMT-05:00 2020-11-13 21:24:07 2020-11-13
#2 Mar 5, 2020, 10:00:07 AM GMT-05:00 2020-03-05 15:00:07 2020-03-05
#3 Aug 13, 2019, 10:58:13 AM GMT-04:00 2019-08-13 14:58:13 2019-08-13
#4 Nov 13, 2020, 4:21:06 PM GMT-05:00 2020-11-13 21:21:06 2020-11-13
#5 Sep 9, 2019, 10:25:03 AM GMT-04:00 2019-09-09 14:25:03 2019-09-09
#6 Sep 19, 2019, 2:45:46 PM GMT-04:00 2019-09-19 18:45:46 2019-09-19
You could try
as.Date(df$LSAF..Date.acted.On.,
format = "%b %d, %Y, %I:%M:%S %p")