Merging of two datasets of different lenghts - r

I'm trying to merge two datasets I have.
df1:
day
month
year
lon
lat
month-year
3
5
2009
5.7
53.9
May 2009
8
9
2004
6.9
52.6
Sep 2004
15
9
2004
3.8
50.4
Sep 2004
5
5
2009
2.7
51.2
May 2009
28
7
2005
14.8
62.4
Jul 2005
18
9
2004
5.1
52.5
Sep 2004
df2:
nao-value
sign
month-year
- 2.1
Negative
Sep 2004
1.3
Positive
Jul 2005
- 1.1
Negative
May 2009
I want to merge this to add the NAO value for each month and year in the occurrence data, meaning i want the NAO value for each specific month repeated for all registrations of that month in the occurrence data.
Problem is I cannot get the NAO values to line up where it should by the occurrence data, its either placed just repetitive and not aligned with the date it should, given as month-year.x and month-year.y ,or it is given back as NA value.
I have tried a few different approaches:
df3 <- merge(df1, df2, by="month-year")
df3 <- merge(cbind(df1, X=rownames(df1)), cbind(df2, variable=rownames(df2)))
df3 <- merge(df1,df2, by ="month-year", all.x = TRUE,all.y=TRUE, sort = FALSE)
df3 <- merge(df1, df2, by=intersect(df1$month-year(df1), df2$month-year(df2)))
But not of those give the result I desire.
Edit to include dput:
dput(head(df1, 10)) :
structure(list(Day = c(29, 2, 14, 31, 16, 7, 25, 12, 21, 22),
Month = c(7, 7, 7, 8, 8, 7, 8, 6, 6, 9), Year = c(2010, 2015,
2010, 2018, 2016, 2018, 2019, 2004, 2015, 2019), Lon = c(-6.155014,
-5.820868, -5.509842, -5.495277, -5.469389, -5.469389, -5.469389,
-5.466995, -5.461942, -5.457127), Lat = c(59.09478, 59.125228,
57.959196, 57.96022, 57.986825, 57.986825, 57.986825, 57.874527,
57.95972, 58.07697), Date = c("Jul 2010", "Jul 2015", "Jul 2010",
"Aug 2018", "Aug 2016", "Jul 2018", "Aug 2019", "Jun 2004",
"Jun 2015", "Sep 2019")), row.names = c(NA, -10L), class =
c("tbl_df",
"tbl", "data.frame"))
dput(head(df2, 10)) :
structure(list(NAO = c(1.04, 1.41, 1.46, 2, -1.53, -0.02, 0.53,
0.97, 1.06, 0.23), Sign = c("Positive", "Positive", "Positive",
"Positive", "Negative", "Negative", "Positive", "Positive",
"Positive",
"Positive"), Date = c("jan 1990", "feb 1990", "mar 1990", "apr 1990",
"mai 1990", "jun 1990", "jul 1990", "aug 1990", "sep 1990", "okt
1990"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

merge function is case sensitive. You have different cases in two dataframes that you are merging. Make the case in both the dataframe same and then perform the merge. Try -
result <- merge(transform(df1, Date = tolower(Date)), df2, by = 'Date')

Using tidyverse
library(dplyr)
df1 %>%
mutate(Date = tolower(Date)) %>%
inner_join(df2, by = 'Date')

Related

How to add missing months to a column of a data frame?

I have a dataframe with a "month" column that has a number in it but I'm missing all the other months of the year because they have 0 data in it.
So I would like to have all the months of the year and put 0 to the months that have no data.
Here is an example of my data frame:
(I have a "yearmonth" column (2015 jui -2022 mai) & "month" column & "year" column & an "n" column.)
Can you help me ? Thanks in advance!
If you have set your locale to French (using Sys.setlocale()) this will output the completed dataframe, with the month-names in French, using complete from tidyverse:
# copy of the data -----------------------
df <- structure(list(yearmonth = c("2015 jui", "2015 aoû", "2015 sep",
"2015 nov", "2015 déc", "2016 fév", "2016 mar", "2016 avr",
"2016 jul", "2016 aoû", "2016 oct", "2016 nov", "2017 fév",
"2017 jui", "2017 jul", "2017 aoû", "2017 nov", "2017 déc",
"2018 avr", "2018 mai", "2018 aoû", "2018 sep", "2019 jan",
"2019 fév", "2019 mar", "2019 jul", "2019 aoû", "2019 sep"),
month = c(6, 8, 9, 11, 12, 2, 3, 4, 7, 8, 10, 11, 2, 6, 7,
8, 11, 12, 4, 5, 8, 9, 1, 2, 3, 7, 8, 9), year = c(2015,
2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016,
2016, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2018,
2018, 2019, 2019, 2019, 2019, 2019, 2019), n = c(1, 2, 1,
1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1,
1, 3, 3, 1, 1, 1)), row.names = c(NA, -28L), class = "data.frame")
# answer ----------------------------------
library(tidyverse)
df %>% complete(year = 2015:2019, month = 1:12, fill = list(n = 0)) %>%
mutate(date = as.Date(paste(year, month, "01", sep = "-")),
yearmonth = format(date, format = "%Y %b")) %>%
filter(between(date, as.Date("2015-06-01"), as.Date("2022-05-01"))) %>%
select(-date)
#> # A tibble: 55 × 4
#> year month yearmonth n
#> <dbl> <dbl> <chr> <dbl>
#> 1 2015 6 2015 Jun 1
#> 2 2015 7 2015 Jul 0
#> 3 2015 8 2015 Aug 2
#> 4 2015 9 2015 Sep 1
#> 5 2015 10 2015 Oct 0
#> 6 2015 11 2015 Nov 1
#> 7 2015 12 2015 Dec 1
#> 8 2016 1 2016 Jan 0
#> 9 2016 2 2016 Feb 1
#> 10 2016 3 2016 Mar 1
#> # … with 45 more rows
Created on 2022-06-19 by the reprex package (v2.0.1)

How can I reshape date column in a dataframe?

I want to reshape my Date column in the same months-wise. For example, the first 12 rows in the Date column of the dataframe contains a character of all the months in the year 2001 and the respective year itself. The next rows are the same, until the year 2017.
Therefore. I want to reshape my data like the following. For example, I want all Januaries of the all years to be in the first rows of Date column, then all februaries in the next rows, Marches of the all years and so on, until December of all years (2001-2017)
I need like this format:
1 2001 1
2 2002 1
3 2003 1
4 2004 1
5 2005 1
6 2006 1
7 2007 1
8 2008 1
9 2009 1
10 2010 1
11 2011 1
12 2012 1
13 2013 1
14 2014 1
15 2015 1
16 2016 1
17 2017 1
18 2001 2
19 2002 2
20 2003 2
21 2004 2
22 2005 2
23 2006 2
24 2007 2
This is my dataframe I generated using dput()
#load packages that might be required
> library(tidyverse)
> library(xts)
> # Read NDVI
> NDVI <- read.csv("processed//NDVI.csv")
> dput(NDVI)
structure(list(Date = c("Jan 1, 2001", "Feb 1, 2001", "Mar 1, 2001",
"Apr 1, 2001", "May 1, 2001", "Jun 1, 2001", "Jul 1, 2001", "Aug 1, 2001",
"Sep 1, 2001", "Oct 1, 2001", "Nov 1, 2001", "Dec 1, 2001", "Jan 1, 2002",
"Feb 1, 2002", "Mar 1, 2002", "Apr 1, 2002", "May 1, 2002", "Jun 1, 2002",
"Jul 1, 2002", "Aug 1, 2002", "Sep 1, 2002", "Oct 1, 2002", "Nov 1, 2002",
"Dec 1, 2002", "Jan 1, 2003", "Feb 1, 2003", "Mar 1, 2003", "Apr 1, 2003",
"May 1, 2003", "Jun 1, 2003", "Jul 1, 2003", "Aug 1, 2003", "Sep 1, 2003",
"Oct 1, 2003", "Nov 1, 2003", "Dec 1, 2003", "Jan 1, 2004", "Feb 1, 2004",
"Mar 1, 2004", "Apr 1, 2004", "May 1, 2004", "Jun 1, 2004", "Jul 1, 2004",
"Aug 1, 2004", "Sep 1, 2004", "Oct 1, 2004", "Nov 1, 2004", "Dec 1, 2004",
"Jan 1, 2005", "Feb 1, 2005", "Mar 1, 2005", "Apr 1, 2005", "May 1, 2005",
"Jun 1, 2005", "Jul 1, 2005", "Aug 1, 2005", "Sep 1, 2005", "Oct 1, 2005",
"Nov 1, 2005", "Dec 1, 2005", "Jan 1, 2006", "Feb 1, 2006", "Mar 1, 2006",
"Apr 1, 2006", "May 1, 2006", "Jun 1, 2006", "Jul 1, 2006", "Aug 1, 2006",
"Sep 1, 2006", "Oct 1, 2006", "Nov 1, 2006", "Dec 1, 2006", "Jan 1, 2007",
"Feb 1, 2007", "Mar 1, 2007", "Apr 1, 2007", "May 1, 2007", "Jun 1, 2007",
"Jul 1, 2007", "Aug 1, 2007", "Sep 1, 2007", "Oct 1, 2007", "Nov 1, 2007",
"Dec 1, 2007", "Jan 1, 2008", "Feb 1, 2008", "Mar 1, 2008", "Apr 1, 2008",
"May 1, 2008", "Jun 1, 2008", "Jul 1, 2008", "Aug 1, 2008", "Sep 1, 2008",
"Oct 1, 2008", "Nov 1, 2008", "Dec 1, 2008", "Jan 1, 2009", "Feb 1, 2009",
"Mar 1, 2009", "Apr 1, 2009", "May 1, 2009", "Jun 1, 2009", "Jul 1, 2009",
"Aug 1, 2009", "Sep 1, 2009", "Oct 1, 2009", "Nov 1, 2009", "Dec 1, 2009",
"Jan 1, 2010", "Feb 1, 2010", "Mar 1, 2010", "Apr 1, 2010", "May 1, 2010",
"Jun 1, 2010", "Jul 1, 2010", "Aug 1, 2010", "Sep 1, 2010", "Oct 1, 2010",
"Nov 1, 2010", "Dec 1, 2010", "Jan 1, 2011", "Feb 1, 2011", "Mar 1, 2011",
"Apr 1, 2011", "May 1, 2011", "Jun 1, 2011", "Jul 1, 2011", "Aug 1, 2011",
"Sep 1, 2011", "Oct 1, 2011", "Nov 1, 2011", "Dec 1, 2011", "Jan 1, 2012",
"Feb 1, 2012", "Mar 1, 2012", "Apr 1, 2012", "May 1, 2012", "Jun 1, 2012",
"Jul 1, 2012", "Aug 1, 2012", "Sep 1, 2012", "Oct 1, 2012", "Nov 1, 2012",
"Dec 1, 2012", "Jan 1, 2013", "Feb 1, 2013", "Mar 1, 2013", "Apr 1, 2013",
"May 1, 2013", "Jun 1, 2013", "Jul 1, 2013", "Aug 1, 2013", "Sep 1, 2013",
"Oct 1, 2013", "Nov 1, 2013", "Dec 1, 2013", "Jan 1, 2014", "Feb 1, 2014",
"Mar 1, 2014", "Apr 1, 2014", "May 1, 2014", "Jun 1, 2014", "Jul 1, 2014",
"Aug 1, 2014", "Sep 1, 2014", "Oct 1, 2014", "Nov 1, 2014", "Dec 1, 2014",
"Jan 1, 2015", "Feb 1, 2015", "Mar 1, 2015", "Apr 1, 2015", "May 1, 2015",
"Jun 1, 2015", "Jul 1, 2015", "Aug 1, 2015", "Sep 1, 2015", "Oct 1, 2015",
"Nov 1, 2015", "Dec 1, 2015", "Jan 1, 2016", "Feb 1, 2016", "Mar 1, 2016",
"Apr 1, 2016", "May 1, 2016", "Jun 1, 2016", "Jul 1, 2016", "Aug 1, 2016",
"Sep 1, 2016", "Oct 1, 2016", "Nov 1, 2016", "Dec 1, 2016", "Jan 1, 2017",
"Feb 1, 2017", "Mar 1, 2017", "Apr 1, 2017", "May 1, 2017", "Jun 1, 2017",
"Jul 1, 2017", "Aug 1, 2017", "Sep 1, 2017", "Oct 1, 2017", "Nov 1, 2017",
"Dec 1, 2017"), NDVI = c(0.722, 0.418, 0.435, 0.568, 0.616, 0.75,
0.62, 0.516, 0.389, 0.484, 0.65, 0.7, 0.533, 0.327, 0.279, 0.43,
0.701, 0.797, 0.748, 0.569, 0.597, 0.662, 0.794, 0.706, 0.537,
0.338, 0.404, 0.461, 0.759, 0.76, 0.578, 0.408, 0.56, 0.62, 0.684,
0.785, 0.539, 0.442, 0.307, 0.461, 0.593, 0.739, 0.674, 0.495,
0.603, 0.712, 0.776, 0.665, 0.522, 0.333, 0.398, 0.28, 0.713,
0.746, 0.757, 0.549, 0.711, 0.685, 0.693, 0.716, 0.617, 0.476,
0.407, 0.325, 0.555, 0.757, 0.622, 0.544, 0.452, 0.586, 0.761,
0.649, 0.635, 0.472, 0.351, 0.402, 0.662, 0.757, 0.712, 0.69,
0.618, 0.632, 0.797, 0.778, 0.543, 0.401, 0.393, 0.556, 0.68,
0.784, 0.659, 0.654, 0.412, 0.653, 0.694, 0.727, 0.715, 0.55,
0.641, 0.443, 0.709, 0.818, 0.646, 0.59, 0.39, 0.562, 0.67, 0.56,
0.476, 0.312, 0.349, 0.405, 0.705, 0.766, 0.665, 0.577, 0.614,
0.518, 0.718, 0.779, 0.792, 0.547, 0.585, 0.373, 0.605, 0.715,
0.779, 0.758, 0.751, 0.77, 0.752, 0.805, 0.521, 0.471, 0.561,
0.595, 0.688, 0.74, 0.77, 0.69, 0.704, 0.613, 0.653, 0.803, 0.63,
0.472, 0.589, 0.512, 0.562, 0.699, 0.814, 0.637, 0.395, 0.462,
0.419, 0.742, 0.498, 0.35, 0.308, 0.321, 0.597, 0.677, 0.729,
0.526, 0.394, 0.567, 0.806, 0.752, 0.703, 0.495, 0.453, 0.48,
0.593, 0.731, 0.799, 0.619, 0.403, 0.396, 0.556, 0.743, 0.717,
0.504, 0.408, 0.356, 0.215, 0.68, 0.487, 0.594, 0.392, 0.566,
0.547, 0.632, 0.604, 0.528, 0.424, 0.627, 0.448, 0.788, 0.764,
0.592, 0.517, 0.274, 0.577, 0.754)), class = "data.frame", row.names = c(NA,
-204L))
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
mutate(Date = mdy(Date)) %>%
separate(Date, into = c("year", "month", "day"), remove = FALSE) %>%
arrange(month, day, year)
# Date year month day NDVI
# 1 2001-01-01 2001 01 01 0.722
# 2 2002-01-01 2002 01 01 0.533
# 3 2003-01-01 2003 01 01 0.537
# 4 2004-01-01 2004 01 01 0.539
# 5 2005-01-01 2005 01 01 0.522
# 6 2006-01-01 2006 01 01 0.617
# ...
This is not generally called a "reshaping" operation - this is sorting/ordering/arranging rows. Reshaping/pivoting is when something that is a column becomes a row, or vice versa.

Ordering grouped bars using ggplot

I'm trying to order each groups in ascending order of the 'first' time point but can't seem to override the alphabetical ordering
ggplot(total, aes(fill=reorder(timepoint, sumofpct), y=sumofpct, x=CountryCode))
+ geom_bar(position="dodge", stat="identity")+scale_fill_manual(values = c("violetred3",'steelblue'))
+ geom_text(aes(label=sumofpct), hjust=1, color='black',size=5, position=position_dodge(width=0.9))
+ xlab('')+ ylab('Survey response (%)')
+ theme_classic()+theme(axis.text = element_text(size=25),axis.title=element_text(size=20),legend.text = element_text(size = 25))+coord_flip()
Is there a way to order this so that it is in ascending order of 'first'?
Data structure
structure(list(CountryCode = c("AUS", "CAN", "DEU", "DNK", "ESP",
"FRA", "ITA", "JPN", "KOR", "NHL", "NOR", "SGP", "SWE", "UK",
"AUS", "CAN", "DEU", "DNK", "ESP", "FRA", "ITA", "JPN", "KOR",
"NHL", "NOR", "SGP", "SWE", "UK"), Date = c("Mar 30 - Apr 05 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)", "Apr 06 - Apr 12 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)", "Apr 06 - Apr 12 (2010)",
"Apr 06 - Apr 12 (2010)", "Mar 30 - Apr 05 (2010)", "Mar 30 - Apr 05 (2010)",
"Mar 30 - Apr 05 (2010)", "Mar 22 - Mar 28 (2000)", "Mar 22 - Mar 28 (2000)",
"Apr 05 - Apr 11 (2000)", "Mar 22 - Mar 28 (2000)", "Apr 05 - Apr 11 (2000)",
"Apr 05 - Apr 11 (2000)", "Apr 05 - Apr 11 (2000)", "Mar 29 - Apr 04 (2000)",
"Mar 22 - Mar 28 (2000)", "Feb 08 - Feb 14 (2000)", "Mar 22 - Mar 28 (2000)",
"Mar 22 - Mar 28 (2000)", "Apr 05 - Apr 11 (2000)", "Apr 05 - Apr 11 (2000)"
), sumofpct = c(94, 95, 92, 90, 96, 95, 97, 83, 95, 89, 92, 91,
91, 96, 89, 95, 90, 89, 95, 93, 95, 84, 94, 85, 91, 86, 88, 93
), timepoint = c("first", "first", "first", "first", "first",
"first", "first", "first", "first", "first", "first", "first",
"first", "first", "last", "last", "last", "last", "last", "last",
"last", "last", "last", "last", "last", "last", "last", "last"
)), row.names = c(NA, -28L), groups = structure(list(CountryCode = c("AUS",
"CAN", "DEU", "DNK", "ESP", "FRA", "ITA", "JPN", "KOR", "NHL",
"NOR", "SGP", "SWE", "UK"), .rows = structure(list(c(1L, 15L),
c(2L, 16L), c(3L, 17L), c(4L, 18L), c(5L, 19L), c(6L, 20L
), c(7L, 21L), c(8L, 22L), c(9L, 23L), c(10L, 24L), c(11L,
25L), c(12L, 26L), c(13L, 27L), c(14L, 28L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 14L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Thanks!
you can try
ggplot(df, aes(x=reorder(CountryCode, sumofpct, max), y=sumofpct, fill=factor(timepoint, levels = c("last", "first")))) +
geom_col(position = "dodge") + # recommended over geom_bar
geom_text(aes(label=sumofpct), hjust=1, color='black',size=5, position=position_dodge(width=0.9)) +
scale_fill_manual("timepoint", values = c("violetred3",'steelblue'), guide = guide_legend(reverse = TRUE))+
coord_flip()
Instead of reorder you can use forcats::fct_reorder2(CountryCode, timepoint=="first", -sumofpct)

apply a function across columns

I usually face problems with dataframes that have many columns.
For example, trying to calculate the Mann-Kendall test:
library(tidyverse)
library(Kendall)
df <- tibble::tribble(
~Season, ~Luxembourg, ~Germany, ~France, ~Russia, ~USA, ~Spain,
"Summer 2000", 29, 88, NA, NA, 31, 10,
"Summer 2001", 134, 36, 23, NA, 37, 4,
"Summer 2002", 22, 9, 10, NA, 7, 3,
"Summer 2003", 40, 11, 19, NA, 16, 6,
"Summer 2004", 74, 19, 26, NA, 27, 9,
"Summer 2005", 16, 8, 14, NA, 8, 4,
"Summer 2006", 191, 22, 32, NA, 32, 16,
"Summer 2007", 15, 7, 13, NA, 14, 4,
"Summer 2008", 43, 48, 22, NA, 27, 14,
"Summer 2009", 16, 35, 14, NA, 7, 9,
"Summer 2010", 29, 8, 25, 18, 11, 7,
"Summer 2011", 10, 4, 6, 12, 3, 3,
"Summer 2012", 55, 33, 11, 17, 46, 23,
"Summer 2013", 62, 57, 23, 70, 45, 21,
"Summer 2014", 65, 7, 6, 21, 12, 7,
"Summer 2015", 74, 11, 43, 110, 21, 11,
"Summer 2016", 85, 35, 79, 89, 30, 20,
"Summer 2017", NA, 11, 12, 14, 10, 6,
"Summer 2018", NA, 22, 11, 30, 18, 12,
"Summer 2019", NA, 20, 11, 25, NA, 14
)
Mk_Luxembourg <- MannKendall(df$Luxembourg)
Mk_Germany <- MannKendall(df$Germany)
Mk_France <- MannKendall(df$France)
Mk_Russia <- MannKendall(df$Russia)
Mk_USA <- MannKendall(df$USA)
Mk_Spain <- MannKendall(df$Spain)
Mk_results <- as.matrix(bind_rows(unlist(Mk_Luxembourg),
unlist(Mk_Germany),
unlist(Mk_France),
unlist(Mk_Russia),
unlist(Mk_USA),
unlist(Mk_Spain)))
row.names(Mk_results) <- c("Mk_Luxembourg", "Mk_Germany", "Mk_France", "Mk_Russia", "Mk_USA", "Mk_Spain" )
Mk_results
#> country tau sl S D varS
#> 1 France -0.113 0.527 -19 168. 810.
#> 2 Germany -0.0697 0.696 -13 186. 942.
#> 3 Luxembourg 0.171 0.364 23 134. 586.
#> 4 Russia 0.244 0.371 11 45 125
#> 5 Spain 0.280 0.0965 52 186. 941.
#> 6 USA -0.0529 0.779 -9 170. 815
This becomes very tedious when instead of 6 I have 70 variables.
Is there a simpler way to call all the variables at once?
You can use something like following
data.frame(
stringsAsFactors = FALSE,
Season = c("Summer 2000","Summer 2001",
"Summer 2002","Summer 2003","Summer 2004","Summer 2005",
"Summer 2006","Summer 2007","Summer 2008",
"Summer 2009","Summer 2010","Summer 2011","Summer 2012",
"Summer 2013","Summer 2014","Summer 2015","Summer 2016",
"Summer 2017","Summer 2018","Summer 2019"),
Luxembourg = c(29,134,22,40,74,16,
191,15,43,16,29,10,55,62,65,74,85,
NA,NA,NA),
Germany = c(88,36,9,11,19,8,22,
7,48,35,8,4,33,57,7,11,35,11,22,
20),
France = c(NA,23,10,19,26,14,
32,13,22,14,25,6,11,23,6,43,79,12,
11,11),
Russia = c(NA,NA,NA,NA,NA,NA,NA,NA,
NA,NA,18,12,17,70,21,110,89,14,30,
25),
USA = c(31,37,7,16,27,8,32,
14,27,7,11,3,46,45,12,21,30,10,
18,NA),
Spain = c(10,4,3,6,9,4,16,
4,14,9,7,3,23,21,7,11,20,6,12,14)
) -> df
library(Kendall)
#Apply the MannKendall function across the columns
try <- lapply(df[-1], MannKendall)
#Convert list to data frame
do.call(rbind.data.frame, try)
# tau sl S D varS
# Luxembourg 0.17101437 0.36358595 23 134.4916 586.3333
# Germany -0.06971737 0.69586229 -13 186.4672 942.3333
# France -0.11311328 0.52717423 -19 167.9732 810.3333
# Russia 0.24444444 0.37109327 11 45.0000 125.0000
# USA -0.05294209 0.77930272 -9 169.9971 815.0000
# Spain 0.27963457 0.09646106 52 185.9570 941.3333
This seems to be a case where the traditional (cbind, apply...) syntax looks better than the tidyverse's:
library(Kendall)
# tidyverse
df %>%
pivot_longer(-Season, names_to = "country") %>%
group_by(country) %>%
summarise(x = list(enframe(unlist(MannKendall(value))))) %>%
unnest(x) %>%
pivot_wider(names_from = name, values_from = value)
# traditional
do.call(rbind.data.frame, lapply(df[-1], MannKendall))

how to create a date variable from a long string [duplicate]

This question already has answers here:
How to convert a String to Date in R
(3 answers)
Closed 2 years ago.
I have a data.frame that looks like this:
samples data can be build using codes:
df<-structure(list(LSAF..Date.acted.On. = structure(c(33L, 18L, 7L,
32L, 51L, 47L, 2L, 45L, 38L, 15L), .Label = c("Aug 13, 2019, 10:44:07 AM GMT-04:00",
"Aug 13, 2019, 10:45:47 AM GMT-04:00", "Aug 13, 2019, 10:47:20 AM GMT-04:00",
"Aug 13, 2019, 10:53:29 AM GMT-04:00", "Aug 13, 2019, 10:54:54 AM GMT-04:00",
"Aug 13, 2019, 10:57:11 AM GMT-04:00", "Aug 13, 2019, 10:58:13 AM GMT-04:00",
"Feb 13, 2020, 1:13:07 PM GMT-05:00", "Feb 13, 2020, 1:17:07 PM GMT-05:00",
"Feb 22, 2019, 11:13:51 AM GMT-05:00", "Jan 28, 2020, 11:19:19 AM GMT-05:00",
"Jan 28, 2020, 11:21:28 AM GMT-05:00", "Jan 7, 2020, 9:59:28 AM GMT-05:00",
"Jul 13, 2020, 9:17:01 PM GMT-04:00", "Jul 15, 2020, 9:08:54 PM GMT-04:00",
"Jul 18, 2019, 11:15:26 AM GMT-04:00", "Jul 23, 2020, 10:50:32 AM GMT-04:00",
"Mar 5, 2020, 10:00:07 AM GMT-05:00", "Mar 5, 2020, 10:01:22 AM GMT-05:00",
"Mar 5, 2020, 9:56:36 AM GMT-05:00", "Mar 5, 2020, 9:58:31 AM GMT-05:00",
"Mar 6, 2020, 1:17:17 PM GMT-05:00", "May 13, 2019, 11:07:52 AM GMT-04:00",
"May 14, 2019, 2:32:40 PM GMT-04:00", "May 16, 2019, 3:46:08 PM GMT-04:00",
"May 16, 2019, 4:23:27 PM GMT-04:00", "May 16, 2019, 4:25:40 PM GMT-04:00",
"May 16, 2019, 4:29:12 PM GMT-04:00", "May 16, 2019, 4:40:58 PM GMT-04:00",
"May 21, 2019, 2:22:07 PM GMT-04:00", "Nov 13, 2020, 4:14:55 PM GMT-05:00",
"Nov 13, 2020, 4:21:06 PM GMT-05:00", "Nov 13, 2020, 4:24:07 PM GMT-05:00",
"Nov 13, 2020, 9:10:19 AM GMT-05:00", "Nov 24, 2020, 10:51:52 AM GMT-05:00",
"Oct 15, 2020, 3:46:52 PM GMT-04:00", "Oct 15, 2020, 3:48:35 PM GMT-04:00",
"Oct 17, 2019, 9:09:20 PM GMT-04:00", "Oct 18, 2019, 11:15:51 AM GMT-04:00",
"Oct 20, 2020, 1:28:57 PM GMT-04:00", "Oct 26, 2020, 1:52:26 PM GMT-04:00",
"Oct 26, 2020, 2:15:11 PM GMT-04:00", "Oct 29, 2019, 1:55:53 PM GMT-04:00",
"Oct 9, 2019, 10:39:10 AM GMT-04:00", "Oct 9, 2019, 10:45:14 AM GMT-04:00",
"Sep 11, 2020, 2:40:34 PM GMT-04:00", "Sep 19, 2019, 2:45:46 PM GMT-04:00",
"Sep 23, 2019, 2:24:01 PM GMT-04:00", "Sep 9, 2019, 10:16:01 AM GMT-04:00",
"Sep 9, 2019, 10:22:23 AM GMT-04:00", "Sep 9, 2019, 10:25:03 AM GMT-04:00",
"Sep 9, 2019, 10:32:44 AM GMT-04:00"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
How can I modify the data so it can be used as a date variable?
Thanks.
You can use lubridate's mdy_hms
df$date_time <- lubridate::mdy_hms(df$LSAF..Date.acted.On.)
df$date <- as.Date(df$date_time)
head(df)
#. LSAF..Date.acted.On. date_time date
#1 Nov 13, 2020, 4:24:07 PM GMT-05:00 2020-11-13 21:24:07 2020-11-13
#2 Mar 5, 2020, 10:00:07 AM GMT-05:00 2020-03-05 15:00:07 2020-03-05
#3 Aug 13, 2019, 10:58:13 AM GMT-04:00 2019-08-13 14:58:13 2019-08-13
#4 Nov 13, 2020, 4:21:06 PM GMT-05:00 2020-11-13 21:21:06 2020-11-13
#5 Sep 9, 2019, 10:25:03 AM GMT-04:00 2019-09-09 14:25:03 2019-09-09
#6 Sep 19, 2019, 2:45:46 PM GMT-04:00 2019-09-19 18:45:46 2019-09-19
You could try
as.Date(df$LSAF..Date.acted.On.,
format = "%b %d, %Y, %I:%M:%S %p")

Resources