I want to reshape my Date column in the same months-wise. For example, the first 12 rows in the Date column of the dataframe contains a character of all the months in the year 2001 and the respective year itself. The next rows are the same, until the year 2017.
Therefore. I want to reshape my data like the following. For example, I want all Januaries of the all years to be in the first rows of Date column, then all februaries in the next rows, Marches of the all years and so on, until December of all years (2001-2017)
I need like this format:
1 2001 1
2 2002 1
3 2003 1
4 2004 1
5 2005 1
6 2006 1
7 2007 1
8 2008 1
9 2009 1
10 2010 1
11 2011 1
12 2012 1
13 2013 1
14 2014 1
15 2015 1
16 2016 1
17 2017 1
18 2001 2
19 2002 2
20 2003 2
21 2004 2
22 2005 2
23 2006 2
24 2007 2
This is my dataframe I generated using dput()
#load packages that might be required
> library(tidyverse)
> library(xts)
> # Read NDVI
> NDVI <- read.csv("processed//NDVI.csv")
> dput(NDVI)
structure(list(Date = c("Jan 1, 2001", "Feb 1, 2001", "Mar 1, 2001",
"Apr 1, 2001", "May 1, 2001", "Jun 1, 2001", "Jul 1, 2001", "Aug 1, 2001",
"Sep 1, 2001", "Oct 1, 2001", "Nov 1, 2001", "Dec 1, 2001", "Jan 1, 2002",
"Feb 1, 2002", "Mar 1, 2002", "Apr 1, 2002", "May 1, 2002", "Jun 1, 2002",
"Jul 1, 2002", "Aug 1, 2002", "Sep 1, 2002", "Oct 1, 2002", "Nov 1, 2002",
"Dec 1, 2002", "Jan 1, 2003", "Feb 1, 2003", "Mar 1, 2003", "Apr 1, 2003",
"May 1, 2003", "Jun 1, 2003", "Jul 1, 2003", "Aug 1, 2003", "Sep 1, 2003",
"Oct 1, 2003", "Nov 1, 2003", "Dec 1, 2003", "Jan 1, 2004", "Feb 1, 2004",
"Mar 1, 2004", "Apr 1, 2004", "May 1, 2004", "Jun 1, 2004", "Jul 1, 2004",
"Aug 1, 2004", "Sep 1, 2004", "Oct 1, 2004", "Nov 1, 2004", "Dec 1, 2004",
"Jan 1, 2005", "Feb 1, 2005", "Mar 1, 2005", "Apr 1, 2005", "May 1, 2005",
"Jun 1, 2005", "Jul 1, 2005", "Aug 1, 2005", "Sep 1, 2005", "Oct 1, 2005",
"Nov 1, 2005", "Dec 1, 2005", "Jan 1, 2006", "Feb 1, 2006", "Mar 1, 2006",
"Apr 1, 2006", "May 1, 2006", "Jun 1, 2006", "Jul 1, 2006", "Aug 1, 2006",
"Sep 1, 2006", "Oct 1, 2006", "Nov 1, 2006", "Dec 1, 2006", "Jan 1, 2007",
"Feb 1, 2007", "Mar 1, 2007", "Apr 1, 2007", "May 1, 2007", "Jun 1, 2007",
"Jul 1, 2007", "Aug 1, 2007", "Sep 1, 2007", "Oct 1, 2007", "Nov 1, 2007",
"Dec 1, 2007", "Jan 1, 2008", "Feb 1, 2008", "Mar 1, 2008", "Apr 1, 2008",
"May 1, 2008", "Jun 1, 2008", "Jul 1, 2008", "Aug 1, 2008", "Sep 1, 2008",
"Oct 1, 2008", "Nov 1, 2008", "Dec 1, 2008", "Jan 1, 2009", "Feb 1, 2009",
"Mar 1, 2009", "Apr 1, 2009", "May 1, 2009", "Jun 1, 2009", "Jul 1, 2009",
"Aug 1, 2009", "Sep 1, 2009", "Oct 1, 2009", "Nov 1, 2009", "Dec 1, 2009",
"Jan 1, 2010", "Feb 1, 2010", "Mar 1, 2010", "Apr 1, 2010", "May 1, 2010",
"Jun 1, 2010", "Jul 1, 2010", "Aug 1, 2010", "Sep 1, 2010", "Oct 1, 2010",
"Nov 1, 2010", "Dec 1, 2010", "Jan 1, 2011", "Feb 1, 2011", "Mar 1, 2011",
"Apr 1, 2011", "May 1, 2011", "Jun 1, 2011", "Jul 1, 2011", "Aug 1, 2011",
"Sep 1, 2011", "Oct 1, 2011", "Nov 1, 2011", "Dec 1, 2011", "Jan 1, 2012",
"Feb 1, 2012", "Mar 1, 2012", "Apr 1, 2012", "May 1, 2012", "Jun 1, 2012",
"Jul 1, 2012", "Aug 1, 2012", "Sep 1, 2012", "Oct 1, 2012", "Nov 1, 2012",
"Dec 1, 2012", "Jan 1, 2013", "Feb 1, 2013", "Mar 1, 2013", "Apr 1, 2013",
"May 1, 2013", "Jun 1, 2013", "Jul 1, 2013", "Aug 1, 2013", "Sep 1, 2013",
"Oct 1, 2013", "Nov 1, 2013", "Dec 1, 2013", "Jan 1, 2014", "Feb 1, 2014",
"Mar 1, 2014", "Apr 1, 2014", "May 1, 2014", "Jun 1, 2014", "Jul 1, 2014",
"Aug 1, 2014", "Sep 1, 2014", "Oct 1, 2014", "Nov 1, 2014", "Dec 1, 2014",
"Jan 1, 2015", "Feb 1, 2015", "Mar 1, 2015", "Apr 1, 2015", "May 1, 2015",
"Jun 1, 2015", "Jul 1, 2015", "Aug 1, 2015", "Sep 1, 2015", "Oct 1, 2015",
"Nov 1, 2015", "Dec 1, 2015", "Jan 1, 2016", "Feb 1, 2016", "Mar 1, 2016",
"Apr 1, 2016", "May 1, 2016", "Jun 1, 2016", "Jul 1, 2016", "Aug 1, 2016",
"Sep 1, 2016", "Oct 1, 2016", "Nov 1, 2016", "Dec 1, 2016", "Jan 1, 2017",
"Feb 1, 2017", "Mar 1, 2017", "Apr 1, 2017", "May 1, 2017", "Jun 1, 2017",
"Jul 1, 2017", "Aug 1, 2017", "Sep 1, 2017", "Oct 1, 2017", "Nov 1, 2017",
"Dec 1, 2017"), NDVI = c(0.722, 0.418, 0.435, 0.568, 0.616, 0.75,
0.62, 0.516, 0.389, 0.484, 0.65, 0.7, 0.533, 0.327, 0.279, 0.43,
0.701, 0.797, 0.748, 0.569, 0.597, 0.662, 0.794, 0.706, 0.537,
0.338, 0.404, 0.461, 0.759, 0.76, 0.578, 0.408, 0.56, 0.62, 0.684,
0.785, 0.539, 0.442, 0.307, 0.461, 0.593, 0.739, 0.674, 0.495,
0.603, 0.712, 0.776, 0.665, 0.522, 0.333, 0.398, 0.28, 0.713,
0.746, 0.757, 0.549, 0.711, 0.685, 0.693, 0.716, 0.617, 0.476,
0.407, 0.325, 0.555, 0.757, 0.622, 0.544, 0.452, 0.586, 0.761,
0.649, 0.635, 0.472, 0.351, 0.402, 0.662, 0.757, 0.712, 0.69,
0.618, 0.632, 0.797, 0.778, 0.543, 0.401, 0.393, 0.556, 0.68,
0.784, 0.659, 0.654, 0.412, 0.653, 0.694, 0.727, 0.715, 0.55,
0.641, 0.443, 0.709, 0.818, 0.646, 0.59, 0.39, 0.562, 0.67, 0.56,
0.476, 0.312, 0.349, 0.405, 0.705, 0.766, 0.665, 0.577, 0.614,
0.518, 0.718, 0.779, 0.792, 0.547, 0.585, 0.373, 0.605, 0.715,
0.779, 0.758, 0.751, 0.77, 0.752, 0.805, 0.521, 0.471, 0.561,
0.595, 0.688, 0.74, 0.77, 0.69, 0.704, 0.613, 0.653, 0.803, 0.63,
0.472, 0.589, 0.512, 0.562, 0.699, 0.814, 0.637, 0.395, 0.462,
0.419, 0.742, 0.498, 0.35, 0.308, 0.321, 0.597, 0.677, 0.729,
0.526, 0.394, 0.567, 0.806, 0.752, 0.703, 0.495, 0.453, 0.48,
0.593, 0.731, 0.799, 0.619, 0.403, 0.396, 0.556, 0.743, 0.717,
0.504, 0.408, 0.356, 0.215, 0.68, 0.487, 0.594, 0.392, 0.566,
0.547, 0.632, 0.604, 0.528, 0.424, 0.627, 0.448, 0.788, 0.764,
0.592, 0.517, 0.274, 0.577, 0.754)), class = "data.frame", row.names = c(NA,
-204L))
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
mutate(Date = mdy(Date)) %>%
separate(Date, into = c("year", "month", "day"), remove = FALSE) %>%
arrange(month, day, year)
# Date year month day NDVI
# 1 2001-01-01 2001 01 01 0.722
# 2 2002-01-01 2002 01 01 0.533
# 3 2003-01-01 2003 01 01 0.537
# 4 2004-01-01 2004 01 01 0.539
# 5 2005-01-01 2005 01 01 0.522
# 6 2006-01-01 2006 01 01 0.617
# ...
This is not generally called a "reshaping" operation - this is sorting/ordering/arranging rows. Reshaping/pivoting is when something that is a column becomes a row, or vice versa.
I'm trying to merge two datasets I have.
df1:
day
month
year
lon
lat
month-year
3
5
2009
5.7
53.9
May 2009
8
9
2004
6.9
52.6
Sep 2004
15
9
2004
3.8
50.4
Sep 2004
5
5
2009
2.7
51.2
May 2009
28
7
2005
14.8
62.4
Jul 2005
18
9
2004
5.1
52.5
Sep 2004
df2:
nao-value
sign
month-year
- 2.1
Negative
Sep 2004
1.3
Positive
Jul 2005
- 1.1
Negative
May 2009
I want to merge this to add the NAO value for each month and year in the occurrence data, meaning i want the NAO value for each specific month repeated for all registrations of that month in the occurrence data.
Problem is I cannot get the NAO values to line up where it should by the occurrence data, its either placed just repetitive and not aligned with the date it should, given as month-year.x and month-year.y ,or it is given back as NA value.
I have tried a few different approaches:
df3 <- merge(df1, df2, by="month-year")
df3 <- merge(cbind(df1, X=rownames(df1)), cbind(df2, variable=rownames(df2)))
df3 <- merge(df1,df2, by ="month-year", all.x = TRUE,all.y=TRUE, sort = FALSE)
df3 <- merge(df1, df2, by=intersect(df1$month-year(df1), df2$month-year(df2)))
But not of those give the result I desire.
Edit to include dput:
dput(head(df1, 10)) :
structure(list(Day = c(29, 2, 14, 31, 16, 7, 25, 12, 21, 22),
Month = c(7, 7, 7, 8, 8, 7, 8, 6, 6, 9), Year = c(2010, 2015,
2010, 2018, 2016, 2018, 2019, 2004, 2015, 2019), Lon = c(-6.155014,
-5.820868, -5.509842, -5.495277, -5.469389, -5.469389, -5.469389,
-5.466995, -5.461942, -5.457127), Lat = c(59.09478, 59.125228,
57.959196, 57.96022, 57.986825, 57.986825, 57.986825, 57.874527,
57.95972, 58.07697), Date = c("Jul 2010", "Jul 2015", "Jul 2010",
"Aug 2018", "Aug 2016", "Jul 2018", "Aug 2019", "Jun 2004",
"Jun 2015", "Sep 2019")), row.names = c(NA, -10L), class =
c("tbl_df",
"tbl", "data.frame"))
dput(head(df2, 10)) :
structure(list(NAO = c(1.04, 1.41, 1.46, 2, -1.53, -0.02, 0.53,
0.97, 1.06, 0.23), Sign = c("Positive", "Positive", "Positive",
"Positive", "Negative", "Negative", "Positive", "Positive",
"Positive",
"Positive"), Date = c("jan 1990", "feb 1990", "mar 1990", "apr 1990",
"mai 1990", "jun 1990", "jul 1990", "aug 1990", "sep 1990", "okt
1990"
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
merge function is case sensitive. You have different cases in two dataframes that you are merging. Make the case in both the dataframe same and then perform the merge. Try -
result <- merge(transform(df1, Date = tolower(Date)), df2, by = 'Date')
Using tidyverse
library(dplyr)
df1 %>%
mutate(Date = tolower(Date)) %>%
inner_join(df2, by = 'Date')
I have the following dates.
c("Aug 08, 2017", "Aug 09, 2017", "Aug 11, 2017", "Aug 11, 2017",
"Aug 10, 2017", "Sep 22, 2017", "Aug 11, 2017", "Aug 11, 2017",
"Aug 11, 2017", "Aug 14, 2017", "Aug 16, 2017", "Aug 16, 2017",
"Aug 18, 2017", "Aug 18, 2017", "Aug 18, 2017", "Sep 20, 2017",
"Aug 22, 2017", "Sep 20, 2017", "Sep 14, 2017", "Sep 25, 2017"
)
I am trying to convert them into the following format:
structure(c(17386, 17387, 17388, 17389, 17392, 17393, 17394,
17395, 17396, 17399, 17400, 17401, 17402, 17403, 17406, 17407,
17408, 17409, 17410, 17414), class = "Date")
Which looks like;
[1] "2017-08-08" "2017-08-09" "2017-08-10" "2017-08-11" "2017-08-14" "2017-08-15" "2017-08-16" "2017-08-17"
[9] "2017-08-18" "2017-08-21" "2017-08-22" "2017-08-23" "2017-08-24" "2017-08-25" "2017-08-28" "2017-08-29"
[17] "2017-08-30" "2017-08-31" "2017-09-01" "2017-09-05"
How can I convert characters to date format?
EDIT:
I run the following;
> C=c("Aug 08, 2017", "Aug 09, 2017", "Aug 11, 2017", "Aug 11, 2017",
+ "Aug 10, 2017", "Sep 22, 2017", "Aug 11, 2017", "Aug 11, 2017",
+ "Aug 11, 2017", "Aug 14, 2017", "Aug 16, 2017", "Aug 16, 2017",
+ "Aug 18, 2017", "Aug 18, 2017", "Aug 18, 2017", "Sep 20, 2017",
+ "Aug 22, 2017", "Sep 20, 2017", "Sep 14, 2017", "Sep 25, 2017"
+ )
> as.numeric(as.Date(C,format='%B %d, %Y'))
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> as.Date(C,format='%B %d, %Y')
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
EDIT:
The following also does not work;
date <- gsub(",", "", date)
date <- gsub(" ", "-", date)
date
as.numeric(as.Date(date, format='%b %d, %Y'))
EDIT:
The following seems to work;
mdy(C)
Giving;
[1] "2017-08-08" "2017-08-09" "2017-08-11" "2017-08-11" "2017-08-10" "2017-09-22" "2017-08-11" "2017-08-11"
[9] "2017-08-11" "2017-08-14" "2017-08-16" "2017-08-16" "2017-08-18" "2017-08-18" "2017-08-18" "2017-09-20"
[17] "2017-08-22" "2017-09-20" "2017-09-14" "2017-09-25"
You need using the format in as.Date
as.numeric(as.Date(C,format='%B %d, %Y'))
[1] 17386 17387 17389 17389 17388 17431 17389 17389 17389 17392 17394 17394 17396 17396 17396 17429 17400 17429 17423 17434
as.Date(C,format='%B %d, %Y')
[1] "2017-08-08" "2017-08-09" "2017-08-11" "2017-08-11" "2017-08-10" "2017-09-22" "2017-08-11" "2017-08-11" "2017-08-11" "2017-08-14" "2017-08-16"
[12] "2017-08-16" "2017-08-18" "2017-08-18" "2017-08-18" "2017-09-20" "2017-08-22" "2017-09-20" "2017-09-14" "2017-09-25"
Dinput:
C=c("Aug 08, 2017", "Aug 09, 2017", "Aug 11, 2017", "Aug 11, 2017",
"Aug 10, 2017", "Sep 22, 2017", "Aug 11, 2017", "Aug 11, 2017",
"Aug 11, 2017", "Aug 14, 2017", "Aug 16, 2017", "Aug 16, 2017",
"Aug 18, 2017", "Aug 18, 2017", "Aug 18, 2017", "Sep 20, 2017",
"Aug 22, 2017", "Sep 20, 2017", "Sep 14, 2017", "Sep 25, 2017"
)