I'm confused about the way the paste() function is behaving. I have a dplyr table with the following columns:
Year Month DayofMonth
2001 May 21
2001 May 22
2001 June 9
2001 March 4
Which I'd like to combine into a single column called "Date". I figured I'd used the command:
df2 = mutate(df, Date = paste(c(Year, Month, DayofMonth), sep = "-",))
Unfortunately, this seems to concatenate every element in Year, then every element in Month, then every element in DayofMonth so the result looks something like this:
2001-2001-2001-2001 ... May-May-June-March ... 21-22-9-4
How should I modify my command so that the paste function iterates over each row individually?
P.S. This is part of a Data Camp course and as such I am running commands through whatever version of R they've got on their server.
Currently you are concatenating all the columns together. Take c() out of your paste() call to paste them together element-by-element.
mutate(df, Date = paste(Year, Month, DayofMonth, sep = "-"))
# Year Month DayofMonth Date
# 1 2001 May 21 2001-May-21
# 2 2001 May 22 2001-May-22
# 3 2001 June 9 2001-June-9
# 4 2001 March 4 2001-March-4
Related
I'm trying to visualize some bird data, however after grouping by month, the resulting output is out of order from the original data. It is in order for December, January, February, and March in the original, but after manipulating it results in December, February, January, March.
Any ideas how I can fix this or sort the rows?
This is the code:
BirdDataTimeClean <- BirdDataTimes %>%
group_by(Date) %>%
summarise(Gulls=sum(Gulls), Terns=sum(Terns), Sandpipers=sum(Sandpipers),
Plovers=sum(Plovers), Pelicans=sum(Pelicans), Oystercatchers=sum(Oystercatchers),
Egrets=sum(Egrets), PeregrineFalcon=sum(Peregrine_Falcon), BlackPhoebe=sum(Black_Phoebe),
Raven=sum(Common_Raven))
BirdDataTimeClean2 <- BirdDataTimeClean %>%
pivot_longer(!Date, names_to = "Species", values_to = "Count")
You haven't shared any workable data but i face this many times when reading from csv and hence all dates and data are in character.
as suggested, please convert the date data to "date" format using lubridate package or base as.Date() and then arrange() in dplyr will work or even group_by
example :toy data created
birds <- data.table(dates = c("2020-Feb-20","2020-Jan-20","2020-Dec-20","2020-Apr-20"),
species = c('Gulls','Turns','Gulls','Sandpiper'),
Counts = c(20,30,40,50)
str(birds) will show date is character (and I have not kept order)
using lubridate convert dates
birds$dates%>%lubridate::ymd() will change to date data-type
birds$dates%>%ymd()%>%str()
Date[1:4], format: "2020-02-20" "2020-01-20" "2020-12-20" "2020-04-20"
save it with birds$dates <- ymd(birds$dates) or do it in your pipeline as follows
now simply so the dplyr analysis:
birds%>%group_by(Months= ymd(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
will give
# A tibble: 4 x 3
Months N Species_Count
<date> <int> <dbl>
1 2020-01-20 1 30
2 2020-02-20 1 20
3 2020-04-20 1 50
However, if you want Apr , Jan instead of numbers and apply as.Date() with format etc, the dates become "character" again. I woudl suggest you keep your data that way and while representing in output for others -> format it there with as.Date or if using DT or other datatables -> check the output formatting options. That way your original data remains and users see what they want.
this will make it character
birds%>%group_by(Months= as.character.Date(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
A tibble: 4 x 3
Months N Species_Count
<chr> <int> <dbl>
1 2020-Apr-20 1 50
2 2020-Dec-20 1 40
3 2020-Feb-20 1 20
4 2020-Jan-20 1 30
Im getting NA value when im trying to replace month number with month name with the below code:
total_trips_v2$month <- ordered(total_trips_v2$month, levels=c("Jul","Aug","Sep","Oct", "Nov","Dec","Jan", "Feb", "Mar","Apr","May","Jun"))
Im working with a big data set where the month column was char data type and the months were numbered as '06','07' and so on starting with 06.
Im not quiet sure even the ordered function in the code which i used, what it really does.I saw it somewhere and i used it. I tried to look up codes to replace specific values in rows but it looked very confusing.
Can anyone help me out with this?
Working with data types can be confusing at times, but it helps you with what you want to achieve. Thus, make sure you understand how to move from type to type!
There are some "helpers" build in to R to work with months and months' names.
Below we have a "character" vector in our data frame, i.e. df$month.
The helper vectors in R are month.name (full month names) and month.abb (abbreviated month names).
You can index a vector by calling the element of the vector at the n-th position.
Thus, month.abb[6] will return "Jun".
We use this to coerce the month to "numeric" and then recode it with the abbreviated names.
# simulating some data
df <- data.frame(month = c("06","06","07","09","01","02"))
# test index month name
month.abb[6]
# check what happens to our column vector - for this we coerce the 06,07, etc. to numbers!
month.abb[as.numeric(df$month)]
# now assign the result
df$month_abb <- month.abb[as.numeric(df$month)]
This yields:
df
month month_abb
1 06 Jun
2 06 Jun
3 07 Jul
4 09 Sep
5 01 Jan
6 02 Feb
The lubridate package can also help you extract certain components of datetime objects, such as month number or name.
Here, I have made some sample dates:
tibble(
date = c('2021-01-01', '2021-02-01', '2021-03-01')
) %>%
{. ->> my_dates}
my_dates
# # A tibble: 3 x 1
# date
# <chr>
# 2021-01-01
# 2021-02-01
# 2021-03-01
First thing we need to do it convert these character-formatted values to date-formatted values. We use lubridate::ymd() to do this:
my_dates %>%
mutate(
date = ymd(date)
) %>%
{. ->> my_dates_formatted}
my_dates_formatted
# # A tibble: 3 x 1
# date
# <date>
# 2021-01-01
# 2021-02-01
# 2021-03-01
Note that the format printed under the column name (date) has changed from <chr> to <date>.
Now that the dates are in <date> format, we can pull out different components using lubridate::month(). See ?month for more details.
my_dates_formatted %>%
mutate(
month_num = month(date),
month_name_abb = month(date, label = TRUE),
month_name_full = month(date, label = TRUE, abbr = FALSE)
)
# # A tibble: 3 x 4
# date month_num month_name_abb month_name_full
# <date> <dbl> <ord> <ord>
# 2021-01-01 1 Jan January
# 2021-02-01 2 Feb February
# 2021-03-01 3 Mar March
See my answer to your other question here, but when working with dates in R, it is good to leave them in the default YYYY-MM-DD format. This generally makes calculations and manipulations more straightforward. The month names as shown above can be good for making labels, for example when making figures and labelling data points or axes.
I have a data frame in R where I have couple of variables, right now concerned is with two variables, title and Date. I write down the short data similar with real data frame
Title Date
Veterans, Sacrame 1997
Action Newsmaker 2005
New Tri-Cable 1990 mar
EFEST June 16, 1987 28494
The Inhuman Perception: what we do 1999 june
New Tri-Cable 2003 july/august
Interviews Concerning His/her 1991-1992
Festival EFEST June 6, 1997 83443
Intervention of the people Undated
What I want is create a new variable year where we only have the year(no date/month or anything like that).
I can extract year from date format or exact similar text format, but here it's different because the title is complicated and not same(not equal word/letter) for each row. I am just wondering any easy way to create a variable 'year' in r-studio I desire. I can extract the year from the date variable if it's some sort of date format. However in some data where the date are like 83443, but I see the year in title but can't extract the year manually because of huge dataset of this format.
Use mdy to convert to Date class and then year to extract the year.
library(lubridate)
year(mdy(dat1$Title, quiet = TRUE))
## [1] NA NA NA 1987 NA NA NA 1997 NA
Note
The data in reproducible form:
Lines <- "Title Date
Veterans, Sacrame 1997
Action Newsmaker 2005
New Tri-Cable 1990 mar
EFEST June 16, 1987 28494
The Inhuman Perception: what we do 1999 june
New Tri-Cable 2003 july/august
Interviews Concerning His/her 1991-1992
Festival EFEST June 6, 1997 83443
Intervention of the people Undated"
L <- readLines(textConnection(Lines))
dat1 <- read.csv(text = sub(" +", ";", trimws(L)), sep = ";")
This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 5 years ago.
I have three variables: Year, Month, and Day. How can I merge them into one variable ("Date") so that the variable is represented as such:
yyyy-mm-dd
Thanks in advance and best regards!
How do you merge three variables into one variable?
Consider two methods:
Old school
With dplyr, lubridate, and data frames
And consider the data types. You can have:
Number or character
Date or POSIXct final type
Old School Method
The old school method is straightforward. I assume you are using vectors or lists and don't know data frames yet. Let's take your data, force it to a standardized, unambiguous format, and concatenate the data.
> y <- 2012:2015
> y
[1] 2012 2013 2014 2015
> m <- 1:4
> m
[1] 1 2 3 4
> d <- 10:13
> d
[1] 10 11 12 13
Use as.numeric if you want to be safe and convert everything to the same format before concatenation. If you get any NA values you will need to handle them with the is.na function and provide a default value.
Use paste with the sep separator value set to your delimiter, in this case, the hyphen.
> paste(y,m,d, sep = '-')
[1] "2012-1-10" "2013-2-11" "2014-3-12" "2015-4-13"
Dataframe / Dplyr / Lubridate Way
> df <- data.frame(year = y, mon = m, day = d)
> df
year mon day
1 2012 1 10
2 2013 2 11
3 2014 3 12
4 2015 4 13
Below I do the following:
Take the df object
Create a new variable name Date
Concatenate the numeric variables y, m, and d with a - separator
Convert the string literal into a Date format with ymd()
> df %>%
mutate(Date = ymd(
paste(y,m,d, sep = '-')
)
)
year mon day Date
1 2012 1 10 2012-01-10
2 2013 2 11 2013-02-11
3 2014 3 12 2014-03-12
4 2015 4 13 2015-04-13
Below we create year-month-day character strings, yyyy-mm-dd character strings (similar except one digit month and day are zero padded out to 2 digits) and Date class. The last one prints out as yyyy-mm-dd and can be manipulated in ways that character strings can't, for example adding one to a Date class object gives the next day.
First we set up some sample input:
year <- c(2017, 2015, 2014)
month <- c(3, 1, 10)
day <- c(15, 9, 25)
convert to year-month-day character string This is not quite yyyy-mm-dd since 1 digit months and days are not zero padded to 2 digits:
paste(year, month, day, sep = "-")
## [1] "2017-3-15" "2015-1-9" "2014-10-25"
convert to Date class It prints on console as yyyy-mm-dd. Two alternatives:
as.Date(paste(year, month, day, sep = "-"))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
as.Date(ISOdate(year, month, day))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
convert to character string yyyy-mm-dd In this case 1 digit month and day are zero padded out to 2 characters. Two alternatives:
as.character(as.Date(paste(year, month, day, sep = "-")))
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
sprintf("%d-%02d-%02d", year, month, day)
## [1] "2017-03-15" "2015-01-09" "2014-10-25"
I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.