How to use group_by without ordering alphabetically? - r

I'm trying to visualize some bird data, however after grouping by month, the resulting output is out of order from the original data. It is in order for December, January, February, and March in the original, but after manipulating it results in December, February, January, March.
Any ideas how I can fix this or sort the rows?
This is the code:
BirdDataTimeClean <- BirdDataTimes %>%
group_by(Date) %>%
summarise(Gulls=sum(Gulls), Terns=sum(Terns), Sandpipers=sum(Sandpipers),
Plovers=sum(Plovers), Pelicans=sum(Pelicans), Oystercatchers=sum(Oystercatchers),
Egrets=sum(Egrets), PeregrineFalcon=sum(Peregrine_Falcon), BlackPhoebe=sum(Black_Phoebe),
Raven=sum(Common_Raven))
BirdDataTimeClean2 <- BirdDataTimeClean %>%
pivot_longer(!Date, names_to = "Species", values_to = "Count")

You haven't shared any workable data but i face this many times when reading from csv and hence all dates and data are in character.
as suggested, please convert the date data to "date" format using lubridate package or base as.Date() and then arrange() in dplyr will work or even group_by
example :toy data created
birds <- data.table(dates = c("2020-Feb-20","2020-Jan-20","2020-Dec-20","2020-Apr-20"),
species = c('Gulls','Turns','Gulls','Sandpiper'),
Counts = c(20,30,40,50)
str(birds) will show date is character (and I have not kept order)
using lubridate convert dates
birds$dates%>%lubridate::ymd() will change to date data-type
birds$dates%>%ymd()%>%str()
Date[1:4], format: "2020-02-20" "2020-01-20" "2020-12-20" "2020-04-20"
save it with birds$dates <- ymd(birds$dates) or do it in your pipeline as follows
now simply so the dplyr analysis:
birds%>%group_by(Months= ymd(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
will give
# A tibble: 4 x 3
Months N Species_Count
<date> <int> <dbl>
1 2020-01-20 1 30
2 2020-02-20 1 20
3 2020-04-20 1 50
However, if you want Apr , Jan instead of numbers and apply as.Date() with format etc, the dates become "character" again. I woudl suggest you keep your data that way and while representing in output for others -> format it there with as.Date or if using DT or other datatables -> check the output formatting options. That way your original data remains and users see what they want.
this will make it character
birds%>%group_by(Months= as.character.Date(dates))%>%
summarise(N=n()
,Species_Count = sum(Counts)
)%>%arrange(Months)
A tibble: 4 x 3
Months N Species_Count
<chr> <int> <dbl>
1 2020-Apr-20 1 50
2 2020-Dec-20 1 40
3 2020-Feb-20 1 20
4 2020-Jan-20 1 30

Related

Easiest way to convert a data.frame to a time series object in R

I need to read a data series stored in a .csv in R and analyze it using the library TSstudio. This data series consists of two columns, the first one stores the date, the second one stores a floating point value measured daily. As straightforward as it could get.
So I first read the csv as a data.frame:
a_data_frame <- read.csv("some_data.csv", sep=";", dec = ",", col.names=c("date", "value"))
head(a_data_frame)
A data.frame: 6 × 2
date value
<chr> <dbl>
1 04/06/1986 0.065041
2 05/06/1986 0.067397
3 06/06/1986 0.066740
4 09/06/1986 0.068247
5 10/06/1986 0.067041
6 11/06/1986 0.066740
The values in the first column are of type char, so I convert them to date thanks to the library lubridate:
library(lubridate)
a_data_frame$date <- dmy(a_data_frame$date)
head(a_data_frame)
A data.frame: 6 × 2
date value
<date> <dbl>
1 1986-06-04 0.065041
2 1986-06-05 0.067397
3 1986-06-06 0.066740
4 1986-06-09 0.068247
5 1986-06-10 0.067041
6 1986-06-11 0.066740
Here comes my headache. When I try to convert the data.frame to time series, I get a matrix of type character instead:
a_time_series <- as.ts(a_data_frame)
head(a_time_series)
A matrix: 6 × 2 of type chr
date value
1986-06-04 0.065041
1986-06-05 0.067397
1986-06-06 0.066740
1986-06-09 0.068247
1986-06-10 0.067041
1986-06-11 0.066740
Is there any other way to convert a data.frame to a ts object?
Assuming some_data.csv generated reproducibly in the Note read it into a zoo series and then use as.ts. That gives a daily series with NA's for the missing days and the time being the number of days since the Epoch. That may or may not be the ts object you want but the question did not specify it further. Also see this answer.
library(zoo)
z <- read.csv.zoo("some_data.csv", format = "%d/%m/%Y")
tt <- as.ts(z); tt
## Time Series:
## Start = 5998
## End = 6005
## Frequency = 1
## [1] 0.065041 0.067397 0.066740 NA NA 0.068247 0.067041
0.066740
Note
Lines <- "date,value
04/06/1986,0.065041
05/06/1986,0.067397
06/06/1986,0.066740
09/06/1986,0.068247
10/06/1986,0.067041
11/06/1986,0.066740"
cat(Lines, file = "some_data.csv")

Trying to convert month number to month name in a date set

Im getting NA value when im trying to replace month number with month name with the below code:
total_trips_v2$month <- ordered(total_trips_v2$month, levels=c("Jul","Aug","Sep","Oct", "Nov","Dec","Jan", "Feb", "Mar","Apr","May","Jun"))
Im working with a big data set where the month column was char data type and the months were numbered as '06','07' and so on starting with 06.
Im not quiet sure even the ordered function in the code which i used, what it really does.I saw it somewhere and i used it. I tried to look up codes to replace specific values in rows but it looked very confusing.
Can anyone help me out with this?
Working with data types can be confusing at times, but it helps you with what you want to achieve. Thus, make sure you understand how to move from type to type!
There are some "helpers" build in to R to work with months and months' names.
Below we have a "character" vector in our data frame, i.e. df$month.
The helper vectors in R are month.name (full month names) and month.abb (abbreviated month names).
You can index a vector by calling the element of the vector at the n-th position.
Thus, month.abb[6] will return "Jun".
We use this to coerce the month to "numeric" and then recode it with the abbreviated names.
# simulating some data
df <- data.frame(month = c("06","06","07","09","01","02"))
# test index month name
month.abb[6]
# check what happens to our column vector - for this we coerce the 06,07, etc. to numbers!
month.abb[as.numeric(df$month)]
# now assign the result
df$month_abb <- month.abb[as.numeric(df$month)]
This yields:
df
month month_abb
1 06 Jun
2 06 Jun
3 07 Jul
4 09 Sep
5 01 Jan
6 02 Feb
The lubridate package can also help you extract certain components of datetime objects, such as month number or name.
Here, I have made some sample dates:
tibble(
date = c('2021-01-01', '2021-02-01', '2021-03-01')
) %>%
{. ->> my_dates}
my_dates
# # A tibble: 3 x 1
# date
# <chr>
# 2021-01-01
# 2021-02-01
# 2021-03-01
First thing we need to do it convert these character-formatted values to date-formatted values. We use lubridate::ymd() to do this:
my_dates %>%
mutate(
date = ymd(date)
) %>%
{. ->> my_dates_formatted}
my_dates_formatted
# # A tibble: 3 x 1
# date
# <date>
# 2021-01-01
# 2021-02-01
# 2021-03-01
Note that the format printed under the column name (date) has changed from <chr> to <date>.
Now that the dates are in <date> format, we can pull out different components using lubridate::month(). See ?month for more details.
my_dates_formatted %>%
mutate(
month_num = month(date),
month_name_abb = month(date, label = TRUE),
month_name_full = month(date, label = TRUE, abbr = FALSE)
)
# # A tibble: 3 x 4
# date month_num month_name_abb month_name_full
# <date> <dbl> <ord> <ord>
# 2021-01-01 1 Jan January
# 2021-02-01 2 Feb February
# 2021-03-01 3 Mar March
See my answer to your other question here, but when working with dates in R, it is good to leave them in the default YYYY-MM-DD format. This generally makes calculations and manipulations more straightforward. The month names as shown above can be good for making labels, for example when making figures and labelling data points or axes.

Can´t convert chr to numeric in R-studio

I get NA´s when i try to convert into numeric values (see below)
Im supposed to make these annual dataframes into monthly ones. to do this i need to make the numbers numeric. I get NA´s when i try to do this. does anyone know?
When you unlist() the data frame, it turns it into a vector. Here's a couple of lines of the data that I can see from your post (with shorter variable names).
TBS <- tibble::tibble(
desc = c("1934-01", "1934-02"),
rate = c("0.72", "0.6")
)
unlist(TBS)
# desc1 desc2 rate1 rate2
# "1934-01" "1934-02" "0.72" "0.6"
When you do as.numeric() on that vector, it turns the dates into missing. I think that's what the output above in your RStudio window shows us.
as.numeric(unlist(TBS))
# [1] NA NA 0.72 0.60
You're probably better off just fixing the variables in place in the data frame, like this:
library(zoo)
library(lubridate)
library(dplyr)
TBS <- TBS %>%
mutate(desc = as.yearmon(desc),
year = year(desc),
rate = as.numeric(rate))
TBS
# A tibble: 2 x 3
# desc rate year
# <yearmon> <dbl> <dbl>
# 1 Jan 1934 0.72 1934
# 2 Feb 1934 0.6 1934
Then you could do whatever you need (e.g., average) over the years. If it was just a straight average, you could do.
TBS %>%
group_by(year) %>%
summarise(mean_rate = mean(rate))

R aggregate a dataframe by hours from a date with time field

I'm relatively new to R but I am very familiar with Excel and T-SQL.
I have a simple dataset that has a date with time and a numeric value associated it. What I'd like to do is summarize the numeric values by-hour of the day. I've found a couple resources for working with time-types in R but I was hoping to find a solution similar to is offered excel (where I can call a function and pass-in my date/time data and have it return the hour of the day).
Any suggestions would be appreciated - thanks!
library(readr)
library(dplyr)
library(lubridate)
df <- read_delim('DateTime|Value
3/14/2015 12:00:00|23
3/14/2015 13:00:00|24
3/15/2015 12:00:00|22
3/15/2015 13:00:00|40',"|")
df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S")))) %>%
group_by(hour_of_day) %>%
summarise(meanValue = mean(Value))
breakdown:
Convert column of DateTime (character) into formatted time then use hour() from lubridate to pull out just that hour value and put it into new column named hour_of_day.
> df %>%
mutate(hour_of_day = hour(as.POSIXct(strptime(DateTime, "%m/%d/%Y %H:%M:%S"))))
Source: local data frame [4 x 3]
DateTime Value hour_of_day
1 3/14/2015 12:00:00 23 12
2 3/14/2015 13:00:00 24 13
3 3/15/2015 12:00:00 22 12
4 3/15/2015 13:00:00 40 13
The group_by(hour_of_day) sets the groups upon which mean(Value) is computed in the via the summarise(...) call.
this gives the result:
hour_of_day meanValue
1 12 22.5
2 13 32.0

Split date data (m/d/y) into 3 separate columns

I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.

Resources