Find the Mean of a Value by Year (not the whole date) - r

I'll preface this by saying I'm very much a self taught beginner with R.
I have a very large data set looking at biological data.
I want to find the average of a variable "shoot.density" split by year, but my date data is entered as "%d/%m/%y". This means using the normal way I would achieve this splits by each individual date, rather than by year only, eg.
tapply(df$Shoot.Density, list(df$Date), mean)
Any help would be much appreciated. I am also happy to paste in a section of my data, but I'm not sure how.

If your data is in date-class, you can use format to transform your date column to a year variable:
tapply(df$Shoot.Density, list(format(df$Date, '%Y')), mean)
If it is in the format %d/%m/%y, you need the substr function:
tapply(df$Shoot.Density, list(substr(df$Date,7,8)), mean)
You can also do this with dplyr:
library(dplyr)
df %>%
group_by(years = format(df$Date, '%Y')) %>%
summarise(means = mean(Shoot.Density))
Another way to do this is with the year function of the data.table package:
library(data.table)
setDT(df)[, mean(Shoot.Density), by = year(Date)]

Related

Add a value to a column on a different row based on similarities in R

I have the following dataframe in R:
I am trying to make an extra column called "Opposition" which I would like to be the other team that has the same Date, Venue and inverse of the Margin. My expected output is:
Does anyone know how to achieve this in R? Im fairly new and cant quite work it out. Thanks!
We could reverse the Team A value for each Team, Date and absolute value of Margin.
library(dplyr)
df %>%
group_by(Venue, Date, tmp = abs(Margin)) %>%
mutate(Opposition = rev(`Team A`)) %>%
select(-tmp) -> result
result
Suppose that your dataframe is named dataset, and then
dataset %>%
left_join(dataset %>% mutate(Margin=-Margin) %>%
rename(Opposition=`Team A`),
by=c("Team A", "Date", "Venue")
)
will do the trick.
Please note that dplyr package is required to utilize mutate(), rename(), left_join() functions, and magrittr package to utilize %>% pipe operator. You can import both packages at once by importing tidyverse package.
Using data.table
library(data.table)
setDT(df)[, Opposition := `Team A`[.N:1], .(Venue, Date, tmp = abs(Margin))]
A base R option with ave + rev may help
transform(
df,
Opposition = ave(TeamA, Date, Venue, abs(Margin),FUN = rev)
)

Is there an R function to summarize individual level data by country and year?

I am trying to make country-level (by year) summaries of a long-form aggregated dataset that has individual-level data. I have tried using dplyr to summarize the average of the variable I am interested in to create a new dataset. However... there appears to be something wrong with my group_by because the answer is only one observation that appears to be the mean of every observation.
data named: "finaldata.giniE",
country variable: "iso3c",
year variable: "date",
individual-level variable of interest: "Ladder.Life.Present"
Note: there are more variables in my data-- could this be an issue?
country_summmary <- finaldata.giniE %>%
select(iso3c, date, Ladder.Life.Present) %>%
group_by(iso3c, date) %>%
summarize(averaged.M = mean(Ladder.Life.Present))
country_summmary
My output appears like this:> country_summmary
averaged.M
1 5.505455
Thank you!
I actually just changed something and added your suggested code to the front and it worked! Here is the code that was able to work!
library(dplyr)
country_summary <- finaldata.gini %>%
group_by(iso3c, date) %>%
select(Ladder.Life.Present) %>%
summarise_each(funs(mean))

Calculating yearly return from daily return data

I have imported daily return data for ADSK via a downloaded Yahoo finance .csv file.
ADSKcsv <- read.csv("ADSK.csv", TRUE)
I have converted the .csv file to a data frame
class(ADSKcsv)
I have selected the two relevant columns that I want to work with and sought to take the mean of all daily returns for each year. I do not know how to do this.
aggregate(Close~Date, ADSK, mean)
The above code yields a mean calculation for each date. My objective is to calculate YoY return from this data, first converting daily returns to yearly returns, then using yearly returns to calculate year-over-year returns. I'd appreciate any help.
May I suggest an easier approach?
library(tidyquant)
ADSK_yearly_returns_tbl <- tq_get("ADSK") %>%
tq_transmute(select = close,
mutate_fun = periodReturn,
period = "yearly")
ADSK_yearly_returns_tbl
If you run the above code, it will download the historical returns for a symbol of interest (ADSK in this case) and then calculate the yearly return. An added bonus to using this workflow is that you can swap out any symbols of interest without manually downloading and reading them in. Plus, it saves you the extra step of calculating the average daily return.
You can extract the year value from date and then do aggregate :
This can be done in base R :
aggregate(Close~year, transform(ADSKcsv, year = format(Date, '%Y')), mean)
dplyr
library(dplyr)
ADSKcsv %>%
group_by(year = format(Date, '%Y')) %>%
#Or using lubridate's year function
#group_by(year = lubridate::year(Date)) %>%
summarise(Close = mean(Close))
Or data.table
library(data.table)
setDT(ADSKcsv)[, .(Close = mean(Close)), format(Date, '%Y')]

objct Timestamp not found when sort_by [duplicate]

I need to sort a data frame by date in R. The dates are all in the form of "dd/mm/yyyy". The dates are in the 3rd column. The column header is V3. I have seen how to sort a data frame by column and I have seen how to convert the string into a date value. I can't combine the two in order to sort the data frame by date.
Assuming your data frame is named d,
d[order(as.Date(d$V3, format="%d/%m/%Y")),]
Read my blog post, Sorting a data frame by the contents of a column, if that doesn't make sense.
Nowadays, it is the most efficient and comfortable to use lubridate and dplyr libraries.
lubridate contains a number of functions that make parsing dates into POSIXct or Date objects easy. Here we use dmy which automatically parses dates in Day, Month, Year formats. Once your data is in a date format, you can sort it with dplyr::arrange (or any other ordering function) as desired:
d$V3 <- lubridate::dmy(d$V3)
dplyr::arrange(d, V3)
In case you want to sort dates with descending order the minus sign doesn't work with Dates.
out <- DF[rev(order(as.Date(DF$end))),]
However you can have the same effect with a general purpose function: rev(). Therefore, you mix rev and order like:
#init data
DF <- data.frame(ID=c('ID3', 'ID2','ID1'), end=c('4/1/09 12:00', '6/1/10 14:20', '1/1/11 11:10')
#change order
out <- DF[rev(order(as.Date(DF$end))),]
Hope it helped.
You can use order() to sort date data.
# Sort date ascending order
d[order(as.Date(d$V3, format = "%d/%m/%Y")),]
# Sort date descending order
d[rev(order(as.Date(d$V3, format = "%d/%m/%y"))),]
Hope this helps,
Link to my quora answer https://qr.ae/TWngCe
Thanks
If you just want to rearrange dates from oldest to newest in r etc. you can always do:
dataframe <- dataframe[nrow(dataframe):1,]
It's saved me exporting in and out from excel just for sort on Yahoo Finance data.
The only way I found to work with hours, through an US format in source (mm-dd-yyyy HH-MM-SS PM/AM)...
df_dataSet$time <- as.POSIXct( df_dataSet$time , format = "%m/%d/%Y %I:%M:%S %p" , tz = "GMT")
class(df_dataSet$time)
df_dataSet <- df_dataSet[do.call(order, df_dataSet), ]
You could also use arrange from the dplyr library.
The following snippet will modify your original date string to a date object, and order by it. This is a good approach, as you store a date as a date, not just a string of characters.
dates <- dates %>%
mutate(date = as.Date(date, "%d/%m/%Y")) %>%
arrange(date)
If you just want to order by the string (usually an inferior option), you can do this:
dates <- dates %>%
arrange(date = as.Date(date, "%d/%m/%Y"))
If you have a dataset named daily_data:
daily_data <- daily_data[order(as.Date(daily_data$date, format="%d/%m/%Y")),]

R - converting Character strings into date with only Year/Month

Goal: Plot a time series.
Problem: X-axis data is of course viewed as a character and I'm having trouble converting the character into a date.
new.df <- df %>%
group_by(Month, Year) %>%
summarise(n = n())
new.df <- new.df %>%
unite(Date, Month, Year, sep = "/") %>%
mutate(Total = cumsum(n))
So, I end up with a data frame looking like this:
Date n Group Total
8/2010 1 1 1
9/2010 414 1 415
etc
I'm trying to convert the Date column into a Date format. The column is a character class. So, I tried doing
new.df$Date <- as.Date(New.Patients$Date, %m/%Y)
However, when I do that, it replaces the entire Date column into NA's.
I'm not sure if this is because my single-digit month dates do not have 0's in front or not. I did the unite() function just because I thought it may make it easier, but it might not.
I originally created the Year/Month variable with the lubridate package but I wasn't sure I could incorporate that here. Bonus points if someone can show me how.
I would appreciate any help or guidance. I'm sure it's not that hard I am just having a major brain fart at the moment.
You can try like this:
library(zoo) # for yearmon
new.df$Date <- as.yearmon(New.Patients$Date, format="%m/%Y")
But if you really need it to be as.Date then I guess you have to define day (e.g. 01) as #lukeA has suggested in comment.
My issue, as pointed out by lukeA in the comments, is that the as.Date function requires a day to be somewhere within the character string.
Therefore, just by pasting "01" (or I think virtually any other two-digit combination would work) to the front of each date fixed the issue.

Resources