Correlation of multiple values across years and strata - r

I have a dataframe that looks like this:
Year Strata Value1 Value2
1999 1 44268 0.8725
1999 2 46009 1.4550
1999 3 27715 3.1100
2000 1 24015 1.5800
2000 2 55601 1.5400
2000 3 15765 3.3200
I'm looking to find if value1 is correlated with value2, across years and strata. The real dataframe has many more years than this.
The repeated measure needs to be year, and it needs to be blocked by strata.
How is this done using R? Do you need to use aov()?

Related

In r, how do I add rows together to get totals for a specific set of variables [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 1 year ago.
My goal is to have a list of how much FDI China sent to each country per year. At the moment I have a list of individual projects that looks like this
Year
Country
Amount
2001
Angola
6000000
2001
Angola
8000000
2001
Angola
5.0E7
I want to sum it so it looks like this.
Year
Country
Amount
2001
Angola
6.4E7
How do I merge the rows and add the totals to get nice country-year data? I can't find an R command that does this precise thing.
library(tidyverse)
I copied the data table and read your dataframe into R using:
df <- clipr::read_clip_tbl(clipr::read_clip())
I like using dplyr to solve this question:
df2 <- as.data.frame(df %>% group_by(Country,Year) %>% summarize(Amount=sum(Amount)))
# A tibble: 1 x 3
# Groups: Country [1]
Country Year Amount
<chr> <int> <dbl>
1 Angola 2001 64000000

How to sum a variable by group but do not aggregate the data frame in R? [duplicate]

This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 4 years ago.
although I have found a lot of ways to calculate the sum of a variable by group, all the approaches end up creating a new data set which aggregates the double cases.
To be more precise, if I have a data frame:
id year
1 2010
1 2015
1 2017
2 2011
2 2017
3 2015
and I want to count the number of times I have the same ID by the different years, there are a lot of ways (using aggregate, tapply, dplyr, sqldf etc) which use a "group by" kind of functionality that in the end will give something like:
id count
1 3
2 2
3 1
I haven't managed to find a way to calculate the same thing but keep my original data frame, in order to obtain:
id year count
1 2010 3
1 2015 3
1 2017 3
2 2011 2
2 2017 2
3 2015 1
and therefore do not aggregate my double cases.
Has somebody already figured out?
Thank you in advance

Aggregates by group and including counts across rows [duplicate]

This question already has answers here:
Apply several summary functions (sum, mean, etc.) on several variables by group in one call
(7 answers)
Closed 6 years ago.
I have this data frame:
YEAR NATION VOTE
2015 NOR 1
2015 USA 0
2015 CAN 1
2015 RUS 1
2014 USA 1
2014 USA 1
2014 USA 0
2014 NOR 1
2014 NOR 0
2014 CAN 1
...and it goes on and on with more years, nations and votes. VOTE is binary, yes(1) or no(0). I am trying to code an output table that aggregates on year and nation, but that also that brings the total number of votes for each nation (the sum of 0's and 1's) together with the total number of 1's, in an output table like the one sketched below (sumVOTES being the total number of votes for that nation that year, i.e. sum of all 1s and 0s):
YEAR NATION VOTE-1 sumVOTES %-1s
2015 USA 8 17 47.1
2015 NOR 7 13 53.8
2015 CAN 3 11 27.2
2014 etc.
etc.
You are not providing your data.frame in a reproducible manner.
But this should work...
library(data.table)
# assuming 'df' is your data.frame
setDT(df)[, .('VOTE-1' = sum(VOTE==1),
'sumVOTES' = .N,
'%-1s' = 1e2*sum(VOTE==1)/.N),
by = .(YEAR, NATION)]
setDT converts data.frame to data.table by reference.

R - Bootstrap by several column criteria

So what I have is data of cod weights at different ages. This data is taken at several locations over time.
What I would like to create is "weight at age", basically a mean value of weights at a certain age. I want do this for each location at each year.
However, the ages are not sampled the same way (all old fish caught are measured, while younger fish are sub sampled), so I can't just create a normal average, I would like to bootstrap samples.
The bootstrap should take out 5 random values of weight at an age, create a mean value and repeat this a 1000 times, and then create an average of the means. The values should be able to be used again (replace). This should be done for each age at every AreaCode for every year. Dependent factors: Year-location-Age.
So here's an example of what my data could look like.
df <- data.frame( Year= rep(c(2000:2008),2), AreaCode = c("39G4", "38G5","40G5"), Age = c(0:8), IndWgt = c(rnorm(18, mean=5, sd=3)))
> df
Year AreaCode Age IndWgt
1 2000 39G4 0 7.317489899
2 2001 38G5 1 7.846606144
3 2002 40G5 2 0.009212455
4 2003 39G4 3 6.498688035
5 2004 38G5 4 3.121134937
6 2005 40G5 5 11.283096043
7 2006 39G4 6 0.258404136
8 2007 38G5 7 6.689780137
9 2008 40G5 8 10.180511929
10 2000 39G4 0 5.972879108
11 2001 38G5 1 1.872273650
12 2002 40G5 2 5.552962065
13 2003 39G4 3 4.897882549
14 2004 38G5 4 5.649438631
15 2005 40G5 5 4.525012587
16 2006 39G4 6 2.985615831
17 2007 38G5 7 8.042884181
18 2008 40G5 8 5.847629941
AreaCode contains the different locations, in reality I have 85 different levels. The time series stretches 1991-2013, the ages 0-15. IndWgt contain the weight. My whole data frame has a row length of 185726.
Also, every age does not exist for every location and every year. Don't know if this would be a problem, just so the scripts isn't based on references to certain row number. There are some NA values in the weight column, but I could just remove them before hand.
I was thinking that I maybe should use replicate, and apply or another plyr function. I've tried to understand the boot function but I don't really know if I would write my arguments under statistics, and in that case how. So yeah, basically I have no idea.
I would be thankful for any help I can get!
How about this with plyr. I think from the question you wanted to bootstrap only the "young" fish weights and use actual means for the older ones. If not, just replace the ifelse() statement with its last argument.
require(plyr)
#cod<-read.csv("cod.csv",header=T) #I loaded your data from csv
bootstrap<-function(Age,IndWgt){
ifelse(Age>2, # treat differently for old/young fish
res<-mean(IndWgt), # old fish mean
res<-mean(replicate(1000,sample(IndWgt,5,replace = TRUE))) # young fish bootstrap
)
return(res)
}
ddply(cod,.(Year,AreaCode,Age),summarize,boot_mean=bootstrap(Age,IndWgt))
Year AreaCode Age boot_mean
1 2000 39G4 0 6.650294
2 2001 38G5 1 4.863024
3 2002 40G5 2 2.724541
4 2003 39G4 3 5.698285
5 2004 38G5 4 4.385287
6 2005 40G5 5 7.904054
7 2006 39G4 6 1.622010
8 2007 38G5 7 7.366332
9 2008 40G5 8 8.014071
PS: If you want to sample all ages in the same way, no need for the function, just:
ddply(cod,.(Year,AreaCode,Age),
summarize,
boot_mean=mean(replicate(1000,mean(sample(IndWgt,5,replace = TRUE)))))
Since you don't provide enough code, it's too hard (lazy) for me to test it properly. You should get your first step using the following code. If you wrap this into replicate, you should get your end result that you can average.
part.result <- aggregate(IndWgt ~ Year + AreaCode + Age, data = data, FUN = function(x) {
rws <- length(x)
get.em <- sample(x, size = 5, replace = TRUE)
out <- mean(get.em)
out
})
To handle any missing combination of year/age/location, you could probably add an if statement checking for NULL/NA and producing a warning and/or skipping the iteration.

R Table data with a grouping command

This seems like a very simple problem, but I can't seem to sort it out. I have sought help from this forum, with the below topics being close, but don't seem to do exactly what I need. I have count data over several years. I want to obtain frequencies of the count value by year. It seems I need a table function with a grouping option, but I haven't found the proper syntax.
Data:
count year
1 15 1957
2 6 1957
3 23 1957
4 23 1957
5 2 1957
6 28 1980
7 15 1980
8 32 1980
9 18 1981
thank you in advance!
Counting the number of elements with the values of x in a vector
grouping data splitted by frequencies
Aggregate data in R
You're looking for the table function. Something like:
with(yourdata, table(Year, Count))

Resources