The data I am working with is from eBird, and I am looking to sort out species occurrence by both name and year. There are over 30k individual observations, each with its own number of birds. From the raw data I posted below, on Jan 1, 2021 and someone observed 2 Cooper's Hawks, etc.
Raw looks like this:
specificName indivualCount eventDate year
Cooper's Hawk 1 (1/1/2018) 2018
Cooper's Hawk 1 (1/1/2020) 2020
Cooper's Hawk 2 (1/1/2021) 2021
Ideally, I would be able to group all the Cooper's Hawks specificName by the year they were observed and sum the total invidualcounts. That way I can make statistical comparisons between the number of birds observed in 2018, 2019, 2020, & 2021.
I created the separate column for the year
year <- as.POSIXct(ebird.df$eventDate, format = "%m/%d/%Y") ebird.df$year <- as.numeric(format(year, "%Y"))
Then aggregated with the follwing:
aggdata <- aggregate(ebird.df$individualCount , by = list( ebird.df$specificname, ebird.df$year ), FUN = sum)
There are hundreds of bird species, so Cooper's Hawks start on the 115th row so the output looks like this:
Group.1 Group.2 x
115 2018 Cooper's Hawk 86
116 2019 Cooper's Hawk 152
117 2020 Cooper's Hawk 221
118 2021 Cooper's Hawk 116
My question is how to I get the data to into a table that looks like the following:
Species Name 2018 2019 2020 2021
Cooper's Hawk 86 152 221 116
I want to eventually run some basic ecology stats on the data using vegan, but one problem first I guess lol
Thanks!
There are errors in the data and code in the question so we used the code and reproducible data given in the Note at the end.
Now, using xtabs we get an xtabs table directly from ebird.df like this. No packages are used.
xtabs(individualCount ~ specificName + year, ebird.df)
## year
## specificName 2018 2020 2021
## Cooper's Hawk 1 1 2
Optionally convert it to a data.frame:
xtabs(individualCount ~ specificName + year, ebird.df) |>
as.data.frame.matrix()
## 2018 2020 2021
## Cooper's Hawk 1 1 2
Although we did not need to use aggdata if you need it for some other reason then it can be computed using aggregate.formula like this:
aggregate(individualCount ~ specificName + year, ebird.df, sum)
Note
Lines <- "specificName,individualCount,eventDate,year
\"Cooper's Hawk\",1,(1/1/2018),2018
\"Cooper's Hawk\",1,(1/1/2020),2020
\"Cooper's Hawk\",2,(1/1/2021),2021"
ebird.df <- read.csv(text = Lines, strip.white = TRUE)
I have the data like this(about credit rating and default)
credit rating
Normal
Default
1st grade
220
0
2nd grade
737
3
3rd grade
680
7
4th grade
73
3
5th grade
6
0
I took Fisher exact test in R.
First, I save the data in the vector name as "chisq".
Second, took the fisher.exact test using this code.
fisher.test(chisq,hybrid=TRUE)
Then, I got the pvalue 0.02791.
But my colleague took same test in SAS and he got pvalue 0.0503.
I can't understand why the result of test different between R and SAS.
Please Help.
I am trying to export a summary of categorical variables through the R markdown. The output of the summary is in the Czech language with diacritics, but the R isn't able to encode it.
Example
summary(data$Et6_d1q_ii)
3 a vĂce hodin 1 - 2 hodiny mĂ©nÄ› neĹľ 1 hodinu žádnĂ˝ NA's
113 240 196 111 6932
Is there any way how to set the endocing globally so the output is readable? I wasn't able to find it anywhere.
Thank you!
I have two datasets and i need to merge specific points from these two datasets in a third matrix which i will create.
I am trying to create a matrix with stock returns for all the companies in my dataset.
My dataset of the companies (referencedata) looks like this:
Company PERNMO earlengage
A 45643 6/7/2011
B 86743 9/12/2012
C 75423 3/4/2011
D 95345 2/11/2011
......
My dataset of the stock returns (datastock) looks like this:
PERNMO date returns
11456 1/3/2011 3.4%
11456 1/4/2011 5.4%
11456 1/5/2011 0.5%
11456 1/6/2011 1.2%
11456 1/7/2011 0.7%
......
I need to use the PERMNO code in referencedata as an identifier to locate the company i am looking for in datastock. At the same time, i need to use earlengage in referencedata as an identifier to find the same date in datastock and then select the 250 returns datapoints prior to that day in datastock.
I want to put all these 250 datapoints for each stock in one matrix (250 rows for the returns & n columns relating to the number of stocks).
I am struggling to replicate the equivalent of the vlookup function in Excel. The output matrix would look like this:
PERNMO date returns
45643 1/3/2011 3.4%
45643 1/4/2011 5.4%
45643 1/5/2011 0.5%
......
45643 6/7/2011 1.2%
(this is the earlengage date)
Any help would be much appreciated.
The way I see it, you are trying to solve two problems in one shot. The first one is merging, and the other is taking the last 250 data points and converting it into a matrix. I'd approach this problem in the simplest way possible by going through the rows one by one rather than trying to solve it using one function
# Sorting so that we can take the bottom 250 rows to find the latest data
datastock = datastock[order(datastock$date),]
dataMatrix = NULL
for (i in 1:nrow(referencedata))
{
single_stock_data = subset(datastock, PERNMO == referencedata$PERNMO[i] &
date < referencedata$PERNMO[i])
dataMatrix = cbind(dataMatrix, tail(single_stock_data$returns, 251)[1:250]
}
I haven't tested the code but this should work.
So I have a dataframe with about 500,000 obs that looks like this:
ID MonthYear Group
123 200811 Blue
345 201102 Red
678 201110 Blue
910 201303 Green
I would like to convert this to a panel that counts the number of occurrences for each group in each month. So it would look like this:
MonthYear Group Count
200801 Blue 521
200802 400
....
200801 Red 521
200802 600
....
I guess it doesn't need to look exactly like that, but just some way to turn this into a useful panel. Aggregate doesn't seem to be sufficient in and of itself.
aggregate(dfrm$ID, dfrm[,c("MonthYear","Group")], length)
If you want to reverse the grouping just reverse the order of the INDEX argument.