R: How to find specific Summary Statistics from Dataset? - r

I want to find the median/mean/range GDP of a specific Region from my dataset.
For the summary of all Regions (Africa, Asia, Europe etc.) I put:
summary(data$GDP, na.rm=TRUE)
This displays all summary statistics of the GDP for all regions. However, I want only only regions summary statistics e.g. Africa's mean,median,quartile, or Europe. Those are the names of the region so those would be used.

Related

Binomial GLM function for specific values vs all other values

I am doing a study on specific needs for kinship caregivers. I want to look at county vs needs to see if certain counties have more significant needs than other counties. This will hopefully be used to allocate funding or policies. I am looking for need (which has various columns for each need) and county (one variable with counties 1-39,99). Some counties only have a few participants, so I only want to compare counties with >50 respondents (6,17,27, 31, 32). So, county "6" vs other counties. I was using a similar function for my other calculations that were binomial 1,0 options so I want to figure out what to use for this comparison.
model1 <-glm(need_1 ~ factor(county), family="binomial", data=Clean.Data)
coef(summary(model1))
exp(coef(model1))
exp(confint.default(model1))
I have also tried
model1 <-glm(need_1 ~ factor(county==6), family="binomial", data=Clean.Data)
coef(summary(model1))
exp(coef(model1))
exp(confint.default(model1))
I really want to use a binomial of county choosen vs other. I would like to make the comparisons between the counties listed above and not other smaller counties.
Best,
Adrienne
model1 <-glm(need_1 ~ factor(county), family="binomial", data=Clean.Data)
coef(summary(model1))
exp(coef(model1))
exp(confint.default(model1))
My exp of my coefficients are 0 to Inf so I am not expecting that.
I also tried to create a new binomial variable using:
Clean.Data$Clark <- ifelse(Clean.Data$live == 6,1,0) but this is not quite what I want because then it compares to all counties rather than the ones that have at least 50.

How to calculate Shannon diversity by factor levels in r

I am a newbie with R. I have a large dataset (66M obs) with pixel temperature data of 4 water bodies (REF,LMB, OTH, FP) at hourly time steps (6am,7am,8am...), with several NA values illustrating blank pixels. I want to calculate a proxy for temperature heterogeneity/diversity for each water body at each time, by using Shannon Diversity or other similar indexes. I have so far managed to calculate basic stats using an available online source, but not sure how to apply more specific diversity indexes.
My data looks like:
First column Temp, second Time, third water
My code:
DF<-read.csv("DF_total.csv",stringsAsFactors = T)
levels(DF$water)
[1]"OTH" "LMB" "REF" "FP"
levels(DF$time)
NULL
source("group_by_summary_stats.R")[**]
summary<-group_by_summary_stats(DF, Temp ,water ,time)
[**]source found online

Regression model for analyzing data in multiple countries and over several years (simpsons paradox in multiple regression)?

Let's say i have this data for the years 1990-2020, for 20 different countries:
(the variables are just made up)
Dependent variable
Foreign direct investment inflow (FDII)
Indendent variables:
Poverty rate (PR)
Government subsidies (GS)
Tax rates (TS)
I had an exercise in class and the method demonstrated there was just to make a multiple regression model (ie. in R would be lm(FDII ~PR+GS+TS)), but however, it seems like this method would miss out on the relationships between the variables within each country. What if the relationship between the variables is negative/positive at country level, but reverses when countries are combined?

Prop.test with errors on the population proportions

I want to compare two population proportions with a chi-squared test but my population proportions come with error bars (e.g. [10,11] people in a population of [100,101]). I don't think that the function prop.test accommodates for that so I'm looking for another function that will do the job.
For instance, let's say that my populations are:
pop 1: [195, 198] people in a group of [215,218] people
pop 2: [101, 102] people in a group of [188,189]
I'm looking for a function that will calculate the chi-squared.
Thank you!

Fitting a poisson GLM in R with an aggregated count data

I have a dataset of the number of stranded turtles reported at a variety of locations along the Queensland coast of Australia. What I would like to find out is the number of stranded turtles that are NOT reported at each of these locations. In order to estimate that number, I have collected data on the frequency with which a turtle is reported to a stranding location; i.e. how often is a single turtle stranding reported more than one time at about 20 points along the coast? So I have count data which indicates the number of turtles that are reported to a stranding location one time, two times, or three or more times. Ultimately I would like to relate these data to covariates such as local population density and distance to the nearest road, in order to predict the "zero reporting" incidence for the rest of the coastal areas as well.
My data should look something like this, then:
loc<-c("A","B","C")
rep1<-c(51,24,10)
rep2<-c(4,8,3)
rep3ormore<-c(2,1,0)
pop<-c(50,1000,100)
turtle- cbind.data.frame(loc, rep1, rep2, rep3ormore, pop)
There are other possible covariates, but I'll keep it simple for now! I think this should be able to be done using a Poisson distribution, but I'm having trouble wrapping my head around how to do it.
Additionally, in certain instances I don't have exact numbers for the turtles that have been reported, but instead I have categories; 4-6, 7-10, >10, etc. If there's a way to model that possibility, that would be great as well!

Resources