Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 days ago.
Improve this question
I want to know if there is a significant difference in a blood biomarker concentration between 2 populations (population 1 = healthy individuals - population 2 = sick individuals). I need to control for the factor 'region'.
My issue is that the distribution of population 2 is not normal (data are censored following the upper detection limit by the lab device) as shown on these plots:
With a normal distribution I would use this model in R:
m <- glm(blood.biomarker ~ status + region + status*region, data=f, family="gausian") # status = healthy or sick
summary(m)
emmeans(m, list(pairwise ~ status), adjust = "tukey")
I am a bit confused regarding the model or the glm family I should use in this case.
I also have a similar situation but with 3 groups (1 group has a normal distribution and 2 groups have a censored distribution). How to deal with this?
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I have distribution of parameter (natural gas mixture composition) expressed in percents. How to test such data for distribution parameters (it should be gamma, normal or lognormal distribution) and generate random composition based on that parameters in R?
This might be a better question for CrossValidated, but:
it is not generally a good idea to choose from among a range of possible distributions according to goodness of fit. Instead, you should choose according to the qualitative characteristics of your data, something like this:
Frustratingly, this chart doesn't actually have the best choice for your data (composition, continuous, bounded between 0 and 1 [or 0 and 100]), which is a Beta distribution (although there are technical issues if you have values of exactly 0 or 100 in your sample).
In R:
## some arbitrary data
z <- c(2,8,40,45,56,58,70,89)
## fit (beta values must be in (0,1), not (0,100), so divide by 100)
(m <- MASS::fitdistr(z/100,"beta",start=list(shape1=1,shape2=1)))
## sample 1000 new values
z_new <- 100*rbeta(n=1000,shape1=m$estimate["shape1"],
shape2=m$estimate["shape2"])
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
With respect to bayesian curve fitting, eq 1.68 of Bishop - Pattern recognition
How is the following result derived :
p(t|x, x, t) = Integration{ p(t|x, w)p(w|x, t) } dw
Lets just consider a simpler case using the Law of total probability.
If w1, w2 are disjoint events then
p(A) = p(A|w1) p(w1) + p(A|w2) p(w2)
we can extend this to any number of items
p(A) = sum_{wi} p(A|wi) p(wi)
or indeed take the limit
p(A) = int_{w} p(A|w) p(w) dw
We can make A depend on another independent event B that the w's might depend on
p(A|B) = int_{w} p(A|w) p(w|B) dw
or an event C which the w's do not depend on
p(A|B,C) = = int_{w} p(A|w,C) p(w|B) dw
which is just your formula with different variables.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
I am trying to do a logistic regression in R with weights, but I dont really know how it works. When I apply weights, something weird happens and all the values appear at 1 but I dont see why? (also how can I fit a line through the points?)
I try to calculate a correlation coefficient for the observed value to the predicted value. Also I am aiming for a plot with "fra" on the y-axis ranging from 0-1, the temp on the x-axis, the fra values in the plot and a line for the regression (something like this example: http://imgur.com/FWevi36)
Thanks!
What I have so far (made up code):
#Dataframe
temp=c(1,1,2,2,3,4,4,5,5,6,6,7,7,8,8)
fra=c(0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.2,0.2,0.3,0.1,0.3,0.4,0.0,0.5)
bin=c(0,0,0,0,0,0,1,1,1,1,1,1,1,0,1)
test1 <- as.data.frame(cbind(temp,bin,fra))
#Overview
plot(test1$temp, test1$bin)
plot(test1$fra)
boxplot(test1$temp ~ test1$bin, horizontal=TRUE)
#Logistic Regression without weight
glmt1 <- glm(test1$bin~test1$temp, family=binomial)
coefficients(summary(glmt1))
fit1 <- fitted(glmt1)
#plot
plot(test1$temp, fit1, ylim=range(0,1))
#line should go to points..???
lines(test1$bin, glmt1$fitted, type="l", col="red")
#with weighted
glmt2 <- glm(test1$bin~test1$temp, family=binomial, weights=test1$fra)
coefficients(summary(glmt2))
fit2 <- fitted(glmt2)
plot(test1$temp, fit2, ylim=range(0,1))
You are only giving a positive weight to cases where bin == 1. That removes all variation in the response variable (you have fit1$bin in the LHS this time). That means your model always predicts 1 no matter what the value of temp1$temp
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am looking for a package in R that can help me to calculate the posterior probability of an event. Is there any?
Alright, I am working on such a data set
age education grade pass
group1 primary 50 no
group2 tertiary 20 no
group1 secondary 70 yes
group2 secondary 67 yes
group1 secondary 55 yes
group1 secondary 49 no
group1 secondary 76 yes
I have the prior probability of a student passing the exam is 0.6, Now I need to get the posterior probability of a student pass given his age , education level, and grade
I know I should get first P(age=group1| pass=yes)* P(education=primary| pass=yes)* P(grade>50 |pass=yes)
But this should be done for each case (row) and I have a date set with 1000 rows
So, I thought I can get a function helps me in this!
There are many packages/functions available. You should check out the Bayesian inference Task View
Also, if your prior isn't a full distribution and is only a point estimate of a probability, you're probably not actually doing bayesian inference, just using bayes rule in a frequentist framework. Very different. But now we're getting into CrossValidated territory.
This is the answer for only one variable (education) and variable (pass):
# get the prior probabilities
prior<- c(prior_no, prior_yes)
# get contingency table of values for mydata
edu_table<- with(mydata,table(mydata$pass, mydata$education))
# get the sum across (pass)
tots<- apply(edu_table,1,sum)
# create matrix of 0's
ppn<- edu_table*0
post<- edu_table*0
# use a loop to get the prior probabilities& posterior probabilities
for(i in 1:length(tots)){
for( j in 1: 4){
ppn[i,j]=edu_table[i,j]/tots[i]
post[i,j]=prior[i]*ppn[i,j]/(prior[1]*ppn[1,j]+prior[2]*ppn[2,j])
}
}
ppn # probability of education=j given y=i
post # posterior probability of y=i given education=j
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have a dataset here with latitude, longitude and salinity for an area. I have these data for three different cases. First case is for normal flow conditions, second is for high flow and third case is for waterlevelrise.
I want to understand how can we use these data and then make some type of analysis.
My data set is uploaded on https://www.dropbox.com/s/285iuyv6bugm48p/dataanalysisforthreetimes.csv
Some of the things that come up to my mind are:
Find the increase or decrease of salinity for each time or even say a pattern.
Mean salinity under different conditions
The code that I used to start in R is as follows:
mydata <- read.csv("dataanalysisforthreetimes.csv")
head(mydata)
library(reshape2)
data1 <- melt(mydata,"Lat","Long")
Would you suggest if I can fit any linear model to my data? Any suggested techniques are highly appreciated.
I want to use R to do the analysis. Can you suggest any reading as well?
mean salinity for all three conditions:
data1 <- melt(mydata,id=c("Lat","Long"))
aggregate(value ~ variable, mean, data=data1)
# variable value
#1 Highflow 4.039384
#2 Levelrise 32.238867
#3 Normal 21.153334
here is how you get the mean fro your conditions. As for linear models, you are probably best googling linear models with spatial autocorrelation in R to get your started.