we were given 2 data frames to import
1 contains a list of gene expression data for 17 patients (non - numerical)
the second one contains their gene ID and their treatment group
these data sets have to firstly be combined
and then we have to calculate the mean expression value for each treatment group
im struggling to work out how to calculate the mean and assosiate it to a certain treatment group
apologies if this does not make sense
patients<-read.table("GSE4922-GPL96_log2Mas5Sc500-N17.tab",sep = "\t", header=TRUE)
patients
attach(patients)
ProbeSetID
patientpID<-read.table("Patient-Groups-N17.tab", sep = "\t",header=TRUE)
patientpID
attach(patientpID)
PatientID
mergeddata<-merge(patientpID,patients)
grouping(TreatmentGroup)
sum(avg_pID = mean("ProbeSetID"))
this is what I have so far, but I need to find the mean of the Probe Set ID
and the group it into a treatment group
Related
to get matched pairs due to PSM ("Matchit"-Package and Method = full) i need to specifiy my command for my longitudinal data frame. Every Case has several obeservations but i only need the first observation per patient to be included in the Matching. So the matching should be based on every patients' first observation but my later analysis should include the complete dataset of each patient with all observations.
Has anyone an idea how to achieve this?
I tried using a data subset (first observation per patient) but wasn't able to get the matching included in the data set (with all observations per patient) using "Match.data".
Thanks in advance
Simon (desperately writing his masters thesis)
My udnerstanding is that you want to create matches at just the first time point but have those matches be identified for each unit at all time points. Fortunatly, this is pretty straightforward: just perform the matching at the first time point and then merge the matched dataset with the full dataset. Here is how this might look. Let's say your original long dataset is d and has an ID column id and a time column time.
m <- matchit(treat ~ X1 + X2, data = subset(d, time == 1), method = "full")
md1 <- match.data(m)
d <- merge(d, md1[c("id", "subclass", "weights")], by = "id", all.x = TRUE)
Your new dataset should have two new columns, subclass and weights, which contain the matching subclass and matching weight for each unit. Rows with identical IDs (i.e., rows corresponding to the same unit at multiple time points) will have the same value of subclass and weight.
For each of the 4 genes(each gene is on column), i need to test whether its mean expression is equal for patients with stable and progressive disease and store the corresponding p-value. Someone can help me please ? the language is in R.
Here picture of my dataframe:
Suppose this is your data frame:
df = data.frame(y=sample(c("progres.","stable"),100,replace=TRUE),matrix(rnorm(100*4),ncol=4))
colnames(df)[-1] = c("X1000_at","X1001_at","X1002_at","X1003_at")
If you just need the p.value, you can do:
apply(df[,-1],2,function(i)t.test(i ~ df$y)[["p.value"]])
X1000_at X1001_at X1002_at X1003_at
0.14861795 0.11653459 0.01820033 0.41873270
In the above, you iterate through the gene columns, t.test between groups demarcated by the y column and capture only the p value.
I am trying to identify the value of the variable in an R data frame conditioning on the value of another variable, but unable to do it.
Basically, I am giving 3 different Dose of vaccine to three groups of animals (5 animal per group ( Total )) and recording results as Protec which means the number of protected animals in each group. From the Protec, I am calculating the proportion of protection (Protec/Total as Prop for each Dose group. For example
library(dplyr)
Dose=c(1,0.25,0.0625);Dose #Dose or Dilution
Protec=c(5,4,3);Protec
Total=c(rep(5,3));Total
df=as.data.frame(cbind(Dose,Protec,Total));df
df=df %>% mutate(Prop=Protec/Total);df
Question is, what is the log10 of minimum value of Dose for which Prop==1, which can be found using the following code
X0=log10(min(df$Dose[df$Prop1==1.0]));X0
The result should be X0=0
If the Protec=c(5,5,3), the Prop becomes c(1.0,1.0,0.6) then the X0 should be -0.60206.
If the Protec=c(5,5,5), the Prop becomes c(1.0,1.0,1.0), For which I want X0=0.
if the Protec=c(5,4,5), the Prop becomes c(1.0,0.8,1.0), then also I want X0=0 because I consider them as unordered and take the highest dose for calculating X0
I think it requires if function but the conditions for which I don't know how to write the code.
can someone explain how to do it in R?. thanking you in advance
We can use mutate_at to create apply the calculation on multiple columns that have column name starting with 'Protec'
library(dplyr)
df1 <- df %>%
mutate_at(vars(starts_with("Protec")), list(Prop = ~./Total))
I am analysing data from a Delphi study and I need to create a vector of the frequency of each score (1:10) for each stakeholder group (6 groups, total of 73 participants) for each outcome (48). The data is in the form:
I would like to create a vector similar to:
score 1,2,3,4,5,6,7,8,9
trialists<-c(0,0,0,0,28.6,71.4,0,0,0)
Where it is expressed as a percentage of a stakeholder group (e.g. trialists) that have scored each score for each outcome . I need to excluded a score of 10 as it represents "unable to answer".
This will result in 48 vectors for each of the 6 stakeholder groups.
Is there a elegant way to do this on R rather than just plodding through the data on excel and inputting it manually?
I have a dataframe that lists studentnumber <- c( 1,2,3.. nth) and schoolnumber<- c(1,1,2,3,4,4) so pupil 1 is in school 1, pupil 2 is in school 1, pupil 3 is in school 3....
I have social economic status for each pupil and I want to calculate a new column where the SESs are actual SES minus the mean SES of a particular school. The function for this is apparently:
mydata$meansocialeconomicstatus <- with(mydata, tapply(ses, schoolnumber, mean))
But I receive an error term because the new column is not repeating each value depending on if the school number has repeated. So this gives me a discrepancy in the number of rows in the new column not matching the dataframe. This is because each mean is only being given once.
My question is, what could I add to make the mean ses repeat in the new column depending on the school number?
You can use the dplyr package.
library(dplyr)
# Calculate the mean socialeconomicstatus per schoolnumber.
mydata2 <- mydata %>%
group_by(schoolnumber) %>%
summarise(meansocialeconomicstatus = mean(ses))
# Join the mean socialeconomicstatus back to the original dataset based on schoolnumber.
left_join(mydata,mydata2,by="schoolnumber")