what‘s the difference bw One-way ANOVA and One-way ANOVA for repeated measures In R - r

for example
one-way :
aov.res2 <- aov(mark ~ teacher, data=my_data2)
one-way for repeated measures :
aov(mark ~ teacher + Error(essay/teacher), data=my_data2)
what's the difference bw teacher + Error(essay/teacher) and teacher.
1.why add plus Error() after teacher & what's that mean?
2.why in the Error, we use essay/teacher not essay * teacher?

First, stackoverflow likely isn't the best place to ask such a theoretical question. There are other sites that lend better to your question. Try Cross Validated.
Having studied statistics extensively, I will give you a high level answer and then direct you to look for more details in textbooks or elsewhere online.
Let's make sure we understand what repeated measures data is. An example of such data would be measuring the blood pressure of a patient every day for a week. Hence we have several "repeated" measurements from one subject. If we did this for many patients/subjects, we then have repeated measures data.
Repeated measures data is inherently different from other data because we expect that the data we observe from the same subject, say over time, will be correlated. (Referring to our previous example, we expect that the blood pressure of a patient tomorrow will be related to the blood pressure of that same patient today.) If you have repeated measures data but don't model it as such, you are leaving out important information about how the data might be related within a subject. Modeling the data properly will then give you a more complete and accurate view, particularly in the variance. Said another way, the data collected from one patient does not vary the same way that the data varies between patients.
Hopefully this helps you understand the nuances of the two methods in question. Certainly I have not explicitly detailed the coding syntax, but I hope that this answer will help you understand why they are different. Once you understand the theory better, your questions will likely change and be more specific. Good luck!

I found the answer:
The experimental object is measured multiple times, so there will be factors within the group. The factors within the group will be specially marked in the following form,
where "teacher" is the factor within the group, and "essay" is the ID of experimental object.

Related

Is a hurdle model adapted to analyse data such as the number of trials?

I am analysing data about song recording of birds. Birds had several recording trials to sing, some of them sang during the first or second trial, some needed more than 10 trials, some never sang even after 15 trials or more. Birds that sang were not recorded again. My data contains a binary variable (did or did not sing), the number of trials until singing or until we definitively stopped recording, and the amount of song phrases that were produced.
I have 4 groups of birds with different temperature treatments, and I try to see if those treatments impact the propensity to sing. I first focused on the binary variable, but my colleagues suggested to also include the number of trials (how hard it's been to have them sing) and the number of phrases produced (amount of singing behaviour).
They suggested to use a hurdle model: first, did the bird sang or not, and then if it did, how much. I liked this idea very much, but it doesn't take into account the number of trials before singing. I don't really know how to analyse those 3 variables so I'm asking for advice and help.
I tried:
to include the number of trials as a covariate, but birds in some treatment groups needed significantly more trials to sing than birds in other groups, and I'm afraid it overlaps with the effect of the treatment in the model
to use the number of trials as the dependent variable, but it seems to me that a hurdle model wouldn't be the most adequate method to analyse this type of data. I see the number of trials more like a succession of opportunities for the bird to sing or not than one observation at a given point, contrary to the number of phrases the bird sang during a given recording.
I have very little experience with hurdle models and other zero-inflated models, so I have reached an impasse and I would really appreciate your opinion. Thanks in advance!
After asking to some collaborators, someone suggested a much better way to analyse this type of data.
I was trying to apply a zero-inflated or zero-altered method, but my data is actually right-censored. I used a survival analysis, I just briefly explain here in case someone would have the same problem as I did:
We use a survival analysis when we want to analyse the number of events along a given time (in health studies, the survival within 5 years for instance). But some individuals are censored because the event didn't happen in the time period that we study.
I have exactly this type of data: I analyse if a bird sang or not (event), and how many trials it needed to sing (time), but some birds didn't sing within the time I dedicated for recordings and those individuals are censored because I don't know how many trials they would need to sing.
I hope this can help other people struggling like me with this kind of data, it is not always easy to find an appropriate analysis.

Comparing data from two different samples (social science data)

Dear Stackoverflow community,
I would really appreciate your advice in the following matter.
I would like to compare two different data sets from a social science survey with each other: one group had "no" treatment, the other group had a treatment. The sociodemographics are broadly similar (gender, professional background), except for age: in one group are slightly younger participants than in the other. I tested different items on Likert scales (ordinal data from 1-5).
At this stage, I am using an unpaired Wilcoxon rank sum test to compare both data sets as I would consider them as independent from each other. Do you think this is the right approach in this matter? I looked up a few medical studies that compared different treatment groups and they used e.g. ANVOVAs. Maybe this is a better approach?
I would very much appreciate your support. Many thanks!

R caret training - but each sample has three separate measurements and I want to use majority vote to predict

I have a very specific datasets with 50 people. Each person has a response (sex) and ~2000 measurements of some biological stuff.
We have three independent replicates from each person, so 3 rows pr. person.
I can easily use caret and groupKFold() to keep each person in either training or test sets - that works fine.
Then I simply predict each replicate separately (so 3 prediction pr person).
I want to use these three prediction together and make a combined prediction pr. person either using majority vote and/or some other scheme.
I.e. - so for each person I get the 3 predictions and predict the response to be the one with most votes. That's pretty easy to do for the final model, but it should also be used in the tuning step (i.e. in the cross validation picking parameter values).
I think I can do that in the summaryFunction=... when calling caret::trainControl() but I would simply like to ask:
Is there a simpler way of doing this?
I have googled around - but I keep failing in finding people with similar problems. And I really hope someone can point me in the right direction.

Network model with much more variables than samples

Dear stackoverflow forum,
this is more of a background question and I hope someone finds the time to give me an advice.
The last few weeks I was learning on how to create a food web model depending on abundances of species (obtained by analysing genomic sequences from several places).
Given that this project was my actual start with this topic (i.e. coding, network modelling) I read very much, but could just understand a small part of it. Now I finally got the data and even if I filter it as much as maintainable there are more than 300 hundred species, but just 27 samples (not all species are present in every sample) and just 1-2 environmental parameters.
My first intention was to produce a food web which shows the strength of interaction & its direction, because the goal is to win knowledge about a uncharted biotope. Do you think it is possible to create a statistical reliable food web (with R) based on this low information or at least a co occurrence network? Because I got my doubts.. For example because working with the robust lm function would force me to restrict the number of species to 27 (samples).
If yes, a hint on how to, or some literature would make my day.
If this is the completely wrong place for this type of question, just tell me and I will delete it, but an advice for a better forum would be nice, maybe like stats.stackexchange?
Many thanks in advance

prediction and imputation of missing values using a panel data model (R)

I have a panel dataset, which is unbalanced. I created a pooled model and now need to predict and input the missing values of the dataset. How can it be done?
Here is a printscreen of my data: https://imagizer.imageshack.us/v2/1366x440q90/661/RAH3uh.jpg
Thank you!
First of all it looks like you have a too broad question in here. If you're really asking about how you should predict values for your spreadsheet (i.e cells: Z6,AA6,...,AM22,...); yes you have a HUGE questions =]. Just a hint, in your following questions, you should be more specific, like: I have THIS data related to Households in Belarus. I've searched about predicting models for that and tried XPTO1 and XPTO2. How can I decide which one is better?
So, what I really mean here is that predicting is not exactly a function like SUM, that you can apply to your data and that's it. Prediction is a whole discipline, with a bunch of methods that should be tested to different cases. For example, to predict the Z6 cell in your data, you should to ask yourself what other data can contribute to infer data missing information? In some cases the simple average value for the past 5 years will be enough, in some other, a lot more should be considered.
I recommend you to first take a look at some basic material that covers simple models, like linear models, play with them, try to understand the accuracy of obtained predictions... That will finally solve your problem, or will at least help you to ask the community more "answerable" questions.
One last tip: there is a new SO's sister Q&A community that may be more appropriate to ask questions about prediction models: https://datascience.stackexchange.com/
Good luck.

Resources