How to describe cases included in analyses in R? - r

I'm very new to R and pretty basic with analyses generally. I successfully ran a regression in R, but a lot of my data are missing. I'm fine with that because R just ignores the missing observations in the analyses and shows me the dfs in the summary. My problems is that I'd like to look more into the observations that are included in the analyses, but I'm not sure how to do that.
I tried to do na.omit, but R created a dataset with far fewer observations than it used in the regressions, so I think that takes it too far.
Basically, I'm trying to get the ages for the respondents that were included in the final analyses, not just the ages of the entire sample, many of whom were not included in the analyses.
Any advice you can give me would be very appreciated!! Please let me know if you need more information.
Thank you!
Edited to include Screenshot of data.

Related

Specification of a mixed model using glmmLasso package

I have a dataset containing repeated measures and quite a lot of variables per observation. Therefore, I need to find a way to select explanatory variables in a smart way. Regularized Regression methods sound good to me to address this problem.
Upon looking for a solution, I found out about the glmmLasso package quite recently. However, I have difficulties defining a model. I found a demo file online, but since I'm a beginner with R, I had a hard time understanding it.
(demo: https://rdrr.io/cran/glmmLasso/src/demo/glmmLasso-soccer.r)
Since I cannot share the original data, I would suggest you use the soccer dataset (the same dataset used in glmmLasso demo file). The variable team is repeated in observations and should be taken as a random effect.
# sample data
library(glmmLasso)
data("soccer")
I would appreciate if you can explain the parameters lambda and family, and how to tune them.

custom code for compact letter display from pairwise table output

I would like to create a custom code that creates a compact letter display from a pairwise test I have performed.
I have done this with pairwise t-tests with success (packages for this exist), and I am also familiar with the package library(multcomp) when I run linear models and the function cld() to get the compact letter displays, but they will not work for my specific case here.
I work with kaplan meier survival data often, and after I run the pairwise_survdiff() function to see if any statistical differences exist between groups (found in the packages library(survival) and library(survminer), I am easily able to extract a table to display all pairwise comparisons and their corresponding p-values. I have included an example for you here today. (see df below)
When their are many comparisons to do by hand, this becomes a mess to found out which groups are different / similar, and it's prone to human error when many levels exist, and up to now, I've always done it by hand. I would like to change this.
Could someone help me with a code that helps do this automatically?
Here is a mock dataframe df with 10 treatments (named treatment-1....treatment-10), and the rows are filled with p-values. Let's assume anything below p<0.05 as significant. However, it would be very cool to have a code that would allow a more conservative approach, and say set the desired cut off for statistical significance (say anything below p<0.01 as significant for example).
Thanks for your help, and again, here is a play datatframe
df <- read.table("https://pastebin.com/raw/ZAKDBjVs", header = T)
While reflecting on this, I believe I found an answer on my own, with the library(mulcompView) and library(rcompanion)
Nonetheless, I think it's important, since I have seen / heard this question multiple times. Here is how I solved my problem
library(rcompanion)
library(multcompView)
df <- read.table("https://pastebin.com/raw/ZAKDBjVs", header=T)
PT1 = fullPTable(df)
multcompLetters(PT1,
compare="<",
threshold=0.05,
Letters=letters,
reversed = FALSE)
This gives me the desired output with the compact letter displays between groups. Additionally, one could edit the statistical threshold to be either more/less conservative by changing the threshold=
Very happy with the result. This has bothered me for a while. I hope it is useful to other members

How to run Longitudinal Ordinal Logistic Regression in R

I'm working with a large data set with repeated patients over multiple months with ordered outcomes on a severity scale from 1 to 5. I was able to analyze the first set of patients using the polr function to run a basic ordinal logistic regression model, but now want to analyze association across all the time points using a longitudinal ordinal logistic model. I can't seem to find any clear documentation online or on this site so far explaining which package to use and how to use it. I am also an R novice so any simple explanations would be incredibly useful. Based on some initial searching it seems like the mixor function might be what I need though I am not sure how it works. I found it on this site
https://cran.r-project.org/web/packages/mixor/vignettes/mixor.pdf
Would appreciate a simple explanation of how to use this function if this is the right one, or would happily take any alternate suggestions with an explanation.
Thank you in advance for your help!

Is repeated anova what i am looking for?

I'm studying the NDVI (normalized vegetation index) behaviour of some soils and cultivars. My database has 33 days of acquisition, 17 kind of soils and 4 different cultivars. I have built it in two different ways, that you can see attached. I am having troubles and errors with both the shapes.
The question first of all is: Is repeated anova the correct way of analyzing my data? I want to see if there are any differences between the behaviours of the different cultivars and the different soils. I've made an ANOVA for each day and there are statistical differecies in each day, but the results are not globally interesting due to the fact that I would like to investigate the whole year behaviour.
The second question then is: how can I perform it? I''ve tryed different tutorials but I had unexpected errors or I didn't manage to complete the analysis.
Last but not the least: I'm coding with R Studio.
Any help is appreciated, I'm still new to statistic but really interested in improving!
orizzontal database
vertical database
I believe you can use the ANOVA, but as always, you have to know if that really is what you're looking for. Either way, since this a plataform for programmin questions, I'll write a code that should work for the vertical version. However, since I don't have your data, I can't know for sure (for future reference, dput(data) creates easily importeable code for those trying to answer you).
summary(aov(suolo ~ CV, data = data))

Example R source code for multiple linear regression with looping through geographies & products?

pardon the newbie question, as I just started learning R a couple weeks ago (but intend to use it actively from now on). However, I could use some help if you already have a working example.
In order to determine own price elasticity coefficients for our each of our products (~100) in each of our states, I want to be able to write a multiple regression that regresses Units on a variety of independent variables. That's straightforward. However, I would like R to be able to cycle through EACH product within a particular state, THEN move onto the next state in the data file, and start the regression on the first product, repeating the cycle.
I have attached an example of what I'm trying to accomplish. I would also like R at the end to export the regression coefficients (and summaries, p-value, t-stat) into a separate worksheet.
Does anyone have an example similar to this? I'm comfortable enough to read the source code and make modifications to fit my needs, but certainly not yet comfortable at this point to write one from scratch. And, alas, I am tired of copying/pasting into Minitab/Excel (which is what i've been using up to this point) to run regressions 1,000 times.
Appreciate any help you could offer!

Resources