I'm trying to create descriptive statistics in a "publishable" html format.
Let's take the mtcars data and assume I want to create a table that gives me the usual descriptive statistics for Miles/(US) gallon, Gross horsepower, Weight (1000 lbs), and 1/4 mile time for both automatic and manual cars.
I can get a rough version of what I am looking for by using psych::describeBy
library(tidyverse)
library(psych)
#Descriptive statistics
data("mtcars")
df <- mtcars|>
select(1,4,6,7,9)
describeBy(df, group = mtcars$am, fast=TRUE)
However, I am trying to create this in a format that is close to what you would find in journal articles and also can be exported as html. Anyone got any suggestions? I tried to use stargazer but struggled to get results for both groups in one table.
Thanks!
Here is minimal raw data for mtcars grouped by am with gtsummary:
library(tidyverse)
data(mtcars)
as_tibble(mtcars)
library(gtsummary)
mtcars %>% tbl_summary(by = am)
You can have various modifications (labels etc.) in gtsummary.
im searching a r package that enables me to compute Goodman and Kruskal's gamma correlations within each subject. I have 2 variables with 16 items each, which I would like to correlate per subject.
So far I used the Hmisc package and the rcorr.cens() function. However, the function creates an correlation overall subject and I failed to adapt the code to get a correlation for each subject... Thats how I tried so far....
```Gamma_correlation <- dataframe %>%
group_by(Subject) %>%
rcorr.cens(dataframe$Variable_1,
dataframe$Variable_2,
outx = TRUE)[2]```
You could possibly achieve this by removing the dataframe$ and just keeping the variable names. Hmisc also seems to mask the summarize function of the dplyr, so you can put the rcorr.cens chunk in dplyr::summarize().
I am using the code below to get correlations between my dependent variable and a questionnaire response (for different levels of different conditions).
BREAK %>%
group_by(condition, valence) %>%
summarize(COR=cor(rt, positive_focused_cognitiveER)) %>%
ungroup()
It gives me the correlations and their directions (+/-).
I would like to know, however, if those correlations are significant.
Is there a way to simply add a line to the code I already have to get the p-values?
Or another easy code? (I don't need fancy stuff, just the numbers)
The only fitting post I found for my problem was this one Getting p values for groupwise correlation using the dplyr package but the answer did not help me.
Thanks in advance for any tips! :)
You can compute p-values with stats::cor.test :
BREAK %>%
group_by(condition, valence) %>%
summarize(COR = stats::cor.test(rt, positive_focused_cognitiveER)$estimate,
pval = stats::cor.test(rt, positive_focused_cognitiveER)$p.value
) %>%
ungroup()
This is complete reEdit of my orignal question
Let's assume I'm working on RT data gathered in a repeated measure experiment. As part of my usual routine I always transform RT to natural logarytms and then compute a Z score for each RT within each partipant adjusting for trial number. This is typically done with a simple regression in SPSS syntax:
split file by subject.
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT rtLN
/METHOD=ENTER trial
/SAVE ZRESID.
split file off.
To reproduce same procedure in R generate data:
#load libraries
library(dplyr); library(magrittr)
#generate data
ob<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,3)
ob<-factor(ob)
trial<-c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6)
rt<-c(300,305,290,315,320,320,350,355,330,365,370,370,560,565,570,575,560,570)
cond<-c("first","first","first","snd","snd","snd","first","first","first","snd","snd","snd","first","first","first","snd","snd","snd")
#Following variable is what I would get after using SPSS code
ZreSPSS<-c(0.4207,0.44871,-1.7779,0.47787,0.47958,-0.04897,0.45954,0.45487,-1.7962,0.43034,0.41075,0.0407,-0.6037,0.0113,0.61928,1.22038,-1.32533,0.07806)
sym<-data.frame(ob, trial, rt, cond, ZreSPSS)
I could apply a formula (blend of Mark's and Daniel's solution) to compute residuals from a lm(log(rt)~trial) regression but for some reason group_by is not working here
sym %<>%
group_by (ob) %>%
mutate(z=residuals(lm(log(rt)~trial)),
obM=mean(rt), obSd=sd(rt), zRev=z*obSd+obM)
Resulting values clearly show that grouping hasn't kicked in.
Any idea why it didn't work out?
Using dplyr and magrittr, you should be able to calculate z-scores within individual with this code (it breaks things into the groups you tell it to, then calculates within that group).
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN))
You should then be able to do use that in your model. However, one thing that may help your shift to R thinking is that you can likely build your model directly, instead of having to make all of these columns ahead of time. For example, using lme4 to treat subject as a random variable:
withRandVar <-
lmer(log(rt) ~ cond + (1|as.factor(subject))
, data = experiment)
Then, the residuals should already be on the correct scale. Further, if you use the z-scores, you probably should be plotting on that scale. I am not actually sure what running with the z-scores as the response gains you -- it seems like you would lose information about the degree of difference between the groups.
That is, if the groups are tight, but the difference between them varies by subject, a z-score may always show them as a similar number of z-scores away. Imagine, for example, that you have two subjects, one scores (1,1,1) on condition A and (3,3,3) on condition B, and a second subject that scores (1,1,1) and (5,5,5) -- both will give z-scores of (-.9,-.9,-.9) vs (.9,.9,.9) -- losing the information that the difference between A and B is larger in subject 2.
If, however, you really want to convert back, you can probably use this to store the subject means and sds, then multiply the residuals by subjSD and add subjMean.
experiment %<>%
group_by(subject) %>%
mutate(rtLN = log(rt)
, ZRE1 = scale(rtLN)
, subjMean = mean(rtLN)
, subjSD = sd(rtLN))
mylm <- lm(x~y)
rstandard(mylm)
This returns the standardized residuals of the function. To bind these to a variable you can do:
zresid <- rstandard(mylm)
EXAMPLE:
a<-rnorm(1:10,10)
b<-rnorm(1:10,10)
mylm <- lm(a~b)
mylm.zresid<-rstandard(mylm)
See also:
summary(mylm)
and
mylm$coefficients
mylm$fitted.values
mylm$xlevels
mylm$residuals
mylm$assign
mylm$call
mylm$effects
mylm$qr
mylm$terms
mylm$rank
mylm$df.residual
mylm$model
Sounds like a trivial one, but some research didnĀ“t come up with an elegant solution:
I have a dataframe structured with a categorial variable (GROUP) and a continuous read-out variable (bloodpressure).
How can a make a simple box-plot showing the mean for each group with its standard deviation?
There are multiple groups: A,B,C,D How can I perform an ANOVA post-hoc analysis within the dataframe. How does it work with Mann-Whitney-U-Test? Can I mark the significance level in the bar-plot?
How can I streamline this operation to multiple continuous variables (dia_bloodpressure, sys_bloodpressure, mean_bloodpressure) and sink() the output in different files (by name of the variable)?
After some research I came up with the agricolae package. This one provides multiple group comparison. The resulting objects can be pipelined into a decent plotting function for groupwise bar-graphs +/- SD or SEM. Unfortunately, no way to use markers of significance between groups in the plots.
After some more programming in R, I stumbled over another nice package suitable for medical research: psych.
Considering the question above, describe() and describeBy() get statistical overview of a dataframe and sort it by a grouping variable.
The function error.bars.by() is an advanced plotting function for mean values +/- SD.
The package offers many functions on covariate analysis, which are useful in psychological research but might also help for medical and marketing research.
A possible code snippet:
library(psych)
x<-c(1,2,3,4,5,6,7,8,9,NA)
y<-c(2,3,NA,3,4,NA,2,3,NA,2)
group<-rep((factor(LETTERS[1:2])),5)
df<-data.frame(x,y,group)
df
by(df$x,df$group,summary)
by(df$x,df$group,mean)
sd(df$x) #result: NA
sd(df$x, na.rm=TRUE) #result: 2.738613
v = c("x", "y")#or
v = colnames(df)[1:2]
sapply(v, function(i) tapply(df[[i]], df$group, sd, na.rm=TRUE))
describeBy(df$x, df$group)
error.bars.by(df$x, df$group, bars=TRUE)