Related
I'm new to r and I need to use the rmCorr package to calculate repeated measures correlations (https://cran.r-project.org/web/packages/rmcorr/rmcorr.pdf). I have 5 participants and 12 variables. rmCorr outputs a list with important values such as r, p, df and CI. rmCorr does not support p-value adjustment so I will need to extract the p values from the list output by rmCorr to calculate the false discovery rate. The syntax for rmCorr is as follows:
rmcorr(subjectID, variable1, variable2, mydata)
I'm hoping I can automate the correlations by feeding in the 12 variables (column names). I'd then need to extract the r value from each of the objects and reconstruct two matrices (one r value and one p value):
var1x2$p |
var1x3$p | var2x3$p |
var1x4$p | var2x4$p | var3x4$p | etc
I'd then run the P values into p.adjust and end up with an adjusted matrix:
var1x2$p.adjust |
var1x3$p.adjust | var2x3$p.adjust |
var1x4$p.adjust | var2x4$p.adjust | var3x4$p.adjust | etc
Is something like this possible? Sorry if the syntax is wrong, I am very new to r.
Sample data
structure(list(subjectID = c(1, 1, 1, 1, 1, 1), DSTSpeed = c(5.4225,
6.8532, 5.6649, 5.6137, 6.5338, 6.9774), DSTError = c(0.060606,
0.11111, 0.032258, 0.0625, 0.068966, 0.11538), CRTSpeed = c(0.46195,
0.5066, 0.53191, 0.48758, 0.50286, 0.47727), CRTError = c(0.017241,
0.034483, 0, 0, 0.033898, 0.016949), KSS = c(4L, 4L, 8L, 8L,
8L, 6L), SIQPhys = c(1.4, 2, 2.8, 3.6, 3.4, 2.2), SIQCog = c(4,
3.8, 4.2, 4.2, 5.2, 3.8), TotalSleep = c(7.66416666666667, 7.49611111111111,
7.28944444444444, 7.78611111111111, 7.46916666666667, 12.8872222222222
), SleepEfficiency = c(0.85775, 0.75881, 0.69097, 0.80629, 0.84559,
0.73939), ProportionSWS = c(0.063709, 0.31109, 0.2135, 0.2107,
0.46937, 0.2988), EDA = c(0.77086, 1.4112, 1.5735, 2.168, 1.0156,
1.7074), WakingEDA = c(0.031424, 0.020836, 0.022987, 0.022799,
0.020879, 0.28959), temp = c(34.904, 35.414, 35.056, 35.248,
35.39, 35.105), WakingTemp = c(35.999, 35.636, 35.749, 35.336,
35.66, NA)), row.names = c(NA, -6L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x7f8ba48128e0>)
We added a new function to the rmcorr package (>=0.5.0) for calculating correlations among all specified pairs. Here's an example of using p.adjust() with output from the new function:
install.packages(rmcorr)
require(rmcorr)
dist_rmc_mat <- rmcorr_mat(participant = Subject,
variables = c("Blindwalk Away",
"Blindwalk Toward",
"Triangulated BW",
"Verbal",
"Visual matching"),
dataset = twedt_dist_measures,
CI.level = 0.95)
#Third component: Summary
dist_rmc_mat$summary
#p-values only
dist_rmc_mat$summary$p.vals
#Vector of original, unadjusted p-values for all 10 comparisons
p.vals <- dist_rmc_mat$summary$p.vals
p.vals.bonferroni <- p.adjust(p.vals,
method = ("bonferroni"),
n = length(p.vals))
p.vals.fdr <- p.adjust(p.vals,
method = ("fdr"),
n = length(p.vals))
#All p-values together
all.pvals <- cbind(p.vals, p.vals.bonferroni, p.vals.fdr)
colnames(all.pvals) <- c("Unadjusted", "Bonferroni", "fdr")
round(all.pvals, digits = 5)
I am having two sets of data values are arranged in different bins I want to compare two data sets mean accross bins of dataset1 and dataset2 visualize in line plot or any method to visualize I am new to this kind of analysis any suggestion will be very helpful
dataset1 and dataset2 actual data bin size is different bin1-bin200 on both datasets and the number of data is varing(300-200) so metioned below sample dataset I wanted use bootstrap method take random data example 100 from both datasets and take mean accross all bins dataset1 and 2 why I am doing boostrap in both data bin size is similar but datas are varing may infer in taking mean and also presence of outlier extream low and high values accross the bins may alter the result so I wanted to use bootstrap method take random dataset take mean across all bins
any suggestions how can I do this in R I am newbie to R and I am in learning phase please help me
dataset1=structure(list(genenames = c("data1", "data2", "data3", "data4", "data5", "data6"),
bin1 = c(0,20,9,0,2,0),
bin2 = c(5,20,8,30,10,0),
bin3 = c(0,0,1,1,3,0),
bin4 =c(6, 20, 10, 5, 0, 1),
bin5 =c(10,15,30,10,9, 4)),
class = "data.frame", row.names = c(NA, -6L))
dataset2=structure(list(genenames = c("data10", "data11", "data12", "data13", "data14", "data15"),
bin1 = c(0,30,0,0,20,0),
bin2 = c(0,0,8,10,20,0),
bin3 = c(0,10,19,15,3,10),
bin4 =c(30, 0, 0, 25, 0, 20),
bin5 =c(0,5,0,20,30, 29)),
class = "data.frame", row.names = c(NA, -6L))
dataset1_mean=colMeans(dataset1[,-1])
dataset2_mean=colMeans(dataset2[,-1])
any statisticl method to remove this outlier or any problem to use bootstrap method please mention
Thank you
Here is one way: After some data wrangling you could use boxplot and mark the mean with a red point:
library(dplyr)
library(ggplot2)
library(tidyr)
dataset1 <- dataset1 %>%
mutate(df = "df1")
dataset2 <- dataset2 %>%
mutate(df = "df2")
bind_rows(dataset1, dataset2) %>%
pivot_longer(
cols = starts_with("bin"),
names_to = "name",
values_to = "value"
) %>%
ggplot(aes(df, value))+
geom_boxplot() +
stat_summary(fun=mean,
geom="point",
shape=20,
size=4,
color="red",
position = position_dodge2 (width = 0.7, preserve = "single"))
I'm trying to create a stacked bar graph showing body composition. I have a table/data set (I don't know the correct term) that looks like this:
structure(list(data.Date = structure(1:7, .Label = c("2021-03-06",
"2021-03-07", "2021-03-08", "2021-03-09", "2021-03-10", "2021-03-11",
"2021-03-12"), class = "factor"), total_bf = c(19.6612, 18.2182,
19.6803, 21.7047, 18.126, 19.7, 19.1424), total_muscle = c(41.5948,
43.043, 42.1578, 42.1866, 43.4017, 42.2, 42.2728), other = c(37.544,
38.8388, 38.0619, 38.0087, 39.1723, 38.1, 38.2848)), class = "data.frame", row.names = c(NA,
-7L))
Each column is a weight in kilograms. Together they add up to the total body weight of the subject. What I want is a stacked bar graph where each bar represents a date and each bar is split by total_bf, total_muscle and other. All of the guides and Q&As I've seen don't seem to apply to my situation. Maybe this is because I am new but nothing I've tried has worked yet.
An example of what I'm trying to achieve:
The only difference is that on my graph blue would be body fat (total_bf), green would be other and red would be muscle (total_muscle).
You can convert data from the wide format to the long format using tidyr::pivot_longer() function:
library(ggplot2)
df <- structure(list(
data.Date = structure(
1:7,
.Label = c("2021-03-06", "2021-03-07", "2021-03-08", "2021-03-09",
"2021-03-10", "2021-03-11", "2021-03-12"), class = "factor"),
total_bf = c(19.6612, 18.2182, 19.6803, 21.7047, 18.126, 19.7, 19.1424),
total_muscle = c(41.5948, 43.043, 42.1578, 42.1866, 43.4017, 42.2, 42.2728),
other = c(37.544, 38.8388, 38.0619, 38.0087, 39.1723, 38.1, 38.2848)
), class = "data.frame", row.names = c(NA, -7L))
long <- tidyr::pivot_longer(df, -data.Date)
Then using ggplot2, the defaults already make a stacked bar chart, so you just need to specify x, y and fill aesthetics.
ggplot(long, aes(data.Date, value, fill = name)) +
geom_col()
Since your date is encoded as a factor, if you want to encode it as a real date you can convert it as follows:
long$date <- as.Date(strptime(as.character(long$data.Date), format = "%Y-%m-%d"))
ggplot(long, aes(date, value, fill = name)) +
geom_col()
Created on 2021-03-12 by the reprex package (v0.3.0)
I have a mosaic plot that looks like
but I need to show the proportions of Countries relative to roles, i.e. flip the chart. Is it possible to do without transposing the table?
thanks.
You can play with the argument split determining the split order of the variables and dir for the split direction (horizontal vs. vertical). For example, both of these split in Roles first and then show the conditional proportions of Countries given Roles (either horizontally or vertically):
tab <- structure(c(12, 14, 23, 12, 26, 13), .Dim = c(3L, 2L),
.Dimnames = structure(list(
Countries = c("American", "European", "Japanese"),
Roles = c("student", "staff")),
.Names = c("Countries", "Roles")), class = "table")
mosaicplot(tab, sort = 2:1, dir = c("h", "v"))
mosaicplot(tab, sort = 2:1, dir = c("v", "h"))
Note that the mosaic() function in package vcd also comes with a formula-based interface and more display options.
I have a zoo object that looks like this:
z <- structure(c(6, 11, 3.6, 8.4, 8.9, 0, NA, 0.5, 7, NA, 9, NA),
.Dim = c(6L, 2L), .Dimnames = list(NULL, c("2234", "2234.1")), index = structure(c(-17746, -17745, -17744, -17743, -17742, -17741), class = "Date"),
class = "zoo")
I tried to use lattice to plot both columns at the same time in 2 different panels:
xyplot(z)
This gives me the same x axis for both panels but different ylim. I want them to have the same ylim so I tried xyplot(z, ylim=range(z[,1])) it didn't do anything, so after reading "Plot zoo Series with Lattice" I tried trellis.focus("panel", 2,1,ylim=range(z[,1])) also without any luck...
This is probably an easy thing to do but I am finding the lattice package very hard to use (at least to start with). Can anyone help?
Thanks!
Try xyplot(z, ylim=range(z, na.rm=TRUE)).
There are two things:
na.rm=TRUE cause range to work properly
range(z) instead of range(z[,1]) let you handling range of all data, not just one column.
require(lattice)
require(zoo)
z <- zoo(cbind(a=1:4,b=11:14), Sys.Date()+(1:4)*10)
xyplot(z, ylim=range(z, na.rm=TRUE))
Note: R version 2.13.0, zoo_1.6-5, lattice_0.19-26
xyplot.zoo accepts most xyplot arguments so:
xyplot(z, scales = list(y = list(relation = "same")))
or this variation:
xyplot(z, scales = list(y = list(relation = "same", alternating = FALSE)))