Dealing with ties in agricolae Kruskal test, R - r

I am running a kruskal.test on some non-normal data with the agricolae package. Some groups have exactly the same value as each other. The kruskal test doesn't handle this well, I receive the error Error in if (s) { : missing value where TRUE/FALSE needed. At first, I thought this was because all the values were 0, but when I make them all the same large number (to test), the same error appears and the function will stop (running function through a loop) and doesn't evaluate anything beyond the first tied variable.
Obviously there is no point running stats on these groups as there will be no difference, but I am using the information generated by agricolae:kruskal to produce a summary table and I need these variables included. I would prefer to keep using this package as it gives me a lot of valuable information. Is there anything I can do to make it run through the tied variables?
dput(example)
structure(list(TREATMENT = c("A", "A", "A", "B", "B", "C", "C",
"C", "D", "D"), W = c(0, 1.6941524646937, 1.524431531984, 0.959282869723864,
1.45273122733115, 0, 1.57479386520925, 0.421759202661462, 1.34235435984449,
1.52131484305823), X = c(0, 0.663872820198758, 0.202935807030853,
0.836223346381214, 0.750767193777965, 1.18128574225979, 2.03622986392828,
3.56466682539425, 0.919751117364462, 0.917347336682722), Y = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Z = c(2.1477548118197, 2.0111754022729,
3.14642815196242, 4.46967452127494, 1.53715421615569, 2.36274861406182,
2.33262528044302, 2.50970456594739, 2.96088598025103, 2.22841740590261
)), class = "data.frame", row.names = c(NA, 10L), .Names = c("TREATMENT",
"W", "X", "Y", "Z"))
library(agricolae)
example<-as.data.frame(example)
for(i in 2:(ncol(example))){
krusk <- kruskal(example[,i],TREATMENT,group=TRUE)
print(krusk)
}

for(i in 2:(ncol(example))){
if(var(example[,i]) > 0){
krusk <- kruskal(example[,i],example$TREATMENT,group=TRUE)
print(krusk)
}
}

Related

Bayesian Network Meta-Analysis (gemtc) - Specifying the order of comparisons

I'm working on a Bayesian Network Meta-Analysis using the gemtc package on a dataset similar to the following:
df <- data.frame(study = c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E", "F", "F",
"G", "G", "H", "H", "I", "I", "J", "J", "K", "K"),
treatment = c("A", "B", "B", "C", "B", "C", "A", "B", "B",
"C", "B", "C", "A", "B", "B", "C", "B", "C",
"A", "C", "B", "C"),
responders = c(1, 5, 0, 0, 3, 1, 0, 2, 0, 2, 0, 2, 0,
0, 1, 2, 0, 0, 2, 9, 1, 1),
sampleSize = c(32, 33, 30, 30, 18, 20, 15, 15, 20,
20, 30, 30, 36, 32, 15, 15, 23, 22, 24, 23, 18, 16))
While I have been able to set up the network model and run the analysis just fine, I have been struggling with specifying the order in which I would like the treatments to be compared in the node-splitting consistency analysis. For example, I want the odds ratios and 95% credible intervals to be calculated using the "B" treatment as the reference group when comparing "B" with "A" and "C" as the reference group when comparing "A" with "C" and "B" with "C". Below is the code I have tried:
library(gemtc)
library(rjags)
# Create mtc.network element to be used in modeling ------
network <- mtc.network(data.ab = df)
# Compile model ------
network.mod <- mtc.model(network,
linearModel = "random", # random effects model
n.chain = 4) # 4 Markov chains
# Assess network consistency using nodesplit method ------
nodesplit <- mtc.nodesplit(network.mod,
linearModel = "random", # random effects model
n.adapt = 5000, # burn-in iterations
n.iter = 100000, # actual simulation iterations
thin = 10) # extract values of every 10th iteration
summary(nodesplit) # High p-values indicate consistent results
plot(summary(nodesplit))
My results provide ORs (95% CrIs) for:
"A" vs. "C"
"B" vs. "C"
"B" vs. "A"
I have created a separate data frame specifying that I want "A" vs. "B" comparisons via:
# Specify desired comparisons ------
comparisons = data.frame(t1 = "A", t2 = "B")
# Assess network consistency using nodesplit method, adding comparisons argument ------
nodesplit <- mtc.nodesplit(network.mod,
comparisons = comparisons,
linearModel = "random", # random effects model
n.adapt = 5000, # burn-in iterations
n.iter = 100000, # actual simulation iterations
thin = 10) # extract values of every 10th iteration
summary(nodesplit) # High p-values indicate consistent results
But I still get "B" vs. "A" results. I have also tried specify t1="B", t2="A", and I get the same results. Any assistance with this would be greatly appreciated. Thanks in advance.

overlay the histograms by using the fill parameter

I would like to create a graphic to show how often each type of event is responsible for reducing each specie.
In total I have 9 species and 8 events. I would like to fix the events like different bars groups (fill) and the species on the x-axis like in the picture below.
I created the following script but I get this error message
Error: StatBin requires a continuous x variable the x variable is discrete. Perhaps you want stat="count"?
Would anyone have any suggestions on how to do a correct script?
Thank you very much in advance
library(ggplot2)
event <- factor(Dataset, levels = c("A", "B", "C", "D", "E", "F", "G", "H"))
ggplot(Dataset) +
geom_histogram(aes(x=specie, fill=event),
colour="grey50", alpha=0.5, position="identity")
data
Dataset <- structure(list(specie = structure(1:9, .Label = c("Hipp_amph",
"Hipp_eq", "Phil_mont", "Pota_larv", "Red_aru", "Sylv_grim",
"Sync_caf", "Trag_oryx", "Trag_scri"), class = "factor"), A = c(2.97029703,
0, 13.86138614, 12.87128713, 0, 17.82178218, 2.97029703, 0, 0.99009901
), B = c(0, 7.920792079, 55.44554455, 51.48514851, 33.66336634,
27.72277228, 33.66336634, 15.84158416, 62.37623762), C = c(0,
5.940594059, 0.99009901, 8.910891089, 2.97029703, 0, 10.89108911,
4.95049505, 21.78217822), D = c(0, 0, 0, 0.99009901, 0, 0, 0,
0, 0), E = c(16.83168317, 28.71287129, 74.25742574, 100, 40.59405941,
32.67326733, 89.10891089, 27.72277228, 86.13861386), F = c(6.930693069,
0, 10.89108911, 42.57425743, 0, 0, 7.920792079, 0, 2.97029703
), G = c(0, 0, 0, 0.99009901, 0, 0, 0, 0, 0), H = c(0, 4.95049505,
1.98019802, 1.98019802, 15.84158416, 0, 19.8019802, 0, 1.98019802
)), .Names = c("specie", "A", "B", "C", "D", "E", "F", "G", "H"
), class = "data.frame", row.names = c(NA, -9L))
The problem is indeed that you are trying to pass a factor/character variable to x axis which in this case needs numeric values.
You could try the below with your dataframe and make a trellis with specie; either this or you sacrifice filling the bars with event (A, B, etc.), and put specie in fill.
Moreover, what is needed in the first place is to gather the data in a long format in order to be able to pass it to aes.
library(tidyverse)
Dataset <- Dataset %>% gather(event, value, 2:9)
ggplot(Dataset) +
geom_histogram(aes(x=value, fill=event), colour="grey50", alpha=0.5, position="identity") +
facet_wrap(~ specie)

rgl segments3d to connect 3d scatter points in order to plot a skeleton

I am working with motion capture data, and wish to plot two skeletons in 3D (motion capture data obtained from two different systems).
I have managed to plot and label the joints, but I canĀ“t figure out how to connect the joints with lines.
A short explanation to the abreviations used in the sample dataset below:
RA and LA (Right and Left Ankle)
RK and LK (Right and Left Knee)
RH and LH (Right and Left Hip)
CG (Center of Gravity)
Simplified data set:
df <- data.frame(
Joint = c("LA", "RA", "LK", "RK", "LH", "RH", "CG", "LA", "RA", "LK", "RK", "LH", "RH", "CG"),
system = c("A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", "B"),
x = c(0, 10, 0, 10, 0, 10, 5, 0, 10, 0, 10, 0, 10, 5),
y = c(0,0,0,0,0,0,0, 20,20,20,20,20,20,20),
z = c(0, 0, 20, 20, 40, 40, 50, 0, 0, 20, 20, 40, 40, 50))
My code so far to plot and label the joints from the two systems:
library(rgl)
with(df, plot3d(x, y, z, type="s", col = as.numeric(system)))
with(df, text3d(x, y, z, text = Joint, adj = 2))
Can you help me connect the joints?
Use the segments3d function to draw line segments. It takes the usual
x, y, z coordinates, and joins pairs of points. So you'll need to work out which joints are joined, and plot segments between those joints.
If the joints are always in the order you gave, it would go something like this:
segs <- c(1, 3, 2, 4, 3, 5, 4, 6, 5, 7, 6, 7)
segments3d(df[segs, 3:5])
(This just does the system A segments.)
Edited to add: In response to the first comment: You will need to tell R that ankles connect to knees, etc, but you can do that. For example:
segs <- c()
for (s in unique(df$system)) {
seg <- with(df, c(which(system == s & Joint == "LA"),
which(system == s & Joint == "LK"))
if (length(seg) == 2)
segs <- c(segs, seg)
seg <- with(df, c(which(system == s & Joint == "LK"),
which(system == s & Joint == "CG"))
if (length(seg) == 2)
segs <- c(segs, seg)
# etc for the other side
}
segments3d(df[segs, 3:5])
This could all be compressed if you have the connections arranged in an R object somehow. I'll leave that to you to work out.

Chart with built in groupby and secondary Y %s in r

Thanks for this wonderful community and expert responses. This is my first question on stackoverflow. I did research but couldn't find what I am trying to do.
How to write an efficient code in r that will create a chart with secondary Y and also does the groupby for total counts based on a certain variable? I want groupby to operate within the code rather than having to create a separate dataframe for every variable that I want to plot on X.
I have thousands of rows and hundreds of columns in an r dataframe. My sample data looks like this. (20 x 5)
tv = c(0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0)
pr1 =c("AA", "AB", "ZH", "AA", "ZA", "AB", "ZA", "ZA", "AA", "AA", "ZA", "AA", "ZG", "AA", "ZF", "AB", "AA", "AB", "AA", "AA")
pr2 =c("B", "F", "F", "J", "E", "E", "J", "B", "J", "F", "B", "B", "J", "B", "F", "J", "B", "F", "B", "E")
pr3 =c(13, 13, 25, 13, 13, 13, 13, 1, 13, 13, 13, 13, 25, 13, 25, 1, 13, 13, 13, 13)
sample_data = data.frame("SN"= c(1:20),"Target_Vbl"=tv,Predictor_1=pr1,Predictor_2=pr2,Predictor_3=pr3)
From this sample data, I can create the chart I am looking for in excel but am lost when it comes to plotting it in r. I want to re-use the code for any other predictor variable but my Y axes will always remain the same i.e. primary Y is total count of Target_Vbl and secondary Y is % of one's for a given category of Predictor variable plotted on X axis.
The chart should look like below...currently plotted for Predictor_1(drawn in excel)
Edit - After trying the plotrix
Continuing with the sample_data I created a summary data to utilize the plotrix package. (Thanks lawyeR) The twoord.plot takes me closer to what I am looking for however there are few discrepancies as below -
1. am not getting the bars for the tc (total count of predictor_1) for left Y axis...I did try mentioning the "bar" in "type" option but it did not work.
2. The X axis labels don't show the values from the data but defaults to numbers. It should show "AA", "AB", "ZA" etc...and not 1,2,3...
3. Is there a way to make the overall process more concise. I feel my code is crude at best. Any pointers would be helpful.
library(sqldf)
smry = sqldf("Select Predictor_1, count(Target_Vbl) as tc, sum(Target_Vbl)
as conv from sample_data Group by Predictor_1")
smry$ratio = round((smry$conv/smry$tc),2)
library(plotrix)
twoord.plot(smry$Predictor_1, smry$tc,
smry$Predictor_1, smry$ratio,
type= c("l", "l"), lcol=3,rcol=4,do.first="plot_bg(\"gray\")")
The graph now looks like this -
output of twoord.plot

Conditional displaying values in R

I'd like to see which values have a particular entry issue, but I'm not getting things done right.
For instance, I need to print on screen values from column "c" but conditional of a given value from "b" say where [b==0].
Finally, I need to add a new string for those whose condition is true.
df<- structure(list(a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7,
11.43, 11.41, 10.48512, 11.19), b = c(2, 3, 2, 0, 0, 0, 1, 2,
4, 0), c = c("q", "c", "v", "f", "", "e", "e", "v", "a", "c")), .Names = c("a",
"b", "c"), row.names = c(NA, -10L), class = "data.frame")
I tried this without success:
if(df[b]==0){
print(df$c)
}
if((df[b]==0)&(df[c]=="v")){
df[c] <-paste("2")
}
Thanks for helping.
The correct syntax is like df[rows, columns], so you could try:
df[df$b==0, "c"]
You can accomplish changing values using ifelse:
df$c <- ifelse(df$b==0 & df$c=="v", paste(df$c, 2, sep=""), df$c)
Does this help?
rows <- which(df$b==0)
if (length(rows)>0) {
print(df$c[rows])
df$c[rows] <- paste(df$c[rows],'2')
## maybe you wanted to have:
# df$c[rows] <- '2'
}
There are several ways to subset data in R, like e.g.:
df$c[df$b == 0]
df[df$b == 0, "c"]
subset(df, b == 0, c)
with(df, c[b == 0])
# ...
To conditionally add another column (here: TRUE/FALSE):
df$e <- FALSE; df$e[df$b == 0] <- TRUE
df <- transform(df, c = ifelse(b == 0, TRUE, FALSE))
df <- within(df, e <- ifelse(b == 0, TRUE, FALSE))
# ...

Resources