CrossTable and Loop do not like each other - r

I am trying to get a set of cross tables with 70 variables. But no matter what I did, R kept generating the "function" back to me. I tried to move substitute after CrossTable but R seemed to have trouble using list(i=as.name(x)).
library(gmodel)
Independent_List <- colnames(Comorbidity)[1:70]
Comorbidity_Table <- lapply(Independent_List, function(x) {
substitute(CrossTable(i ,
Comorbidity$sleep,
prop.c = TRUE,
prop.r = FALSE,
prop.t = FALSE,
prop.chisq = FALSE,
data =Comorbidity),
list(i=as.name(x)))
})
lapply(Comorbidity_Table, summary)
[[1]]
Length Class Mode
8 call call
[[2]]
Length Class Mode
8 call call
[[3]]
Length Class Mode
8 call call
The goal is to try to make a table with specific cell numbers and column percentage and merge with my looped glm results.

I ended up using a much simpler method to solve this problem:
Tables <- lapply(Table_Data[, 1:11], function(x){table(x, Table_Data$TSD,exclude = NA)})
Prop_Tabs <- lapply(Tables[1:11], function(x){prop.table(x,2)})

Related

Using an R function to hash values produces a repeating value across rows

I'm using the following query:
let
Source = {1..5},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), {"Numbers"}, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Letters", each Character.FromNumber([Numbers] + 64)),
#"Run R script" = R.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)library(""digest"")#(lf)#(lf)dataset$SuffixedLetters <- paste(dataset$Letters, ""_suffix"")#(lf)dataset$HashedLetters <- digest(dataset$Letters, ""md5"", serialize = TRUE)#(lf)output<-dataset",[dataset=#"Added Custom"]),
output = #"Run R script"{[Name="output"]}[Value]
in
output
which leads to the resulting table:
And the here is the R script with better formatting:
# 'dataset' holds the input data for this script
library("digest")
dataset$SuffixedLetters <- paste(dataset$Letters, "_suffix")
dataset$HashedLetters <- digest(dataset$Letters, "md5", serialize = TRUE)
output<-dataset
The 'paste' function appears to iterate over rows and resolve on each row with the new input. But the 'digest' function only appears to return the first value in the table across all rows.
I don't know why the behavior of the two functions would seem to operate differently. Can anyone advise how to get the 'HashedLetters' column to resolve using the values from each row instead of just the initial one?
Use:
dataset$HashedLetters <- sapply(dataset$Letters, digest, algo = "md5", serialize = TRUE)
digest works on a whole object at a time, not individual elements of a vector.
vec <- letters[1:3]
digest::digest(vec, algo="md5", serialize=TRUE)
# [1] "38ce1fe9e19a222505e693e8bdd8aeec"
sapply(vec, digest::digest, algo="md5", serialize=TRUE)
# a b c
# "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1"

Error in discretizeDF.supervised(formula, data, method = disc.method) :data needs to be a data.frame

I am using arulesCBA on dataset of words with class attribute which is polarity to be positive or negative. first, I am converting the words to numeric values by using as.numeric function. after that, I am discretizing the columns using this code:
trans.disc <- as.data.frame(lapply(df[2:75], function(x) discretize(x, categories=9)))
in this step, I have warnings that say: parameter categories is deprecated. Use breaks instead! Also, the default method is now frequency!the next step that I am applying is adding the polarity column :
trans.disc$polarity <- df$polarity
the last step, I am trying to build the classifier:
classifier <- CBA(trans.disc, "polarity", supp = 0.05, conf=0.9)
in this phase, there is an error message that says: (Error in discretizeDF.supervised(formula, data, method = disc.method) :data needs to be a data.frame).
It looks like you have the arguments for CBA moxed up. The man page ?CBA says:
CBA(
formula,
data,
pruning = "M1",
parameter = NULL,
control = NULL,
balanceSupport = FALSE,
disc.method = "mdlp",
verbose = FALSE,
...
)

Loop in R through variable names with values as endings and create new variables from the result

I have 24 variables called empl_1 -empl_24 (e.g. empl_2; empl_3..)
I would like to write a loop in R that takes this values 1-24 and puts them in the respective places so the corresponding variables are either called or created with i = 1-24. The sample below shows what I would like to have within the loop (e.g. ye1- ye24; ipw_atet_1 - ipw_atet_14 and so on.
ye1_ipw <- empl$empl_1[insample==1]
ipw_atet_1 <- treatweight(y=ye1_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_1
ipw_atet_1$se
ye2_ipw <- empl$empl_2[insample==1]
ipw_atet_2 <- treatweight(y=ye2_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_2
ipw_atet_2$se
ye3_ipw <- empl$empl_3[insample==1]
ipw_atet_3 <- treatweight(y=ye3_ipw, d=treat_ipw, x=x1_ipw, ATET =TRUE, trim=0.05, boot = 2)
ipw_atet_3
ipw_atet_3$se
coming from a Stata environment I tried
for (i in seq_anlong(empl_list)){
ye[i]_ipw <- empl$empl_[i][insample==1]
ipw_atet_[i]<-treatweight(y=ye[i]_ipw, d=treat_ipw, x=x1_ipw, ATET=TRUE, trim=0.05, boot =2
}
However this does not work at all. Do you have any idea how to approach this problem by writing a nice loop? Thank you so much for your help =)
You can try with lapply :
result <- lapply(empl[paste0('empl_', 1:24)], function(x)
treatweight(y = x[insample==1], d = treat_ipw,
x = x1_ipw, ATET = TRUE, trim = 0.05, boot = 2))
result would be a list output storing the data of all the 24 variables in same object which is easier to manage and process instead of having different vectors.

Shiny : R: wilcox.test : change "value" -Output

I am working on a tiny shiny app, that allows the user to upload some data and run some statistics on it. I now have two things I would like to change in the output.
However, as you can see, the data part is quite saying nothing, I rather have the output: data: Group1 and Group2
I also would like to have n as the number of subjects(here: 109,114,115).
This is the code to produce the current output:
statsPaired <- function(boolean_dt1, boolean_dt2){
for (i in 4: (length(boolean_dt1))){
print(wilcox.test( (unlist(boolean_dt1[,i, with = FALSE])), unlist(boolean_dt2[,i, with = FALSE]) , paired = T, exact= FALSE))
cat("\n")
}
}

Mann-Whitney-Wilcoxon test in R giving Error

I am trying to run a Mann-Whitney test across large data set. Here is an excerpt of my input:
GeneID GeneID-2 GeneName TSS-ID Locus-ID TAp73fTfTAAdEmp TAp73fTfTFAdEmp TAp73fTfTJAdEmp TAp73fTfTAAdCre TAp73fTfTFAdCre TAp73fTfTJAdCre
ENSMUSG00000028180 ENSMUSG00000028180 Zranb2 TSS1050,TSS17719,TSS52367,TSS53246,TSS72833,TSS73222 3:157534159-157548390 11.32013333 11.66344 11.87956667 13.01974667 14.70944667 10.94043867
ENSMUSG00000028184 ENSMUSG00000028184 Lphn2 TSS23298,TSS2403,TSS74519 3:148815585-148989316 15.0983 15.09572 14.03578667 17.00742667 17.90735333 14.69675333
ENSMUSG00000028187 ENSMUSG00000028187 Rpf1 TSS66485 3:146506347-146521423 12.34542667 14.11470667 10.493766 14.57954 11.93746667 11.07405867
ENSMUSG00000028189 ENSMUSG00000028189 Ctbs TSS36674,TSS72417 3:146450469-146465849 1.288003867 1.435658 1.959620667 1.427768 1.502116667 1.243928267
ENSMUSG00000020755 ENSMUSG00000020755 Sap30bp TSS14892,TSS218,TSS54781,TSS58430 11:115933281-115966725 31.91070667 31.68585333 26.86939333 39.05116667 30.62916667 27.22893333
ENSMUSG00000020752 ENSMUSG00000020752 Recql5 TSS26689,TSS42686,TSS60902,TSS75513,TSS9111 11:115892594-115933477 10.55415467 9.373216667 8.315984 7.255579333 7.022178 8.553787333
ENSMUSG00000020758 ENSMUSG00000020758 Itgb4 TSS23937,TSS28540,TSS29211,TSS34600,TSS36953,TSS4070,TSS6591,TSS68296 11:115974708-116008412 130.2124 117.3862 129.323 134.1108667 134.8743333 165.3330667
ENSMUSG00000069833 ENSMUSG00000069833 Ahnak TSS54612 19:8989283-9076919 116.3223333 135.2628 130.1286 147.045 142.8164 127.2352
ENSMUSG00000033863 ENSMUSG00000033863 Klf9 TSS87300 19:23141225-23166911 23.23418667 27.46006 26.56143333 21.09004667 18.47022 16.63767333
ENSMUSG00000069835 ENSMUSG00000069835 Sat2 TSS71535,TSS9615 11:69622023-69623870 0.975045133 0.886760067 1.593631333 1.469496 1.2373384 1.292182733
ENSMUSG00000028233 ENSMUSG00000028233 Tgs1 TSS24151,TSS28446,TSS50213,TSS68499,TSS79096 4:3574874-3616619 4.221024667 4.212087333 4.160574 5.113266667 6.917347333 5.22148
ENSMUSG00000028232 ENSMUSG00000028232 Tmem68 TSS12134,TSS25773,TSS25778,TSS49743,TSS7797 4:3549040-3574853 4.048868 3.906129333 6.024607333 4.613682 6.292972 4.287184
I wrote the same script for t-test and it worked. However the replacing test by "wilcox" is giving me the error:
Error in wilcox.test.default(x[i, 1:3], x[i, 4:6], var.equal = TRUE) :
'x' must be numeric
My code is:
library(preprocessCore)
err <-file("err.Rout", open="wt")
sink(err, type="message")
x <- read.table("Data.txt", row.names=1, header=TRUE, sep="\t", na.strings="NA")
x<-x[,5:ncol(x)]
p<-matrix(0,nrow(x),3)
for (i in 1:nrow(x)) {
myTest <- try(wilcox.test(x[i,1:3], x[i,4:6], var.equal=TRUE))
if (inherits(myTest, "try-error"))
{ p[i,2]=1 }
else
{p[i,2]=myTest$p.value; num=rowMeans(x[i,1:3], na.rm = FALSE); den=rowMeans(x[i,4:6], na.rm = FALSE); ratio=num/den; p[i,1]=ratio }
}
p[,3] = p.adjust(p[,2], method="none")
colnames(p) <- c("FoldChange", "p-value", "Adjusted-p")
write.table(p, file = "tmpPval-fold.txt", append = FALSE, quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
sink()
I'd appreciate your help in this matter. As i said, it worked perfectly if I use test instead of 'wilcox'.
There are (at least) two problems with your code at the moment, one of them is the cause of that error. The class of the object returned by x[i,1:3] is data.frame which is a list object and fails the is.numeric test inside wilcox.test. Try coercing:
wilcox.test(as.numeric(x[1,(1:3)]), as.numeric(x[1,(4:6)]), var.equal=TRUE)
But what-the-F is var.equal doing in a call to a non-parametric test that will not have any assumption of equal variance? (Actually it is getting ignored is what is happening.) And how do you expect to be getting useful information from a test when you're only giving 3 items compared to 3 items. That is never giving to be "significant" or even particularly informative. I doubt that a t.test could be informative when it is 3 vs 3 but a non-parametric test that is based on ordering of values is going to be even less likely to give a statistical signal of "significance".

Resources