Avoiding for loop, Naming Example - r

I would like to avoid using for loop in following example. Goal is to repeat string vector multiple times with different second part which changes each repetition. Is that possible?
str2D = mtcars
Vector = c(10,20)
Dimen = dim( str2D )
nn = c()
for ( i in Dimen[2]*(1:length(Vector)) ){
nn[ (i+1-Dimen[2]): i ] = rep(paste("|d",Vector[i/Dimen[2]],sep=""), Dimen[2] )
}
Name = paste( rep(names(str2D) , length(Vector) ),nn,sep="")
Correct result for "Name" vector is following:
"mpg|d10" "cyl|d10" "disp|d10" "hp|d10" "drat|d10" "wt|d10" "qsec|d10" "vs|d10" "am|d10" "gear|d10" "carb|d10" "mpg|d20" "cyl|d20" "disp|d20" "hp|d20" "drat|d20" "wt|d20" "qsec|d20" "vs|d20" "am|d20" "gear|d20" "carb|d20"
Thank you

I don't quite understand the end goal here but at least this achieves your desired output without a loop:
Name <- paste0(paste(names(mtcars)), "|d", rep(1:2, each = length(names(mtcars))), "0")
> Name
[1] "mpg|d10" "cyl|d10" "disp|d10" "hp|d10" "drat|d10" "wt|d10" "qsec|d10"
[8] "vs|d10" "am|d10" "gear|d10" "carb|d10" "mpg|d20" "cyl|d20" "disp|d20"
[15] "hp|d20" "drat|d20" "wt|d20" "qsec|d20" "vs|d20" "am|d20" "gear|d20"
[22] "carb|d20"

Related

Unable to do homog.test in a loop

I have a problem looping through columns with this command. When printing "i", the variable name appears, but it does not substitute it in the formula. The error suggested that I can't use a variable. Any suggestions?
for (i in colnames(NMDStrokeHx)[3:14]){
print(i)
print(homog.test(i ~ AM25, data = NMDStrokeHx, method = "Levene"))
}
output:
[1] "ANCOWATO"
Error in homog.test(i ~ AM25, data = NMDStrokeHx, method = "Levene") :
The name of response variable does not match the variable names in the data.
these are the column names of the data:
> colnames(NMDStrokeHx)[3:14]
[1] "ANCOWATO" "ANMSETOT" "ANAFTOT" "ANBNTTOT" "ANDELCOR" "ANWM2TOT" "ANFULVR1" "ANVRTCOR" "ANTMASEC"
[10] "ANTMBSEC" "ANSDMTOT" "ADCDRSTG"
You can use reformulate/as.formula to create a formula object.
for (i in colnames(NMDStrokeHx)[3:14]){
print(i)
print(homog.test(reformulate('AM25', i), data = NMDStrokeHx,method = "Levene"))
}

How to coerce stslist.freq to dataframe

I am doing some describtive sequence analysis using the "TraMineR" library. I want to report my findings via R-Markdown in html format. For formating tables I use "kable" and "kableExtra".
To get the frequency and propotions of the most common sequences I use seqtab(). The result is an stslist.freq object. When I try to coerce it to a dataframe, the dataframe is not containing any frequencies and proportions.
I tried to print the results of seqtab() and store this result again. This gives me the dataframe I desire. However there are two "problems" with that: (1) I don't understand what is happening here and it seems like a "dirty" trick, (2) as a result I also get the output of the print command in my final html document if I don't split the code in multiple chunks and disable the ouput in the specific chunk.
Here is some code to replicate the problem:
library("TraMineR")
#Data creation
data.long <- data.frame(
id=rep(1:50, each=4),
time = c(0,1,2,3),
status = sample(letters[1:2], 200, replace = TRUE),
weight=rep(runif(50, 0, 1), each=4)
)
#reshape
data.wide <- reshape(data.long, v.names = "status", idvar="id", direction="wide", timevar="time")
#sequence
sequence <- seqdef(data.wide,
var=c("status.0", "status.1", "status.2", "status.3"),
weights=data.wide$weight)
#frequencies of sequences
##doesn't work:
seqtab.df1 <- as.data.frame(seqtab(sequence))
##works:
seqtab.df2 <- print(seqtab(sequence))
I expect the dataframe to be the same as the one saved in seqtab.df2, however either without using the print command or with "silently" (no output printed) using the print command.
Thank you very much for your help and let me know if I forgot something to make answering the question possible!
If you look at the class() of the object returned by seqtab, it has the type
class(seqtab(sequence))
# [1] "stslist.freq" "stslist" "data.frame"
so if we look at exactly, what's happening in the print statement for such an object we can get a clue what's going on
TraMineR:::print.stslist.freq
# function (x, digits = 2, width = 1, ...)
# {
# table <- attr(x, "freq")
# print(table, digits = digits, width = width, ...)
# }
# <bytecode: 0x0000000003e831f8>
# <environment: namespace:TraMineR>
We see that what it's really giving you is the "freq" attribute. You can extract this directly and skip the print()
attr(seqtab(sequence), "freq")
# Freq Percent
# a/3-b/1 4.283261 20.130845
# b/1-a/1-b/2 2.773341 13.034390
# a/2-b/1-a/1 2.141982 10.067073
# a/1-b/1-a/1-b/1 1.880359 8.837476
# a/1-b/2-a/1 1.723489 8.100203
# b/1-a/2-b/1 1.418302 6.665861
# b/2-a/1-b/1 1.365099 6.415813
# a/1-b/3 1.241644 5.835586
# a/1-b/1-a/2 1.164434 5.472710
# a/2-b/2 1.092656 5.135360

order strings according to some characters

I have a vector of strings, each of those has a number inside and I like to sort this vector according to this number.
MWE:
> str = paste0('N', sample(c(1,2,5,10,11,20), 6, replace = FALSE), 'someotherstring')
> str
[1] "N11someotherstring" "N5someotherstring" "N2someotherstring" "N20someotherstring" "N10someotherstring" "N1someotherstring"
> sort(str)
[1] "N10someotherstring" "N11someotherstring" "N1someotherstring" "N20someotherstring" "N2someotherstring" "N5someotherstring"
while I'd like to have
[1] "N1someotherstring" "N2someotherstring" "N5someotherstring" "N10someotherstring" "N11someotherstring" "N20someotherstring"
I have thought of using something like:
num = sapply(strsplit(str, split = NULL), function(s) {
as.numeric(paste0(head(s, -15)[-1], collapse = ""))
})
str = str[sort(num, index.return=TRUE)$ix]
but I guess there might be something simpler
There is an easy way to do this via gtools package,
gtools::mixedsort(str)
#[1] "N1someotherstring" "N2someotherstring" "N5someotherstring" "N10someotherstring" "N11someotherstring" "N20someotherstring"

How to access data saved in an assign construct?

I made a list, read the list into a for loop, do some calculations with it and export a modified dataframe to [1] "IAEA_C2_NoStdConditionResiduals1" [2] "IAEA_C2_EAstdResiduals2" ect. When I do View(IAEA_C2_NoStdConditionResiduals1) after the for loop then I get the following error message in the console: Error in print(IAEA_C2_NoStdConditionResiduals1) : object 'IAEA_C2_NoStdConditionResiduals1' not found, but I know it is there because RStudio tells me in its Environment view. So the question is: How can I access the saved data (in this assign construct) for further usage?
ResidualList = list(IAEA_C2_NoStdCondition = IAEA_C2_NoStdCondition,
IAEA_C2_EAstd = IAEA_C2_EAstd,
IAEA_C2_STstd = IAEA_C2_STstd,
IAEA_C2_Bothstd = IAEA_C2_Bothstd,
TIRI_I_NoStdCondition = TIRI_I_NoStdCondition,
TIRI_I_EAstd = TIRI_I_EAstd,
TIRI_I_STstd = TIRI_I_STstd,
TIRI_I_Bothstd = TIRI_I_Bothstd
)
C = 8
for(j in 1:C) {
#convert list Variable to string for later usage as Variable Name as unique identifier!!
SubNameString = names(ResidualList)[j]
SubNameString = paste0(SubNameString, "Residuals")
#print(SubNameString)
LoopVar = ResidualList[[j]]
LoopVar[ ,"F_corrected_normed"] = round(LoopVar[ ,"F_corrected_normed"] / mean(LoopVar[ ,"F_corrected_normed"]),
digit = 5
)
LoopVar[ ,"F_corrected_normed_error"] = round(LoopVar[ ,"F_corrected_normed_error"] / mean(LoopVar[ ,"F_corrected_normed_error"]),
digit = 5
)
assign(paste(SubNameString, j), LoopVar)
}
View(IAEA_C2_NoStdConditionResiduals1)
Not really a problem with assign and more with behavior of the paste function. This will build a variable name with a space in it:
assign(paste(SubNameString, j), LoopVar)
#simple example
> assign(paste("v", 1), "test")
> `v 1`
[1] "test"
,,,, so you need to get its value by putting backticks around its name so the space is not misinterpreted as a parse-able delimiter. See what happens when you type:
`IAEA_C2_NoStdCondition 1`
... and from here forward, use paste0 to avoid this problem.

How to remove outliers from a list of vectors?

I have this list of vectors :
tdatm.sp=structure(list(X3CO = c(24.88993835, 25.02366257, 24.90308762
), X3CS = c(25.70629883, 25.26747704, 25.1953907), X3CD = c(26.95723343,
26.84725571, 26.2314415), X3CSD = c(36.95250702, 36.040905, 36.90475845
), X5CO = c(25.44123077, 24.97585869, 24.86075592), X5CS = c(25.71570396,
26.10244179, 25.39032555), X5CD = c(27.67508507, 27.18985558,
26.93682098), X5CSD = c(36.26528549, 34.88553238, 33.97910309
), X7CO = c(24.7142601, 24.08443642, 23.97057915), X7CS = c(24.55734444,
24.56562042, 24.7589817), X7CD = c(27.14260101, 26.65704346,
26.49533081), X7CSD = c(33.89881897, 32.91091919, 32.79199219
), X9CO = c(26.86141014, 26.42648888, 25.8350563), X9CS = c(28.17367744,
27.27400589, 26.58813667), X9CD = c(28.88915062, 28.32597542,
28.2713623), X9CSD = c(34.61352158, 35.84189987, 35.80329132)), .Names = c("X3CO",
"X3CS", "X3CD", "X3CSD", "X5CO", "X5CS", "X5CD", "X5CSD", "X7CO",
"X7CS", "X7CD", "X7CSD", "X9CO", "X9CS", "X9CD", "X9CSD"))
> head(tdatm.sp)
$X3CO
[1] 24.88994 25.02366 24.90309
$X3CS
[1] 25.70630 25.26748 25.19539
$X3CD
[1] 26.95723 26.84726 26.23144
$X3CSD
[1] 36.95251 36.04091 36.90476
$X5CO
[1] 25.44123 24.97586 24.86076
$X5CS
[1] 25.71570 26.10244 25.39033
I would like to remove outliers from each individual vector using the Hampel method.
One way I found to do it is :
repoutliers=function(x){ med=median(x); mad=mad(x); x[x>med+3*mad | x<med-3*mad]=NA; return(x)}
lapply(tdatm.sp, repoutliers)
But I was wondering if it was possible to do it without declaring a new function, directly within lapply. lapply sends each individual vector to the function repoutliers, do you know how to operate on this individual vectors directly within lapply ? Let's say I swap repoutliers with the function "replace", I could do the same word by calling the individual vectors in the arguments of replace (lapply(X,FUN,...); ... = replace arguments).
In brief : how to manipulate individual vectors lapply sends to the function winthin lapply ?
It's really more or less a tomato tomahtoe thing. Doing it all in lapply doesn't get you very far.
lapply( tdatm.sp, function(x){
med=median(x)
mad=mad(x)
x[x>med+3*mad | x<med-3*mad]=NA
return(x)} )
Now lapply is just sending everything to an anonymous function. But if you didn't want the function hanging around afterwards this is handy syntax.

Resources