Let's say I have the following loop:
test <- mtcars
target_var <- c("mpg", "wt")
group_var <- c("gear", "carb")
library(tidyverse)
for (i in target_var) {
for (j in group_var) {
print(prop.table(table(test[, i], test[, j]), 2))
}
}
When I run it, I see four prop.tables reported, each with a heading of [1] through [[4]]. I can visually match up which prop.table goes with which variables by looking at the code and seeing the order of variables assigned to target_var and group_var.
But let's say I'm looping through a dozen variables or more. Obviously, it's a problem to match up one particular prop.table (listed as, say, "[[14]]") with the actual variables in that table.
Is there a way to print the command R used to generate each particular table?
I am not looking for a progress bar but, instead, something that would list the code above each table printed in the console. For example, the following code would be printed right above the appropriate prop.table:
prop.table(table(test$mpg, test$gear), 2)
# actual prop.table result from the first time through the loop here
prop.table(table(test$mpg, test$carb), 2)
# actual prop.table result from the second time through the loop here
This would help me when doing exploratory data analysis.
If I understand well you want the results of your function to be printed "as you go" (i.e. at each iteration of the loop, not at the end)?
You can use cat instead of print (be sure to print a new line \n after you call cat to avoid a messy console):
for(i in 1:3) {
results <- mean(rnorm(10))
cat(results)
cat("\n")
}
Related
I made a function systemFail(x), where x is one of the columns in my data frame (df$location).
Now I want to create a new column in my data frame (df$outcome) with the result of this function (Pass/Fail result). I used the line of code below which made the extra column as I wanted. However, annoyingly the result (a long column of Passes and Fails) also shows up in my R Markdown document.
How can I get the extra column in my data frame without getting the result also in my R Markdown document?
df$outcome <- sapply(df$location, systemFail)
It is difficult to answer without knowing details of systemFail function. However, from your previous question it seems you are printing the values in the function. Instead of printing use return to return the result.
Look at this simple example -
systemFail <- function(x) {
print(x)
}
res <- systemFail('out')
#[1] "out"
and when we use return nothing is printed and the result is available in res.
systemFail <- function(x) {
return(x)
}
res <- systemFail('out')
I am trying to concatenate strings using mapply function in R. However, I want one of the strings to be variable in mapply function. I have a snippet of my code below:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
#dosomething with strings_concat
}
But this is giving me the last iteration only
strings_concat
dsf.xml
sdf.xml
sdf.xml
bust instead, the desired output is as follows:
strings_concat
dsf.csv
sdf.csv
sdf.csv
dsf.json
sdf.json
sdf.json
dsf.xml
sdf.xml
sdf.xml
At every iteration, i want to combine strings_concat with another dataframe and save it. Can anyone help me if there is an easy way to do this in R?
Perhaps, outer is a better option here :
strings_concat <- c(outer(strings$x, strings2$extension, paste0))
strings_concat
#[1] "dsf.csv" "sdf.csv" "sdf.csv" "dsf.json" "sdf.json" "sdf.json"
# "dsf.xml" "sdf.xml" "sdf.xml"
You can add it in a data.frame :
df <- data.frame(strings_concat)
If you want to add some additional steps at each iteration you can use lapply :
lapply(strings2$extension, function(x) {
strings_concat <- paste0(strings$x, x)
#do something with strings_concat
})
All you should need to do is make sure you are continually augmenting your dataset. So I think this should do the trick:
strings<-data.frame(x=c("dsf","sdf","sdf"))
strings2<-data.frame(extension=c(".csv",".json",".xml"))
# We are going to keep adding things to results
results = NULL
for (i in 1:3)
{
strings_concat<-mapply(function(string1,string2) paste0(string1,string2),strings$x,strings2$extension[i])%>%
data.frame()%>%
unlist()%>%
data.frame()
# Here is where we keep adding things to results
results = rbind(results, strings_concat)
}
print(results)
Caution: not in front a computer with R so this code is untested
I have a list of data frames. I want to use lapply on a specific column for each of those data frames, but I keep throwing errors when I tried methods from similar answers:
The setup is something like this:
a <- list(*a series of data frames that each have a column named DIM*)
dim_loc <- lapply(1:length(a), function(x){paste0("a[[", x, "]]$DIM")}
Eventually, I'll want to write something like results <- lapply(dim_loc, *some function on the DIMs*)
However, when I try get(dim_loc[[1]]), say, I get an error: Error in get(dim_loc[[1]]) : object 'a[[1]]$DIM' not found
But I can return values from function(a[[1]]$DIM) all day long. It's there.
I've tried working around this by using as.name() in the dim_loc assignment, but that doesn't seem to do the trick either.
I'm curious 1. what's up with get(), and 2. if there's a better solution. I'm constraining myself to the apply family of functions because I want to try to get out of the for-loop habit, and this name-as-list method seems to be preferred based on something like R- how to dynamically name data frames?, but I'd be interested in other, more elegant solutions, too.
I'd say that if you want to modify an object in place you are better off using a for loop since lapply would require the <<- assignment symbol (<- doesn't work on lapply`). Like so:
set.seed(1)
aList <- list(cars = mtcars, iris = iris)
for(i in seq_along(aList)){
aList[[i]][["newcol"]] <- runif(nrow(aList[[i]]))
}
As opposed to...
invisible(
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <<- runif(nrow(aList[[x]]))
})
)
You have to use invisible() otherwise lapply would print the output on the console. The <<- assigns the vector runif(...) to the new created column.
If you want to produce another set of data.frames using lapply then you do:
lapply(seq_along(aList), function(x){
aList[[x]][["newcol"]] <- runif(nrow(aList[[x]]))
return(aList[[x]])
})
Also, may I suggest the use of seq_along(list) in lapply and for loops as opposed to 1:length(list) since it avoids unexpected behavior such as:
# no length list
seq_along(list()) # prints integer(0)
1:length(list()) # prints 1 0.
I have a question here about the print() inside a for loop.
I have a dataset (gpa) with 2 columns. I am trying to get mean, variance, and standard deviation of values inside the two columns. When I code,
for(x in c(1:2)) {
mean(gpa[[x]])
var(gpa[[x]])
sd(gpa[[x]])
}
I don't get any output:
for(x in c(1:2)) {
print(mean(gpa[[x]]))
print(var(gpa[[x]]))
print(sd(gpa[[x]]))
}
But If i insert print before each of the lines, I do get the desired values.
What is the difference here? Is print really necessary?
The reason for this is, that all the stuff inside the loop gets evaluated, but never returned to somewhere. Though print deliver something I wouldn't advise it, because you can't use these values later, because they get returned to the console rather then the global environment. Instead you might want to assign them to something.
For example:
#example data
df <- data.frame(x = 1:10, y = rnorm(10))
#it is good to create the output in the desired length first
#it is much more efficient in terms of speed an memory beeing used but here it don't really matter
ret <- vector("list", NCOL(df))
for(x in seq_len(NCOL(df))){
ret[[x]] <- c(mean(df[[x]]),
var(df[[x]]),
sd(df[[x]]))
}
ret
Though I don't advise that neither. For the most (if not all things) you can do with a for loop you can use the apply family of functions from base r or the map function from purrr. This would look that way:
library(purrr)
map(df, ~c(mean(.), var(.), sd(.)))
#or even save it with names
ret <- map(df, ~c(mean = mean(.), var = var(.), sd = sd(.)))
ret
The apply/map variants are faster & shorter, but more importantly for me easier to understand and have less room for errors. Though there are a hole bunch of other arguments why you might want to use apply/map.
I'm using lapply to loop through a list of dataframes and apply the same set of functions. This works fine when lapply has just one function, but I'm struggling to see how I store/print the output from multiple functions - in that case, I seem to only get output from one 'loop'.
So this:
output <- lapply(dflis,function(lismember) vss(ISEQData,n=9,rotate="oblimin",diagonal=F,fm="ml"))
works, while the following doesn't:
output <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
})
I think this dummy example is an analogue, so in other words:
nbs <- list(1==1,2==2,3==3,4==4)
nbsout <- lapply(nbs,function(x) length(x))
Gives me something I can access, while I can't see how to store output using the below (e.g. the attempt to use nbsout[[x]][2]):
nbs <- list(1==1,2==2,3==3,4==4)
nbsout <- lapply(nbs,function(x){
nbsout[[x]][1]<-typeof(x)
nbsout[[x]][2]<-length(x)
}
)
I'm using RStudio and will then be printing outputs/knitting html (where it makes sense to display the results from each dataset together, rather than each function-output for each dataset sequentially).
You should return a structure that include all your outputs. Better to return a named list. You can also return a data.frame if your outputs have all the same dimensions.
otutput <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
list(outputvss=outputvss,nefa=nefa)
## or data.frame(outputvss=outputvss,nefa=nefa)
})
When you return a data.frame you can use sapply that simply outputs the final result to a big data.frame. Or you can use the classical:
do.call(rbind,output)
to aggregate the result.
A function should always have an explicit return value, e.g.
output <- lapply(dflis,function(lismember){
outputvss <- vss(lismember,n=9,rotate="oblimin",diagonal=F,fm="ml")
nefa <- (EFA.Comp.Data(Data=lismember, F.Max=9, Graph=T))
#return value:
list(outputvss, nefa)
})
output is then a list of lists.