I have a dataframe with >100 columns, all of which are INTs.
I have subsetted some columns which I would like to factorise, allowing me to conduct an ANOVA, say
my_variables_list = headers[grep('independent', headers)]
Now I would like to loop over all these variables and factorise:
for (i in my_variables_list) {
df$i = as.factor(df$i)
}
However this doesn't work - no error message is returned, but also no changes are made to the df. Similarly, if I try to run a single line of this it also fails.
df$my_variables_list[10] <- as.factor(df$my_variables_list[10])
You should use the [] operators to subset your dataframe within the for loop:
for (i in my_variables_list) {
df[,i] = as.factor(df[,i])
}
An example on iris avoiding the loop. We first look for the patter Sepal or Sepal in the colnames of iris, then convert those columns to factor with lapply
my_variables_list = grep('Petal|Sepal', colnames(iris))
iris[, my_variables_list] <- lapply(iris[, my_variables_list], as.factor)
or on you data.frame:
df[,my_variables_list] <- lapply(df[, my_variables_list], as.factor)
Related
I have around 192 CSV's that I have converted to dataframes. I would like to be able to put the names of each dataframe in a vector and then run a FOR LOOP through the vector like so:
for (i in length(vector)){
vector[i] <- f1(vector[i])
}
or just pass through the vector into the function like so: f1(vector).
If the vector is full of integers or strings, I can put the vector through a function and it will work fine. For example:
squared <- function(x) {
return(x*x)
}
This will work with with a vector c(1,2,3,4,5) and return c(1,4,9,16,25). Otherwise, I have to make 124 lines of code for each function I want to do.
Your advice would be greatly appreciated, please.
I think the most Rtistic way of doing it would be to have all your dataframes in a list to start with. For instance,
df1 <- mtcars
df2 <- mtcars
df3 <- mtcars
frames <- grep('df', ls(), value = T)
frame_list <- lapply(frames, get)
gets you there. Now you can apply whatever function you want to each dataframe in a lapply call. So, for instance, if you wanted to get all the squared values of mpg, you could write
frame_adj <- lapply(frame_list, function(x) x$mpg * x$mpg )
The above gives you all the squared values of mpg from the original dataframes, but does not keep the other columns. If you prefer to keep the other values, simply adjust your function to return the dataframe, e.g.
frames_with_squared_mpg <-
lapply(frame_list, function(x) {
x$mpg_sq <- x$mpg * x$mpg
return(x)})
will get you there.
Consider this code
lapply(lst,rowSums)
lst is a list of five data frames. Each data frame has, for example, four columns and ten rows. I want to add the values of the columns in each row, however, I do not want to include the value of column one in the sum.
I can use a for loop and the code below:
lst_sum = list()
for (ii in c(1,5))
{
dummy <- lst[[ii]]
dummy <- rowSums(dummy[,seq(2,4,by = 1)])
lst_sum[[ii]] = dummy
}
I would like to use lapply or a similar function because I think the for loops looks ugly and inefficient.
With the help of comments I got this solution:
lapply(lst,function(x) rowSums(x[,seq(2,4,by = 1)]))
I would like to create a vector from each column of mtcars data frame. I need two solutions. First one should be done in loop and if it's possible the other one without a loop.
The desired output should be like that:
vec_1 <- mtcars[,1]
vec_2 <- mtcars[,2]
etc...
I tried to create a loop but I failed. Can you tell me what is wrong with that loop ?
vec <- c()
for (i in 1:2){
assign(paste("vec",i,sep="_" <- mtcars[,i][!is.na(mtcars[,i])]
}
I need to remove possible NAs from my data that's why I put it in the example.
Your loop is missing a few brackets and you should assign the vector to the global environment of your R session like so:
for (i in 1:2) {
assign(sprintf("vec_%d", i), mtcars[!is.na(mtcars[[i]]), i], envir = .GlobalEnv)
}
It is not possible to get the desired result without a loop.
I have a dataframe in which I occasionally have -1s. I want to replace them with NA. I tried the apply function, but it returns a matrix of characters to me, which is no good:
apply(d,c(1,2), function(x){
if (x == -1){
return (NA)
}else{
return (x)
}
})
I am wrestling with by but I cannot seem to handle it properly. I have got this so far:
s <-by(d,d[,'Q1_I1'], function(x){
for(i in x)
print(i)
})
which if I understood correctly by() serves into x my dataframe row by row. And I can iterate through the each element of the row by the for function. I just don't know how to replace the value.
The reason that apply does not work is that it converts a data frame to a matrix and if your data frame has any factors then this will be a character matrix.
You can use lapply instead which will process the data frame one column at a time. This code works:
mydf <- data.frame( x=c(1:10, -1), y=c(-1, 10:1), g=sample(letters,11) )
mydf
mydf[] <- lapply(mydf, function(x) { x[x==-1] <- NA; x})
mydf
As #rawr mentions in the comments it does work to do:
mydf[ mydf== -1 ] <- NA
but the documentation (?'[.data.frame') say that that is not recommended due to the conversions.
One big question is how the data frame is being created. If you are reading the data using read.table or related functions then you can just specify the na.strings argument and have the conversion done for you as the data is read in.
You can do this fast and transparently with the data.table library.
# take standard dataset and transform to data.table
mtcars = data.table(mtcars,keep.rownames = TRUE)
# select rows with 5 gear and set to NA
mtcars[gear==5,gear:= NA]
mtcars
I would like to build a function that adds many columns of random variables or other function to a a dataframe. Here I am trying to append it to map data.
library(plyr)
add <- function(name, df){
new.df = mutate(df, name = runif(length(df[,1])))
new.df
}
The function works to add a column of data...
add("e", iris)
iris2<- add("f", iris)
The apply does not work...
I am trying to add 26 columns from the list of letters so that df$a, df$b, df$c are all random vectors.
new <- lapply(letters, add, df = tx)
What is the most efficient way to columns from a list of col names?
I would like to later loop through all of the column names in another function.
It's not very clear to me, what you want to achieve. This adds multiple columns of random numbers to a data.frame:
cbind(iris,
matrix(runif(nrow(iris)*5), ncol=5))
I don't see a reason to use an *apply function.