I want to remove parts from a list to reduce the list to the elements of it that have a certain number of columns.
This a dummy example of what I'm trying to do:
#1: define the list
tables = list(mtcars,iris)
for(k in 1:length(tables)) {
# 2: be sure that each element is shaped as dataframe and not matrix
tables[[k]] = as.data.frame(tables[[k]])
# 3: remove elements that have more or less than 5 columns
if(ncol(tables[[k]]) != 5) {
tables <- tables[-k]
}
}
another option I tried:
#1: define the list
tables = list(mtcars,iris)
for(k in 1:length(tables)) {
# 2: be sure that each element is shaped as dataframe
tables[[k]] = as.data.frame(tables[[k]])
# 3: remove elements that have more or less than 5 columns
if(ncol(tables[[k]]) != 5) {
tables[[-k]] <- NULL
}
}
I'm getting
Error in tables[[k]] : subscript out of bounds.
Is there an alternative and correct approach?
We can use Filter
Filter(function(x) ncol(x)==5, tables)
Or with sapply to create a logical index and subset the list
tables[sapply(tables, ncol)==5]
Or as #Sotos commented
tables[lengths(tables)==5]
lengths return the length of each list element convert it a logical vector and subset the list. The length of a data.frame is the number of columns it has
For a tidyverse option you can use purrr:keep for this. You just define a predicate function, if true it keeps the list element, if false it removes it. Here I've done that with the formula option.
library(purrr)
tables <- list(mtcars, iris)
result <- purrr::keep(tables, ~ ncol(.x) == 5)
str(result)
#> List of 1
#> $ :'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Related
I am having some trouble running a combination of eval, parse and as.character for a data.table. I basically want to convert a given column of the data table to as.character output of the same column.
library(data.table)
options(datatable.WhenJisSymbolThenCallingScope=TRUE)
# an options that I heard may solve the problem
iris2 <- data.table(iris)
VARS <- colnames(iris)
j <- 1
iris2[,eval(parse(text = paste0(VARS[j])))] # this works fine
iris2[,eval(parse(text = paste0(VARS[j]))) := as.character(eval(parse(text = paste0(VARS[j]))))]
#but this fails
From the looks of it, it appears the eval and parse functions work fine but when it comes to updating the column with := it seems to break. Could someone tell me what the issue is?
We can use the data.table methods to transform the variables. Specify the 'VARS' or subset of 'VARS' i.e 'VARS[j]' in .SDcols, loop through the columns (in case we want to loop for multiple columns) and assign (:=) to the columns specified in 'VARS[j]`
iris2[, VARS[j] := lapply(.SD, as.character) , .SDcols = VARS[j]]
str(iris2)
#Classes ‘data.table’ and 'data.frame': 150 obs. of 5 variables:
#$ Sepal.Length: chr "5.1" "4.9" "4.7" "4.6" ...
#$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1
Im new to R and have some trouble understanding how to handle local and global environments. I checked the Post on local and global variables, but couldn't figure it out.
If, for example, I would like to make several plots using a function and save them like this:
PlottingFunction <- function(type) {
type <<- mydata %>%
filter(typeVariable==type) %>%
qplot(a,b)
}
lapply(ListOfTypes, PlottingFunction)
Which didn't yield the desired result. I tried using the assign() function, but couldn't get it to work either.
I want to save the graphs in the global environment so I can combine them using gridExtra. This might not be the best way to do that, but I think it might be useful to understand this issue nevertheless.
You don't need to assign the plot to a gloabl variable. All plots can be saved in one list.
For this example, I use the iris data set.
library(gridExtra)
library(ggplot2)
library(dplyr)
str(iris)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
The modified function without assignment:
PlottingFunction <- function(type) {
iris %>%
filter(Species == type) %>%
qplot(Sepal.Length, Sepal.Width, data = .)
}
One figure per Species is created
species <- unique(iris$Species)
# [1] setosa versicolor virginica
# Levels: setosa versicolor virginica
l <- lapply(species, PlottingFunction)
Now, the function do.call can be used to call grid.arrange with the plot objects in the list l.
do.call(grid.arrange, l)
I open my csv file and I control the class of each of my data:
mydataP<-read.csv("Energy_protein2.csv", stringsAsFactors=F)
apply(mydataP, 2, function(i) class(i))
#[1] "numeric"
I add a column and check the class of the data:
mydataP[ ,"ID"] <-rep(c("KOH1", "KOH2", "KOH3", "KON1", "KON2", "KON3", "WTH1", "WTH2", "WTH3","WTN1", "WTN2", "WTN3"), each=2)
apply(mydataP, 2, function(i) class(i))
Here it changes to a "character"
as.numeric(as.factor(mydataP))
#Error in sort.list(y) : 'x' must be atomic for 'sort.list'
#Have you called 'sort' on a list?
as.numeric(as.character(mydataP))
I get a vector with 117 NA
I have no idea what to do now, as soon I touch the frame it changes to character, can somebody help me? Thanks
That happens because apply converts your data.frame to matrix and those can only have one class in them.
Try this instead:
sapply(mydataP, class)
This is the reason you should normally try to avoid using apply on data.frames.
This behavior is documented in the help file (?apply):
If X is not an array but an object of a class with a non-null dim
value (such as a data frame), apply attempts to coerce it to an array
via as.matrix if it is two-dimensional (e.g., a data frame) or via
as.array.
Here's a reproducible example with the built-in iris dataset:
> apply(iris, 2, function(i) class(i))
#Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# "character" "character" "character" "character" "character"
> sapply(iris, class)
#Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# "numeric" "numeric" "numeric" "numeric" "factor"
> str(iris)
#'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
As you can see, apply converts all columns to the same class.
I'm sitting with a large dataset and want to get som basic information about my variables, first of all if they are numeric or factor/ordinal.
I'm working with a function, and want, one variable at a time, investigate if it is numeric or a factor.
To make the for loop work I'm using dataset[i] to get to the variable I want.
object<-function(dataset){
n=ncol(dataset)
for(i in 1:n){
variable_name<-names(dataset[i])
factor<-is.factor(dataset[i])
rdered<-is.ordered(dataset[i])
numeric<-is.numeric(dataset[i])
print(list(variable_name,factor,ordered,numeric))
}
}
is.ordered
My problem is that is.numeric() does not seem to work with dataset[i], all the results becomes "FALSE", but only with dataset$.
Do you have any idea how to solve this?
Try str(dataset) to get summary information on an object, but to solve your problem you need to compeletely extract your data with double square brackets. Single square bracket subsetting keeps the output as a sub-list (or data.frame) rather than extracting the contents:
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
is.numeric(iris[1])
[1] FALSE
class(iris[1])
[1] "data.frame"
is.numeric(iris[[1]])
[1] TRUE
Assuming that dataset is something like a data.frame, you can do the following (and avoid the loop):
names = sapply(dataset, names) # or simply `colnames(dataset)`
types = sapply(dataset, class)
Then types gives you either numeric or factor. You can then simply do something like this:
is_factor = types == 'factor'
I want to exclude the "fldname" labeled column from a frame frm in R. If we know the index of the column say i then we can use the frm[-i] to exclude the ith column. Is there any simple way to do the same by specifying the column label string or list of label strings which i want to exclude?
I worked out a solution (corrected by Fhnuzoag):
frm[names (frm)[names (frm) != c("fldname1","fldname2")]]
frm[names (frm)[!names (frm) %in% c("fldname1","fldname2")]]
get the list of wanted strings and use them as index. Above "fldname1" and "fldname2" are the unwanted fields.
Is there a simply solution which the language syntax has?
Yes, use a combination of negation ! and %in%. For example, using iris:
x <- iris[, !names(iris) %in% c("Sepal.Width", "Sepal.Length")]
str(x)
'data.frame': 150 obs. of 3 variables:
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
I think, no. Usually I do frm[, setdiff(names(frm), excludelist)].