Display column names that are only factors - r

Is there a way to extract only column names that are factor. For example, in iris dataset, last column is a factor, so only Species (column name and not entire column) should be extracted
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> str(head(iris))
'data.frame': 6 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1

We can use :
names(iris)[sapply(iris, is.factor)]
#[1] "Species"
Or using Filter :
names(Filter(is.factor, iris))

Another solution which involves the dplyr package (if by chance you are already using it in your own project) is
names(iris %>% select_if(is.factor))
or equivalently (choose the one you like more)
iris %>% select_if(is.factor) %>% names()
Output
# [1] "Species"

Related

Looping Over Categories to Create Individual Regressions

In addendum to my previous question, I want to then create a for loop, each iteration of which would create a regression for each unique code I have created. More specifically, I want to create a regression with only the data that correspond to each unique code. How do I do this?
I have tried Googling "for loops in R" and have failed to find an answer that suits my need to iterate over categories rather than variables.
Here is an example predicting Sepal.Length from Sepal.Width, Petal.Length, and Petal.Width for each of three species using the iris data:
data(iris)
str(iris)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Now split and lapply:
iris.split <- split(iris[, -5], iris$Species)
iris.lm <- lapply(iris.split, \(x) lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width, x))
lapply(iris.lm, summary)
The last line prints summary reports for the regressions for each species.

Conditional Evaluation in Dplyr

I have a character vector r <- c(). I want to mutate on dataframe based on length of r
This works
iris %>% if(length(r) > 0) mutate(Test = 1) else .
This does not work when I expand to add more dplyr verbs
iris %>% if(length(r) > 0) mutate(Test = 1) else . %>% mutate(Test2 = 1)
I am only looking for dplyr based solution.
As there are multiple statements, wrap it inside a {}
r <- c()
iris %>%
{if(length(r) > 0) {
mutate(., Test = 1)
} else .}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
...
-testing with r length > 0
r <- 5
iris %>%
{if(length(r) > 0) {
mutate(., Test = 1)
} else .}
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Test
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
...
However, this can be easily modified without a loop i.e. convert the logical vector to numeric index by adding 1 (as indexing in R starts from 1). Use that to select a list with values 1 and NULL. If the length is 0, then NULL is selected and thus no column is created
iris %>%
mutate(Test = list(NULL, 1)[[1 + (length(r) > 0)]])
library(dplyr)
Using an intermediate function provides an alternative solution once it is substituted by an anonymous function
g_if <- function(df, r){
if(length(r)) {
ans <- df %>% mutate(test = 1)
} else {
ans <- df
}
invisible(ans)
}
r <- c()
iris %>% g_if(r) %>% str
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
r <- c(1)
iris %>% g_if(r) %>% str
#> 'data.frame': 150 obs. of 6 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ test : num 1 1 1 1 1 1 1 1 1 1 ...
Now, we can use the same idea with an anonymous function, that is, without defining explicitely
function g_if()
r <- c()
iris %>% {
function(df, cond){
if(length(cond) > 0) {
ans <- df %>% mutate(test = 1)
} else {
ans <- df
}
ans}}(r) %>%
head
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
r <- c(1)
iris %>% {
function(df, cond){
if(length(cond) > 0) {
ans <- df %>% mutate(test = 1)
} else {
ans <- df
}
ans}}(r) %>%
head
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species test
#> 1 5.1 3.5 1.4 0.2 setosa 1
#> 2 4.9 3.0 1.4 0.2 setosa 1
#> 3 4.7 3.2 1.3 0.2 setosa 1
#> 4 4.6 3.1 1.5 0.2 setosa 1
#> 5 5.0 3.6 1.4 0.2 setosa 1
#> 6 5.4 3.9 1.7 0.4 setosa 1
Created on 2021-06-17 by the reprex package (v0.3.0)
We could use ifelse
library(dplyr)
r <- c()
iris %>%
mutate(Test = ifelse(length(r) > 0, 1,1))
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Test
1 5.1 3.5 1.4 0.2 setosa 1
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 1
6 5.4 3.9 1.7 0.4 setosa 1
The below code will add the variable if the condition is met. If not, it will add a variable populated will all NA and eventually remove it (I understand you need the new variable only if the condition is met).
library(dplyr)
r <- c()
iris %>%
mutate(test2=if_else(length(r)>0, 2, NULL)) %>%
select(where(~ !(all(is.na(.))))) #remove columns with all NAs

Get row by certain value in R

So I have this data table, and I'd like to sort it out by profession (column 'Profissao').
The idea is to make an average of the answers to each column by area of working.
For example:
I need to select every 'Aspeto-A' cell in a row referent to 'Media' job and make the average of all Media people who answered the form.
data table screenshot
A picture of your data is not as useful as using dput(). Since I can't use your data, I'll use the iris data set that is included with R:
data(iris)
str(iris)
# 'data.frame': 150 obs. of 5 variables:
# $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
aggregate(.~Species, iris, mean)
# Species Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1 setosa 5.006 3.428 1.462 0.246
# 2 versicolor 5.936 2.770 4.260 1.326
# 3 virginica 6.588 2.974 5.552 2.026
It wasn't that hard after all. Probably there is a better way to do it, but this one is working out.
AspetoA_Selectiom<-data.frame(Profissao=Profissao,aspetoA=aspetoA)
ApetoA_sum <- aggregate(AspetoA_Selectiom$aspetoA, by=list(AspetoA_Selectiom$Profissao), FUN=sum)
AspetoA_length <- aggregate(AspetoA_Selectiom$aspetoA, by=list(AspetoA_Selectiom$Profissao), FUN=length)
AspetoA_AVG<- ApetoA_sum$x / AspetoA_length$x

Error using Levene test in R after a group by [error: is not a numeric variable]

I´m trying to use levene Test from "car" library in R with the iris dataset.
The code I have is:
library(tidyverse)
library(car)
iris %>% group_by (Species) %>% leveneTest( Sepal.Length )
From there I´m getting the following error:
Error in leveneTest.default(., Sepal.Length) :
. is not a numeric variable
I don´t know how to fix this, since the data types seem to be of the rigth type:
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Your question is mainly about R syntax, which is not on topic on CrossValidated. That being said, you can either use the formula interface as in
leveneTest(Sepal.Length ~ Species, data=iris)
or state the data directly as in
leveneTest(y = iris$Sepal.Length, group = iris$Species)

Convert the structure of a column in a data frame to character if if it exist, otherwise don't worry

I am creating a function that will read in several comma delimited files into R before I mutate them to add on a few columns and then export it again to use the files in another model. There exist one particular field in some of the files that contains numbers but the model denotes it as a string (with "" around it). Naturally R obviously reads this as a numeric field and so I want to convert it to a character if the column name exists.
The code within the function I have tried is the following:
with(df, if("name of column" %in% colnames(df)) as.character)
unfortunately this doesn't work. Thank you in advance!
P.s. I have code further along in my script that will add on the "" to all character fields, so obtaining those is not a problem, just the actual conversion to a character field.
We can write up a simple name checker function:
check_names<- function(df, col_name = "Species"){
if(col_name %in% names(df)){
df[[col_name]]<- as.character(df[[col_name]])
}
df
}
str(check_names(iris,"Species"))
EDIT
The modified function below will work for several columns:
check_names<- function(df, col_names = NULL){
if(all(col_names %in% names(df))){
df[,col_names]<- sapply(df[,col_names],as.character)
}
df
}
Result using the above:
str(check_names(iris,c("Species","Sepal.Length")))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: chr "5.1" "4.9" "4.7" "4.6" ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : chr "setosa" "setosa" "setosa" "setosa" ...
Original result:
data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : chr "setosa" "setosa" "setosa" "setosa" .
# non existent names
str(check_names(iris,"nope"))
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Resources