This question already has answers here:
How to deal with nonstandard column names (white space, punctuation, starts with numbers)
(3 answers)
Closed 2 years ago.
I am trying to find convert/cast a built in matrix state.x77 to a dataframe. But once casted using as.data.frame, the column "Life Exp" should be automatically casted to "Life.Exp"; however, when i used select() function to choose that column using Life.Exp or Life Exp, both do not exist. Am I casting it wrong?
library(dplyr)
library(tidyr)
state.x77 %>% as.data.frame %>% select(Frost,Life.Exp) %>% cor
Reinforcing the response of colleagues, as.data.frame converts the state.x77 matrix to data.frame, keeping the name of the original variables. The Life Exp variable contains space, interpreted by the R as a special character, so to select the Life Exp column in the data.frame, you must put (``). Therefore:
select (Frost, `Life Exp`)
as.data.frame surprisingly does not change name of columns so the space in it remains, you can select the column with back quotes.
library(dplyr)
state.x77 %>% as.data.frame %>% select(Frost,`Life Exp`) %>% cor
However, if you use data.frame it adds the "." in between so now you can use
state.x77 %>% data.frame %>% select(Frost,Life.Exp) %>% cor
Try this:
library(dplyr)
library(tidyr)
state.x77 %>% as.data.frame() %>% select(Frost,`Life Exp`) %>% cor()
Frost Life Exp
Frost 1.000000 0.262068
Life Exp 0.262068 1.000000
Related
This question already has answers here:
How to select non-numeric columns using dplyr::select_if
(3 answers)
Closed 1 year ago.
In my project, I want to extract all the columns except numeric from my R data frame, as this question I used the same method and just put a not gate into is.numeric() R function but it is not working
This gives all the numaric data,
x<-iris %>% dplyr::select(where(is.numeric))
But this does not work as expected,
x<-iris %>% dplyr::select(where(!is.numeric))
Note: Finally the output data frame should only contain the species column in the iris dataset
purrr package from tidyverse serves exactly what you want by purrr::keep and purrr::discard
library(purrr)
x <- iris %>% keep(is.numeric)
by these piece of code, you set a logical test in keep function and only the columns which passed the test stays.
to reverse that operation and achieve to your wish, you can use discard from purrr also;
x <- iris %>% discard(is.numeric)
you can think discard as keep but with !is.numeric
or alternatively by dplyr
x <- iris %>% select_if(~!is.numeric(.))
I have a data frame with four rows, 23 numeric columns and one text column. I'm trying to normalize all the numeric columns by subtracting the value in the first row.
I've tried getting it to work with mutate_at, but I couldn't figure out a good way to get it to work.
I got it to work by converting to a matrix and converting back to a tibble:
## First, did some preprocessing to get out the group I want
totalNKFoldChange <- filter(signalingFrame,
Population == "Total NK") %>% ungroup
totalNKFoldChange_mat <- select(totalNKFoldChange, signalingCols) %>%
as.matrix()
normedNKFoldChange <- sweep(totalNKFoldChange_mat,
2, totalNKFoldChange_mat[1,])
normedNKFoldChange %<>% cbind(Timepoint =
levels(totalNKFoldChange$Timepoint)) %>%
as.tibble %>%
mutate(Timepoint = factor(Timepoint,
levels = levels(totalNKFoldChange$Timepoint)))
I'm so certain there's a nicer way to do it that would be fully dplyr native. Anyone have tips? Thank you!!
If we want to normalize all the numeric columns by subtracting the value in the first row, use mutate_if
library(dplyr)
df1 %>%
mutate_if(is.numeric, list(~ .- first(.)))
I want to convert column names that start with the word "feature" to character type using dplyr. I tried the below and a few other variations using answers from stackoverflow. Any help would be appreciated. Thanks!
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
train %>% mutate_if(vars(starts_with("feature")), funs(as.character(.)))
I am trying to improve my usage of dplyr commands.
You need mutate_at instead
library(dplyr)
train %>% mutate_at(vars(starts_with("feature")), as.character)
As #Gregor mentioned, mutate_if is when selection of column is based on the actual data in the column and not the names.
For example,
iris %>% mutate_if(is.numeric, sqrt)
So if the data in the column is numeric only then it will calculate square root.
If we want to combine multiple vars statement into one we can use matches
merchants %>% mutate_at(vars(matches("_id|category_")), as.character)
This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 4 years ago.
I am trying to get better in using pipes %>% in dplyr package. I understand that the whole point of using pipes (%>%) is that it replaces the first argument in a function by the one connected by pipe. That is, in this example:
area = rep(c(3:7), 5) + rnorm(5)
Pipes
area %>%
mean
equal normal function
`mean(area)`.
My problem is when it gets to a dataframe. I would like to split dataframe in a list of dataframes, and than calculate means per area columns. But, I can't figure out how to call the column instead of the dataframe?
I know that I can get means by year simply by aggregate(area~ year, df, mean) but I would like to practice pipes instead.
Thank you!
# Dummy data
set.seed(13)
df<-data.frame(year = rep(c(1:5), each = 5),
area = rep(c(3:7), each = 5) + rnorm(1))
# Calculate means.
# Neither `mean(df$area)`, `mean("area")` or `mean[area]` does not work. How to call the column correctly?
df %>%
split(df$year) %>%
mean
This?
df %>%
group_by(year) %>%
summarise(Mean=mean(area))
We need to extract the column from the list of data.frames in split. One option is to loop through the list with map, and summarise the 'area'.
df %>%
split(.$year) %>%
map_df(~ .x %>%
summarise(area = mean(area)))
I have a data.frame:
set.seed(1L)
vector <- data.frame(patient=rep(1:5,each=2),medicine=rep(1:3,length.out=10),prob=runif(10))
I want to get the mean of the "prob" column while grouping by patient. I do this with the following code:
vector %>%
group_by(patient) %>%
summarise(average=mean(prob))
This code perfectly works. However, I need to get the same values without using the word "prob" on the "summarise" line. I tried the following code, but it gives me a data.frame in which the column "average" is a vector with 5 identical values, which is not what I want:
vector %>%
group_by(patient) %>%
summarise(average=mean(vector[,3]))
PD: for the sake of understanding why I need this, I have another data frame with multiple columns with complex names that need to be "summarised", that's why I can't put one by one on the summarise command. What I want is to put a vector there to calculate the probs of each column grouped by patients.
It appears you want summarise_each
vector %>%
group_by(patient) %>%
summarise_each(funs(mean), vars= matches('prop'))
Using data.table you could do
setDT(vector)[,lapply(.SD,mean),by=patient,.SDcols='prob')