I have a data.frame of cells containing a mix of numbers and characters.
For example
data(iris)
iris$comb<-paste(iris$Sepal.Length,'-',iris$Species)
iris$comb2<-paste(iris$Sepal.Width,'-',iris$Species)
head(iris[,6:7])
comb comb2
1 5.1 - setosa 3.5 - setosa
2 4.9 - setosa 3 - setosa
3 4.7 - setosa 3.2 - setosa
4 4.6 - setosa 3.1 - setosa
5 5 - setosa 3.6 - setosa
6 5.4 - setosa 3.9 - setosa
I want to sort groups of cells based on their numeric value, and I can do this with gtools::mixedsort(). However, I have several columns that need this, and I only want to sort every 3 rows in a column, independently of the rest of the column. The (extremely) long way to do this would be
library(gtools)
mixedsort(iris[1:3,6],decreasing=TRUE)
mixedsort(iris[4:6,6],decreasing=TRUE)
I'm just not sure how to loop through little bunches of cells like this. I would very much appreciate any help.
We create a grouping variable using gl and then using mutate_at specify the columns of interest to apply the function
library(gtools)
library(dplyr)
iris %>%
group_by(grp = as.integer(gl(n(), 3, n()))) %>%
mutate_at(vars(matches("comb")), funs(mixedsort(., decreasing = TRUE))) %>%
ungroup() %>%
select(-grp)
Related
I want to rename multiple columns that starts with the same string.
However, all the codes I tried did not change the columns.
For example this:
df %>% rename_at(vars(matches('^oldname,\\d+$')), ~ str_replace(., 'oldname', 'newname'))
And also this:
df %>% rename_at(vars(starts_with(oldname)), funs(sub(oldname, newname, .))
Are you familiar with a suitable code for rename?
Thank you!
Take iris for example, you can use rename_with() to replace those column names started with "Petal" with a new string.
head(iris) %>%
rename_with(~ sub("^Petal", "New", .x), starts_with("Petal"))
Sepal.Length Sepal.Width New.Length New.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
You can also use rename_at() in this case, although rename_if(), rename_at(), and rename_all() have been superseded by rename_with().
head(iris) %>%
rename_at(vars(starts_with("Petal")), ~ sub("^Petal", "New", .x))
This question already has answers here:
Create loop with dynamic column names and repeating values based on defined i
(1 answer)
How to use mutate and ifelse in a loop?
(3 answers)
How can I dynamically create new variables/columns on databases in R using dplyr?
(2 answers)
How to use mutate from dplyr to create a series of columns defined and called by a vector specifying values for mutation?
(1 answer)
dplyr apply a single function with changing argument to the same column
(2 answers)
Closed 1 year ago.
Let me clarify I am not looking at mutate_at or mutate(across(..., ...)) type of syntax here. I just want to know how to create several new columns at once inside tidyverse pipe syntax.
Let us assume the case of iris dataset.
I want to create say 10 (or 100 or more) new columns having a criteria like this.
first new column(variable) say V1 is just Petal.Length * 1,
second new col say V2 is Petal.Length * 2
and so on upto say V10 Petal.Length * 10
without explicitly writing the names and formula for each of these columns, which may be cumbersome If I want to create say 100 new columns.
You can use map functions :
library(dplyr)
library(purrr)
df <- iris %>% head
value <- 1:5
bind_cols(df,
map_dfc(value, ~df %>% transmute(!!paste0('col', .x) := Petal.Length * .x)))
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species col1 col2 col3 col4 col5
#1 5.1 3.5 1.4 0.2 setosa 1.4 2.8 4.2 5.6 7.0
#2 4.9 3.0 1.4 0.2 setosa 1.4 2.8 4.2 5.6 7.0
#3 4.7 3.2 1.3 0.2 setosa 1.3 2.6 3.9 5.2 6.5
#4 4.6 3.1 1.5 0.2 setosa 1.5 3.0 4.5 6.0 7.5
#5 5.0 3.6 1.4 0.2 setosa 1.4 2.8 4.2 5.6 7.0
#6 5.4 3.9 1.7 0.4 setosa 1.7 3.4 5.1 6.8 8.5
In base R, this can be done with lapply :
df[paste0('col', value)] <- lapply(value, `*`, df$Petal.Length)
I would like to use the ntile function from dplyr or a similar function on a list of data frames but using a different n for each data frame. My list contains 150 data frames so a manual solution like the one below will not work. How can I rewrite the code below to act on the list of data frames and return the list of data frames with the new column?
library(tidyverse)
iris_list=split(iris,iris$Species)
iris_setosa=iris_list[[1]]
iris_versicolor=iris_list[[2]]
iris_virginica=iris_list[[3]]
iris_setosa$n3=ntile(iris_setosa$Sepal.Length,3)
iris_versicolor$n5=ntile(iris_setosa$Sepal.Length,5)
iris_virginica$n7=ntile(iris_setosa$Sepal.Length,7)
The final result should be this
final_list=list(iris_setosa,iris_versicolor,iris_virginica)
head(final_list[[1]])
Sepal.Length Sepal.Width Petal.Length Petal.Width Species n3
1 5.1 3.5 1.4 0.2 setosa 2
2 4.9 3.0 1.4 0.2 setosa 1
3 4.7 3.2 1.3 0.2 setosa 1
4 4.6 3.1 1.5 0.2 setosa 1
5 5.0 3.6 1.4 0.2 setosa 2
6 5.4 3.9 1.7 0.4 setosa 3
There are several ways to achieve this, depending on what type of object you want in the end.
One way would be to use base::expand.grid and purrr::pmap like this:
percentiles = list(3,5,7)
iris_list %>%
map("Sepal.Length") %>%
expand.grid(percentiles) %>%
pmap(~ntile(..1,..2))
First, you want only the Sepal.Length variable of all your datasets, so you use purrr::map to get them.
Then, expand.grid creates a dataframe of all combinations of its parameters. Here, with 2 lists of 3 members, it would return a dataframe of 3x3=9 rows: setosa 3, versicolor 3, virginica 3, setosa 5, ...
Finally, pmap can iterate over the dataframe and apply the function ntile, with the first column (iris_list) as the first argument and the second column (percentiles) as the second argument. Unfortunately, purrr is very bad in dealing with names, but it seems that it is on purpose.
EDIT:
Your edit is somehow another question, so here is another answer:
iris_list %>%
map(~mutate(.x, n3=ntile(Sepal.Length,3)),
n5=ntile(Sepal.Length,5)), n7=ntile(Sepal.Length,7)))
I've found a way that works
n_size=data.frame(Species=c("setosa ","versicolor","virginica"),size=c(3,5,7))
iris_bin=iris %>% inner_join(n_size,by="Species") %>%
group_by(Species)%>%
mutate(bin=ntile(Sepal.Length,size[1])) %>%
arrange(Species,Sepal.Length,bin)
This question already has answers here:
dplyr mutate rowSums calculations or custom functions
(7 answers)
Closed 3 years ago.
Summing across columns by listing their names is fairly simple:
iris %>% rowwise() %>% mutate(sum = sum(Sepal.Length, Sepal.Width, Petal.Length))
However, say there are a lot more columns, and you are interested in extracting all columns containing "Sepal" without manually listing them out. Specifically, I'm looking for a method in the same way select() in dplyr allows you to subset columns with with contains(), starts_with(), etc.
There are ways to use mutate_all() + sum() + join() in order to fulfill the same result as this query, but I am more interested in seeing something as close to the solution as the code below:
iris %>% rowwise() %>% mutate(sum = sum(contains(colnames(.), "Sepal")))
If I understand correctly, basically you're trying to do:
library(dplyr)
iris %>% mutate(sum = rowSums(select(., contains("Sepal"))))
First few rows:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1 5.1 3.5 1.4 0.2 setosa 8.6
2 4.9 3.0 1.4 0.2 setosa 7.9
3 4.7 3.2 1.3 0.2 setosa 7.9
4 4.6 3.1 1.5 0.2 setosa 7.7
5 5.0 3.6 1.4 0.2 setosa 8.6
6 5.4 3.9 1.7 0.4 setosa 9.3
I am trying to create new variable in a dataset based on the value of an indicator. The following is the code for the same:
prac_data <- head(iris,10)
COPY_IND='Y' ##declaring the indicator to be 'Y'
prac_data <- prac_data %>% mutate(New_Var=ifelse(COPY_IND=='Y', Sepal.Length, 'N'))
I get the following output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species New_Var
1 5.1 3.5 1.4 0.2 setosa 5.1
2 4.9 3.0 1.4 0.2 setosa 5.1
3 4.7 3.2 1.3 0.2 setosa 5.1
4 4.6 3.1 1.5 0.2 setosa 5.1
5 5.0 3.6 1.4 0.2 setosa 5.1
6 5.4 3.9 1.7 0.4 setosa 5.1
7 4.6 3.4 1.4 0.3 setosa 5.1
8 5.0 3.4 1.5 0.2 setosa 5.1
9 4.4 2.9 1.4 0.2 setosa 5.1
10 4.9 3.1 1.5 0.1 setosa 5.1
I actually want to copy the variable 'Sepal.Length' in the 'New_Var' for every observation if indicator(COPY_IND) is Yes('Y').
If I do the the following, I get the desired response:
if (COPY_IND=='Y')
{
prac_data$New_Var <- prac_data$Sepal.Length
} else {prac_data$New_Var <- 'N'}
I just want to understand why R treats both 'if-else' approaches differently?
Is there another better elegant way to the same?
Thanks in advance!!
Actually, this might be easier to read as an answer.
From ifelse() help: "ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE".
Your test is just a single value, so ifelse() returns a single value, either Sepal.Length[1] or N, which is then duplicated across the whole column.
You need rowwise() on your way: prac_data <- prac_data %>% rowwise() %>% mutate(New_Var = ifelse(COPY_IND=='Y', Sepal.Length, 'N'))
COPY_IND is always "Y" in your case, then the code could be simplified to prac_data$New_Var = prac_data$Sepal.Length. Even if you want to use ifelse statement row-wisely, it is better to follow the instructions in the help document
Further note that if(test) yes else no is much more efficient and often much preferable to ifelse(test, yes, no) whenever test is a simple true/false result, i.e., when length(test) == 1.
I guess the desired COPY_IND should be one column of the data frame/vector rather than a single fixed value. In this case, you code generate the right answer, e.g. keep the first five number:
library(dplyr)
prac_data <- head(iris,10)
prac_data$COPY_IND=c(rep('Y',5),rep('N',5))
#COPY_IND=c(rep('Y',5),rep('N',5)) works too
prac_data <- prac_data %>% mutate(New_Var=ifelse(COPY_IND=='Y', Sepal.Length, 'N'))
generates the right column.