Renaming columns according to vector inside pipe - r

I have a data.frame df with columns A and B:
df <- data.frame(A = 1:5, B = 11:15)
There's another data.frame, df2, which I'm building by various calculations that ends up having generic column names X1 and X2, which I cannot control directly (because it passes through being a matrix at one point). So it ends up being something like:
mtrx <- matrix(1:10, ncol = 2)
mtrx %>% data.frame()
I would like to rename the columns in df2 to be the same as df. I could, of course, do it after I finish building df2 with a simple assigning:
names(df2)<-names(df)
My question is - is there a way to do this directly within the pipe? I can't seem to use dplyr::rename, because these have to be in the form of newname=oldname, and I can't seem to vectorize it. Same goes to the data.frame call itself - I can't just give it a vector of column names, as far as I can tell. Is there another option I'm missing? What I'm hoping for is something like
mtrx %>% data.frame() %>% rename(names(df))
but this doesn't work - gives error Error: All arguments must be named.
Cheers!

You can use setNames
mtrx %>%
data.frame() %>%
setNames(., nm = names(df))
# A B
#1 1 6
#2 2 7
#3 3 8
#4 4 9
#5 5 10
Or use purrr's equivalent set_names
mtrx %>%
data.frame() %>%
purrr::set_names(., nm = names(df))
A third option is "names<-"
mtrx %>%
data.frame() %>%
"names<-"(names(df))

We can use rename_all from tidyverse
library(tidyverse)
mtrx %>%
as.data.frame %>%
rename_all(~ names(df))
# A B
# 1 1 6
# 2 2 7
# 3 3 8
# 4 4 9
# 5 5 10

Related

R: rowwise nth element ordered_by row values

I have this input:
t <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
And want to have the rowwise nth-lowest element of the dataframe ordered by the rowwise values, so that the output is something like this (example for nth_element = 2):
[1] 2 3 5 4
I tried a function like this:
apply(t, 1, nth, n=1, order_by = .)
But this does not work. Two questions:
What should I type in the order_by gument to make this function work?
Which is the best way to summarise rows with an own summary function if I don't want to mention the column names in the rowwise summary function?
Sidenote:
I don't want to mention the column names specifically, I want the function to use all rows in the dataset.
I tried the rownth function from the Rfast package but it only provides one result. Does anybody know what I do wrong?
We can use apply and sort to do this.
d <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
nth_lowest <- 2
apply(d, 1, FUN = function(x) sort(x)[nth_lowest])
# [1] 2 3 5 4
Note that I am calling the data d instead of t. t is already a reserved name in R (matrix transpose function).
Not as elegant as #bouncyball's answer, but using dplyr (and tidyr), one possibility is to do:
library(dplyr)
library(tidyr)
t %>% mutate(Row = row_number()) %>%
pivot_longer(-Row, names_to = "Col", values_to = "Val") %>%
group_by(Row) %>%
arrange(Val) %>%
slice(2) %>%
select(Val)
Adding missing grouping variables: `Row`
# A tibble: 4 x 2
# Groups: Row [4]
Row Val
<int> <dbl>
1 1 2
2 2 3
3 3 5
4 4 4
Using Rfast you could reduce run time for big matrices and for matrices only.
d <- data.frame(x=c(1,2,8,4), y=c(2,3,4,5), k=c(3,4,5,1))
d<- Rfast::data.frame.to_matrix(d)
nth_lowests <- rep(2,ncol(d))
Rfast::rownth(d,nth_lowests)
# [1] 2 3 5 4
You could also use the parallel version of Rfast::rownth

How to transpose the first rows into new columns in R?

I want to transpose the first two rows into two new columns, and remain the rest of data frame. How do I do it in R?
My original data
A <- c("2012","PL",3,2)
B <- c("2012","PL",6,1)
C <- c("2012","PL",7,4)
DF <- data.frame(A,B,C)
My final data after transpose
V1 <- c("2012","2012")
V2 <- c("PL","PL")
A <- c(3,2)
B <- c(6,1)
C <- c(7,4)
DF <- data.frame(V1,V2,A,B,C)
Where V1 and V2 are the names for new columns and they are created automatically.
Thank you for any assistance.
Base R:
cbind(t(DF[1:2, 1, drop=FALSE]), DF[-(1:2),])
# Warning in data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# 1 2 A B C
# 1 2012 PL 3 6 7
# 2 2012 PL 2 1 4
though I have some concerns about the apparent key property of "2012" and "PL". That is, you start with three instances of each and end with two. Logically it makes sense, though really to me it looks as if you have a matrix of numbers associated with a single "2012","PL", but perhaps that's not how the data is coming to you. (If you can change the format of the data before getting to this point such that you have a matrix and its associated keys, then it might make data munging more direct, declarative, and resistant to bugs.)
Here is an option with slice
library(dplyr)
DF %>%
select(A) %>%
slice(1:2) %>%
t %>%
as.data.frame %>%
bind_cols(DF %>%
slice(-(1:2)))

Using set_names vs. mutate(colnames) when changing data frame column names to lower case

A quick question that I was looking to understand better.
Data:
df1 <- data.frame(COLUMN_1 = letters[1:3], COLUMN_2 = 1:3)
> df1
COLUMN_1 COLUMN_2
1 a 1
2 b 2
3 c 3
Why does this work in setting data frame names to lower case:
df2 <- df1 %>%
set_names(., tolower(names(.)))
> df2
column_1 column_2
1 a 1
2 b 2
3 c 3
But this does not?
df2 <- df1 %>%
mutate( colnames(.) <- tolower(colnames(.)) )
Error: Column `colnames(.) <- tolower(colnames(.))` must be length 3 (the number of rows) or one, not 2
The solution, writing the arguments out explicitly, is:
df1 %>% rename_all(tolower) ==
rename_all(.tbl = df1, .funs = tolower)
mutate operates on the data itself, not the column names, so that's why we're using rename. We use rename_all because you don't want to type out 1 = tolower(1), 2 = tolower(2), ...
What you suggested, df2 <- df1 %>% rename_all(tolower(.)) doesn't work because then you would be trying to feed the whole df1 into the tolower function, which is not what you want.
Another solution would be this names(df) <- tolower(names(df))

dplyr mutate using character vector of column names

data is a data.frame containing: date, a, b, c, d columns. Last 4 is numeric
Y.columns <- c("a")
X.columns <- c("b","c","d")
what i need:
data.mutated <- data %>%
mutate(Y = a, X = b+c+d) %>%
select(date,Y,X)
but i would like to pass mutate arguments from character vector,
i tried the following:
Y.string <- paste(Y.columns, collapse='+')
X.string <- paste(X.columns, collapse='+')
data.mutated <- data %>%
mutate(Y = UQ(Y.string), X = UQ(X.string)) %>%
select(date,Y,X)
But it didn't work. any help is appreciated.
To use tidyeval with UQ, you need to first parse your expressions to a quosure with parse_quosure from rlang (Using mtcars as example, since OP's question is not reproducible):
Y.columns <- c("cyl")
X.columns <- c("disp","hp","drat")
Y.string <- paste(Y.columns, collapse='+')
X.string <- paste(X.columns, collapse='+')
library(dplyr)
library(rlang)
mtcars %>%
mutate(Y = UQ(parse_quosure(Y.string)),
X = UQ(parse_quosure(X.string))) %>%
select(Y,X)
or with !!:
mtcars %>%
mutate(Y = !!parse_quosure(Y.string),
X = !!parse_quosure(X.string)) %>%
select(Y,X)
Result:
Y X
1 6 273.90
2 6 273.90
3 4 204.85
4 6 371.08
5 8 538.15
6 6 332.76
7 8 608.21
8 4 212.39
9 4 239.72
10 6 294.52
...
Note:
mutate_ has now deprecated, so I think tidyeval with quosure's and UQ is the new way to go.

Pull specific rows

Let's say that I have a data frame that looks like this...
City <- c("x","x","y","y","z","z","a","a")
Number <-c(1,2,3,4,5,6,7,8)
mat <- cbind.data.frame(City ,Number)
"City" "Number"
x 1
x 2
y 3
y 4
z 5
z 6
a 7
a 8
Now I want to be able to pull the data for...
list <- c("x","y", "a")
And the desired out come would look something like this...
x y a
1 3 7
2 4 8
I tried using which(list%in%City) to help find the indices to pull that data from the index but that does not produce the rows that I want.
UPDATE
Make sure that when you are using Chris' answer that your data type for "City" is "chr" otherwise you will pop up with an error message as I got originally before using the "as.character" function.
I renamed your variable list to test, because list is a function name. You can do this, using data.table:
matdt <- as.data.table(mat)
setkey(matdt, City)
sapply(test, function(x) matdt[x, Number])
x y a
[1,] 1 3 7
[2,] 2 4 8
You need to pass the City names to the extraction function one by one. In this case sapply will deliver a matrix as you expect but if there were a varying number of results per city, the retruned object would be a list rather than a matrix:
sapply( list, function(city) mat[ mat$City %in% city, "Number"] )
x y a
[1,] 1 3 7
[2,] 2 4 8
Using dplyr and tidyr:
mat %>%
filter(City %in% c("x", "y", "a")) %>%
group_by(City) %>%
mutate(Index = 1:n()) %>%
spread(City, Number) %>%
select(-Index)

Resources