Why doesn't `[<-` work to reorder data frame columns?

Why doesn't `[<-` work to reorder data frame columns? - r

Why doesn't this work?
df <- data.frame(x=1:2, y = 3:4, z = 5:6)
df[] <- df[c("z", "y", "x")]
df
#> x y z
#> 1 5 3 1
#> 2 6 4 2
notice that the names are in the original order, but the data itself has changed order.
This works just fine
df <- data.frame(x=1:2, y = 3:4, z = 5:6)
df[c("z", "y", "x")]
#> z y x
#> 1 5 3 1
#> 2 6 4 2

When an extraction is completed the values in the index are replaced not the names. For example, replacing the first item below does not affect the name of the element:
x <- c(a=1, b=2)
x[1] <- 3
x
a b
3 2
In your data frame you replaced the values in the same way. The values changed but the names stayed constant. To reorder the data frame avoid the extraction framework.
df <- df[c("z", "y", "x")]

Just don't put the [] after the df and it will do as you want...
df <- data.frame(x=1:2, y = 3:4, z = 5:6)
df <- df[c("z", "y", "x")]
df
# z y x
#1 5 3 1
#2 6 4 2
And if you question is about why, Pierre Lafortune's comment is right.
as a side note, I also like to add the commat to separate dimension:
df <- df[,c("z", "y", "x")]
I find it more proper.

Related

Initialise a dataframe where a column references another column

I wonder if there is a way to do:
df <- data.frame(x = 1:3)
df$y = df$x + 5
yielding:
x y
1 1 6
2 2 7
3 3 8
in one line of code where the y column refers to the x column? For example:
data.frame(x = 1:3, y = self$x + 5) # doesn't work
(I won't accept answers that ignore the x column, for example, data.frame(x = 1:3, y = 6:8 :-))

This is possible using tibble from tibble library. Credit to #DaveArmstrong from the comments.
library(tibble)
tibble(x = 1:3, y = x + 5)
# A tibble: 3 × 2
x y
<int> <dbl>
1 1 6
2 2 7
3 3 8

Here's a base R method that do not need to use external package (e.g. tibble).
We can use outer to add 5 to each element in df$x, then cbind the result with df.
setNames(data.frame(cbind(1:3, outer(1:3, 5, `+`))), c("x", "y"))
# or to expand your code
setNames(cbind(data.frame(x = 1:3), outer(1:3, 5, `+`)), c("x", "y"))
x y
1 1 6
2 2 7
3 3 8

Move several chunks of columns dynamically to another position

My data is:
df <- data.frame(a = 1:2,
x = 1:2,
b = 1:2,
y = 3:4,
x_2 = 1:2,
y_2 = 3:4,
c = 1:2,
x_3 = 5:6,
y_3 = 1:2)
I now want to put together the x vars, and the y vars so that the order of columns would be:
a, x, x_2, x_3, b, y, y_2, y_3, c
I thought, I could use tidyverse's relocate function in combination with lapply or map or reduce (?), but it doesn't work out.
E.g. if I do:
move_names <- c("x", "y")
library(tidyverse)
moved_data <- lapply(as.list(move_names), function(x)
{
df <- df |>
relocate(!!!syms(paste0(x, "_", 2:3)),
.after = all_of(x))
}
)
It does the moving for x and y separately, but it creates separate list, but I want to have just my original df with relocated columns.
Update:
I should have been clear that my real data frame has ~500 columns where the to-be-moved columns are all over the place. So providing the full vector of desired column name order won't be feasible.
What I instead have: I have the names of my original columns, i.e. x and y, and I have the names of the to-be-moved columns, i.e. x_2, x_3, y_2, y_3.

In base R:
df[match(c('a', 'x', 'x_2', 'x_3', 'b', 'y', 'y_2', 'y_3', 'c'), names(df))]
#> a x x_2 x_3 b y y_2 y_3 c
#> 1 1 1 1 5 1 3 3 1 1
#> 2 2 2 2 6 2 4 4 2 2

Not sure if it's what you want.
Vector with order of column names
Let's say you have a vector relocate_name that contains the order of your columns:
library(tidyverse)
relocate_name <- c("a", "x", "x_2", "x_3", "b", "y", "y_2", "y_3", "c")
df %>% relocate(any_of(relocate_name))
Vector with prefix of column names
Or if you only have the prefix of the order, let's call it relocate_name2:
relocate_name2 <- c("a", "x", "b", "y", "c")
df %>% relocate(starts_with(relocate_name2))
Group x and y together
Or if you only want to "group" x and y together:
df %>%
relocate(starts_with("x"), .after = "x") %>%
relocate(starts_with("y"), .after = "y")
Output
All of the above output is the same.
a x x_2 x_3 b y y_2 y_3 c
1 1 1 1 5 1 3 3 1 1
2 2 2 2 6 2 4 4 2 2

library(rlist)
# split based in colname-part before _
L <- split.default(df, f = gsub("(.*)_.*", "\\1", names(df)))
# remove names with an underscore
# this is the new order, it should match the names of list L !!
neworder <- names(df)[!grepl("_", names(df))]
# [1] "a" "x" "b" "y" "c"
# cbind list elements together
ans <- rlist::list.cbind(L[neworder])
# a x.x x.x_2 x.x_3 b y.y y.y_2 y.y_3 c
# 1 1 1 1 5 1 3 3 1 1
# 2 2 2 2 6 2 4 4 2 2
# create tidy names again
names(ans) <- gsub(".*\\.(.*)", "\\1", names(ans))
# a x x_2 x_3 b y y_2 y_3 c
# 1 1 1 1 5 1 3 3 1 1
# 2 2 2 2 6 2 4 4 2 2

Ok, this is probably the worst workaround ever and I don't really understand what exactly I'm doing (especially with the <<-), but it is does the trick.
My general idea after realizing the problem a bit more with the help of you guys here was to "loop" through both of my x and y names, remove these new _2 and _3 columns from the vector of column names and re-append them after their "base" x and y columns.
search_names <- c("x", "y")
df_names <- names(df)
new_names <- lapply(search_names, function(x)
{
start <- which(df_names == x)
without_new_names <- setdiff(df_names, paste0(x, "_", 2:3))
df_names <<- append(without_new_names, values = paste0(x, "_", 2:3), after = start)
})[[length(search_names)]]
df |>
relocate(any_of(new_names))
a x x_2 x_3 b y y_2 y_3 c
1 1 1 1 5 1 3 3 1 1
2 2 2 2 6 2 4 4 2 2

Concatenate rows and columns

I have a data set like this
x y z
a 5 4
b 1 2
And i want concat columns and rows :
ay 5
az 4
by 1
bz 2
Thanks

You can use melt, and paste but you will need to make your rownames a variable, i..e
df$new <- rownames(df)
m_df <- reshape2::melt(df)
rownames(m_df) <- paste0(m_df$new, m_df$variable)
m_df <- m_df[-c(1:2)]
m_df
# value
#ax 5
#bx 1
#ay 4
#by 2
#az 3
#bz 1
After your edit, you don't need to convert rownames to a variable so just,
m1_df <- reshape2::melt(df)
m1_df$new <- paste0(m1_df$x, m1_df$variable)
m1_df
# x variable value new
#1 a y 5 ay
#2 b y 1 by
#3 a z 4 az
#4 b z 2 bz
You can then tidy your data frame to required output

with dplyr-tidyr
library(dplyr)
library(tidyr)
df %>%
gather(var, val, -x) %>%
mutate(var=paste0(x, var)) %>%
select(var, val)%>%
arrange(var)
# var val
#1 ay 5
#2 az 4
#3 by 1
#4 bz 2

library(reshape2)
library(dplyr)
library(tibble)
library(stringr)
# Create dataframe
x <- data.frame(x = c(5, 1),
y = c(4, 2),
z = c(3, 1),
row.names = c('a', 'b'))
# Convert rowname to column and melt
x <- tibble::rownames_to_column(x, "rownames") %>%
melt('rownames')
# assign concat columns as rownames
row.names(x) <- str_c(x$rownames, x$variable)
# Select relevant columns only
x <- select(x, value)
# Remove names from dataframe
names(x) <- NULL
> x
ax 5
bx 1
ay 4
by 2
az 3
bz 1

Here is another option in base R
stack(setNames(as.list(unlist(df1[-1])), outer(df1$x, names(df1)[-1], paste0)))[2:1]

Performing Difference in Pair of row of data

My Data Frame is:
df:
one two three
a 8 x
a 12 y
b 9 x
b 3 y
and result should be like:
one two
a 4
b 6
can you please help me..

Here is a base R method using aggregate:
aggregate(two~one, data=df, FUN=function(i) abs(diff(i)))
data
df <- read.table(header=T, text="one two three
a 8 x
a 12 y
b 9 x
b 3 y")

Here is another way to do it using dplyr
library(dplyr)
df <- data.frame(one = factor(c("a", "a", "b", "b")),
two = c(8,12,9,3),
three = factor(c("x", "y", "x", "y")))
answer <- df %>% group_by(one) %>% summarise(two = abs(diff(two)))
> answer
Source: local data frame [2 x 2]
one two
(fctr) (dbl)
1 a 4
2 b 6

You can try:
library(data.table)
setDT(df)[, .(two = abs(diff(two))), .(one)]
With plyr package:
library(plyr)
ddply(df, 'one', summarise, two = abs(diff(two)))
one two
1 a 4
2 b 6

Bind data frames on longer identifiers R

I've got two data frames in which the unique identifiers common to both frames differ in the number of observations. I would like to create a dataframe from both in which the observations from each frame are taken if they have more observations for a common identifier. For example:
f1 <- data.frame(x = c("a", "a", "b", "c", "c", "c"), y = c(1,1,2,3,3,3))
f2 <- data.frame(x = c("a","b", "b", "c", "c"), y = c(4,5,5,6,6))
I would like this to generate a merge based on the longer x such that it produces:
x y
a 1
a 1
b 5
b 5
c 3
c 3
c 3
Any and all thoughts would be great.

Here's a solution using split
dd<-rbind(cbind(f1, s="f1"), cbind(f2, s="f2"))
keep<-unsplit(lapply(split(dd$s, dd$x), FUN=function(x) {
y<-table(x)
x == names(y[which.max(y)])
}), dd$x)
dd <- dd[keep,]
Normally i'd prefer to use the ave function here but because i'm changing data.types from a factor to a logical, it wasn't as appropriate so I basically copied the idea that ave uses and used split.

dplyr solution
library(dplyr)
First we combine the data:
with rbind() and introduce a new variable called ref to know where each observation came from:
both <- rbind( f1, f2 )
both$ref <- rep( c( "f1", "f2" ) , c( nrow(f1), nrow(f2) ) )
then count the observations:
make another new variable that contains how many observations for each ref and x combination:
both_with_counts <- both %>%
group_by( ref ,x ) %>%
mutate( counts = n() )
then filter for the largest count:
both_with_counts %>% group_by( x ) %>% filter( n==max(n) )
note: you could also select only the x and y cols with select(x,y)...
this gives:
## Source: local data frame [7 x 4]
## Groups: x
##
## x y ref counts
## 1 a 1 f1 2
## 2 a 1 f1 2
## 3 c 3 f1 3
## 4 c 3 f1 3
## 5 c 3 f1 3
## 6 b 5 f2 2
## 7 b 5 f2 2
Altogether now...
what_I_want <-
rbind(cbind(f1,ref = "f1"),cbind(f2,ref = "f2")) %>%
group_by(ref,x) %>%
mutate(counts = n()) %>%
group_by( x ) %>%
filter( counts==max(counts) ) %>%
select( x, y )
and thus:
> what_I_want
# Source: local data frame [7 x 2]
# Groups: x
#
# x y
# 1 a 1
# 2 a 1
# 3 c 3
# 4 c 3
# 5 c 3
# 6 b 5
# 7 b 5

Not a elegant answer but still give the desired result. Hope this help.
f1table <- data.frame(table(f1$x))
colnames(f1table) <- c("x","freq")
f1new <- merge(f1,f1table)
f2table <- data.frame(table(f2$x))
colnames(f2table) <- c("x","freq")
f2new <- merge(f2,f2table)
table <- rbind(f1table, f2table)
table <- table[with(table, order(x,-freq)), ]
table <- table[!duplicated(table$x), ]
data <-rbind(f1new, f2new)
merge(data, table, by=c("x","freq"))[,c(1,3)]
x y
1 a 1
2 a 1
3 b 5
4 b 5
5 c 3
6 c 3
7 c 3

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Why doesn't `[<-` work to reorder data frame columns? - r

Related

Initialise a dataframe where a column references another column

Move several chunks of columns dynamically to another position

Concatenate rows and columns

Performing Difference in Pair of row of data

Bind data frames on longer identifiers R

Categories

Resources