Rbind same data.frame with column switching - r

I am not new to R, but I cannot solve this problem: I have a data.frame and want to rbind the same data.frame with coloumn switching. But R does not switch the columns.
Example:
set.seed(13)
df <- data.frame(var1 = sample(5), var2 = sample(5))
> df
var1 var2
1 4 1
2 1 3
3 2 4
4 5 2
5 3 5
> rbind(df, df[,c(2,1)])
var1 var2
1 4 1
2 1 3
3 2 4
4 5 2
5 3 5
6 4 1
7 1 3
8 2 4
9 5 2
10 3 5
As you can see, the coloumns are not switched (row 6-10) whereas switching the columns alone works like a charm:
> df[,c(2,1)]
var2 var1
1 1 4
2 3 1
3 4 2
4 2 5
5 5 3
I guess this has something to do with the column names, but I cannot figure out what exacly.
Can anyone help?
Kind regards!

As pointed out by #Henrik, from ?rbind.data.frame: "The rbind data frame method [...] matches columns by name. So try this:
> rbind(df, setNames(df[,c(2,1)], c("var1", "var2")))
var1 var2
1 4 1
2 1 3
3 2 4
4 5 2
5 3 5
6 1 4
7 3 1
8 4 2
9 2 5
10 5 3
this also works:
> rbind(as.matrix(df), as.matrix(df[,c(2,1)]))

Related

Subset rows excluse special values

I want to subset rows which do not contain special values. For example:
df <- data.frame(a=c(1,2,2,3,4,4),b=c(-9999,2,3,4,5,6),c=c(2,3,4,-9999,2,4))
a b c
1 1 -9999 2
2 2 2 3
3 2 3 4
4 3 4 -9999
5 4 5 2
6 4 6 4
df has many rows and columns , I want to subset the rows which don't contain -9999. Expect result as follow codes:
df[which(df$a!=-9999,df$b!=-9999,df$c!=-9999),]
a b c
2 2 2 3
3 2 3 4
5 4 5 2
6 4 6 4
when columns are to many to type above logical judge, how to subset it?
You can try this one:
temp <- which(df == "-9999",arr.ind = T)
df[-unique(temp[,1]),]
a b c
2 2 2 3
3 2 3 4
5 4 5 2
6 4 6 4

melt the lower half from systematic matrix in R

Given that I have a three by three systematic matrix.
> x<-matrix(1:9,3)
> x[lower.tri(x)] = t(x)[lower.tri(x)]
> x
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 4 5 8
[3,] 7 8 9
Then I apply library reshape2 to make it in long-format.
> library(reshape2)
> x <- melt(x)
> x
Var1 Var2 value
1 1 1 1
2 2 1 4
3 3 1 7
4 1 2 4
5 2 2 5
6 3 2 8
7 1 3 7
8 2 3 8
9 3 3 9
As the upper diagonal and bottom diagonal are identical, I only need half of result, which will look like below.
Var1 Var2 value
1 1 1
2 1 4
3 1 7
2 2 5
3 2 8
3 3 9
Any elegant approach to do this?
You can change the values for the bottom or upper half to NA, and then melt ignoring missing values, assume there are not missing values in the matrix originally or you don't need to keep them in the result if there are:
x[upper.tri(x)] = NA
reshape2::melt(x, na.rm=T)
# Var1 Var2 value
#1 1 1 1
#2 2 1 4
#3 3 1 7
#5 2 2 5
#6 3 2 8
#9 3 3 9
As the 'x' was already assigned and melted, we can get a logical index of the non-duplicate rows after sorting the subset of dataset with 1st and 2nd column by row and then use it to subset the rows
x[!duplicated(t(apply(x[1:2], 1, sort))),]
# Var1 Var2 value
#1 1 1 1
#2 2 1 4
#3 3 1 7
#5 2 2 5
#6 3 2 8
#9 3 3 9

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

How to only keep the columns with same names between two data frames?

I have two data frames like the following:
a<-c(1,3,4,5,6,8)
b<-c(2,3,4,2,6,7)
c<-c(2,5,6,3,5,6)
df1<-data.frame(a,b,c)
d<-c(3,4,5,6,7,8)
e<-c(1,2,3,2,1,1)
c<-c(1,3,4,5,6,2)
df2<-data.frame(d,e,c)
> df1
a b c
1 1 2 2
2 3 3 5
3 4 4 6
4 5 2 3
5 6 6 5
6 8 7 6
> df2
d e c
1 3 1 1
2 4 2 3
3 5 3 4
4 6 2 5
5 7 1 6
6 8 1 2
I want combine the two data frames,and only keep the columns with the same names. The final data frame should like this:
> df3
c1 c2
1 2 1
2 5 3
3 6 4
4 3 5
5 5 6
6 6 2
My real data frames have hundreds columns,so I need codes do this job. Can anyone help me?
Find out which names belong to both dataframes and then bind them:
eqnames <- names(df1)[names(df1) %in% names(df2)]
df3 <- cbind(df1[eqnames], df2[eqnames])
You can then rename the columns:
names(df3) <- paste0(names(df3), 1:ncol(df3))
Resulting in:
> df3
c1 c2
1 2 1
2 5 3
3 6 4
4 3 5
5 5 6
6 6 2

remove i+1th term if reoccuring

Say we have the following data
A <- c(1,2,2,2,3,4,8,6,6,1,2,3,4)
B <- c(1,2,3,4,5,1,2,3,4,5,1,2,3)
data <- data.frame(A,B)
How would one write a function so that for A, if we have the same value in the i+1th position, then the reoccuring row is removed.
Therefore the output should like like
data.frame(c(1,2,3,4,8,6,1,2,3,4), c(1,2,5,1,2,3,5,1,2,3))
My best guess would be using a for statement, however I have no experience in these
You can try
data[c(TRUE, data[-1,1]!= data[-nrow(data), 1]),]
Another option, dplyr-esque:
library(dplyr)
dat1 <- data.frame(A=c(1,2,2,2,3,4,8,6,6,1,2,3,4),
B=c(1,2,3,4,5,1,2,3,4,5,1,2,3))
dat1 %>% filter(A != lag(A, default=FALSE))
## A B
## 1 1 1
## 2 2 2
## 3 3 5
## 4 4 1
## 5 8 2
## 6 6 3
## 7 1 5
## 8 2 1
## 9 3 2
## 10 4 3
using diff, which calculates the pairwise differences with a lag of 1:
data[c( TRUE, diff(data[,1]) != 0), ]
output:
A B
1 1 1
2 2 2
5 3 5
6 4 1
7 8 2
8 6 3
10 1 5
11 2 1
12 3 2
13 4 3
Using rle
A <- c(1,2,2,2,3,4,8,6,6,1,2,3,4)
B <- c(1,2,3,4,5,1,2,3,4,5,1,2,3)
data <- data.frame(A,B)
X <- rle(data$A)
Y <- cumsum(c(1, X$lengths[-length(X$lengths)]))
View(data[Y, ])
row.names A B
1 1 1 1
2 2 2 2
3 5 3 5
4 6 4 1
5 7 8 2
6 8 6 3
7 10 1 5
8 11 2 1
9 12 3 2
10 13 4 3

Resources