I need to get from a number of rows where some columns are equivalent and extract exactly those columns.
I have the following dataframe:
a <- c(1,2,3)
b <- c(1,2,3)
c <- c(4,5,6)
A <- data.frame(a,b,c)
> A
a b c d
1 1 2 4 1
2 2 2 5 2
3 3 3 6 3
I would like the following result:
> columnInnerJoin(A)
a d
1 1 1
2 2 2
3 3 3
Or, more specifically:
> columnInnerJoinGiveColumns(A)
a d
We can try with duplicated
res <- A[duplicated(as.list(A))|duplicated(as.list(A), fromLast=TRUE)]
names(res)
#[1] "a" "d"
Related
I have a tricky merge that I usually do in Excel via various formulas and I want to automate with R.
I have 2 dataframes, one called inputs looks like this:
id v1 v2 v3
1 A A C
2 B D F
3 T T A
4 A F C
5 F F F
And another called df
id v
1 1
1 2
1 3
2 2
3 1
I would like to combined them based on the id and v values such that I get
id v key
1 1 A
1 2 A
1 3 C
2 2 D
3 1 T
So I'm matching on id and then on the column from v1 thru v2, in the first example you will see that I match id = 1 and v1 since the value of v equals 1. In Excel I do this combining creatively VLOOKUP and HLOOKUP but I want to make this simpler in R. Dataframe examples are simplified versions as the I have more records and values go from v1 thru up to 50.
Thanks!
You could use pivot_longer:
library(tidyr)
library(dplyr)
key %>% pivot_longer(!id,names_prefix='v',names_to = 'v') %>%
mutate(v=as.numeric(v)) %>%
inner_join(df)
Joining, by = c("id", "v")
# A tibble: 5 × 3
id v value
<int> <dbl> <chr>
1 1 1 A
2 1 2 A
3 1 3 C
4 2 2 D
5 3 1 T
Data:
key <- read.table(text="
id v1 v2 v3
1 A A C
2 B D F
3 T T A
4 A F C
5 F F F",header=T)
df <- read.table(text="
id v
1 1
1 2
1 3
2 2
3 1 ",header=T)
You can use two column matrices as index arguments to "[" so this is a one liner. (Not the names of the data objects are d1 and d2. I'd opposed to using df as a data object name.)
d1[-1][ data.matrix(d2)] # returns [1] "A" "A" "C" "D" "T"
So full solution is:
cbind( d2, key= d1[-1][ data.matrix(d2)] )
id v key
1 1 1 A
2 1 2 A
3 1 3 C
4 2 2 D
5 3 1 T
Try this:
x <- "
id v1 v2 v3
1 A A C
2 B D F
3 T T A
4 A F C
5 F F F
"
y <- "
id v
1 1
1 2
1 3
2 2
3 1
"
df <- read.table(textConnection(x) , header = TRUE)
df2 <- read.table(textConnection(y) , header = TRUE)
key <- c()
for (i in 1:nrow(df2)) {
key <- append(df[df2$id[i],(df2$v[i] + 1L)] , key)
}
df2$key <- rev(key)
df2
># id v key
># 1 1 1 A
># 2 1 2 A
># 3 1 3 C
># 4 2 2 D
># 5 3 1 T
Created on 2022-06-06 by the reprex package (v2.0.1)
mydata <- data.frame(a = 2, b = 3, c = 3)
myvec <- c(2, 9, 1)
I would like to column bind mydata with myvec. I want the final output to look something like this:
> mydata
a b c myvec1 myvec2 myvec3
1 2 3 3 2 9 1
However, if I simply use cbind, I don't get the desired result:
> cbind(mydata, myvec)
a b c myvec
1 2 3 3 2
2 2 3 3 9
3 2 3 3 1
One way is to iterate over the entries in myvec with a for loop. Is there a simpler way?
We can convert to list
cbind(mydata, setNames(as.list(myvec), paste0('myvec', seq_along(myvec))))
# a b c myvec1 myvec2 myvec3
#1 2 3 3 2 9 1
Or another option is
mydata[paste0('myvec', seq_along(myvec))] <- myvec
You could transpose the vector :
cbind(mydata, t(myvec))
# a b c 1 2 3
#1 2 3 3 2 9 1
You can name the columns with setNames or names<-
I'm trying to produce all possible row permutations of a data frame (or matrix if that's easier) and have an object returned as a list or array of the data frames/matrices. I've constructed a mock dataframe that as the same dimensions as the one I'm working with.
test.df <- as.data.frame(matrix(1:80,nrow=16,ncol=5)
Edit: changed combinations to permutations
v.df <- data.frame(symbol = c("a", "b", "c"), number = c(1,2,3))
v.df
## symbol number
## 1 a 1
## 2 b 2
## 3 c 3
permutate.rows <- function(df) {
k <- dim(df)[1] # number of rows
index.df <- as.data.frame(t(permutations(n = k, r = k, v = 1:k)))
res <- lapply(index.df, function(idx) df[idx, , drop = FALSE])
}
permutate.rows(v.df)
gives the list of all permutated dfs:
$V1
symbol number
1 a 1
2 b 2
3 c 3
$V2
symbol number
1 a 1
3 c 3
2 b 2
$V3
symbol number
2 b 2
1 a 1
3 c 3
$V4
symbol number
2 b 2
3 c 3
1 a 1
$V5
symbol number
3 c 3
1 a 1
2 b 2
$V6
symbol number
3 c 3
2 b 2
1 a 1
Use 16 instead of 3 and your data frame to apply it on your example.
I shortened the df because 16!=20922789888000
library(purrr)
library(combinat)
test.df <- as.data.frame(matrix(1:25,nrow=5,ncol=5))
map(permn(1:nrow(test.df)), function(x) test.df[x,])
This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 5 years ago.
I have a data frame as follows:
df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3))
df
> df
x y
1 a,b,c 1
2 d,e 2
3 f 3
I can get the flattened df$x like this:
unique(unlist(strsplit(as.character(df$x), ",")))
[1] "a" "b" "c" "d" "e" "f"
What would be the best way to transform my input df into:
x y
a 1
b 1
c 1
d 2
e 2
f 3
Basically flatten df$x and individually assign its corresponding y
If you are working on data.frame, I recommend using tidyr
df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3),stringsAsFactors = F)
library(tidyr)
df %>%
transform(x= strsplit(x, ",")) %>%
unnest(x)
y x
1 1 a
2 1 b
3 1 c
4 2 d
5 2 e
6 3 f
sapply(unlist(strsplit(as.character(df$x), ",")), function(ss)
df$y[which(grepl(pattern = ss, x = df$x))])
#a b c d e f
#1 1 1 2 2 3
If you want a dataframe
do.call(rbind, lapply(1:NROW(df), function(i)
setNames(data.frame(unlist(strsplit(as.character(df$x[i]), ",")), df$y[i]),
names(df))))
# x y
#1 a 1
#2 b 1
#3 c 1
#4 d 2
#5 e 2
#6 f 3
FWIW, you could also repeat the row indices according to how many elements each x value has:
df <- data.frame(x=c('a,b,c','d,e','f'),y=c(1,2,3),stringsAsFactors = F)
df[,1] <- strsplit(df[,1],",")
cbind(x=unlist(df[,1]),df[rep(1:nrow(df), lengths(df[,1])),-1,F])
# x y
# 1 a 1
# 1.1 b 1
# 1.2 c 1
# 2 d 2
# 2.1 e 2
# 3 f 3
For example, now I get the table
A B C
A 0 4 1
B 2 1 3
C 5 9 6
I like to order the columns and rows by my own defined order, to achieve
B A C
B 1 2 3
A 4 0 1
C 9 5 6
This can be accomplished in base R. First we make the example data:
# make example data
df.text <- 'A B C
0 4 1
2 1 3
5 9 6'
df <- read.table(text = df.text, header = T)
rownames(df) <- LETTERS[1:3]
A B C
A 0 4 1
B 2 1 3
C 5 9 6
Then we simply re-order the columns and rows using a vector of named indices:
# re-order data
defined.order <- c('B', 'A', 'C')
df <- df[, defined.order]
df <- df[defined.order, ]
B A C
B 1 2 3
A 4 0 1
C 9 5 6
If the defined order is given as
defined_order <- c("B", "A", "C")
and the initial table is created by
library(data.table)
# create data first
dt <- fread("
id A B C
A 0 4 1
B 2 1 3
C 5 9 6")
# note that row names are added as own id column
then you could achieve the desired result using data.table as follows:
# change column order
setcolorder(dt, c("id", defined_order))
# change row order
dt[order(defined_order)]
# id B A C
# 1: B 1 2 3
# 2: A 4 0 1
# 3: C 9 5 6