Join 3 columns of different lengths in R

Join 3 columns of different lengths in R - r

I have 3 columns
2 are the same length
1 is of a lesser length
here are the columns:
column1 <- letters[1:10]
column2 <- letters[1:15]
column3 <- letters[1:15]
I want all 3 columns to be joined together but have the missing 5 values in column1 to be NA?
What can i do to achieve this? a tibble?

You can change length of a vector
column1 <- letters[1:10]
column2 <- letters[1:15]
length(column1) <- length(column2)
Now
> column1
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" NA NA NA NA NA
We can wrap it in function
cbind_dif <- function(x = list()){
# Find max length
max_length <- max(unlist(lapply(x, length)))
# Set length of each vector as
res <- lapply(x, function(x){
length(x) <- max_length
return(x)
})
return(as.data.frame(res))
}
# Example usage:
> cbind_dif(list(column1 = column1, column2 = column2))
column1 column2
1 a a
2 b b
3 c c
4 d d
5 e e
6 f f
7 g g
8 h h
9 i i
10 j j
11 <NA> k
12 <NA> l
13 <NA> m
14 <NA> n
15 <NA> o

n <- max(length(column1), length(column2), length(column3))
data.frame(column1[1:n],column2[1:n],column3[1:n])
column1.1.n. column2.1.n. column3.1.n.
1 a a a
2 b b b
3 c c c
4 d d d
5 e e e
6 f f f
7 g g g
8 h h h
9 i i i
10 j j j
11 <NA> k k
12 <NA> l l
13 <NA> m m
14 <NA> n n
15 <NA> o o

Using cbind.fill from rowr package you can do it easily.
library(rowr)
new<- cbind.fill(column1,column2,column3)
I hope this helps

column1 <- letters[1:10]
column2 <- letters[1:15]
column3 <- letters[1:15]
tibble(a = c(column1, rep(NA, length(column2) - length(column1))), b = column2, c = column3)
# A tibble: 15 × 3
a b c
<chr> <chr> <chr>
1 a a a
2 b b b
3 c c c
4 d d d
5 e e e
6 f f f
7 g g g
8 h h h
9 i i i
10 j j j
11 NA k k
12 NA l l
13 NA m m
14 NA n n
15 NA o o

Related

Getting the length of a list

I am attempting to decipher a list res which has structure as per below:
How would I go about converting this to a 21 (row) by 2 (column) dataframe?
I can do it by manually hard-coding the 21:
data.frame(matrix(unlist(res), nrow=21 ))
However I would like to use length(res) which unfortunately returns 1

As it is a list use [[ to index it to get the matrix and then convert to dataframe.
data.frame(res[[1]])
Or use unlist with recursive = FALSE
data.frame(unlist(res[[1]], recursive = FALSE))
Using a reproducble example,
res <- list(matrix(letters,ncol = 2))
data.frame(res[[1]])
# X1 X2
#1 a n
#2 b o
#3 c p
#4 d q
#5 e r
#6 f s
#7 g t
#8 h u
#9 i v
#10 j w
#11 k x
#12 l y
#13 m z

You can also magrittr::extract2
res %>% magrittr::extract2(1)
## A tibble: 21 x 2
# V1 V2
# <chr> <chr>
# 1 O M
# 2 W S
# 3 C Q
# 4 L C
# 5 M K
# 6 R M
# 7 U Q
# 8 I T
# 9 K J
#10 H V
## … with 11 more rows
or use purrr::flatten_dfc
purrr::flatten_dfc(res)
## A tibble: 21 x 2
# V1 V2
# <chr> <chr>
# 1 O M
# 2 W S
# 3 C Q
# 4 L C
# 5 M K
# 6 R M
# 7 U Q
# 8 I T
# 9 K J
#10 H V
## … with 11 more rows
Sample data
set.seed(2018)
res <- list(
as_tibble(matrix(sample(LETTERS, 21 * 2, replace = T), nrow = 21, ncol = 2))
)

R: more efficient solution than this for-loop

I wrote a functioning for loop, but it's slow over thousands of rows and I'm looking for more efficient alternative. Thanks in advance!
The task:
If column a matches column b, column d becomes NA.
If column a does not match b, but b matches c, then column e becomes
NA.
The for loop:
for (i in 1:nrow(data)) {
if (data$a[i] == data$b[i]) {data$d[i] <- NA}
if (!(data$a[i] == data$b[i]) & data$b[i] == data$c[i])
{data$e[i] <- NA}
}
An example:
a b c d e
F G G 1 10
F G F 5 10
F F F 2 8
Would become:
a b c d e
F G G 1 NA
F G F 5 10
F F F NA 8

If you're concerned about speed and efficiency, I'd recommend data.table (though technically vectorizing a normal data.frame as recommended by #parfait would probably speed things up more than enough)
library(data.table)
DT <- fread("a b c d e
F G G 1 10
F G F 5 10
F F F 2 8")
print(DT)
# a b c d e
# 1: F G G 1 10
# 2: F G F 5 10
# 3: F F F 2 8
DT[a == b, d := NA]
DT[!a == b & b == c, e := NA]
print(DT)
# a b c d e
# 1: F G G 1 NA
# 2: F G F 5 10
# 3: F F F NA 8

Suppose df is your data then:
ab <- with(df, a==b)
bc <- with(df, b==c)
df$d[ab] <- NA
df$e[!ab & bc] <- NA
which would result in
# a b c d e
# 1 F G G 1 NA
# 2 F G F 5 10
# 3 F F F NA 8

We could create a list of quosure and evaluate it
library(tidyverse)
qs <- setNames(quos(d*NA^(a == b), e*NA^((!(a ==b) & (b == c)))), c("d", "e"))
df1 %>%
mutate(!!! qs)
# a b c d e
#1 F G G 1 NA
#2 F G F 5 10
#3 F F F NA 8

R - Adding a total row in Excel output

I want to add a total row (as in the Excel tables) while writing my data.frame in a worksheet.
Here is my present code (using openxlsx):
writeDataTable(wb=WB, sheet="Data", x=X, withFilter=F, bandedRows=F, firstColumn=T)
X contains a data.frame with 8 character variables and 1 numeric variable. Therefore the total row should only contain total for the numeric row (it will be best if somehow I could add the Excel total row feature, like I did with firstColumn while writing the table to the workbook object rather than to manually add a total row).
I searched for a solution both in StackOverflow and the official openxslx documentation but to no avail. Please suggest solutions using openxlsx.
EDIT:
Adding data sample:
A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f
After Total row:
A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f
na na na na na na na 22 na

library(janitor)
adorn_totals(df, "row")
#> A B C D E F G H I
#> a b s r t i s 5 j
#> f d t y d r s 9 s
#> w s y s u c k 8 f
#> Total - - - - - - 22 -
If you prefer empty space instead of - in the character columns you can specify fill = "" or fill = NA.

Assuming your data is stored in a data.frame called df:
df <- read.table(text =
"A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f",
header = TRUE,
stringsAsFactors = FALSE)
You can create a row using lapply
totals <- lapply(df, function(col) {
ifelse(!any(!is.numeric(col)), sum(col), NA)
})
and add it to df using rbind()
df <- rbind(df, totals)
head(df)
A B C D E F G H I
1 a b s r t i s 5 j
2 f d t y d r s 9 s
3 w s y s u c k 8 f
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 22 <NA>

Getting the maximum common words in R

I have data of the form:
ID A1 A2 A3 ... A100
1 john max karl ... kevin
2 kevin bosy lary ... rosy
3 karl lary bosy ... hale
.
.
.
10000 isha john lewis ... dave
I want to get one ID for each ID such that both of them have maximum number of common attributes(A1,A2,..A100)
How can I do this in R ?
Edit: Let's call the output a MatchId:
ID MatchId
1 70
2 4000
.
.
10000 3000

I think this gets what you're looking for:
library(dplyr)
# make up some data
set.seed(1492)
rbind_all(lapply(1:15, function(i) {
x <- cbind.data.frame(stringsAsFactors=FALSE, i, t(sample(LETTERS, 10)))
colnames(x) <- c("ID", sprintf("A%d", 1:10))
x
})) -> dat
print(dat)
## Source: local data frame [15 x 11]
##
## ID A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
## 1 1 H F E C B A R J Z N
## 2 2 Q P E M L Z C G V Y
## 3 3 Q J D N B T L K G Z
## 4 4 D Y U F V O I C A W
## 5 5 T Z D I J F R C B S
## 6 6 Q D H U P V O E R N
## 7 7 C L I M E K N S X Z
## 8 8 M J S E N O F Y X I
## 9 9 R H V N M T Q X L S
## 10 10 Q H L Y B W S M P X
## 11 11 M N J K B G S X V R
## 12 12 W X A H Y D N T Q I
## 13 13 K H V J D X Q W A U
## 14 14 M U F H S T W Z O N
## 15 15 G B U Y E L A Q W O
# get commons
rbind_all(lapply(1:15, function(i) {
rbind_all(lapply(setdiff(1:15, i), function(j) {
data.frame(id1=i,
id2=j,
common=length(intersect(c(t(dat[i, 2:11])),
c(t(dat[j, 2:11])))))
}))
})) -> commons
commons %>%
group_by(id1) %>%
top_n(1, common) %>%
filter(row_number()==1) %>%
select(ID=id1, MatchId=id2)
## Source: local data frame [15 x 2]
## Groups: ID
##
## ID MatchId
## 1 1 5
## 2 2 7
## 3 3 5
## 4 4 12
## 5 5 1
## 6 6 9
## 7 7 8
## 8 8 7
## 9 9 10
## 10 10 9
## 11 11 9
## 12 12 13
## 13 13 12
## 14 14 8
## 15 15 2

Using similar data as provided by #hrbrmstr
set.seed(1492)
dat <- do.call(rbind, lapply(1:15, function(i) {
x <- cbind.data.frame(stringsAsFactors=FALSE, i, t(sample(LETTERS, 10)))
colnames(x) <- c("ID", sprintf("A%d", 1:10))
x
}))
You could achieve the same using base R only
Res <- sapply(seq_len(nrow(dat)),
function(x) apply(dat[-1], 1,
function(y) length(intersect(dat[x, -1], y))))
diag(Res) <- -1
cbind(dat[1], MatchId = max.col(Res, ties.method = "first"))
# ID MatchId
# 1 1 5
# 2 2 7
# 3 3 5
# 4 4 12
# 5 5 1
# 6 6 9
# 7 7 8
# 8 8 7
# 9 9 10
# 10 10 9
# 11 11 9
# 12 12 13
# 13 13 12
# 14 14 8
# 15 15 2

If I understand correctly, the requirement is to obtain the maximum number of common attributes for each ID.
Frequency tables can be obtained using table() and recursively in lapply(), assuming that ID column is unique - slight modification is necessary if not (unique(df$ID) rather than df$ID in lapply()). The maximum frequencies can be taken and, if there is a tie, only the first one is chosen. Finally they are combined by do.call().
df <- read.table(header = T, text = "
ID A1 A2 A3 A100
1 john max karl kevin
2 kevin bosy lary rosy
3 karl lary bosy hale
10000 isha john lewis dave")
do.call(rbind, lapply(df$ID, function(x) {
tbl <- table(unlist(df[df$ID == x, 2:ncol(df)]))
data.frame(ID = x, MatchId = tbl[tbl == max(tbl)][1])
}))
# ID MatchId
#john 1 1
#kevin 2 1
#karl 3 1
#isha 10000 1

Subsetting character data in R

I have a data frame with several columns of varied character data. I want to find the average of each combination of that character data. I think I'm closing in on a solution, but am having trouble figuring out how to loop over characters. An example bit of data would be like:
Var1 Var2 Var3 M1
a w j 20
a w j 15
a w k 10
a w j 0
b x L 30
b x L 10
b y k 20
b y k 15
c z j 20
c z j 10
c z k 11
c w l 45
a d j 20
a d k 4
a d l 23
a d k 11
And trying to get it in the form of:
P1 P2 P3 Avg
a w j 11.667
a w k 10
a d j 20
a d k 15
a d l 23
b x L 20
b y k 17.5
c z j 15
c z k 11
c w l 45
I think the idea is something like:
test <- read.table("clipboard",header=T)
newdata <- subset(test,
Var1=='a'
& Var2=='w'
& Var3=='j',
select=M1
)
row.names(newdata)<-NULL
newdata2 <- as.data.frame(matrix(data=NA,nrow=3,ncol=4))
names(newdata2) <- c("P1","P2","P3","Avg")
newdata2[1,1] <- 'a'
newdata2[1,2] <- 'w'
newdata2[1,3] <- 'j'
newdata2[1,4] <- mean(newdata$M1)
Which works for the first line, but I'm not entirely sure how to automate this to loop over each character combination across the columns. Unless, of course, there's a similar apply-like function to use in this case?

library(dplyr)
newdata2 = summarise(group_by(test,Var1,Var2,Var3),Avg=mean(M1))
And the result:
> newdata2
Source: local data frame [10 x 4]
Groups: Var1, Var2
Var1 Var2 Var3 Avg
1 a d j 20.00000
2 a d k 7.50000
3 a d l 23.00000
4 a w j 11.66667
5 a w k 10.00000
6 b x L 20.00000
7 b y k 17.50000
8 c w l 45.00000
9 c z j 15.00000
10 c z k 11.00000

Using the base aggregate function:
mydata <- read.table(header=TRUE, text="
Var1 Var2 Var3 M1
a w j 20
a w j 15
a w k 10
a w j 0
b x L 30
b x L 10
b y k 20
b y k 15
c z j 20
c z j 10
c z k 11
c w l 45
a d j 20
a d k 4
a d l 23
a d k 11")
aggdata <-aggregate(mydata$M1, by=list(mydata$Var1,mydata$Var2,mydata$Var3) , FUN=mean, na.rm=TRUE)
output:
> aggdata
Group.1 Group.2 Group.3 x
1 a d j 20.00000
2 a w j 11.66667
3 c z j 15.00000
4 a d k 7.50000
5 a w k 10.00000
6 b y k 17.50000
7 c z k 11.00000
8 a d l 23.00000
9 c w l 45.00000
10 b x L 20.00000

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Join 3 columns of different lengths in R - r

I have 3 columns 2 are the same length 1 is of a lesser length here are the columns: column1 <- letters[1:10] column2 <- letters[1:15] column3 <- letters[1:15] I want all 3 columns to be joined together but have the missing 5 values in column1 to be NA? What can i do to achieve this? a tibble?

n <- max(length(column1), length(column2), length(column3)) data.frame(column1[1:n],column2[1:n],column3[1:n]) column1.1.n. column2.1.n. column3.1.n. 1 a a a 2 b b b 3 c c c 4 d d d 5 e e e 6 f f f 7 g g g 8 h h h 9 i i i 10 j j j 11 <NA> k k 12 <NA> l l 13 <NA> m m 14 <NA> n n 15 <NA> o o

Using cbind.fill from rowr package you can do it easily. library(rowr) new<- cbind.fill(column1,column2,column3) I hope this helps

Related

Getting the length of a list

R: more efficient solution than this for-loop

R - Adding a total row in Excel output

Getting the maximum common words in R

Subsetting character data in R

Categories

Resources