I have a vector of variable names and several matrices with single rows.
I want to create a new matrix. The new matrix is created by match/merge the row names of the matrices with single rows.
Example:
A vector of variable names
Complete_names <- c("D","C","A","B")
Several matrices with single rows
Matrix_1 <- matrix(c(1,2,3),3,1)
rownames(Matrix_1) <- c("D","C","B")
Matrix_2 <- matrix(c(4,5,6),3,1)
rownames(Matrix_1) <- c("A","B","C")
Desired output:
Desired_output <- matrix(c(1,2,NA,3,NA,6,4,5),4,2)
rownames(Desired_output) <- c("D","C","A","B")
[,1] [,2]
D 1 NA
C 2 6
A NA 4
B 3 5
I know there are several similar postings like this, but those previous answers do not work perfectly for this one.
The main job can be done with merge, returning a data frame:
merge(Matrix_1, Matrix_2, by = "row.names", all = TRUE)
# Row.names V1.x V1.y
# 1 A NA 4
# 2 B 3 5
# 3 C 2 6
# 4 D 1 NA
Depending on your purposes you may then further modify names or get rid of Row.names.
The answers offered by Julius Vainora and achimneyswallow work well, but just to exactly obtain the desired output I want:
temp <- merge(Matrix_1, Matrix_2, by = "row.names", all = TRUE)
temp$Row.names <- factor(temp$Row.names, levels=Complete_names)
temp <- temp[order(temp$Row.names),]
rownames(temp) <- temp[,1]
Desired_output <- as.matrix(temp[,-1])
V1.x V1.y
D 1 NA
C 2 6
A NA 4
B 3 5
Related
How to simply "paste" two data frames next to each other, filling unequal rows with NAs (e.g. because I want to make a "kable" or sth similar)?
df1 <- data.frame(a = c(1,2,3),
b = c(3,4,5))
df2 <- data.frame(a = c(4,5),
b = c(5,6))
# The desired "merge"
a b a b
1 3 4 5
2 4 5 6
3 5 NA NA
Thanks to Ronak Shah, I found an easy answer in the answers to this post: How to cbind or rbind different lengths vectors without repeating the elements of the shorter vectors?
Without having to hack anything together, one can use cbind.na from the qpcR: package:
df1 <- data.frame(a = c(1,2,3),
b = c(3,4,5))
df2 <- data.frame(a = c(4,5),
b = c(5,6))
comb <- qpcR:::cbind.na(df1, df2)
As this answer is 4 years old, I wonder if there are more "modern" solutions in the popular packages like tidyverse et. al.
In base R you could do:
nr <- max(nrow(df1), nrow(df2))
cbind(df1[1:nr, ], df2[1:nr, ])
# a b a b
# 1 1 3 4 5
# 2 2 4 5 6
# 3 3 5 NA NA
This question already has answers here:
How to add multiple columns to a data.frame in one go?
(2 answers)
Closed 4 years ago.
I am in the process of reformatting a few data frames and was wondering if there is a more efficient way to add named columns to data frames, rather than the below:
colnames(df) <- c("c1", "c2)
to rename the current columns and:
df$c3 <- ""
to create a new column.
Is there a way to do this in a quicker manner? I'm trying to add dozens of named columns and this seems like an inefficient way of going through the process.
use your method in a shorter way:
cols_2_add=c("a","b","c","f")
df[,cols_2_add]=""
A way to add additional columns can be achieved using merge. Apply merge on existing dataframe with the one created with a desired columns and empty rows. This will be helpful if you want to create columns of different types.
For example:
# Existing dataframe
df <- data.frame(x=1:3, y=4:6)
#use merge to create say desired columns as a, b, c, d and e
merge(df, data.frame(a="", b="", c="", d="", e=""))
# Result
# x y a b c d e
#1 1 4
#2 2 5
#3 3 6
# Desired columns of different types
library(dplyr)
bind_rows(df, data.frame(a=character(), b=numeric(), c=double(), d=integer(),
e=as.Date(character()), stringsAsFactors = FALSE))
# x y a b c d e
#1 1 4 <NA> NA NA NA <NA>
#2 2 5 <NA> NA NA NA <NA>
#3 3 6 <NA> NA NA NA <NA>
A simple loop can help here
name_list <- c('a1','b1','c1','d1')
# example df
df <- data.frame(a = runif(3))
# this adds a new column
for(i in name_list)
{
df[[i]] <- runif(3)
}
# output
a a1 b1 c1 d1
1 0.09227574 0.08225444 0.4889347 0.2232167 0.8718206
2 0.94361151 0.58554887 0.7095412 0.2886408 0.9803941
3 0.22934864 0.73160433 0.6781607 0.7598064 0.4663031
# in case of data.table, for-set provides faster version:
# example df
df <- data.table(a = runif(3))
for(i in name_list)
set(df, j=i, value = runif(3))
I have two data.frames:
word1=c("a","a","a","a","b","b","b")
word2=c("a","a","a","a","c","c","c")
values1 = c(1,2,3,4,5,6,7)
values2 = c(3,3,0,1,2,3,4)
df1 = data.frame(word1,values1)
df2 = data.frame(word2,values2)
df1:
word1 values1
1 a 1
2 a 2
3 a 3
4 a 4
5 b 5
6 b 6
7 b 7
df2:
word2 values2
1 a 3
2 a 3
3 a 0
4 a 1
5 c 2
6 c 3
7 c 4
I would like to split these dataframes by word*, and perform two sample t.tests in R.
For example, the word "a" is in both data.frames. What's the t.test between the data.frames for the word "a"? And do this for all the words that are in both data.frames.
The result is a data.frame(result):
word tvalues
1 a 0.4778035
Thanks
Find the words common to both dataframes, then loop over these words, subsetting both dataframes and performing the t.test on the subsets.
E.g.:
df1 <- data.frame(word=sample(letters[1:5], 30, replace=TRUE),
x=rnorm(30))
df2 <- data.frame(word=sample(letters[1:5], 30, replace=TRUE),
x=rnorm(30))
common_words <- sort(intersect(df1$word, df2$word))
setNames(lapply(common_words, function(w) {
t.test(subset(df1, word==w, x), subset(df2, word==w, x))
}), common_words)
This returns a list, where each element is the output of the t.test for one of the common words. setNames just names the list elements so you can see which words they correspond to.
Note I've created new example data here since your example data only have one word in common (a) and so don't really resemble your true problem.
If you just want a matrix of statistics, you can do something like:
t(sapply(common_words, function(w) {
test <- t.test(subset(df1, word==w, x), subset(df2, word==w, x))
c(test$statistic, test$parameter, p=test$p.value,
`2.5%`=test$conf.int[1], `97.5%`=test$conf.int[2])
}))
## t df p 2.5% 97.5%
## a 0.9141839 8.912307 0.38468553 -0.4808054 1.1313220
## b -0.2182582 7.589109 0.83298193 -1.1536056 0.9558315
## c -0.2927253 8.947689 0.77640684 -1.5340097 1.1827691
## d -2.7244728 12.389709 0.01800568 -2.5016301 -0.2826952
## e -0.3683153 7.872407 0.72234501 -1.9404345 1.4072499
I want to check that columns are consistent for each ID number (they're supposed to be constants, but there may be some doubt in the data, so I want to double check)
For example, given the following data frame:
test <- data.frame(ID = c("one","two","three"),
a = c(1,1,1),
b = c(1,1,1),
t = c(NA,1,1),
d = c(2,4,1))
I want to check that columns a,b,c and d are all the same, disregarding missing values. I thought I could do this by counting the unique values in the relevant columns, so then I can select only the rows where the number of unique values is more than 1... I imagine this is likely not the best way of doing that, but it was the only way I could think with my limited knowledge.
I found this question here, which seems to be similar to what I want to do:
Find unique values across a row of a data frame
But I am struggling to apply the answers to my data. I have tried this, which didn't do anything (but I've never used a for-loop before, so I've probably done that wrong), although when I run the inside of the function on it's own for a single row it does exactly what I hope for:
yeartest <- function(x){
temp <- test[x,2:5]
temp <- as.numeric(temp)
veclength <- length(unique(temp[!is.na(temp)]))
temp2 <- c(temp,veclength)
test[,"thing"] <- NA
test[x,2:6] <- temp2
}
for(i in 1:nrow(test)){
yeartest(i)
}
Then I tried from the accepted answer, to apply that:
x <- test
# dups <- function(x) x[!duplicated(x)]
yeartest <- function(x){
# x <- 1
temp <- test[x,2:5]
temp <- as.numeric(temp)
veclength <- length(unique(temp[!is.na(temp)]))
temp2 <- c(temp,veclength)
test[,"thing"] <- NA
test[x,2:6] <- temp2
}
new.df <- t(apply(x, 1, function(x) yeartest(x)))
Which gives an error and so it is pretty obvious that I have made a mistake in my translation of the answer to my data.
Apologies, this must be a really obvious failing on my part, I am very grateful for any help.
Solution: (thank you for the help!)
test$new <- apply(test[,2:5],1,function(r) length(unique(na.omit(r))))
> df <- data.frame(
a=sample(2,10,replace=TRUE),
b=sample(2,10,replace=TRUE),
c=sample(c("a","b"),10,replace=TRUE),
d=sample(c("a","b"),10,replace=TRUE))
> df[c(3,6,8),1] <- NA
> df
a b c d
1 1 2 a b
2 1 2 a b
3 NA 2 a a
4 2 2 a b
5 1 2 a a
6 NA 1 a b
7 2 1 b b
8 NA 1 a a
9 1 1 b b
10 2 2 b b
> apply(df,1,function(r) length(unique(na.omit(r))))
[1] 3 3 2 4 3 2 4 2 3 3
While merging 3 data.frames using plyr library, I encounter some values with the same name but with different values each in different data.frames.
How does the do.call(rbind.fill,list) treat this problem: by arithmetic or geometric average?
From the help page for rbind.fill:
Combine data.frames by row, filling in missing columns. rbinds a list of data frames
filling missing columns with NA.
So I'd expect it to fill columns that do not match with NA. It is also not necessary to use do.call() here.
dat1 <- data.frame(a = 1:2, b = 4:5)
dat2 <- data.frame(b = 3:2, c = 8:9)
dat3 <- data.frame(a = 5:6, c = 1:2)
rbind.fill(dat1, dat2, dat3)
a b c
1 1 4 NA
2 2 5 NA
3 NA 3 8
4 NA 2 9
5 5 NA 1
6 6 NA 2
Are you expecting something different?