I'm trying to use subset to get values from the union of two tables
> ans<-subset(table2, select=rownames(table1))
But i get the following error:
Error in [.data.frame(x, r, vars, drop = drop) : undefined columns selected
Given table1
V2
E x
F x
G x
H x
And table2
V1 V2 V3 V4 V5 V6
1 A B C D E F
2 2 5 6 4 6 8
I want to obtain:
E F
6 8
Used this data:
table1 <- structure(list(V2 = structure(c(1L, 1L, 1L, 1L), .Label = "x", class = "factor")), class = "data.frame", row.names = c("E",
"F", "G", "H"))
structure(list(X1 = c("A", "2"), X2 = c("B", "5"), X3 = c("C",
"6"), X4 = c("D", "4"), X5 = c("E", "6"), X6 = c("F", "8")), class = "data.frame", row.names = c(NA,
-2L))
Note: This does not work if the data structure is factors. I assembled table2 with:
table2 <- data.frame(rbind(as.character(LETTERS[1:6]), c(2, 5, 6, 4, 6, 8)), stringsAsFactors = FALSE)
So, then this works:
ans <- table2[, as.character(table2[1, ]) %in% rownames(table1)]
ans
Related
I would like to convert a data frame to a matrix in R, as in the following example:
df
row.index column.index matrix element
1 1 A
1 2 B
2 1 C
2 2 D
matrix
A B
C D
Is it possible to do the same with rownames? In example
df
row.name column.name matrix element
X P A
X Q B
Y P C
Y Q D
matrix
P Q
X A B
Y C D
Thanks for help!
We can use tapply
tapply(df$matrixelement, df[1:2], FUN = I)
It would also work for the second dataset
res <- tapply(df1$matrixelement, df1[1:2], FUN = I)
names(dimnames(res)) <- NULL
res
# P Q
#X "A" "B"
#Y "C" "D"
If we need a data.frame, then dcast can be used
library(reshape2)
dcast(df, row.index ~column.index)
data
df <- structure(list(row.index = c(1L, 1L, 2L, 2L), column.index = c(1L,
2L, 1L, 2L), matrixelement = c("A", "B", "C", "D")), .Names = c("row.index",
"column.index", "matrixelement"), class = "data.frame", row.names = c(NA,
-4L))
df1 <- structure(list(row.name = c("X", "X", "Y", "Y"), column.name = c("P",
"Q", "P", "Q"), matrixelement = c("A", "B", "C", "D")), .Names = c("row.name",
"column.name", "matrixelement"), class = "data.frame", row.names = c(NA,
-4L))
How to add column dynamically in dataframe if list of variable increases. My dataframe:
ID Value
1 list(F="20",B="rt")
2 list(F="20",B="rt",`H'="ty")
3 list(F="20",B="rt")
4 list(F="20")
Desired output:
ID Value F B H
1 list(F="20",B="rt") 20 rt NA
2 list(F="20",B="rt",H="ty") 20 rt ty
3 list(F="20",B="rt") 20 rt NA
4 list(F="20") 20 NA NA
structure(list(Billing = list(NULL, structure
(list(`EUcust#` = "3",`Cust#` = "5", Com = "I", `Com#` = "6", Add = "Y"), .Names
= c("EUcust#",
"Cust#", "Com", "Com#", "Add"), class
= "data.frame", row.names = 1L))), .Names
= "Billing",
row.names = 8:9, class = "data.frame")
We can use tidyverse
library(tidyverse)
df1 %>%
bind_cols(., map_df(.$Value, ~do.call(cbind.data.frame, .)))
# ID Value F B H
#1 1 20, rt 20 rt <NA>
#2 2 20, rt, ty 20 rt ty
#3 3 20, rt 20 rt <NA>
#4 4 20 20 <NA> <NA>
data
df1 <- structure(list(ID = 1:4, Value = structure(list(structure(list(
F = "20", B = "rt"), .Names = c("F", "B")), structure(list(
F = "20", B = "rt", H = "ty"), .Names = c("F", "B", "H")),
structure(list(F = "20", B = "rt"), .Names = c("F", "B")),
structure(list(F = "20"), .Names = "F")), class = "AsIs")),
.Names = c("ID",
"Value"), row.names = c(NA, -4L), class = "data.frame")
I have a data frame arranged as follows:
df <- structure(list(name1= c("A","A","B"),
name2 = c("B", "C","C"),
size = c(10,20,30)),.Names=c("name1","name2","size"),
row.names = c("1", "2", "3"), class =("data.frame"))
I would like to add "mirror" observations as follows:
df <- structure(list(name1 = c("A","B","A", "C", "B", "C"),
name2 = c("B", "A","C", "A", "C", "B"),
size = c(10,10,20,20,30,30)),.Names=c("name1","name2","size"),
row.names = c("1", "2", "3", "4", "5", "6"), class =("data.frame"))
Inputs would be much appreciated.
We can do this in two steps,
df1 <- df[rep(rownames(df), each = 2),]
df1[c(FALSE, TRUE), 1:2] <- df1[c(FALSE, TRUE), 2:1]
df1
# name1 name2 size
#1 A B 10
#1.1 B A 10
#2 A C 20
#2.1 C A 20
#3 B C 30
#3.1 C B 30
We can do
library(data.table)
rbindlist(list(df, df[c(2:1, 3)]))
I have 19 nested lists generated from a lapply and split operation.
These lists are in the form:
#list1
Var col1 col2 col3
A 2 3 4
B 3 4 5
#list2
Var col1 col2 col3
A 5 6 7
B 5 4 4
......
#list19
Var col1 col2 col3
A 3 6 7
B 7 4 4
I have been able to merge the lists with
merge.all <- function(x, y) merge(x, y, all=TRUE, by="Var")
out <- Reduce(merge.all, DataList)
I am however getting an error due to the similarity in the names of the other columns.
How can I concatenate the name of the list to the variable names so that I get something like this:
Var list1.col1 list1.col2 list1.col3 .......... list19.col3
A 2 3 4 7
B 3 4 5 .......... 4
I'm really sure somebody will come up with a much, much better solution. However, if you're after a quick and dirty solution, this seems to work.
My plan was to simply change the column names prior to merging.
#Sample Data
df1 <- data.frame(Var = c("A","B"), col1 = c(2,3), col2 = c(3,4), col3 = c(4,5))
df2 <- data.frame(Var = c("A","B"), col1 = c(5,5), col2 = c(6,4), col3 = c(7,5))
df19 <- data.frame(Var = c("A","B"), col1 = c(3,7), col2 = c(6,4), col3 = c(7,4))
mylist <- list(df1, df2, df19)
names(mylist) <- c("df1", "df2", "df19") #just manually naming, presumably your list has names
## Change column names by pasting name of dataframe in list with standard column names. - using ugly mix of `lapply` and a `for` loop:
mycolnames <- colnames(df1)
mycolnames1 <- lapply(names(mylist), function(x) paste0(x, mycolnames))
for(i in 1:length(mylist)){
colnames(mylist[[i]]) <- mycolnames1[[i]]
colnames(mylist[[i]])[1] <- "Var" #put Var back in so you can merge
}
## Merge
merge.all <- function(x, y)
merge(x, y, all=TRUE, by="Var")
out <- Reduce(merge.all, mylist)
out
# Var df1col1 df1col2 df1col3 df2col1 df2col2 df2col3 df19col1 df19col2 df19col3
#1 A 2 3 4 5 6 7 3 6 7
#2 B 3 4 5 5 4 5 7 4 4
There you go - it works but is very ugly.
To set the data frame names unique, you could use a function to set all list names that are not the merging variable to unique names.
resetNames <- function(x, byvar = "Var") {
asrl <- as.relistable(lapply(x, names))
allnm <- names(unlist(x, recursive = FALSE))
rpl <- replace(allnm, unlist(asrl) %in% byvar, byvar)
Map(setNames, x, relist(rpl, asrl))
}
Reduce(merge.all, resetNames(dlist))
# Var list1.col1 list1.col2 list1.col3 list2.col1 list2.col2 list2.col4 list3.col1
#1 A 2 3 4 5 6 7 3
#2 B 3 4 5 5 4 4 7
# list3.col2 list3.col3 list4.col1 list4.col2 list4.col3
#1 6 7 3 6 7
#2 4 4 4 5 6
when run your list with an added data frame there are no warnings. And there's always data table. Its merge method does not return a warning for duplicated column names.
library(data.table)
Reduce(merge.all, lapply(dlist, as.data.table))
Another option is to check the names as the data enters the function, change them there, and then you can avoid the warning. This isn't perfect but it works ok here.
merge.all <- function(x, y) {
m <- match(names(y)[-1], gsub("[.](x|y)$", "", names(x)[-1]), 0L)
names(y)[-1][m] <- paste0(names(y)[-1][m], "DUPE")
merge(x, y, all=TRUE, by="Var")
}
rm <- Reduce(merge.all, dlist)
names(rm)
# [1] "Var" "col1" "col2" "col3" "col1DUPE.x"
# [6] "col2DUPE.x" "col4" "col1DUPE.y" "col2DUPE.y" "col3DUPE.x"
# [11] "col1DUPE" "col2DUPE" "col3DUPE.y"
where dlist is
structure(list(list1 = structure(list(Var = structure(1:2, .Label = c("A",
"B"), class = "factor"), col1 = 2:3, col2 = 3:4, col3 = 4:5), .Names = c("Var",
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA,
-2L)), list2 = structure(list(Var = structure(1:2, .Label = c("A",
"B"), class = "factor"), col1 = c(5L, 5L), col2 = c(6L, 4L),
col4 = c(7L, 4L)), .Names = c("Var", "col1", "col2", "col4"
), class = "data.frame", row.names = c(NA, -2L)), list3 = structure(list(
Var = structure(1:2, .Label = c("A", "B"), class = "factor"),
col1 = c(3L, 7L), col2 = c(6L, 4L), col3 = c(7L, 4L)), .Names = c("Var",
"col1", "col2", "col3"), class = "data.frame", row.names = c(NA,
-2L)), list4 = structure(list(Var = structure(1:2, .Label = c("A",
"B"), class = "factor"), col1 = 3:4, col2 = c(6L, 5L), col3 = c(7L,
6L)), .Names = c("Var", "col1", "col2", "col3"), row.names = c(NA,
-2L), class = "data.frame")), .Names = c("list1", "list2", "list3",
"list4"))
I have some columns of characters as a data frame df:
V1 V2 V3 group
B C - 1
B C C 1
B C C 1
A C A 2
A A A 2
A A A 2
I would like to find out whether the intersection of the factored groups for each column are empty or not and would like to output the result in say a TRUE/FALSE format.
Column 2 is the only column with non-zero intersection which I have checked using:
> is.na(intersect(df[,2][df$group=="1"],df[,2][df$group=="2"]))
[1] FALSE
I was trying to automate this for the three columns V1-V3 using
by(df[,1:3], df$group, function(x) { is.na(intersect(x[df$group=="1"],x[df$group=="2"]))})
but got an error:
Error in `[.data.frame`(x, df$group == "2") : undefined columns selected
Thanks for any suggestions/alternatives!
Try
lapply(df[,1:3], function(x)
is.na(intersect(x[df$group=='1'], x[df$group=='2'])))
Or
Map(function(x,y) is.na(intersect(x,y)),
df[df$group=='1',-4], df[df$group=='2', -4])
If you have many groups,
lapply(df[,1:3], function(x) is.na(Reduce(`intersect`,split(x, df$group))))
data
df <- structure(list(V1 = c("B", "B", "B", "A", "A", "A"), V2 = c("C",
"C", "C", "C", "A", "A"), V3 = c("-", "C", "C", "A", "A", "A"
), group = c(1L, 1L, 1L, 2L, 2L, 2L)), .Names = c("V1", "V2",
"V3", "group"), class = "data.frame", row.names = c(NA, -6L))