Coverting a data frame to matrix in R - r

I would like to convert a data frame to a matrix in R, as in the following example:
df
row.index column.index matrix element
1 1 A
1 2 B
2 1 C
2 2 D
matrix
A B
C D
Is it possible to do the same with rownames? In example
df
row.name column.name matrix element
X P A
X Q B
Y P C
Y Q D
matrix
P Q
X A B
Y C D
Thanks for help!

We can use tapply
tapply(df$matrixelement, df[1:2], FUN = I)
It would also work for the second dataset
res <- tapply(df1$matrixelement, df1[1:2], FUN = I)
names(dimnames(res)) <- NULL
res
# P Q
#X "A" "B"
#Y "C" "D"
If we need a data.frame, then dcast can be used
library(reshape2)
dcast(df, row.index ~column.index)
data
df <- structure(list(row.index = c(1L, 1L, 2L, 2L), column.index = c(1L,
2L, 1L, 2L), matrixelement = c("A", "B", "C", "D")), .Names = c("row.index",
"column.index", "matrixelement"), class = "data.frame", row.names = c(NA,
-4L))
df1 <- structure(list(row.name = c("X", "X", "Y", "Y"), column.name = c("P",
"Q", "P", "Q"), matrixelement = c("A", "B", "C", "D")), .Names = c("row.name",
"column.name", "matrixelement"), class = "data.frame", row.names = c(NA,
-4L))

Related

Verifyin if there's at least two columns have the same value in a specefic column

i have a data and i want to see if my variables they all have unique value in specefic row
let's say i want to analyze row D
my data
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
> TRUE (because all the three variables have unique value)
Second example
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 4
>False (because F and T have the same value in row D )
In base R do
f1 <- function(dat, ind) {
tmp <- unlist(dat[ind, -1])
length(unique(tmp)) == length(tmp)
}
-testing
> f1(df, 4)
[1] TRUE
> f1(df1, 4)
[1] FALSE
data
df <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = 3:6), class = "data.frame", row.names = c(NA, -4L))
df1 <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = c(3L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-4L))
You can use dplyr for this:
df %>%
summarize_at(c(2:ncol(.)), n_distinct) %>%
summarize(if_all(.fns = ~ .x == nrow(df)))

How to collapse rows by identical values in a column

Good evening,
I have a two columns tab separated .txt file, as the following:
number letter
1 a
1 b
2 a
2 b
3 b
I would like to collapse rows where the column "number" has identical value, by creating a comma separated value in the corresponding column "letter".
In other words, this should be the output:
number letter
1 a,b
2 a,b
3 b
I have looked up the web but I did not find an actual solution.
Thank you in advance,
Giuseppe
We can use aggregate in base R
aggregate(letter ~ number, df1, FUN = paste, collapse=",")
-output
# number letter
#1 1 a,b
#2 2 a,b
#3 3 b
Or with tidyverse
library(dplyr)
library(stringr)
df1 %>%
group_by(number) %>%
summarise(letter = str_c(letter, collapse=","))
data
df1 <- structure(list(number = c(1L, 1L, 2L, 2L, 3L), letter = c("a",
"b", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-5L))
We can also combine aggregate() with toString:
#Code
newdf <- aggregate(letter~.,df,toString)
Output:
number letter
1 1 a, b
2 2 a, b
3 3 b
Some data:
#Data
df <- structure(list(number = c(1L, 1L, 2L, 2L, 3L), letter = c("a",
"b", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-5L))

subset of R undefined columns

I'm trying to use subset to get values from the union of two tables
> ans<-subset(table2, select=rownames(table1))
But i get the following error:
Error in [.data.frame(x, r, vars, drop = drop) : undefined columns selected
Given table1
V2
E x
F x
G x
H x
And table2
V1 V2 V3 V4 V5 V6
1 A B C D E F
2 2 5 6 4 6 8
I want to obtain:
E F
6 8
Used this data:
table1 <- structure(list(V2 = structure(c(1L, 1L, 1L, 1L), .Label = "x", class = "factor")), class = "data.frame", row.names = c("E",
"F", "G", "H"))
structure(list(X1 = c("A", "2"), X2 = c("B", "5"), X3 = c("C",
"6"), X4 = c("D", "4"), X5 = c("E", "6"), X6 = c("F", "8")), class = "data.frame", row.names = c(NA,
-2L))
Note: This does not work if the data structure is factors. I assembled table2 with:
table2 <- data.frame(rbind(as.character(LETTERS[1:6]), c(2, 5, 6, 4, 6, 8)), stringsAsFactors = FALSE)
So, then this works:
ans <- table2[, as.character(table2[1, ]) %in% rownames(table1)]
ans

How to apply a function to factored subgroups in R?

I have some columns of characters as a data frame df:
V1 V2 V3 group
B C - 1
B C C 1
B C C 1
A C A 2
A A A 2
A A A 2
I would like to find out whether the intersection of the factored groups for each column are empty or not and would like to output the result in say a TRUE/FALSE format.
Column 2 is the only column with non-zero intersection which I have checked using:
> is.na(intersect(df[,2][df$group=="1"],df[,2][df$group=="2"]))
[1] FALSE
I was trying to automate this for the three columns V1-V3 using
by(df[,1:3], df$group, function(x) { is.na(intersect(x[df$group=="1"],x[df$group=="2"]))})
but got an error:
Error in `[.data.frame`(x, df$group == "2") : undefined columns selected
Thanks for any suggestions/alternatives!
Try
lapply(df[,1:3], function(x)
is.na(intersect(x[df$group=='1'], x[df$group=='2'])))
Or
Map(function(x,y) is.na(intersect(x,y)),
df[df$group=='1',-4], df[df$group=='2', -4])
If you have many groups,
lapply(df[,1:3], function(x) is.na(Reduce(`intersect`,split(x, df$group))))
data
df <- structure(list(V1 = c("B", "B", "B", "A", "A", "A"), V2 = c("C",
"C", "C", "C", "A", "A"), V3 = c("-", "C", "C", "A", "A", "A"
), group = c(1L, 1L, 1L, 2L, 2L, 2L)), .Names = c("V1", "V2",
"V3", "group"), class = "data.frame", row.names = c(NA, -6L))

Count matching instances between two data frames [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm a newbie with R and can't find my answer/anything that works.
I've got two data frames that look like..
Teams
A
B
C
...
and
TCF
A
B
C
C
B
A
...
I need to count the number of instances that each of the first DF column occurs in the second DF and return the value to the first DF. Thanks in advance!
You could use base R to do this:
sapply(unique(df1$Teams), function(x) sum(df2$TCF %in% x))
#A B C
#2 2 2
Or
setNames(table(match(df2$TCF, unique(df1$Teams))), unique(df1$Teams))
#A B C
#2 2 2
Or using data.table
library(data.table)
setkey(setDT(df1), Teams)
setkey(setDT(df2), TCF)
df2[J(unique(df1$Teams)),.N, by=.EACHI]
# TCF N
#1: A 2
#2: B 2
#3: C 2
data
df1 <- structure(list(Teams = c("A", "B", "C")), .Names = "Teams",
class = "data.frame", row.names = c(NA,-3L))
df2 <- structure(list(TCF = c("A", "B", "C", "C", "B", "A")), .Names = "TCF",
class = "data.frame", row.names = c(NA, -6L))
Would this option be easier to your eyes?
library(dplyr)
df2 %>% count(TCF) %>% filter(TCF %in% unique(df1$Teams))
# Source: local data frame [3 x 2]
# TCF n
# 1 A 2
# 2 B 2
# 3 C 2
Data
df1 <- structure(list(Teams = c("A", "B", "C")), .Names = "Teams", class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(TCF = structure(c(1L, 2L, 3L, 3L, 2L, 1L, 4L,
5L, 5L), .Label = c("A", "B", "C", "X", "Y"), class = "factor")), .Names = "TCF", row.names = c(NA,
-9L), class = "data.frame")

Resources