I have a matrix with binary data representing whether each column field is relevant to each row element. I'm looking to create a two column dataframe identifying the name of each field associated with each row. How can I do this in R?
Here is an example of what I'm starting with:
A B C
W 1 1 0
X 0 1 1
Y 1 1 1
Z 0 1 1
And I'm looking to end up with this:
Element | Relevant Field
W|A
W|B
X|B
X|C
Y|A
Y|B
Y|C
Z|B
Z|C
Any hints? Thanks!
If your starting value is a matrix like this
mm <- matrix(c(1L, 0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L),
ncol=3, dimnames = list(c("W", "X", "Y", "Z"), c("A", "B", "C")))
You can treat it like a table and unroll the data faairly easily
subset(as.data.frame(as.table(mm)), Freq>0)
# Var1 Var2 Freq
# 1 W A 1
# 3 Y A 1
# 5 W B 1
# 6 X B 1
# 7 Y B 1
# 8 Z B 1
# 10 X C 1
# 11 Y C 1
# 12 Z C 1
We can use base R methods
data.frame(Element = rep(rownames(m1), each = ncol(m1)),
Relevant_Field = rep(colnames(m1), nrow(m1)))[as.vector(t(m1))!=0,]
Or with CJ
library(data.table)
CJ(Element = row.names(m1), Relevant_Field = colnames(m1))[as.vector(t(m1)!=0)]
# Element Relevant_Field
#1: W A
#2: W B
#3: X B
#4: X C
#5: Y A
#6: Y B
#7: Y C
#8: Z B
#9: Z C
Or as #Frank suggested, we can melt (using reshape2) to a three column dataset, convert to data.table and remove the 0 values
library(reshape2)
setDT(melt(m1))[ value == 1 ][, value := NULL][]
Here is another base R method that uses with and subsetting.
# get the positions of 1s in matrix (row / column) output
posMat <- which(mm==1, arr.ind=TRUE)
# build the data.frame
myDf <- data.frame(rowVals=rownames(mm)[posMat[, 1]],
colVals=colnames(mm)[posMat[, 2]])
or other structures...
# matrix
myMat <- cbind(rowVals=rownames(mm)[posMat[, 1]],
colVals=colnames(mm)[posMat[, 2]])
# vector with pipe separator
myVec <- paste(rownames(mm)[posMat[, 1]], colnames(mm)[posMat[, 2]], sep="|")
Related
I have a very simple problem. I'm am trying to set the value of column X to 0 if column Y[n,] does not equal column Y[n-1,]. My issue is that I do not know how to reference a previous row value in R, and then use that value to set the value of another column.
As an example:
Y X
1 5
1 1
2 0
2 2
X[3,2] is 0 because Y[3,1] does not equal Y[2,1].
I need to basically find all instance of this in a large data-set and set the corresponding X value to 0.
data$X <- 0 if data$Y[n] =! data$Y[n-1]
Is there a simple solution to this in R? It really feels as though there should be.
Thank you
Similarly to the post from #markus, with dplyr you can do:
df %>%
mutate(X = (Y == lag(Y, default = first(Y))) * X)
Y X
1 1 5
2 1 1
3 2 0
4 2 2
Given
Y <- c(1, 1, 2, 2)
X <- c(5, 1, 10, 2)
an option would be diff
X * (c(0, diff(Y)) == 0)
# [1] 5 1 0 2
The idea is to check if x[i] - x[i -1] equals zero which gives a logical vector that we multiply by X
Another base R option
with(df, X * c(TRUE, !(Y[-1] - Y[-length(Y)])))
#[1] 5 1 0 2
Or using dplyr
library(dplyr)
df %>%
mutate(X = c(X[1], ((duplicated(Y) * X)[-1])))
# Y X
#1 1 5
#2 1 1
#3 2 0
#4 2 2
data
df <- structure(list(Y = c(1L, 1L, 2L, 2L), X = c(5L, 1L, 0L, 2L)),
class = "data.frame", row.names = c(NA, -4L))
I need some help on creating a special kind of subtraction.
I have a data table x and I must subtract two columns, say a and b.
However, either column may not exist.
If a column does not exist, its value in the subtraction should be set to zero.
So far, I have approached this problem by trying to define a new subtraction operator, %-%
Thus, for example, if x = data.table(a = 5, b = 2), then a %-% b should be 3, whereas a %-% d should be 5.
I have tried to define this subtraction operator as shown below. However, for some reason, my subtraction always yields zero! Can anyone help me understand what am I doing wrong and how may I correct my code?
library(data.table)
x = data.table(a = floor(10 * runif(5)), b = floor(10 * runif(5)), c =floor(10 * runif(5)))
`%-%` <- function(e1,e2, DT = x){
ifelse(is.numeric(substitute(e1, DT)), e1 <- substitute(e1, DT), e1 <- 0)
ifelse(is.numeric(substitute(e2, DT)), e2 <- substitute(e2, DT), e2 <- 0)
return(e1 - e2)
}
x[, d := a %-% b]
x
x[, d := a %-% d]
x
Many thanks!
We can create a function with intersect for passing the column names into .SDcols, then Reduce by subtracting the corresponding rows of each column in .SD (Subset of Data.table)
f1 <- function(dat, .x, .y) intersect(names(dat), c(.x, .y))
x[, d := Reduce('-', .SD), .SDcols = f1(x, 'a', 'b')]
x[, e := Reduce(`-`, .SD), .SDcols = f1(x, 'a', 'f')]
x
# a b c d e
#1: 7 0 8 7 7
#2: 3 6 4 -3 3
#3: 9 9 8 0 9
#4: 3 6 2 -3 3
#5: 0 2 3 -2 0
Or if we want to change the OP's function by passing unquoted arguments, then use enquo to convert it to from quosure and then reconvert it back to string with quo_name. Create an intersection vector from the column names of the dataset, and use - in the Reduce
library(dplyr)
`%-%` <- function(e1,e2, DT){
e1 <- quo_name(enquo(e1))
e2 <- quo_name(enquo(e2))
nm1 <- intersect(names(DT), c(e1, e2))
DT[, Reduce(`-`, .SD), .SDcols = nm1]
}
x[, d := `%-%`(a, b, .SD)]
x[, e := `%-%`(a, f, .SD)]
data
x <- structure(list(a = c(7L, 3L, 9L, 3L, 0L), b = c(0L, 6L, 9L, 6L,
2L), c = c(8L, 4L, 8L, 2L, 3L)), .Names = c("a", "b", "c"), row.names = c("1:",
"2:", "3:", "4:", "5:"), class = "data.frame")
setDT(x)
`%-%`=function(a,b){
DT=eval(sys.status()$sys.calls[[2]][[2]])
a=substitute(a)
b=substitute(b)
stopifnot(is.name(a),is.name(b),is.data.table(DT))
a=deparse(a)
b=deparse(b)
d=numeric(nrow(DT))
a=if(!exists(a,DT)) d else get(a,DT)
b=if(!exists(b,DT)) d else get(b,DT)
a-b
}
set.seed(5)
x = data.table(a = floor(10 * runif(5)), b = floor(10 * runif(5)), c =floor(10 * runif(5)))
x
a b c
1: 2 7 2
2: 6 5 4
3: 9 8 3
4: 2 9 5
5: 1 1 2
x[,a%-%b]
[1] -5 1 1 -7 0
x[,a%-%f]# F is just a column of zeros since it does not exist:
[1] 2 6 9 2 1
Or you can just do:
x[,c("d","e","f"):=.(a%-%b,a%-%h,g%-%h)]
x
a b c d e f
1: 2 7 2 -5 2 0
2: 6 5 4 1 6 0
3: 9 8 3 1 9 0
4: 2 9 5 -7 2 0
5: 1 1 2 0 1 0
This function is written to work on a datatable only. For example:
setDF(x)[,a%-%b]
Error: is.data.table(DT) is not TRUE
setDT(x)[,a%-%b]
[1] -5 1 1 -7 0
EDIT: This answer gives the correct value with regard to the order. (Most of the answers given below do not pass this test)
setDT(x)[,a%-%b]#Column subtract another
[1] -5 1 1 -7 0
setDT(x)[,b%-%a]#Reversing the order
[1] 5 -1 -1 7 0
setDT(x)[,b%-%b]#Column Subtract itself
[1] 0 0 0 0 0
setDT(x)[,a%-%f]#Column subtract a non-existing column
[1] 2 6 9 2 1
setDT(x)[,f%-%a]#a non-existing column subtract an existing column
[1] -2 -6 -9 -2 -1
x[,g%-%f] #subtract two non-existing columns
[1] 0 0 0 0 0
IIUC, you can try this way. We use exist function to ensure if the column is available in the data.
# helper function
do_sub <- function(df, col1 = 'a', col2='b')
{
ans <- integer()
if (exists(col1, df) & exists(col2, df)){
ans <- append(ans, df[[col1]] - df[[col2]])
} else if (exists(col1, df)){
ans <- append(ans, df[[col1]] - 0)
} else {
ans <- append(ans, 0 - df[[col2]])
}
return (ans)
}
# compute new columns
df[, d := do_sub(.SD, col1 = 'a', col2 = 'b')]
df[, e := do_sub(.SD, col1 = 'a', col2 = 'f')]
print(df)
a b c d e
1: 7 0 8 7 7
2: 3 6 4 -3 3
3: 9 9 8 0 9
4: 3 6 2 -3 3
5: 0 2 3 -2 0
I'm trying to create a df that has col #, row number and cell value based on the lowest value according to a linear model with constraints. column is the col with the lowest value according to the lm output. row is the corresponding row with lowest value, now I want the actual value and i'm having a hard time. if your curious the time value is gmapsdistance output in seconds.
What i'm getting
> h
column row time.1 time.2 time.3 time.4 time.5
1 1 1 8262 8262 8262 66357 66357
2 1 2 21386 21386 21386 73307 73307
3 1 3 30698 30698 30698 52547 52547
4 2 4 32711 32711 32711 53006 53006
5 2 5 66156 66156 66156 65205 65205
What I want is one "time" column with the minimum time corresponding to column and row within aa.
> h
column row time
1 1 1 8262
2 1 2 21386
3 1 3 30698
4 2 4 53006
5 2 5 65205
Here is a reproducible example:
library(lpSolve)
aa <- matrix(c(8262, 21386, 30698, 32711, 66156, 66357, 73307, 52547, 53006, 65205),
nrow=5,
ncol=2)
aa
#Run aa through a Linear model with lower constraint of 2 and upper constraint of 8
gwide <- aa
k <- ncol(gwide)
n <- nrow(gwide)
dir <- "min"
objective.in <- c(gwide)
A <- t(rep(1, k)) %x% diag(n)
B <- diag(k) %x% t(rep(1, n))
const.mat <- rbind(A, B, B)
const.dir <- c(rep("==", n), rep(">=", k), rep("<=", k))
const.rhs <- c(rep(1, n), rep(2, k), rep(8, k))
res <- lp(dir, objective.in, const.mat, const.dir, const.rhs, all.bin = TRUE)
res
#create a matrix from LM
soln <- matrix(res$solution, n, k)
soln
column <- apply(soln, 1, which.max)
h <- as.data.frame(column)
h$row = 1:nrow(h)
h$time <- aa[h$row,c(h$column)] #this seems to be where the problem is
h
I thought h$time <- aa[h$row,c(h$column)] would return a new column named "time" with the value from aa based on the row and column from h but that didn't work out so well. I've been racking my brain for hours and have come up with nothing. Any thoughts?
You have to loop through the rows of h and then extract the values in aa using the row and column indices from h.
h$time <- apply( h, 1, function(x) aa[x[2], x[1]] )
h
# column row time
# 1 1 1 8262
# 2 1 2 21386
# 3 1 3 30698
# 4 2 4 53006
# 5 2 5 65205
Data:
aa <- structure(c(8262, 21386, 30698, 32711, 66156, 66357, 73307, 52547,
53006, 65205), .Dim = c(5L, 2L))
h <- structure(list(column = c(1L, 1L, 1L, 2L, 2L), row = 1:5), .Names = c("column",
"row"), row.names = c(NA, -5L), class = "data.frame")
I have a binary data set which looks like
a b c d
r1 1 1 0 0
r2 0 1 1 0
r3 1 0 0 1
And a vector
V <- c("a", "c")
I want to a command to search colnames and change values in these columns. for example change 1 to A. So the output would be:
a b c d
r1 A 1 0 0
r2 0 1 A 0
r3 A 0 0 1
Here is a vectorized way to do it,
df[names(df) %in% V] <- replace(df[names(df) %in% V], df[names(df) %in% V] == 1, 'A')
#or avoid calling the %in% part 3 times by assigning it, i.e.
i1 <- names(df) %in% V
df[i1] <- replace(df[i1], df[i1] == 1, 'A')
#or a more simplified syntax, compliments of #Cath,
df[, V][df[, V]==1] <- "A"
which gives,
a b c d
r1 A 1 0 0
r2 0 1 A 0
r3 A 0 0 1
A solution with dplyr:
library(dplyr)
V <- c("a", "c")
df %>%
mutate_at(V, ~replace(.x, .x == 1, 'A'))
# a b c d
# r1 A 1 0 0
# r2 0 1 A 0
# r3 A 0 0 1
mutate_at takes a a data.frame and a vector of column names and applys the specified function to each of the columns.
DATA
df <- structure(list(a = c(1L, 0L, 1L), b = c(1L, 1L, 0L),
c = c(0L, 1L, 0L), d = c(0L, 0L, 1L)),
.Names = c("a", "b", "c", "d"),
class = "data.frame", row.names = c("r1", "r2", "r3"))
If left hand side (LHS) and right hand side (RHS) or of the same type, then data.table can be used to update only the selected "cells" in place, i.e., without copying the whole column:
library(data.table)
setDT(df)
for (s in V) df[get(s) == 1L, (s) := 99L] # replacement value is of type integer
df[]
a b c d
1: 99 1 0 0
2: 0 1 99 0
3: 99 0 0 1
To verify that only selected rows in each column are updated, we can check the addresses of each column before and after the update using:
df[, lapply(.SD, address), .SDcols = V]
(In addition, the verbose mode can be switched on by options(datatable.verbose = TRUE).)
In case LHS and RHS are of different type, a type conversion is required anyway. Therefore, the whole column needs to be replaced:
df[, (V) := lapply(.SD, function(x) replace(x, x == 1L, "A")), .SDcols = V]
df
a b c d
1: A 1 0 0
2: 0 1 A 0
3: A 0 0 1
Using address() shows that each of the affected columns has been copied. But only the affected columns are copied, the other columns haven't been touched. This is different to the other answers posted so far where the whole data frame is copied.
DF = structure(list(a = c(1L, 2L, 5L), b = c(2L, 3L, 3L), c = c(3L, 1L, 2L)), .Names = c("a", "b", "c"), row.names = c(NA, -3L), class = "data.frame")
a b c
1 2 3
2 3 1
5 3 2
How do I create additional columns, each including the names or indices of the columns of the row minimum, middle and maximum as follows?
a b c min middle max
1 2 3 a b c
2 3 1 c a b
5 3 2 c b a
One approach would be to loop through the rows with apply, returning the column names in the indicated order:
cbind(DF, t(apply(DF, 1, function(x) setNames(names(DF)[order(x)],
c("min", "middle", "max")))))
# a b c min middle max
# 1 1 2 3 a b c
# 2 2 3 1 c a b
# 3 5 3 2 c b a
This solution assumes you have exactly three columns (so the middle is the second largest). If that is not the case, you could generalize to any number of columns with the following modification:
cbind(DF, t(apply(DF, 1, function(x) {
ord <- order(x)
setNames(names(DF)[c(ord[1], ord[(length(x)+1)/2], tail(ord, 1))],
c("min", "middle", "max"))
})))
# a b c min middle max
# 1 1 2 3 a b c
# 2 2 3 1 c a b
# 3 5 3 2 c b a
As the OP mentioned about data.table, here is one way with data.table. Convert the 'data.frame' to 'data.table' (setDT(DF)), grouped by the sequence of rows, we unlist the dataset, order the values, use it as index to order column names, create three columns by assigning (after converting to list).
library(data.table)
setDT(DF)[, c('min', 'middle', 'max') :=
as.list(names(DF)[order(unlist(.SD))]) ,1:nrow(DF)][]
# a b c min middle max
#1: 1 2 3 a b c
#2: 2 3 1 c a b
#3: 5 3 2 c b a