R: print certain values of a matrix to a csv-file - r

I have a matrix with 1 and 0 in it. Now I want to create a csv-file with the following syntax where only the values=1 were printed:
j1.i1, 1
j1.i2, 1
j2.i2, 1
...
j1 should be the name of the row 1
i1 should be the name of column 1
and so on...
Edit:
M1 = matrix(c(1, 0, 1, 0, 1, 0), nrow=2, ncol=3, byrow = TRUE)
row.names(M1) <- c(100, 101)
colnames(M1) <- c("A", "B", "C")
M1
A B C
100 1 0 1
101 0 1 0
If we take this easy example the solution i'm looking for is:
100.A, 1
100.C, 1
101.B, 1

Related

How to loop a function over all elements of a vector except one and store the result in separate columns of a data frame

I have a data frame with several columns. I want to run a function [pmax() in this case] over all columns whose name is stored in a vector except one, and store the result in new separate columns. At the end, I would also like to store the names of all new columns in a separate vector. A minimal example would be:
Name <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
C1 <- c(1, 0, 1, 1, 0)
C2 <- c(0, 1, 1, 1, 0)
C3 <- c(0, 1, 0, 0, 0)
C4 <- c(1, 1, 0, 1, 0)
Data <- data.frame(Name, C1, C2, C3, C4)
var.min <- function(data, col.names){
new.df <- data
# This is how I would do it outside a function and without loop:
new.df$max.def.col.exc.1 <- pmax(new.df$C2, new.df$C3)
new.df$max.def.col.exc.2 <- pmax(new.df$C1, new.df$C3)
new.df$max.def.col.exc.3 <- pmax(new.df$C1, new.df$C2)
new.columns <- c("max.def.col.exc.1", "max.def.col.exc.2", "max.def.col.exc.3")
return(new.df)
}
new.df <- var.min(Data,
col.names= c("C1", "C2", "C3"))
The result should look like:
Name C1 C2 C3 C4 max.def.col.exc.1 max.def.col.exc.2 max.def.col.exc.3
1 Case 1 1 0 0 1 0 1 1
2 Case 2 0 1 1 1 1 1 1
3 Case 3 1 1 0 0 1 1 1
4 Case 4 1 1 0 1 1 1 1
5 Case 5 0 0 0 0 0 0 0
Anyone with an idea? Many thanks in advance!
Here is a base R solution with combn. It gets all pairwise combinations of the column names and calls a function computing pmax.
Note that the order of the expected output columns is the same as the one output by the code below. If the columns vector is c("C1", "C2", "C3"), the order will be different.
Note also that the function is now a one-liner and accepts combinations of any number of columns, 2, 3 or more.
var.min <- function(cols, data) Reduce(pmax, data[cols])
cols <- c("C3", "C2", "C1")
combn(cols, 2, var.min, data = Data)
# [,1] [,2] [,3]
#[1,] 0 1 1
#[2,] 1 1 1
#[3,] 1 1 1
#[4,] 1 1 1
#[5,] 0 0 0
Now it's just a matter of assigning column names and cbinding with the input data.
tmp <- combn(cols, 2, var.min, data = Data)
colnames(tmp) <- paste0("max.def.col.exc.", seq_along(cols))
Data <- cbind(Data, tmp)
rm(tmp) # final clean-up

Transform categorical attribute vector into similarity matrix

I need to transfrom a categorical attribute vector into a "same attribute matrix" using R.
For example I have a vector which reports gender of N people (male = 1, female = 0). I need to convert this vector into a NxN matrix named A (with people names on rows and columns), where each cell Aij has the value of 1 if two persons (i and j) have the same gender and 0 otherwise.
Here is an example with 3 persons, first male, second female, third male, which produce this vector:
c(1, 0, 1)
I want to transform it into this matrix:
A = matrix( c(1, 0, 1, 0, 1, 0, 1, 0, 1), nrow=3, ncol=3, byrow = TRUE)
Like lmo said in acomment it's impossible to know the structure of your dataset so what follows is just an example for you to see how it could be done.
First, make up some data.
set.seed(3488) # make the results reproducible
x <- LETTERS[1:5]
y <- sample(0:1, 5, TRUE)
df <- data.frame(x, y)
Now tabulate it according to your needs
A <- outer(df$y, df$y, function(a, b) as.integer(a == b))
dimnames(A) <- list(df$x, df$x)
A
# A B C D E
#A 1 1 1 0 0
#B 1 1 1 0 0
#C 1 1 1 0 0
#D 0 0 0 1 1
#E 0 0 0 1 1

Converting counts to individual observations in r

I have a data set that looks as follows
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
I want to reshape the dataframe to look like this
# name judgement1 judgement2 judgement3
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# b 1 0 0
# b 0 1 0
# b 0 0 1
And so on. I have seen that untable is recommended on some other threads, but it does not appear to work with the current version of r. Is there a package that can convert summarised counts into individual observations?
You could try something like this:
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
rep.vec <- colSums(df[colnames(df) %in% paste0("judgement", (1:nrow(df)), sep="")], na.rm = TRUE)
want <- data.frame(name=df$name, cbind(diag(nrow(df))))
colnames(want)[-1] <- paste0("judgement", (1:nrow(df)), sep="")
(want <- want[rep(1:nrow(want), rep.vec), ])
I wrote a function that works to give you your desired output:
untabl <- function(df, id.col, count.cols) {
df[is.na(df)] <- 0 # replace NAs
out <- lapply(count.cols, function(x) { # for each column with counts
z <- df[rep(1:nrow(df), df[,x]), ] # replicate rows
z[, -c(id.col)] <- 0 # set all other columns to zero
z[, x] <- 1 # replace the count values with 1
z
})
out <- do.call(rbind, out) # combine the list
out <- out[order(out[,c(id.col)]),] # reorder (you can change this)
rownames(out) <- NULL # return to simple row numbers
out
}
untabl(df = df, id.col = 1, count.cols = c(2,3,4))
# name judgement1 judgement2 judgement3
#1 a 1 0 0
#2 a 1 0 0
#3 a 1 0 0
#4 a 1 0 0
#5 a 1 0 0
#6 a 0 1 0
#7 b 0 1 0
#8 a 0 0 1
#9 a 0 0 1
#10 b 0 0 1
And for your reference, reshape::untable consists of the following code:
function (df, num)
{
df[rep(1:nrow(df), num), ]
}

Create a vector of counts

I wanted to create a vector of counts if possible.
For example: I have a vector
x <- c(3, 0, 2, 0, 0)
How can I create a frequency vector for all integers between 0 and 3? Ideally I wanted to get a vector like this:
> 3 0 1 1
which gives me the counts of 0, 1, 2, and 3 respectively.
Much appreciated!
You can do
table(factor(x, levels=0:3))
Simply using table(x) is not enough.
Or with tabulate which is faster
tabulate(factor(x, levels = min(x):max(x)))
You can do this using rle (I made this in minutes, so sorry if it's not optimized enough).
x = c(3, 0, 2, 0, 0)
r = rle(x)
f = function(x) sum(r$lengths[r$values == x])
s = sapply(FUN = f, X = as.list(0:3))
data.frame(x = 0:3, freq = s)
#> data.frame(x = 0:3, freq = s)
# x freq
#1 0 3
#2 1 0
#3 2 1
#4 3 1
You can just use table():
a <- table(x)
a
x
#0 2 3
#3 1 1
Then you can subset it:
a[names(a)==0]
#0
#3
Or convert it into a data.frame if you're more comfortable working with that:
u<-as.data.frame(table(x))
u
# x Freq
#1 0 3
#2 2 1
#3 3 1
Edit 1:
For levels:
a<- as.data.frame(table(factor(x, levels=0:3)))

Imputing labels based on a comparison of columns

I don't think this question has been asked on this board before. I have two columns of 1s and 0s in a dataframe. Let's call these columns X and Y, respectively. In a comparison of X and Y for any row, one of four combinations is obviously possible:
A: 1, 0
B: 0, 1
C: 1, 1
D: 0, 0
Imagine the dataframe has m columns total, but we're interested only in X and Y. I'd like to write a function that compares only X and Y and then characterizes the particular combination with the corresponding labels A, B, C, or D in a new column (let's call it Z).
So say the data looks like:
X Y
1 1
0 1
0 0
1 1
The function will ouput:
X Y Z
1 1 C
0 1 B
0 0 D
1 1 C
I imagine this would be trivial but I'm an R newbie. Thanks for any guidance!
We create a key/value combination unique dataset and then merge with the input dataset based on 'X' and 'Y' columns
merge(df1, KeyDat, by = c("X", "Y"), all.x=TRUE)
# X Y Z
#1 0 0 D
#2 0 1 B
#3 1 1 C
#4 1 1 C
Or to get the output in the same order, use left_join
library(dplyr)
left_join(df1, keyDat)
#Joining by: c("X", "Y")
# X Y Z
#1 1 1 C
#2 0 1 B
#3 0 0 D
#4 1 1 C
data
keyDat <- data.frame(X= c(1, 0, 1, 0), Y = c(0, 1, 1,
0), Z = c("A", "B", "C", "D"), stringsAsFactors=FALSE)
df1 <- data.frame(X= c(1, 0, 0, 1), Y=c(1, 1, 0, 1))

Resources