Creating a dataframe based on the matching of 2 lists - r

I have two lists (one or more of the columns of these list will be a list as well) and I am matching values between each only where the column names match. Then based on matching, I would like the values either TRUE or FALSE outputted to a DF. I was using a loop in my code and ignoring the fist column which is an ID column. I provided some code to reproduce the example and desired output, but my script is more specific to my data so you can get a better idea of what I'm trying to accomplish.
List1 <- list(ID = 123, L1 = 89, L2 = 2, L3 = c(2.5, 3.5, 4))
List2 <- list(income = c(1,2,3,4,5), L1 = c(0,0,0,89,0), L10 = c(3,3,3,3,3),
L2 = c(2,55,55,55,55), L3 = c(2.5, 8, 8, 4, 8))
Desired results:
DF1
income L1 L10 L2 L3
1 0 0 1 1
2 0 0 0 0
3 0 0 0 0
4 1 0 0 1
5 0 0 0 0
My code:
NBPmatrix <- function(x){
nbp <- matrix(0, nrow = nrow(black), ncol = ncol(black))
colnames(nbp) <- names(black)
index <- nbp[, match(colnames(x), colnames(nbp), nomatch = 0)]
for(i in 2:length(nbp))
nbp[index] <- grepl(black[[i]], x[i], ignore.case = TRUE)
return(nbp)
}

Related

How to loop a function over all elements of a vector except one and store the result in separate columns of a data frame

I have a data frame with several columns. I want to run a function [pmax() in this case] over all columns whose name is stored in a vector except one, and store the result in new separate columns. At the end, I would also like to store the names of all new columns in a separate vector. A minimal example would be:
Name <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
C1 <- c(1, 0, 1, 1, 0)
C2 <- c(0, 1, 1, 1, 0)
C3 <- c(0, 1, 0, 0, 0)
C4 <- c(1, 1, 0, 1, 0)
Data <- data.frame(Name, C1, C2, C3, C4)
var.min <- function(data, col.names){
new.df <- data
# This is how I would do it outside a function and without loop:
new.df$max.def.col.exc.1 <- pmax(new.df$C2, new.df$C3)
new.df$max.def.col.exc.2 <- pmax(new.df$C1, new.df$C3)
new.df$max.def.col.exc.3 <- pmax(new.df$C1, new.df$C2)
new.columns <- c("max.def.col.exc.1", "max.def.col.exc.2", "max.def.col.exc.3")
return(new.df)
}
new.df <- var.min(Data,
col.names= c("C1", "C2", "C3"))
The result should look like:
Name C1 C2 C3 C4 max.def.col.exc.1 max.def.col.exc.2 max.def.col.exc.3
1 Case 1 1 0 0 1 0 1 1
2 Case 2 0 1 1 1 1 1 1
3 Case 3 1 1 0 0 1 1 1
4 Case 4 1 1 0 1 1 1 1
5 Case 5 0 0 0 0 0 0 0
Anyone with an idea? Many thanks in advance!
Here is a base R solution with combn. It gets all pairwise combinations of the column names and calls a function computing pmax.
Note that the order of the expected output columns is the same as the one output by the code below. If the columns vector is c("C1", "C2", "C3"), the order will be different.
Note also that the function is now a one-liner and accepts combinations of any number of columns, 2, 3 or more.
var.min <- function(cols, data) Reduce(pmax, data[cols])
cols <- c("C3", "C2", "C1")
combn(cols, 2, var.min, data = Data)
# [,1] [,2] [,3]
#[1,] 0 1 1
#[2,] 1 1 1
#[3,] 1 1 1
#[4,] 1 1 1
#[5,] 0 0 0
Now it's just a matter of assigning column names and cbinding with the input data.
tmp <- combn(cols, 2, var.min, data = Data)
colnames(tmp) <- paste0("max.def.col.exc.", seq_along(cols))
Data <- cbind(Data, tmp)
rm(tmp) # final clean-up

Create a boolean column based on data in another column in R

I have a data set and would like to do two things:
Set certain row values in Col A to 0 based on values in Col B
Create a new column with values of either 0 or 1 based on the edited values in Col A
My current approach is shown below - the issue is I occasionally get an error:
Error in `[<-.data.frame`(`*tmp*`, "OCS_dose", value = 0) :
replacement has 1 row, data has 0
As the numbers that I am generating are randomly selected and on certain trials there are no rows to update in Col A based on the numbers in Col B.
Here is an example of my code that causes the error:
pbo_IFNlow_data[pbo_IFNlow_data$OCS_status == 0,]['OCS_dose'] <- 0
OCS_status is either a 0 or 1 that is generated using:
pbo_OCS_status_low <- sample(c(0,1), replace = TRUE,
size = pbo_n_IFNlow, prob=c(1-.863, 0.863))
Therefore on occasion, I have no 0's... In my mind R should then just not try to update anything.
Is there a better way to do what I am trying to do?
Here is a more complete segment of my code:
pbo_OCS_status_low <- sample(c(0,1), replace = TRUE, size = pbo_n_IFNlow, prob=c(1-.863, 0.863)) #on OCS = 1
#OCS dose
pbo_OCS_dose_low <- rtruncnorm(pbo_n_IFNlow, a=0, b=Inf, mean=12.8, sd=8.1)
#IFN boolean flag
pbo_IFN_low <- rep(0, pbo_n_IFNlow)
#SLEDAI score
pbo_SLEDAI_low <- rtruncnorm(pbo_n_IFNlow, a=0, b=Inf, mean=11.1, sd=4.4)
#Response criteria met for SRI score reduction
pbo_SRI_low <- sample(c(0,1), replace = TRUE, size = pbo_n_IFNlow, prob=c(1-0.423, 0.423))
pbo_IFNlow_data <- cbind(IFN_status=pbo_IFN_low,
OCS_status=pbo_OCS_status_low,
OCS_dose=pbo_OCS_dose_low,
SLEDAI=pbo_SLEDAI_low,
SRI_response=pbo_SRI_low)
pbo_IFNlow_data <- data.frame(pbo_IFNlow_data)
#set those off OCS to 0
pbo_IFNlow_data[pbo_IFNlow_data$OCS_status == 0,]['OCS_dose'] <- 0
#stratifcation factor for OCS dosage
pbo_IFNlow_data$OCS_lessthan10 <- "temp"
pbo_IFNlow_data[pbo_IFNlow_data$OCS_dose < 10, ]['OCS_lessthan10'] <- 1
pbo_IFNlow_data[pbo_IFNlow_data$OCS_dose >= 10, ]['OCS_lessthan10'] <- 0
#stratification factor for SLE score
pbo_IFNlow_data$SLE_lessthan10 <- "temp"
pbo_IFNlow_data[pbo_IFNlow_data$SLEDAI < 10, ]['SLE_lessthan10'] <- 1
pbo_IFNlow_data[pbo_IFNlow_data$SLEDAI >= 10, ]['SLE_lessthan10'] <- 0
It would be easier if we can have a minimal reproducible example. If I understand your question correctly, you may want to try ifelse statement in R?
df <- data.frame(colA = seq(1, 10), colB = seq(11, 20))
# Set certain row values in Col A to 0 based on values in Col B
df$colA <- ifelse(df$colB > 15, 0, df$colB)
# Create a new column with values of either 0
# or 1 based on the edited values in Col A
df$colC <- ifelse(df$colA == 0, 1, 0)
print(df)
## colA colB colC
## 1 11 11 0
## 2 12 12 0
## 3 13 13 0
## 4 14 14 0
## 5 15 15 0
## 6 0 16 1
## 7 0 17 1
## 8 0 18 1
## 9 0 19 1
## 10 0 20 1

Transform categorical attribute vector into similarity matrix

I need to transfrom a categorical attribute vector into a "same attribute matrix" using R.
For example I have a vector which reports gender of N people (male = 1, female = 0). I need to convert this vector into a NxN matrix named A (with people names on rows and columns), where each cell Aij has the value of 1 if two persons (i and j) have the same gender and 0 otherwise.
Here is an example with 3 persons, first male, second female, third male, which produce this vector:
c(1, 0, 1)
I want to transform it into this matrix:
A = matrix( c(1, 0, 1, 0, 1, 0, 1, 0, 1), nrow=3, ncol=3, byrow = TRUE)
Like lmo said in acomment it's impossible to know the structure of your dataset so what follows is just an example for you to see how it could be done.
First, make up some data.
set.seed(3488) # make the results reproducible
x <- LETTERS[1:5]
y <- sample(0:1, 5, TRUE)
df <- data.frame(x, y)
Now tabulate it according to your needs
A <- outer(df$y, df$y, function(a, b) as.integer(a == b))
dimnames(A) <- list(df$x, df$x)
A
# A B C D E
#A 1 1 1 0 0
#B 1 1 1 0 0
#C 1 1 1 0 0
#D 0 0 0 1 1
#E 0 0 0 1 1

Converting counts to individual observations in r

I have a data set that looks as follows
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
I want to reshape the dataframe to look like this
# name judgement1 judgement2 judgement3
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# a 1 0 0
# b 1 0 0
# b 0 1 0
# b 0 0 1
And so on. I have seen that untable is recommended on some other threads, but it does not appear to work with the current version of r. Is there a package that can convert summarised counts into individual observations?
You could try something like this:
df <- data.frame( name = c("a", "b", "c"),
judgement1= c(5, 0, NA),
judgement2= c(1, 1, NA),
judgement3= c(2, 1, NA))
rep.vec <- colSums(df[colnames(df) %in% paste0("judgement", (1:nrow(df)), sep="")], na.rm = TRUE)
want <- data.frame(name=df$name, cbind(diag(nrow(df))))
colnames(want)[-1] <- paste0("judgement", (1:nrow(df)), sep="")
(want <- want[rep(1:nrow(want), rep.vec), ])
I wrote a function that works to give you your desired output:
untabl <- function(df, id.col, count.cols) {
df[is.na(df)] <- 0 # replace NAs
out <- lapply(count.cols, function(x) { # for each column with counts
z <- df[rep(1:nrow(df), df[,x]), ] # replicate rows
z[, -c(id.col)] <- 0 # set all other columns to zero
z[, x] <- 1 # replace the count values with 1
z
})
out <- do.call(rbind, out) # combine the list
out <- out[order(out[,c(id.col)]),] # reorder (you can change this)
rownames(out) <- NULL # return to simple row numbers
out
}
untabl(df = df, id.col = 1, count.cols = c(2,3,4))
# name judgement1 judgement2 judgement3
#1 a 1 0 0
#2 a 1 0 0
#3 a 1 0 0
#4 a 1 0 0
#5 a 1 0 0
#6 a 0 1 0
#7 b 0 1 0
#8 a 0 0 1
#9 a 0 0 1
#10 b 0 0 1
And for your reference, reshape::untable consists of the following code:
function (df, num)
{
df[rep(1:nrow(df), num), ]
}

R: print certain values of a matrix to a csv-file

I have a matrix with 1 and 0 in it. Now I want to create a csv-file with the following syntax where only the values=1 were printed:
j1.i1, 1
j1.i2, 1
j2.i2, 1
...
j1 should be the name of the row 1
i1 should be the name of column 1
and so on...
Edit:
M1 = matrix(c(1, 0, 1, 0, 1, 0), nrow=2, ncol=3, byrow = TRUE)
row.names(M1) <- c(100, 101)
colnames(M1) <- c("A", "B", "C")
M1
A B C
100 1 0 1
101 0 1 0
If we take this easy example the solution i'm looking for is:
100.A, 1
100.C, 1
101.B, 1

Resources