Assign value to a specific rows (R) - r

I have a df of 16k+ items. I want to assign values A, B and C to those items based.
Example:
I have the following df with 10 unique items
df <- c(1:10)
Now I have three separate vectors (A, B, C) that contain row numbers of the df with values A, B or C.
A <- c(3, 9)
B <- c(2, 6, 8)
C <- c(1, 4, 5, 7, 10)
Now I want to add a new category column to the df and assign values A, B and C based on the row numbers that are in the three vectors that I have. For example, I would like to assign value C to rows 1, 4, 5, 7 and 10 of the df.
I tried to experiment with for loops and if statements to match the value of the vector with the row number of the df but I didn't succeed. Can anybody help out?

Here is a way to assign the new column.
Create the data frame and a list of vectors:
df <- data.frame(n=1:10)
dat <- list( A=c(3, 9), B=c(2, 6, 8), C=c(1, 4, 5, 7, 10) )
Put the data in the desired rows:
df$new[unlist(dat)] <- sub("[0-9].*$","",names(unlist(dat)))
Result:
df
n new
1 1 C
2 2 B
3 3 A
4 4 C
5 5 C
6 6 B
7 7 C
8 8 B
9 9 A
10 10 C

You could iterate over the names of a list and assign those names to the positions indexed by the successive sets of numeric values:
dat <- list(A=A,B=B,C=C)
for(i in names(dat)){ df$new[ dat[[i]] ] <- i}

Related

Labelling two separate data frames by row and then merging them into one data frame under the same column but with row labels preserved

I have two data sets, lets call them
A: 1, 3, 5, 6, 3
and
B: 2, 4, 7, 9, 8
Where the numbers in A and B represent time
I want to label all of the numbers in A by row as "positive"
and all numbers in B labelled "Negative" by row.
I then want to merge both A and B into a single data frame under one column called "time", but they must keep their row names "Positive"/"Negative" for their corresponding number so I can plot both onto a survival plot
My guess is that you are looking for such a dataframe:
A <- data.frame(time = c(1, 3, 5, 6, 3), status= "positive")
B <- data.frame(time = c(2, 4, 7, 9, 8), status= "negative")
rbind(A, B)
Output:
time status
1 1 positive
2 3 positive
3 5 positive
4 6 positive
5 3 positive
6 2 negative
7 4 negative
8 7 negative
9 9 negative
10 8 negative

R data.table - multiply two columns whose components are matrices

I have a data.table with some numeric columns and another column each entry of which is a matrix. Here is an example:
dt = data.table(a = c(1,2,3), b = c(-1,4,2))
dt$c = vector("list",3)
for (ind in 1:3){dt$c[[ind]] = round(matrix(10*runif(8), nrow = 4))}
For each row, I want to multiply the numeric vector formed by columns a and b with the corresponding 4 x 2 matrix in c and store the resulting 4 numbers into columns V1, V2, V3 and V4. For instance, for the first row, I would take the 4 x 2 matrix dt$c[[1]], multiply it with the 2 x 1 vector rbind(dt$a[1],dt$b[1]) and assign the resulting 4 numbers into the first row of 4 new columns named V1, V2, V3 and V4.
I am looking for a native data.table way to do this. I am currently looping over all rows and that is prohibitively slow for my actual problem size. Tried a variety of data.table syntaxes but I suspect I am missing something fundamental in the way column c is internally treated as a list and therefore I am unable to get the matrix multiplication to work.
Any help on this would be greatly appreciated.
We can use Map to do corresponding column value multiplication
dt[, paste0("V", 1:4) := do.call(rbind.data.frame,
Map(function(x, y, z) t(z %*% c(x, y)) , a, b, c))]
dt
# a b c V1 V2 V3 V4
#1: 1 -1 1, 7, 4,10, 9, 7, 4, 1,... -8 0 0 9
#2: 2 4 1, 8, 8, 4, 7, 1, 5,10,... 30 20 36 48
#3: 3 2 6, 6, 9, 5, 0, 0,10, 2,... 18 18 47 19

Matching dataframe columns: one int and another is list

Trying to create a column in dataframe df1 based on match in another dataframe df2, where df1 is much bigger than df2:
df1$val2 <- df2$val2[match(df1$id, df2$IDs)]
This doesn't quite work because df2$IDs column is a list:
> df2
IDs val2
1 0 1
2 1, 2 2
3 3, 4 3
4 5, 6 4
5 7, 8 5
6 9, 10 6
7 11, 12, 13, 14 7
It only works for the part where the list has 1 element (row 1: ..$ : int 0 above). For all other rows the 'match(df1$id, df2$IDs)' returns NA.
Test of matching some individual numbers works just fine with double brackets:
2 %in% df2[[2,'IDs']]
So, I either need to modify the column df2$IDs or need to perform match operation differently. The df1 has many other columns, so does the df2, but df2 is much shorter in rows.
The case can be reproduced with the following:
IDs <- c("[0]", "[1, 2]", "[3, 4]", "[5, 6]", "[7, 8]", "[9, 10]", "[11, 12, 13, 14]")
val2 <- c(1,2,3,4,5,6,7)
df2 <- data.frame(IDs, val2)
df2$IDs <- lapply(strsplit(as.character(df2$IDs), ','), function (x) as.integer(gsub("\\s|\\[|\\]", "", x)))
id <- floor(runif(100, min=0, max=15))
df1 <- data.frame(id)
str(df1)
str(df2)
df1$val2 <- df2$val2[match(df1$id, df2$IDs)]
List columns are clumsy to work with. If you convert df2 to a more vanilla format, it works:
DF2 = with(df2, data.frame(ID = unlist(IDs), val2 = rep(val2, lengths(IDs))))
df1$m = DF2$val2[ match(df1$id, DF2$ID) ]
If you want list columns just for browsing, it is quick to do...
aggregate(ID ~ ., DF2, list)
val2 ID
1 1 0
2 2 1, 2
3 3 3, 4
4 4 5, 6
5 5 7, 8
6 6 9, 10
7 7 11, 12, 13, 14
.
Fyi, the match approach will not extend naturally to joining on more columns, so you might want to eventually learn data.table and its "update join" syntax for this case:
library(data.table)
setDT(df1); setDT(df2)
DT2 = df2[, .(ID = unlist(IDs)), by=setdiff(names(df2), "IDs")]
df1[DT2, on=.(id = ID), v := i.val2 ]

Conditional statement in R dataframe

I have dataframe df as below.
dput(df)
structure(list(X = c(1, 2, 5, 7, 8), Y = c(3, 5, 8, 7, 2), Z = c(2,
8, 7, 4, 3), R = c(6, 6, 6, 6, 66)), .Names = c("X", "Y", "Z",
"R"), row.names = c(NA, -5L), class = "data.frame")
df
class(df)
I have to modify df under two conditions.
First:
modify df so that it check minimum between X,Y,Z for each row and whichever is minimum get replaced with corresponding value of R.
Second case:
which is minimum between X,Y,Z,R in each row, it get replaced with maximum between X,Y,Z,and R and create a new df.
How should i get that?
I tried ifelse and if and else but could not get what i want..
Any help would be appreciated.
You can create a new dataset "df1" with first three coumns of "df". Multiply "df1" with "-1" so that maximum values become "min" (assuming that there are no negative values). Here, in the example, the values were all unique per row. So, you can use the function max.col and specify the ties.method='first'. It will get you the index of maximum value (here it will be minimum) per row, cbind it will the 1:nrow(df) to create the "row/column" index and extract the elements of "df1" based on that index (df1[cbind..]) and change those values to "R" column values (<- df$R). You could then change the original "df" columns ("df[1:3]") to new values. If there are more than one "minimum" value per row, you could use the "loop" method described for the second case.
df1 <- df[1:3]
df1[cbind(1:nrow(df),max.col(-1*df1, 'first'))] <- df$R
df[1:3] <- df1
df
# X Y Z R
#1 6 3 2 6
#2 6 5 8 6
#3 6 8 7 6
#4 7 7 6 6
#5 8 66 3 66
Create a copy of "df" (df2), get the max values per row using pmax, loop over the rows of "df2" (sapply(seq_len...)) and change the "minimum" values in each row to corresponding "max" values ("MaxV"), transpose (t) and assign it back to the "df2" (df2[])
df2 <- df
#only use this if there is only a single "minimum" value per row
# and no negative values in the data
#df2[cbind(1:nrow(df), max.col(-1*df2, 'first'))] <-
# do.call(pmax, df2)
MaxV <- do.call(pmax, df2)
df2 [] <- t(sapply(seq_len(nrow(df2)), function(i) {
x <- unlist(df2[i,])
ifelse(x==min(x), MaxV[i], x)}))
df2
# X Y Z R
#1 6 3 6 6
#2 6 8 8 6
#3 8 8 7 8
#4 7 7 7 7
#5 8 66 66 66

R efficient search in data.frame

I want using R to organize the most efficient search a value ​​in tables in the format data.frame like this
x01 x02 x03 x04 x05 x06 x07
1 NA 100 200 300 400 500 600
2 10 1 4 3 6 7 1
3 20 2 5 2 5 8 2
4 30 3 6 1 4 9 8
Values ​​in the first row and first column in order of increasing. For example, I need to find value to the crosshairs of a column containing 300 in the first row and the row containing 20 in the first column. The value 2. Code for this:
coefficient_table_1 <- data.frame(
x01=c(NA, 10, 20, 30),
x02=c(100, 1, 2, 3),
x03=c(200, 4, 5, 6),
x04=c(300, 3, 2, 1),
x05=c(400, 6, 5, 4),
x06=c(500, 7, 8, 9),
x07=c(600, 1, 2, 8)
)
col_value <- 300
row_value <- 20
col <- 0
for(i in 2:ncol(coefficient_table_1)){
if(coefficient_table_1[1,i]==col_value ){
col <- i
}
}
row <- which(coefficient_table_1$x01==row_value)
value <- coefficient_table_1[row, col]
Table can be large and the search can be arranged inside the loop. What is the most effective way to search in data.frame?
Your data is all numeric, so your best course of action is probably to use arrays, rather than data frames.
Since arrays contain data of only a single class (e.g. numeric), many operations are much faster when your data is in array format.
Try this:
x <- as.matrix(coefficient_table_1)
x[which(x[, 1]==row_value), which(x[1, ]==col_value)]
x04
2

Resources