R dataframe column multiplication with sapply

R dataframe column multiplication with sapply - r

I need to multiply columns in R data.frame. I want to do this based on certain patterns in the column names. This is very elementary task, but I struggle to make it work with sapply() or some related function. This is what I've tried thus far.
df <- data.frame("pA" = sample(1:100), "pB" = sample(1:100), "qA" = sample(1:100), "qB" = sample(1:100))
cols <- c("A","B")
multip <- function(df,col){
dfp <- df[which(names(df) %in% paste0("p",col))]
dfq <- df[which(names(df) %in% paste0("q",col))]
dfv <- dfp*dfq
setNames(dfv, paste0("v",col))
}
sapply(df, function(x) multip(x,cols))
I can make it work if I take it apart and forget the function and sapply parts but that would complicate my work. Is there some solution that would make this work?

You can use multip directly on 'df'
multip(df, cols)
Or without using multip
Map('*', df[grep('p', names(df))], df[grep('q', names(df))])
The problem with sapply/lapply call is that we get access to only a single column for each list element and that is not the arguments based on the function multip

Related

How can lapply work with addressing columns as unknown variables?

So, I have a list of strings named control_for. I have a data frame sampleTable with some of the columns named as strings from control_for list. And I have a third object dge_obj (DGElist object) where I want to append those columns. What I wanted to do - use lapply to loop through control_for list, and for each string, find a column in sampleTable with the same name, and then add that column (as a factor) to a DGElist object. For example, for doing it manually with just one string, it looks like this, and it works:
group <- as.factor(sampleTable[,3])
dge_obj$samples$group <- group
And I tried something like this:
lapply(control_for, function(x) {
x <- as.factor(sampleTable[, x])
dge_obj$samples$x <- x
}
Which doesn't work. I guess the problem is that R can't recognize addressing columns like this. Can someone help?

Here are two base R ways of doing it. The data set is the example of help("DGEList") and a mock up data.frame sampleTable.
Define a vector common_vars of the table's names in control_for. Then create the new columns.
library(edgeR)
sampleTable <- data.frame(a = 1:4, b = 5:8, no = letters[21:24])
control_for <- c("a", "b")
common_vars <- intersect(control_for, names(sampleTable))
1. for loop
for(x in common_vars){
y <- sampleTable[[x]]
dge_obj$samples[[x]] <- factor(y)
}
2. *apply loop.
tmp <- sapply(sampleTable[common_vars], factor)
dge_obj$samples <- cbind(dge_obj$samples, tmp)
This code can be rewritten as a one-liner.
Data
set.seed(2021)
y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
dge_obj <- DGEList(counts=y, group=rep(1:2,each=2))

Iteratively adding a row containing characters and numbers to a dataframe

I have a list containing named elements. I am iterating over the list names, performing the computation for each corresponding element, "encapsulating" the results and the name in a vector and finally adding the vector to a table. The row or vector after each iteration contains a mix of characters and numbers.
The first row is getting added but from the second row onwards there is a problem.
In this example, there is supposed to be one column (first) containing alphanumeric names. All rows after the first one contain NAs.
x <- list(a_1=c(1,2,3), b_2=c(3,4,5), c_3=c(5,1,9))
df <- data.frame()
for(name in names(x))
{
tmp <- x[[name]]
m <- mean(tmp)
s <- sum(tmp)
df <- rbind(df, c(name,m,s))
}
df <- as.data.frame(df)
I know there are possibly more efficient ways but for the moment this is more intuitive for me as it is assuring that each computation is associated with a particular name. There can be several columns and rows and the names are extremely helpful to join tables, query, compare etc. They make it easier to trace back results to a particular element in my original list.
Additionally, I would be glad to know other ways in which the element names are always retained while transforming.
Thankyou!

You have to set stringsAsFactors = FALSE in rbind. With stringsAsFactors = TRUE the first iteration in the loop converts the string variables into factors (with the factor levels being the values).
x <- list(a_1=c(1,2,3), b_2=c(3,4,5), c_3=c(5,1,9))
df <- data.frame()
for(name in names(x))
{
tmp <- x[[name]]
m <- mean(tmp)
s <- sum(tmp)
df <- rbind(df, c(name,m,s), stringsAsFactors = FALSE)
}
An easier solution would be to utilize sapply().
x <- list(a_1=c(1,2,3), b_2=c(3,4,5), c_3=c(5,1,9))
df <- data.frame(name = names(x), m = sapply(x, mean), s = sapply(x, sum))

Repeating calculation with different DF

I have around 10 DFs and would like to perform the following calculations on all of them and then have out as 10 new DFs.
I have been able to get this to work for 1 DF, but rather than copying the code and changing the names, 10 times, I wanted to see if there is a way to do this in. Ideally, I end up with 1 DF and 10 different columns, but am happy with anything
The calculations I am trying to do are:
temp <- merge (x=DF1, y=temp1, by = c("name"), all.x= TRUE)
asset_column <-grep("^Assets_", names(DF1))
return_column <-grep("^Return_", names(DF1))
OutputDF <-
stack(colSums(t(t(temp[asset_column])/colSums(temp[asset_column],
na.rm=TRUE)) * US_only[return_column],na.rm =TRUE))
OutputDF['values'] = OutputDF['values']/100

If these are repeatable calculations in a list, loop through the list with lapply and do the same code where we specify the first dataset from the anonymous function call (function(x) x)
out <- lapply(lst1, function(x) {
temp <- merge (x, y=temp1, by = c("name"), all.x= TRUE)
asset_column <-grep("^Assets_", names(x))
return_column <-grep("^Return_", names(x))
OutputDF <-
stack(colSums(t(t(temp[asset_column])/colSums(temp[asset_column],
na.rm=TRUE)) * US_only[return_column],na.rm =TRUE))
OutputDF['values'] = OutputDF['values']/100
OutputDF
})
Here, the output is also a list of data.frames which can be kept in the list as such or extract with [[

How can I make a tibble/tbl_df/data_frame from a vector or vectors

I have a name and a vector
my.name <- 'data.values'
my.vec <- 1:5
and I'd like to make a tibble/tbl_df/data_frame with one column that has my.name as the name of that column and my.vec as the values. What I have is
df <- data_frame(placeholder = rep(NA, length(my.vec)))
df[[my.name]] <- my.vec
df[['placeholder']] <- NULL
Which just feels silly. Is there an easier way to do this?
I am also interested in the case where I have multiple vectors and multiple names, e.g.
my.name1 <- 'data.values.day1'
my.name2 <- 'data.values.day2'
my.vec1 <- 1:5
my.vec2 <- 2:6
...

I think the best answer came in a comment.
DirtySockSniffer recommended:
as_data_frame(setNames(list(my.vec), my.name)))
which generalizes nicely to the multiple column situation
as_data_frame(setNames(list(my.vec1, my.vec2),
c(my.name1, my.name2)))

You can create a data_frame first and then set its column names:
my.data <- data_frame(my.vec.1, my.vec.2, ...)
names(my.data) <- c(my.name.1, my.name.2, ...) # Order is important here

Nested named list to data frame

I have the following named list output from a analysis. The reproducible code is as follows:
list(structure(c(-213.555409754509, -212.033637890131, -212.029474755074,
-211.320398316741, -211.158815833294, -210.470525157849), .Names = c("wasn",
"chappal", "mummyji", "kmph", "flung", "movie")), structure(c(-220.119433774144,
-219.186901747536, -218.743319709963, -218.088361753899, -217.338920075687,
-217.186050877079), .Names = c("crazy", "wired", "skanndtyagi",
"andr", "unveiled", "contraption")))
I want to convert this to a data frame. I have tried unlist to data frame options using reshape2, dplyr and other solutions given for converting a list to a data frame but without much success. The output that I am looking for is something like this:
Col1 Val1 Col2 Val2
1 wasn -213.55 crazy -220.11
2 chappal -212.03 wired -219.18
3 mummyji -212.02 skanndtyagi -218.74
so on and so forth. The actual out put has multiple columns with paired values and runs into many rows. I have tried the following codes already:
do.call(rbind, lapply(df, data.frame, stringsAsFactors = TRUE))
works partially provides all the character values in a column and numeric values in the second.
data.frame(Reduce(rbind, df))
didn't work - provides the names in the first list and numbers from both the lists as tow different rows
colNames <- unique(unlist(lapply(df, names)))
M <- matrix(0, nrow = length(df), ncol = length(colNames),
dimnames = list(names(df), colNames))
matches <- lapply(df, function(x) match(names(x), colNames))
M[cbind(rep(sequence(nrow(M)), sapply(matches, length)),
unlist(matches))] <- unlist(df)
M
didn't work correctly.
Can someone help?

Since the list elements are all of the same length, you should be able to stack them and then combine them by columns.
Try:
do.call(cbind, lapply(myList, stack))

Here's another way:
as.data.frame( c(col = lapply(x, names), val = lapply(x,unname)) )
How it works. lapply returns a list; two lists combined with c make another list; and a list is easily coerced to a data.frame, since the latter is just a list of vectors having the same length.
Better than coercing to a data.frame is just modifying its class, effectively telling the list "you're a data.frame now":
L = c(col = lapply(x, names), val = lapply(x,unname))
library(data.table)
setDF(L)
The result doesn't need to be assigned anywhere with = or <- because L is modified "in place."

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R dataframe column multiplication with sapply - r

You can use multip directly on 'df' multip(df, cols) Or without using multip Map('*', df[grep('p', names(df))], df[grep('q', names(df))]) The problem with sapply/lapply call is that we get access to only a single column for each list element and that is not the arguments based on the function multip

Related

How can lapply work with addressing columns as unknown variables?

Iteratively adding a row containing characters and numbers to a dataframe

Repeating calculation with different DF

How can I make a tibble/tbl_df/data_frame from a vector or vectors

Nested named list to data frame

Categories

Resources