Fast way to and rows to dataframe based on number of column

Fast way to and rows to dataframe based on number of column - r

I have the following data frame:
df1 = structure(c(3, 5, 8, 6), .Dim = c(2L, 2L))
I would like to add 3 rows:
(0,0)
(5,0)
(0,8)
i.e. for each column, I add a row that is the max of this column, the rest are zeros, and an all zero line. Any fast way to do this?

One option is colMaxs from matrixStats, get the diag and rbind with the original matrix along with a 0 padded row
library(matrixStats)
rbind(df1, 0, diag(colMaxs(df1)))
If it is a data.frame (based on the title)
library(dplyr)
df2 %>%
summarise_all(max) %>%
diag %>%
rbind(df2, 0, .)
data
df2 <- as.data.frame(df1)

Related

Unable to apply functions to column names in dplyr

I have a dataframe df and I wish to create a new column b that is the smaller value of column a and 10 - a. When there is NA, I wish column b also returnsNA in the corresponding rows. So column b should be c(1, 3, 1, NA). I tried the following code but all rows of b are 1. I wish to find a solution in tidyverse.
library(tidyverse)
df <- data.frame(a = c(1, 3, 9, NA))
df2 <- df %>% mutate(b = min(a, 10 - a, na.rm = T))
I guess the issue arises becuase of applying the min function, which is complicated by the presence of NA. But I cannot figure out how to solve the issue.

how to replace data frame rows containing column names with column labels in R

consider my labelled df1 below
This is my second dataframe df2
I want to change item column in df2 such that if its rows contains any names of df1, that string is replaced by the column label like below
any approach to achieve this is highly appreciated.
library(Hmisc)
library(dplyr)
df1 <- data.frame(low = rep(1,3),
med = rep(2,3),
high = rep(3,3),
other = rep(0,3))
label(df1$low) <- "is it low"
label(df1$med) <- "is it med"
label(df1$high) <- "is it high"
label(df1$other) <- "is it broken"
df2 <- data.frame(item = c("lowYes", "medNo", "high"),
value = c(12, 10, 14))
df3 <- data.frame(item = c("is it low:No", "is it med:Yes", "is it high"),
value = c(12, 10, 14))
library(stringr)
df2$item <- str_replace(df2$item, grep(df2$item, names(df1)), label(df1)) # not for all rows

Extract the label from the 'df1' and create a named vector (unlist), then use the named vector in str_replace_all for modifying the 'item' column by matching the key value with the substring in 'item' column
library(dplyr)
library(stringr)
library(Hmisc)
keyval <- df1 %>%
summarise(across(everything(), ~ str_c(label(.x), ":"))) %>%
unlist
df3 <- df2 %>%
mutate(item = trimws(str_replace_all(item, keyval), whitespace = ":"))
-output
df3
item value
1 is it low:Yes 12
2 is it med:No 10
3 is it high 14

Conditionally applying factor values from one dataframe to another

I have the following two data frames:
letters <- LETTERS[seq(from = 1, to = 5)]
values <- rnorm(5, mean = 50)
df1 <- data.frame(letters, values)
category <- sample(LETTERS[1:5], 20, replace = TRUE)
numbers <- rnorm(20, mean = 100)
df2 <- data.frame(category, numbers)
I want to create a new column in df2 that takes the value in df2$numbers and subtracts the value in df1$values based on the matching letter.
In other words, if the value for "C" in df1 is 49.2, I want to subtract 49.2 from every row in df2$numbers where df$category equals "C". Hope that makes sense. Thanks for the help!

With dplyr:
df <- full_join(df1, df2, by = c('letters' = 'category')) %>%
mutate(diff = numbers - values)

Return name of column containing max value, from only certain selected columns in a data.frame

I would like to obtain (in an new column in the data.table) the column name of the column that contains the maximum value in only a few columns in a data.frame.
Here is an example data.frame
# creating the vectors then the data frame ------
id = c("a", "b", "c", "d")
ignore = c(1000,1000, 1000, 1000)
s1 = c(0,0,0,100)
s2 = c(100,0,0,0)
s3 = c(0,0,50,0)
s4 = c(50,0,50,0)
df1 <- data.frame(id,ignore,s1,s2,s3,s4)
(1) now I want to find the column name of the maximum number in each row, from the columns s1-s4. (i.e. ignore the column called "ignore")
(2) If there is a tie for the maximum, I would like the last (e.g. s4) column name returned.
(3) as an extra favour - if all are 0, I would ideally like NA returned
here is my best attempt so far
df2 <- cbind(df1,do.call(rbind,apply(df1,1,function(x) {data.frame(max.col.name=names(df1)[which.max(x)],stringsAsFactors=FALSE)})))
this returns ignore in each case, and (except for row b) works if I remove this column, and reorder the s1-s4 columns as s4-s1.
How would you approach this?
Many thanks indeed.

We use grep to create a column index for columns that start with 's' followed by numbers ('i1'). To get the row index of the subset dataset ('df1[i1]') that has the maximum value, we can use max.col with the option ties.method='last'. To convert the rows that have only 0 values to NA, we get the rowSums, check if that is 0 (==0) and convert those to NA (NA^) and multiply with max.col output. This can be used to extract the column names of subset dataset.
i1 <- grep('^s\\d+', names(df1))
names(df1)[i1][max.col(df1[i1], 'last')*NA^(rowSums(df1[i1])==0)]
#[1] "s2" NA "s4" "s1"

library(dplyr)
library(tidyr)
df1 = data_frame(
id = c("a", "b", "c", "d")
ignore = c(1000,1000, 1000, 1000)
s1 = c(0,0,0,100)
s2 = c(100,0,0,0)
s3 = c(0,0,50,0)
s4 = c(50,0,50,0))
result =
df1 %>%
gather(variable, value, -id, -ignore) %>%
group_by(id) %>%
slice(value %>%
{. == max(.)} %>%
which %>%
last) %>%
ungroup %>%
mutate(variable_fix = ifelse(value == 0,
NA,
variable))

Combining frequency tables in R

I have a vector containing the frequencies of molecules within their respective molecular class for all molecules measured. I also have a vector that contains the per class frequency of significant molecules identified by variable selection. How can I merge these 2 vectors into a data frame and fill in empty frequencies with 0's (in R)?
Here is a workable example:
full = rep(letters[1:4], 4:7)
fullTable = table(full)
sub = rep(letters[1:2], c(2, 4))
subTable = table(sub)
I would like the table to look like:
print(data.frame(Letter=letters[1:4], fullFreq=c(4, 5, 6, 7), subFreq=c(2, 4, 0, 0)))

Try this (I supposed you meant subTable=table(sub) in your last line):
res<-merge(as.data.frame(fullTable),as.data.frame(subTable),by.x=1,by.y=1,all=TRUE)
colnames(res)<-c("Letter","fullFreq","subFreq")
res[is.na(res)]<-0

With the library dplyr
library(dplyr)
full=rep(letters[1:4], 4:7)
sub=rep(letters[1:2], c(2,4))
df <- data.frame(Letter=unique(c(full, sub)))
df <- df %>%
left_join(as.data.frame(table(full)), by=c("Letter"="full")) %>%
left_join(as.data.frame(table(sub)), by=c("Letter"="sub"))
df[is.na(df)] <- 0
df

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Fast way to and rows to dataframe based on number of column - r

I have the following data frame: df1 = structure(c(3, 5, 8, 6), .Dim = c(2L, 2L)) I would like to add 3 rows: (0,0) (5,0) (0,8) i.e. for each column, I add a row that is the max of this column, the rest are zeros, and an all zero line. Any fast way to do this?

Related

Unable to apply functions to column names in dplyr

how to replace data frame rows containing column names with column labels in R

Conditionally applying factor values from one dataframe to another

Return name of column containing max value, from only certain selected columns in a data.frame

Combining frequency tables in R

Categories

Resources