How to replace all values in selected columns with dplyr - r

I was trying to figure out a way to transform all values in selected columns of my dataset using an equation $$x_i = x_{max} - x_i$$ using dplyr. I'm not sure how to correctly do this for one column, let alone multiple columns. My attempt at mutating 1 column:
df1 <- df %>% mutate(column1 = replace(column1, ., x = max(column1) - x)
My x = max(column1) - x part is not literal, I just want to know how I can implement that equation into all row entries in the column. Furthermore, how can I do this for multiple columns in the same line? Any help is appreciated. Thanks!

If it is to replace all values across multiple columns, loop across the numeric columns and subtract the values from its max value for that column
library(dplyr)
df <- df %>%
mutate(across(where(is.numeric), ~ max(., na.rm = TRUE) - .))

Related

Replace values conditionally over multiple columns with mutate in R

I have a data frame with 56 columns. I need to replace values with NA in 50 columns conditional on the value of the column "positive == 0".
I have figured out how to replace values in a single column conditionally with mutate. I have spent quite some time now to figure out how to apply this mutate function to other columns without having to reproduce the command 50 times - to no avail.
There has to be a simple solution. ;) Can somebody help?
This is the structure of the data:
df <- data.frame(positive = c(1,1,0,0,1),
a = c("x","x","x","y","y"),
b = c("z","z","z","y","z"),
d = c(1,2,3,4,5))
This is my solution to replace values in one column.
df <- df %>%
mutate(a = replace(a, positive == 0, NA))
df %>% mutate(across(c(-positive,-d), ~replace(.,positive == 0, NA)))
[Previous incorrect solution—does replace values with NA, but converts other values to 1. May still be useful to show different ways of selecting columns in the across() call.]
This seems to work:
df %>% mutate(across(c(-positive,-d), ~na_if(positive, 0)))
Here I've assumed you want to mutate all columns except 'positive' and 'd', but if you'd rather specify the columns you could say:
df %>% mutate(across(c('a', 'b'), ~na_if(positive, 0)))
Or, as in the dplyr documentation for the na_if function, you could use a characteristic of the columns you want to specify them:
df %>% mutate(across(where(is.character), ~na_if(positive, 0)))

R new column (variable) that rowSums across lists with NULL values

I have a data.frame that looks like this:
UID<-c(rep(1:25, 2), rep(26:50, 2))
Group<-c(rep(5, 25), rep(20, 25), rep(-18, 25), rep(-80, 25))
Value<-sample(100:5000, 100, replace=TRUE)
df<-data.frame(UID, Group, Value)
But I need the values separated into new rows so I run this:
df<-pivot_wider(df, names_from = Group,
values_from = Value,
values_fill = list(Value = 0))
Which introduces NULL into the dataset. Sorry, could not figure out a way to get an example dataset with NULL values. Note: this is now a tbl_df tbl data.frame
These aren't great variable names so I run this:
colnames(df)[which(names(df) == "20")] <- "pos20"
colnames(df)[which(names(df) == "5")] <- "pos5"
colnames(df)[which(names(df) == "-18")] <- "neg18"
colnames(df)[which(names(df) == "-80")] <- "neg80"
What I want to be able to do is create a new column (variable) that rowSums across columns. So I run this:
df<-df%>%
replace(is.na(.), 0) %>%
mutate(rowTot = rowSums(.[2:5]))
Which of course works on the example dataset but not on the one with NULL values. I have tried converting NULL to NA using df[df== "NULL"] <- NA but the values do not change. I have tried converting the lists to numeric using as.numeric(as.character(unlist(df[[2]]))) but I get an error telling me I have unequal number of rows, which I guess would be expected.
I realize there might be a better process to get my desired end result, so any suggestions to any of this is most appreciated.
EDIT: Here is a link to the actual dataset which will introduce Null values after using pivot_wider. https://drive.google.com/file/d/1YGh-Vjmpmpo8_sFAtGedxzfCiTpYnKZ3/view?usp=sharing
Difficult to answer with confidence without an actual reproducible example where the error occurs but I am going to take a guess.
I think your pivot_wider steps produces list columns (meaning some values are vectors) and that is why you are getting NULL values. Create a unique row for each Group and then use pivot_wider. Also rowSums has na.rm parameter so you don't need replace.
library(dplyr)
df %>%
group_by(temp) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from = temp, values_from = numseeds) %>%
mutate(rowTot = rowSums(.[3:6], na.rm = TRUE))
Please change the column numbers according to your data in rowSums if needed.

R- Populating a dataframe based on another one with conditions

im new on R and i have a data set of 22x252, the 252 have many repeated values on column 1(ID). I made another dataset that has nrows of the unique values (with those values already populated), and i want to populate the rest of the columns based on the other dataset (basically summing all the values that share the same value in column 1.)
Is there a basic function that enables me to do this?
Thanks & Regards
We can use aggregate in base R. Assuming the column name of first column is 'ID' and all other columns are numeric class, we group by 'ID' and get the sum of the rest of the columns in aggregate
aggregate(.~ ID, df1, sum, na.rm = TRUE)
Or with dplyr
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise_at(vars(-group_cols()), sum, na.rm = TRUE)
Or with new version with across
df1 %>%
group_by(ID) %>%
summarise(across(-group_cols(), sum, na.rm = TRUE))

Normalize specified columns in dplyr by value in first row

I have a data frame with four rows, 23 numeric columns and one text column. I'm trying to normalize all the numeric columns by subtracting the value in the first row.
I've tried getting it to work with mutate_at, but I couldn't figure out a good way to get it to work.
I got it to work by converting to a matrix and converting back to a tibble:
## First, did some preprocessing to get out the group I want
totalNKFoldChange <- filter(signalingFrame,
Population == "Total NK") %>% ungroup
totalNKFoldChange_mat <- select(totalNKFoldChange, signalingCols) %>%
as.matrix()
normedNKFoldChange <- sweep(totalNKFoldChange_mat,
2, totalNKFoldChange_mat[1,])
normedNKFoldChange %<>% cbind(Timepoint =
levels(totalNKFoldChange$Timepoint)) %>%
as.tibble %>%
mutate(Timepoint = factor(Timepoint,
levels = levels(totalNKFoldChange$Timepoint)))
I'm so certain there's a nicer way to do it that would be fully dplyr native. Anyone have tips? Thank you!!
If we want to normalize all the numeric columns by subtracting the value in the first row, use mutate_if
library(dplyr)
df1 %>%
mutate_if(is.numeric, list(~ .- first(.)))

How to make a weighted mean inside a summarise_if

I have a dataframe containing a line per company, with different variables (some numeric, others not):
data <- data.frame(id=1:5,
CA = c(1200,1500,1550,200,0),
EBE = c(800,50,654,8555,0),
VA = c(6984,6588,633,355,84),
FBCF = c(35,358,358,1331,86),
name=c("qsdf","xdwfq","qsdf","sqdf","qsdfaz"),
weight = c(1, 5, 10,1 ,1))
I would like to summarise all numeric variables by a weighted sum. If I wanted a simple sum I would do:
data %>% summarise_if(is.numeric,sum)
but I don't see how to define a weighted sum.
I tried:
w.sum <- function(x) {sum(x*weight) %>% return()}
but without any success.
We can use it inside the funs
data %>%
summarise_if(is.numeric, funs(sum(.*weight)))
Note that the above is based on the condition that if the columns are numeric class. Based on the example the 'id' column is numeric, which may not need the summariseation. A better option would be summarise_at to specify the columns of interest
data %>%
summarise_at(names(.)[2:5], funs(sum(.*weight)))

Resources