i have the following issue:
In my data frame (89 columns) I have 4 of them which have the values in a negative way as you can see in the following image
![1]: https://i.stack.imgur.com/ZFF0U.png
So I would like to know how I could mutate that specific columns of my data frame in order to make the values of them positive (absolute value).
Many thanks
Here's one option:
library(dplyr)
your_data %>%
mutate(across(c("DAYS_BIRTH", "DAYS_EMPLOYED", "DAYS_REGISTRATION", "DAYS_ID_PUBLISH"), abs))
Depending on which columns you want to mutate and which you want to leave, you might be able to use a simpler select helper, like mutate(across(starts_with("DAYS"), abs)), for example.
A general solution:
library(dplyr)
data %>% mutate_if(function(x) all(x<0), function(x) abs(x))
I have a sequence of numeric labels for records that can be shared by a variable number of records per label (labelsequence). I also have the records themselves, but unfortunately for some of the sequence values, all records have been lost (dataframe df). I need to identify when a numeric label from labelsequence does not appear in the label column of df, copy all records within df that are associated with the closest label value that is less than the missing value, and append these to a newly filled-in dataframe, say df2.
I am trying to accomplish this in R (a dplyr answer would be ideal), and have looked at answers to questions regarding filling in missing rows, such as Fill in missing rows in R and fill missing rows in a dataframe, and have a working solution below, was wondering if anyone has a better way of doing this.
Take , for instance, this example data:
labelsequence<-data.frame(label=c(1,2,3,4,5,6))
and
df<-data.frame(label=c(1,1,1,1,3,3,4,4,4),
place=c('vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','lamb',
'cougar','donkey','lamb','wolf'))
with desired result...
desired_df2<-data.frame(label=c(1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6),
place=c('vermont','kentucky',
'wisconsin','wyoming','vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky','utah',
'georgia','kentucky','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','wolf',
'wolf','cougar','cougar','lamb','cougar',
'donkey','lamb','wolf','donkey','lamb','wolf',
'donkey','lamb','wolf'))
Is there a better (be it effiency of code, flexibility, or resource efficiency) way than the following?
df2<- df %>%
full_join(expand.grid(label=unique(df$label),newlabel=labelsequence$label)) %>%
mutate(missing = ifelse(newlabel %in% label,0,1))%>%
filter(label<newlabel)%>%
group_by(newlabel) %>%
filter(label==max(label) & missing ==1) %>%
ungroup()%>%
mutate(label=newlabel,missing=NULL,newlabel=NULL) %>%
bind_rows(df) %>%
arrange(label)
This may well have an answer elsewhere but I'm having trouble formulating the words of the question to find what I need.
I have two dataframes, A and B, with A having many more rows than B. I want to look up a value from B based on a column of A, and add it to another column of A. Something like:
A$ColumnToAdd + B[ColumnToMatch == A$ColumnToMatch,]$ColumnToAdd
But I get, with a load of NAs:
Warning in `==.default`: longer object length is not a multiple of shorter object length
I could do it with a messy for-loop but I'm looking for something faster & elegant.
Thanks
If I understood your question correctly, you're looking for a merge or a join, as suggested in the comments.
Here's a simple example for both using dummy data that should fit what you described.
library(tidyverse)
# Some dummy data
ColumnToAdd <- c(1,1,1,1,1,1,1,1)
ColumnToMatch <- c('a','b','b','b','c','a','c','d')
A <- data.frame(ColumnToAdd, ColumnToMatch)
ColumnToAdd <- c(1,2,3,4)
ColumnToMatch <- c('a','b','c','d')
B <- data.frame(ColumnToAdd, ColumnToMatch)
# Example using merge
A %>%
merge(B, by = c("ColumnToMatch")) %>%
mutate(sum = ColumnToAdd.x + ColumnToAdd.y)
# Example using join
A %>%
inner_join(B, by = c("ColumnToMatch")) %>%
mutate(sum = ColumnToAdd.x + ColumnToAdd.y)
The advantages of the dplyr versions over merge are:
rows are kept in existing order
much faster
tells you what keys you're merging by (if you don't supply)
also work with database tables.
Comming from SQL i would expect i was able to do something like the following in dplyr, is this possible?
# R
tbl %>% mutate(n = dense_rank(Name, Email))
-- SQL
SELECT Name, Email, DENSE_RANK() OVER (ORDER BY Name, Email) AS n FROM tbl
Also is there an equivilant for PARTITION BY?
I did struggle with this problem and here is my solution:
In case you can't find any function which supports ordering by multiple variables, I suggest that you concatenate them by their priority level from left to right using paste().
Below is the code sample:
tbl %>%
mutate(n = dense_rank(paste(Name, Email))) %>%
arrange(Name, Email) %>%
view()
Moreover, I guess group_by is the equivalent for PARTITION BY in SQL.
The shortfall for this solution is that you can only order by 2 (or more) variables which have the same direction. In the case that you need to order by multiple columns which have different direction, saying that 1 asc and 1 desc, I suggest you to try this:
Calculate rank with ties based on more than one variable
I have two tibbles: the first one with more than one row and second one, with exactly one row.
I want to col bind them, and, for this purpose, I want the second one to have the same number of rows as the first.
I can do this operation with this trick:
for (i in colnames(df2)) {
df1[[i]] <- df2[1,i]
}
However, this sounds like a workaround to me. Is there a "tidier" way of doing this (I mean, with tidyverse)?
You can just go for cbind(df1,df2), it will expand the shortest data.frame to match the number of rows of the longest
If you want to use dplyr, you would want a cross join... but dplyr has no cross join yet.
You can create a dummy column on both tables, and inner_join on it:
df1 %>%
mutate(dummy_id=1) %>%
inner_join(df2 %>% mutate(dummy_id=1)) %>%
mutate(dummy_id=NULL)