R : how copy (last) column name to new column as values? - r

In short, I need to create new column with timestamps, taking from another column name
So I have already this command to select below columns from dataset : Lat, Long_, last_col()
I use last(col) because column name (date) is changing
data_new <- data %>%
select(Lat, Long_, last_col() )
Results:
"Lat","Long_","5/26/20"
-14.271,-170.132,44
13.4443,144.7937,167
My goal is to achieve below results:
"Lat","Long_","date","Value"
-14.271,-170.132,"5/26/20",44
13.4443,144.7937,"5/26/20",167
Any idea please ?

We can use mutate
library(dplyr)
data_new %>%
mutate(date = names(.)[3]) %>%
rename(Value = `5/26/20`)
If there are more rows, then the bug free approach is pivot_longer
library(tidyr)
pivot_longer(data_new, cols = -c(Lat:Long_), names_to = 'date')

Related

Want to mutate values of one category with duplicate values from another category in R

I have a Weather dataset with categorical values in a column named "City". In that, a city "Ahmedabad" is missing values entirely for the year 2019. Upon searching I found that "Hyderabad" double the number of values for the same year. I would like to extractpic of duplicate entries one set of values from "Hyderabad" and place that in "Ahmedabad"
You could try this approach, using tidyverse and lubridate
library(tidyverse)
library(lubridate)
dt = dt %>% arrange(City, Date)
bind_rows(
dt %>% filter(City!="Hyderabad" | year(Date)!=2019),
dt %>% filter(City=="Hyderabad" & year(Date)==2019) %>%
mutate(City=if_else(row_number()%%2==0,"Ahmedabad", City))
)
Or, using data.table
library(data.table)
dt = setDT(dt)[order(City,Date)]
rbind(
dt[City!="Hyderabad" | year(Date)!=2019],
dt[City=="Hyderabad" & year(Date)==2019][seq(1,.N,2), City:="Ahmedabad"]
)

mutate the new data frame if email and unique ID is duplicate

I have a sample data frame and I want to check if the values are duplicate and mutate new columns as 1,0 for duplicate. I am trying like below but this isn't working for me.
df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
email = c("akash.dev#abcd.com","rahul.singh#abcd.com","salman.abbas#abcd.com","ram.lal#abcd.com","ram.lal#xyz.com","prabal.garg#xyz.com","sanu.ali#abcd.com","kunal.singh#abcd.com","lakhan.tomar#abcd.com","praveen.thakur#abcd.com","sarman.ali#abcd.com","zuber.khan#dkl.com","giriraj.singh#dkl.com","lokesh.sharma#abcd.com","pooja.pawar#abcd.com","nikita.sharma#abcd.com"))
ID = "emp_id"
Email = "email"
ID <- sym(ID)
Email <- sym(email)
df4 <- df4 %>% group_by(!!ID) %>%
mutate(Flag=1:n(),`Duplicate_ID`=ifelse(Flag==1,0,1)) %>% select(-Flag)
df4 <- df4 %>% filter(!is.na(!!Email)) %>% group_by(!!Email) %>%
mutate(Flag=1:n(),`Duplicate_email`=ifelse(Flag==1,0,1)) %>% select(-Flag) %>% ungroup(.)
there can be different names in data frame for Name and email so i also want to fixed it.
also I want to give input parameter for user to give names of columns according to its data frame.
and i will recall it in my script. do we have any suggestion for that...??
like here i am using sym for fix the parameter in script.
enter image description here
Instead of getting into non-standard evaluation try with across. Also as far as I could read your code you are trying to assign 0 to first instance of the value in column and 1 for all the duplicates. You can do this duplicated so no need for group_by, ifelse etc.
library(dplyr)
ID = "emp_id"
Email = "email"
df4 <- df4 %>%
mutate(across(c(ID, Email), ~as.integer(duplicated(.)), .names = 'flag_{col}'))

is there an R code for converting from wide to long when you have more than 1 unique id in R?

I have the following data that i want to convert from wide to long.
id_1<-c(1,2,2,2)
s02.0<-c(1,1,4,7)
s02.1<-c(2,2,5,8)
s02.2<-c(NA,3,6,NA)
id_2<-c(1,1,2,3)
df1<-data.frame(id_1,s02.0,s02.1,s02.2,id_2)
I would wish to have the following output based on two unique ids, and added new variable say n, that defines the position of 's02' in each row
id_1<-c(1,1,1,2,2,2,2,2,2,2,2,2)
id_2<-c(1,1,1,1,1,1,2,2,2,3,3,3)
s02<-c(1,2,NA,1,2,3,4,5,6,7,8,NA)
n<-c(1,2,3,1,2,3,1,2,3,1,2,3)
df2<-data.frame(id_1,id_2,s02,n)
We can use pivot_longer
library(tidyr)
library(dplyr)
df1 %>%
pivot_longer(cols = starts_with('s02'), values_to = 's02') %>%
group_by(id_1, id_2) %>%
mutate(n = row_number())

Drop a column that was used as 'by' argument in join

I have the following query:
library(dplyr)
FinalQueryDplyr <- PostsWithFavorite %>%
inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
select(DisplayName, Age, Location, FavoriteTotal, MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
select(-c(OwnerUserId)) %>%
arrange(desc(FavoriteTotal))
As you can see, I use the OwnerUserId column as the joining column between 2 data frames.
I want the result data frame to only have other columns, without the OwnerUserId column visible.
Even though I 'deselect' the OwnerUserId column 2 times in said query:
once by not including it in the first select clause
once by explicitly deselecting it with select(-c(OwnerUserId))
It is still visible in the result:
OwnerUserId DisplayName Age Location FavoriteTotal MostFavoriteQuestion MostFavoriteQuestionLikes
How can I get rid of the column that was used as a joining column in dplyr?
One option is to remove the attribute by converting to data.frame
library(dplyr)
PostsWithFavorite %>%
inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
select(DisplayName, Age, Location, FavoriteTotal,
MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
as.data.frame %>%
select(-c(OwnerUserId)) %>%
arrange(desc(FavoriteTotal))

Can I create a data.frame in R from an existing data.frame by assigning a list of col.names?

I have a data.frame where I assign each column.name a vector of variables:
dat1 <- data.frame(a=1:5,b=1:5,c=1:5)
I want to create a new data.frame but instead of assigning each column individually, I want to assign them all at once. For example, if I wanted to rename them all:
dat.new <- data.frame(paste(names(dat1),'1',sep='') = dat1)
This obviously doens't work. Is there a way to make it work?
I understand I can just rename using names(), but the scenario where this actually seems useful is if combining multiple data sets that share the same col.names (and in which I don't want to simply rbind):
dat1 <- data.frame(a=1:5,b=1:5,c=1:5)
dat2 <- data.frame(a=6:10,b=6:10,c=6:10)
dat.new <- data.frame(paste(names(dat1),'1',sep='') = dat1, paste(names(dat1),'2',sep='') = dat2)
library(dplyr)
library(tidyr)
library(magrittr)
Ok, here's the first part:
dat2 =
dat1 %>%
setNames(names(.) %>%
paste0("1") )
Here's the second part. The reshaping is a bit complex but more flexible, especially if you have row id's already with different amounts of rows:
list(dat1, dat2) %>%
bind_rows(.id = "number") %>%
group_by(number) %>%
mutate(id = 1:n()) %>%
gather(variable, value, -number, -id) %>%
unite(new_variable, variable, number) %>%
spread(new_variable, value)

Resources