I have the following query:
library(dplyr)
FinalQueryDplyr <- PostsWithFavorite %>%
inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
select(DisplayName, Age, Location, FavoriteTotal, MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
select(-c(OwnerUserId)) %>%
arrange(desc(FavoriteTotal))
As you can see, I use the OwnerUserId column as the joining column between 2 data frames.
I want the result data frame to only have other columns, without the OwnerUserId column visible.
Even though I 'deselect' the OwnerUserId column 2 times in said query:
once by not including it in the first select clause
once by explicitly deselecting it with select(-c(OwnerUserId))
It is still visible in the result:
OwnerUserId DisplayName Age Location FavoriteTotal MostFavoriteQuestion MostFavoriteQuestionLikes
How can I get rid of the column that was used as a joining column in dplyr?
One option is to remove the attribute by converting to data.frame
library(dplyr)
PostsWithFavorite %>%
inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
select(DisplayName, Age, Location, FavoriteTotal,
MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
as.data.frame %>%
select(-c(OwnerUserId)) %>%
arrange(desc(FavoriteTotal))
Related
I'm using rotate_df from sjmisc to change columns to rows and vice versa:
library(sjmisc)
library(dplyr)
ss557r <- as.data.frame(myfiles557r) %>%
rename(variable = 1) %>% # change first column name
select(-ends_with("Player.Name"))%>% # remove duplicated columns
rotate_df(rn = NULL, cn = TRUE) %>%
clean_names()
However, this produces the following df:
The first column is "date", but my intended first column would be what's on the left - the row names.
What changes can I make to the code to make that happen?
I have a sample data frame and I want to check if the values are duplicate and mutate new columns as 1,0 for duplicate. I am trying like below but this isn't working for me.
df4 <- data.frame(emp_id =c("DEV-2962","KTN_2252","ANA2719","ITI_2624","DEV2698","HRT2921","","KTN2624","DEV2698","ITI2535","DEV2698","HRT2837","ERV2951","KTN2542","ANA2813","ITI2210"),
email = c("akash.dev#abcd.com","rahul.singh#abcd.com","salman.abbas#abcd.com","ram.lal#abcd.com","ram.lal#xyz.com","prabal.garg#xyz.com","sanu.ali#abcd.com","kunal.singh#abcd.com","lakhan.tomar#abcd.com","praveen.thakur#abcd.com","sarman.ali#abcd.com","zuber.khan#dkl.com","giriraj.singh#dkl.com","lokesh.sharma#abcd.com","pooja.pawar#abcd.com","nikita.sharma#abcd.com"))
ID = "emp_id"
Email = "email"
ID <- sym(ID)
Email <- sym(email)
df4 <- df4 %>% group_by(!!ID) %>%
mutate(Flag=1:n(),`Duplicate_ID`=ifelse(Flag==1,0,1)) %>% select(-Flag)
df4 <- df4 %>% filter(!is.na(!!Email)) %>% group_by(!!Email) %>%
mutate(Flag=1:n(),`Duplicate_email`=ifelse(Flag==1,0,1)) %>% select(-Flag) %>% ungroup(.)
there can be different names in data frame for Name and email so i also want to fixed it.
also I want to give input parameter for user to give names of columns according to its data frame.
and i will recall it in my script. do we have any suggestion for that...??
like here i am using sym for fix the parameter in script.
enter image description here
Instead of getting into non-standard evaluation try with across. Also as far as I could read your code you are trying to assign 0 to first instance of the value in column and 1 for all the duplicates. You can do this duplicated so no need for group_by, ifelse etc.
library(dplyr)
ID = "emp_id"
Email = "email"
df4 <- df4 %>%
mutate(across(c(ID, Email), ~as.integer(duplicated(.)), .names = 'flag_{col}'))
In short, I need to create new column with timestamps, taking from another column name
So I have already this command to select below columns from dataset : Lat, Long_, last_col()
I use last(col) because column name (date) is changing
data_new <- data %>%
select(Lat, Long_, last_col() )
Results:
"Lat","Long_","5/26/20"
-14.271,-170.132,44
13.4443,144.7937,167
My goal is to achieve below results:
"Lat","Long_","date","Value"
-14.271,-170.132,"5/26/20",44
13.4443,144.7937,"5/26/20",167
Any idea please ?
We can use mutate
library(dplyr)
data_new %>%
mutate(date = names(.)[3]) %>%
rename(Value = `5/26/20`)
If there are more rows, then the bug free approach is pivot_longer
library(tidyr)
pivot_longer(data_new, cols = -c(Lat:Long_), names_to = 'date')
I need some help with finding a good way to dynamically add columns with counts for different categories that I need to extract from a string.
In my data, I have a column that contains names of categories and counts thereof. The fields can be empty or contain any combination of categories one can think of. Here are some examples:
themes:firstcategory_1;secondcategory_33;thirdcategory_5
themes:secondcategory_33;fourthcategory_2
themes:fifthcategory_1
What I need is a column for each category (should have the category's name) and the count extracted from the strings above. The list of categories is dynamic, so I don't know beforehand which ones exist.
How do I approach this?
This code will get a column for each category with the counts for each row.
library(dplyr)
library(tidyr)
library(stringr)
# Create test dataframe
df <- data.frame(themes = c("firstcategory_1;secondcategory_33;thirdcategory_5", "secondcategory_33;fourthcategory_2","fifthcategory_1"), stringsAsFactors = FALSE)
# Get the number of columns to split values into
cols <- max(str_count(df$themes,";")) + 1
# Get vector of temporary column names
cols <- paste0("col",c(1:cols))
df <- df %>%
# Add an ID column based on row number
mutate(ID = row_number()) %>%
# Separate multiple categories by semicolon
separate(col = themes, into = cols, sep = ";", fill = "right") %>%
# Gather categories into a single column
gather_("Column", "Value", cols) %>%
# Drop temporary column
select(-Column) %>%
# Filter out NA values
filter(!is.na(Value)) %>%
# Separate categories from their counts by underscore
separate(col = Value, into = c("Category","Count"), sep = "_", fill = "right") %>%
# Spread categories to create a column for each category, with the count for each ID in that category
spread(Category, Count)
I have conducted an inner_join or merge function in R.
I want to remain the second id column "DI" in the result.
library(dplyr)
ab<-data.frame(ID=(c("PDM.999993856","PDM.999960488")),oi=rep("r",2),stringsAsFactors = FALSE)
to<-data.frame(DI=c("PDM.999993856","PDM.999960488"),kl=rep("foo",2),stringsAsFactors=FALSE)
inner_join(ab,to, by=c("ID"="DI"))
We can try
to %>%
mutate(ID = DI) %>%
inner_join(., ab, by = "ID")