Keeping the first column after rotate_df - r

I'm using rotate_df from sjmisc to change columns to rows and vice versa:
library(sjmisc)
library(dplyr)
ss557r <- as.data.frame(myfiles557r) %>%
rename(variable = 1) %>% # change first column name
select(-ends_with("Player.Name"))%>% # remove duplicated columns
rotate_df(rn = NULL, cn = TRUE) %>%
clean_names()
However, this produces the following df:
The first column is "date", but my intended first column would be what's on the left - the row names.
What changes can I make to the code to make that happen?

Related

How to add a new row in the end of data frame only if the last actual value of the column Z doesn't contain "VALUE1"?

I have a list of several data frames and as the heading states, would like to add a new row (where column Z is "VALUE1") in the end of data frame if the last actual value/string (not counting NA "values") of column Z doesn't contain "VALUE1". I already have a script for adding a new row into the beginning of df if the first value of column Z doesn't contain "VALUE1", but can't quite modify the script into the new one myself.
The aforementioned script I'd like to modify looks following:
for(i in 1:length(df)){
df[[i]] <- df[[i]] %>%
filter(!is.na(Z)) %>%
slice(1) %>%
mutate(across(col1:col3, ~ 0)) %>%
filter(!grepl("VALUE1", Z)) %>%
mutate(Z = "VALUE1") %>%
bind_rows(., df[[i]])
}
Also if possible, it would be very much welcome if there could be a short comment for each line explaining what happens in the code (not necessary tho) for further learning and understanding. Thank you!
It's a quite strange script.
To add a line at end of df's if last is VALUE1, slice by n() instead of by 1, and flip order of bind_rows arguments. Try this:
for(i in 1:length(df)){
df[[i]] <- df[[i]] %>%
filter(!is.na(Z)) %>% #filter those rows that are not NA
slice(n()) %>% # select the last (n()) row (in original, select first) we have a new df with 1 row
mutate(across(col1:col3, ~ 0)) %>% # set all to zero (since it will be used as new row)
filter(!grepl("VALUE1", Z)) %>% # if Z contains VALUE1 the result is a df with 0 row (filter out)
mutate(Z = "VALUE1") %>% # set Z to value1 (if df contains 1 row)
bind_rows(df[[i]],.) # paste the new row (. is the pipe placeholder) at end of original data.frame (df[[i]]).
}
# if the step filter(!grepl("VALUE1", Z)) filtered out the row, then bind_rows append a zero row
# dataframe and the effect is that df[[i]] does not change at all.

R : how copy (last) column name to new column as values?

In short, I need to create new column with timestamps, taking from another column name
So I have already this command to select below columns from dataset : Lat, Long_, last_col()
I use last(col) because column name (date) is changing
data_new <- data %>%
select(Lat, Long_, last_col() )
Results:
"Lat","Long_","5/26/20"
-14.271,-170.132,44
13.4443,144.7937,167
My goal is to achieve below results:
"Lat","Long_","date","Value"
-14.271,-170.132,"5/26/20",44
13.4443,144.7937,"5/26/20",167
Any idea please ?
We can use mutate
library(dplyr)
data_new %>%
mutate(date = names(.)[3]) %>%
rename(Value = `5/26/20`)
If there are more rows, then the bug free approach is pivot_longer
library(tidyr)
pivot_longer(data_new, cols = -c(Lat:Long_), names_to = 'date')

Drop a column that was used as 'by' argument in join

I have the following query:
library(dplyr)
FinalQueryDplyr <- PostsWithFavorite %>%
inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
select(DisplayName, Age, Location, FavoriteTotal, MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
select(-c(OwnerUserId)) %>%
arrange(desc(FavoriteTotal))
As you can see, I use the OwnerUserId column as the joining column between 2 data frames.
I want the result data frame to only have other columns, without the OwnerUserId column visible.
Even though I 'deselect' the OwnerUserId column 2 times in said query:
once by not including it in the first select clause
once by explicitly deselecting it with select(-c(OwnerUserId))
It is still visible in the result:
OwnerUserId DisplayName Age Location FavoriteTotal MostFavoriteQuestion MostFavoriteQuestionLikes
How can I get rid of the column that was used as a joining column in dplyr?
One option is to remove the attribute by converting to data.frame
library(dplyr)
PostsWithFavorite %>%
inner_join(Users, by = c("OwnerUserId" = "Id"), keep = FALSE) %>%
select(DisplayName, Age, Location, FavoriteTotal,
MostFavoriteQuestion, MostFavoriteQuestionLikes) %>%
as.data.frame %>%
select(-c(OwnerUserId)) %>%
arrange(desc(FavoriteTotal))

select highest pairs from complex table

I want to make a new dataframe from a selection of rows in a complex table of pairwise comparisons. I want to select the rows such that the 2 highest values of each pairwise comparison is selected.
Below is an example dataset:
dataframe <- data.frame(X1 = c("OP2413iiia","OP2413iiib","OP2413iiic","OP2645ii_a","OP2645ii_b","OP2645ii_c","OP2645ii_d","OP2645ii_e","OP3088i__a","OP5043___a","OP5043___b","OP5044___a","OP5044___b","OP5044___c","OP5046___a","OP5046___b","OP5046___c","OP5046___d","OP5046___e","OP5047___a","OP5047___b","OP5048___b","OP5048___c","OP5048___d","OP5048___e","OP5048___f","OP5048___g","OP5048___h","OP5049___a","OP5049___b","OP5051DNAa","OP5051DNAb","OP5051DNAc","OP5052DNAa","OP5053DNAa"),
gr1 = c("2","2","2","3","3","3","3","3","3","4","4","4","3","4","2","3","3","3","4","2","4","3","3","3","4","2","4","2","3","3","3","4","2","4","2"),
X2 = c("OP2413iiib","OP2413iiic","OP5046___a","OP2645ii_a","OP2645ii_a","OP2645ii_a","OP2645ii_b","OP2645ii_b","OP5046___a","OP2645ii_b","OP2645ii_c","OP2645ii_c","OP2645ii_c","OP2645ii_c","OP5048___e","OP2645ii_d","OP5046___a","OP2645ii_d","OP2645ii_d","OP2645ii_d","OP2645ii_d","OP2645ii_e","OP5048___e","OP2645ii_e","OP2645ii_e","OP2645ii_e","OP2645ii_e","OP2645ii_e","OP3088i__a","OP3088i__a","OP3088i__a","OP3088i__a","OP3088i__a","OP3088i__a","OP3088i__a"),
gr2 = c("3","3","3","4","4","4","2","2","2","2","4","4","4","4","4","2","2","2","2","2","2","4","4","4","4","4","4","4","3","3","3","3","3","3","3"),
value = c("1.610613e+00","1.609732e+00","8.829263e-04","1.080257e+01","1.111006e+01","1.110978e+01","4.048302e+00","5.610458e+00","5.609584e+00","9.911490e+00","1.078518e+01","1.133728e+01","1.133686e+01","1.738092e+00","9.247411e+00","5.170646e+00","6.074909e+00","6.074287e+00","6.212711e+00","3.769029e+00","5.793390e+00","1.124045e+01","1.163326e+01","1.163293e+01","7.752766e-01","1.008434e+01","1.222854e+00","6.469443e+00","1.610828e+00","1.784774e+00","1.784235e+00","9.434803e+00","4.512563e+00","9.582847e+00","4.309312e+00"))
expected_output_dataframe <- rbind(dataframe[10,],dataframe[34,],dataframe[32,],dataframe[15,],dataframe[3,],dataframe[17,])
Many thanks in advance
Cheers
The method works using dplyr. I created an extra column, gr_pair, to identify the pairwise groups.
library(dplyr)
library(magrittr)
dataframe %>%
filter(gr1 != gr2) %>% # This case is excluded from your expected output
mutate(gr_pair = paste(pmin(gr1, gr2), pmax(gr1, gr2), sep = ",")) %>%
group_by(gr_pair) %>%
top_n(2, value) # Keep the top two rows in each group, sorted by value

Dynamic adding of columns in R

I need some help with finding a good way to dynamically add columns with counts for different categories that I need to extract from a string.
In my data, I have a column that contains names of categories and counts thereof. The fields can be empty or contain any combination of categories one can think of. Here are some examples:
themes:firstcategory_1;secondcategory_33;thirdcategory_5
themes:secondcategory_33;fourthcategory_2
themes:fifthcategory_1
What I need is a column for each category (should have the category's name) and the count extracted from the strings above. The list of categories is dynamic, so I don't know beforehand which ones exist.
How do I approach this?
This code will get a column for each category with the counts for each row.
library(dplyr)
library(tidyr)
library(stringr)
# Create test dataframe
df <- data.frame(themes = c("firstcategory_1;secondcategory_33;thirdcategory_5", "secondcategory_33;fourthcategory_2","fifthcategory_1"), stringsAsFactors = FALSE)
# Get the number of columns to split values into
cols <- max(str_count(df$themes,";")) + 1
# Get vector of temporary column names
cols <- paste0("col",c(1:cols))
df <- df %>%
# Add an ID column based on row number
mutate(ID = row_number()) %>%
# Separate multiple categories by semicolon
separate(col = themes, into = cols, sep = ";", fill = "right") %>%
# Gather categories into a single column
gather_("Column", "Value", cols) %>%
# Drop temporary column
select(-Column) %>%
# Filter out NA values
filter(!is.na(Value)) %>%
# Separate categories from their counts by underscore
separate(col = Value, into = c("Category","Count"), sep = "_", fill = "right") %>%
# Spread categories to create a column for each category, with the count for each ID in that category
spread(Category, Count)

Resources