Unnesting groups from LSD.test (agricolae) - r

Forgive me if this has been asked before. I am using the following code to create a list of groups, produced with LSD.test (agricolae) and nested by id.
lsd_groups <- dataset %>%
group_by(id) %>%
do(lsd_statistics = LSD.test(lm(value ~ book_name + treatment_name, data=.),
"treatment_name", alpha=0.1)$groups) %>%
unnest()
My problem is when I unnest the results, I use the identifiers (treatment names) associated with the means in the grouping.
I know if I were to leave the LSD.test output as a list, I could see the treatment names by running:
lsd_groups$lsd_statistics[[1]]
I could also convert the treatment names, which are stored as row.names, to a column.
I was hoping, though, for a more elegant solution using unnest(). Is there any way to instruct unnest() to keep those row names? Alternatively, is there a way to tell LSD.test to list the treatment names in a column instead of assigning them as row names? Thank you.

Related

Divide by a certain position in R

I have several series, each one indicates the deflator for the GDP for each country. (Data attached down below)
So what I want to do is to divide every column for the 97th position.
I know this could be pretty simple for you, but I am struggling.
This is my code so far:
d_data <- d_data %>%
mutate_if(is.numeric, function(x) x/d_data[[97,x]])
So as you can see in the data, from columns 3 to 8 data are numeric.
I think the error is that argument x of the function refers to the column name, while in the d_data, the second argument refers to column position and that is the main issue.
How can I solve this? Thanks in advance!!
Data
Data was massive to put here (745 rows, 8 columns)
So I uploaded the dput(d_data) output here
Use mutate with across as _at/_all are deprecated. Also, to extract by position, use nth
library(dplyr)
d_data %>%
mutate(across(where(is.numeric), ~ .x/nth(.x, 97)))
In the OP's code, instead of d_data[[97,x]], it should be x[97] as x here is the column value itself
d_data %>%
mutate_if(is.numeric, function(x) x/x[97])
If we want to subset the original data column, have to pass either column index or column name. Here, x doesn't refer to column index or name. But with across, we can get the column name with cur_column() e.g. (mtcars %>% summarise(across(everything(), ~ cur_column()))) which is not needed for this case

Changing a Column to an Observation in a Row in R

I am currently struggling to transition one of my columns in my data to a row as an observation. Below is a representative example of what my data looks like:
library(tidyverse)
test_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409),
abstention=rep(52199))
As seen above, the abstention column exists at the end of my data frame, and I would like my data to look like the following:
library(tidyverse)
desired_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo","abstention"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409,52199))
Here, abstentions are treated like a candidate, in the can column. Thus, the rest of the data is maintained, and the abstention values are their own observation in the cv1 column.
I have tried using pivot_wider, but I am unsure how to use the arguments to get what I want. I have also considered t() to transpose the column into a row, but also having a hard time slotting it back into my data. Any help is appreciated! Thanks!
Here's a strategy what will work if you have multiple unit_names
test_df %>%
group_split(unit_name) %>%
map( function(group_data) {
slice(group_data, 1) %>%
mutate(can="abstention", cv1=abstention) %>%
add_row(group_data, .) %>%
select(-abstention)
}) %>%
bind_rows()
Basically we split the data up by unit_name, then we grab the first row for each group and move the values around. Append that as a new row to each group, and then re-combine all the groups.

How can I copy and append rows of data to fill in missing records in a defined sequence?

I have a sequence of numeric labels for records that can be shared by a variable number of records per label (labelsequence). I also have the records themselves, but unfortunately for some of the sequence values, all records have been lost (dataframe df). I need to identify when a numeric label from labelsequence does not appear in the label column of df, copy all records within df that are associated with the closest label value that is less than the missing value, and append these to a newly filled-in dataframe, say df2.
I am trying to accomplish this in R (a dplyr answer would be ideal), and have looked at answers to questions regarding filling in missing rows, such as Fill in missing rows in R and fill missing rows in a dataframe, and have a working solution below, was wondering if anyone has a better way of doing this.
Take , for instance, this example data:
labelsequence<-data.frame(label=c(1,2,3,4,5,6))
and
df<-data.frame(label=c(1,1,1,1,3,3,4,4,4),
place=c('vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','lamb',
'cougar','donkey','lamb','wolf'))
with desired result...
desired_df2<-data.frame(label=c(1,1,1,1,2,2,2,2,3,3,4,4,4,5,5,5,6,6,6),
place=c('vermont','kentucky',
'wisconsin','wyoming','vermont','kentucky',
'wisconsin','wyoming','nevada',
'california','utah','georgia','kentucky','utah',
'georgia','kentucky','utah','georgia','kentucky'),
animal=c('wolf','wolf','cougar','cougar','wolf',
'wolf','cougar','cougar','lamb','cougar',
'donkey','lamb','wolf','donkey','lamb','wolf',
'donkey','lamb','wolf'))
Is there a better (be it effiency of code, flexibility, or resource efficiency) way than the following?
df2<- df %>%
full_join(expand.grid(label=unique(df$label),newlabel=labelsequence$label)) %>%
mutate(missing = ifelse(newlabel %in% label,0,1))%>%
filter(label<newlabel)%>%
group_by(newlabel) %>%
filter(label==max(label) & missing ==1) %>%
ungroup()%>%
mutate(label=newlabel,missing=NULL,newlabel=NULL) %>%
bind_rows(df) %>%
arrange(label)

Need Help Incorporating Tidyr's Spread into a Function that Outputs a List of Dataframes with Grouped Counts

library(tidyverse)
Using the sample data at the bottom, I want to find counts of the Gender and FP variables, then spread these variables using tidyr::spread(). I'm attempting to do this by creating a list of dataframes, one for the Gender counts, and one for FP counts. The reason I'm doing this is to eventually cbind both dataframes. However, I'm having trouble incorporating the tidyr::spread into my function.
The function below creates a list of two dataframes with counts for Gender and FP, but the counts are not "spread."
group_by_quo=quos(Gender,FP)
DF2<-map(group_by_quo,~DF%>%
group_by(Code,!!.x)%>%
summarise(n=n()))
If I add tidyr::spread, it doesn't work. I'm not sure how to incorporate this since each dataframe in the list has a different variable.
group_by_quo=quos(Gender,FP)
DF2<-map(group_by_quo,~DF%>%
group_by(Code,!!.x)%>%
summarise(n=n()))%>%
spread(!!.x,n)
Any help would be appreciated!
Sample Code:
Subject<-c("Subject1","Subject2","Subject1","Subject3","Subject3","Subject4","Subject2","Subject1","Subject2","Subject4","Subject3","Subject4")
Code<-c("AAA","BBB","AAA","CCC","CCC","DDD","BBB","AAA","BBB","DDD","CCC","DDD")
Code2<-c("AAA2","BBB2","AAA2","CCC2","CCC2","DDD2","BBB2","AAA2","BBB2","DDD2","CCC2","DDD2")
Gender<-c("Male","Male","Female","Male","Female","Female","Female","Male","Male","Male","Male","Male")
FP<-c("F","P","P","P","F","F","F","F","F","F","F","F")
DF<-data_frame(Subject,Code,Code2,Gender,FP)
I think you misplaced the closing parenthesis. This code works for me:
library(tidyverse)
Subject<-c("Subject1","Subject2","Subject1","Subject3","Subject3","Subject4","Subject2","Subject1","Subject2","Subject4","Subject3","Subject4")
Code<-c("AAA","BBB","AAA","CCC","CCC","DDD","BBB","AAA","BBB","DDD","CCC","DDD")
Code2<-c("AAA2","BBB2","AAA2","CCC2","CCC2","DDD2","BBB2","AAA2","BBB2","DDD2","CCC2","DDD2")
Gender<-c("Male","Male","Female","Male","Female","Female","Female","Male","Male","Male","Male","Male")
FP<-c("F","P","P","P","F","F","F","F","F","F","F","F")
DF<-data_frame(Subject,Code,Code2,Gender,FP)
group_by_quo <- quos(Gender, FP)
DF2 <- map(group_by_quo,
~DF %>%
group_by(Code,!!.x) %>%
summarise(n=n()) %>%
spread(!!.x,n))
This last part is a bit more concise using count:
DF2 <- map(group_by_quo,
~DF %>%
count(Code,!!.x) %>%
spread(!!.x,n))
And by using count the unnecessary grouping information is removed as well.

How to Create Multiple Frequency Tables with Percentages Across Factor Variables using Purrr::map

library(tidyverse)
library(ggmosaic) for "happy" dataset.
I feel like this should be a somewhat simple thing to achieve, but I'm having difficulty with percentages when using purrr::map together with table(). Using the "happy" dataset, I want to create a list of frequency tables for each factor variable. I would also like to have rounded percentages instead of counts, or both if possible.
I can create frequency precentages for each factor variable separately with the code below.
with(happy,round(prop.table(table(marital)),2))
However I can't seem to get the percentages to work correctly when using table() with purrr::map. The code below doesn't work...
happy%>%select_if(is.factor)%>%map(round(prop.table(table)),2)
The second method I tried was using tidyr::gather, and calculating the percentage with dplyr::mutate and then splitting the data and spreading with tidyr::spread.
TABLE<-happy%>%select_if(is.factor)%>%gather()%>%group_by(key,value)%>%summarise(count=n())%>%mutate(perc=count/sum(count))
However, since there are different factor variables, I would have to split the data by "key" before spreading using purrr::map and tidyr::spread, which came close to producing some useful output except for the repeating "key" values in the rows and the NA's.
TABLE%>%split(TABLE$key)%>%map(~spread(.x,value,perc))
So any help on how to make both of the above methods work would be greatly appreciated...
You can use an anonymous function or a formula to get your first option to work. Here's the formula option.
happy %>%
select_if(is.factor) %>%
map(~round(prop.table(table(.x)), 2))
In your second option, removing the NA values and then removing the count variable prior to spreading helps. The order in the result has changed, however.
TABLE = happy %>%
select_if(is.factor) %>%
gather() %>%
filter(!is.na(value)) %>%
group_by(key, value) %>%
summarise(count = n()) %>%
mutate(perc = round(count/sum(count), 2), count = NULL)
TABLE %>%
split(.$key) %>%
map(~spread(.x, value, perc))

Resources