I am currently struggling to transition one of my columns in my data to a row as an observation. Below is a representative example of what my data looks like:
library(tidyverse)
test_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409),
abstention=rep(52199))
As seen above, the abstention column exists at the end of my data frame, and I would like my data to look like the following:
library(tidyverse)
desired_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo","abstention"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409,52199))
Here, abstentions are treated like a candidate, in the can column. Thus, the rest of the data is maintained, and the abstention values are their own observation in the cv1 column.
I have tried using pivot_wider, but I am unsure how to use the arguments to get what I want. I have also considered t() to transpose the column into a row, but also having a hard time slotting it back into my data. Any help is appreciated! Thanks!
Here's a strategy what will work if you have multiple unit_names
test_df %>%
group_split(unit_name) %>%
map( function(group_data) {
slice(group_data, 1) %>%
mutate(can="abstention", cv1=abstention) %>%
add_row(group_data, .) %>%
select(-abstention)
}) %>%
bind_rows()
Basically we split the data up by unit_name, then we grab the first row for each group and move the values around. Append that as a new row to each group, and then re-combine all the groups.
Related
I have several series, each one indicates the deflator for the GDP for each country. (Data attached down below)
So what I want to do is to divide every column for the 97th position.
I know this could be pretty simple for you, but I am struggling.
This is my code so far:
d_data <- d_data %>%
mutate_if(is.numeric, function(x) x/d_data[[97,x]])
So as you can see in the data, from columns 3 to 8 data are numeric.
I think the error is that argument x of the function refers to the column name, while in the d_data, the second argument refers to column position and that is the main issue.
How can I solve this? Thanks in advance!!
Data
Data was massive to put here (745 rows, 8 columns)
So I uploaded the dput(d_data) output here
Use mutate with across as _at/_all are deprecated. Also, to extract by position, use nth
library(dplyr)
d_data %>%
mutate(across(where(is.numeric), ~ .x/nth(.x, 97)))
In the OP's code, instead of d_data[[97,x]], it should be x[97] as x here is the column value itself
d_data %>%
mutate_if(is.numeric, function(x) x/x[97])
If we want to subset the original data column, have to pass either column index or column name. Here, x doesn't refer to column index or name. But with across, we can get the column name with cur_column() e.g. (mtcars %>% summarise(across(everything(), ~ cur_column()))) which is not needed for this case
I have two dataframes with df1 being a subset of df2, each with two columns.
So the df I want and my original two df looks like this:
So basically I want to assign the values in df1 to df2 according to barcode.
(I found something here in a similar question below and it worked but I couldn't extract the resultant df, the code is at the end.)
(And I didn't get the full picture of the code like why we have two fill() one saying direction = "up" and the other "down"... I am still rather confused as to how %>% works in combining codes) Yes it gave me the resultant table but I don't know how to output that result... Like how do I name the resultant df as a df3 and apply it to further arguments?
or is there other ways of doing this?
BTW this originates from Seurat package, when I was trying to merge a subset annotated seurat object with cell_types assigned to clusters in seu#active.ident to a main seurat dataset with more cells (with numeric cluster number). I want to retain the cluster number for those seurat_main cells that are not in seurat_subset, while those annotated in seurat_subset to retain their cell_type names in seurat_main.
There is probably a quicker way doing this rather than what i'm doing now... extracting cluster.ident as df and combine them and import it again, but I don't know how.
Thank you for your help.
I found something here in a similar question like:
df1 %>% bind_rows(df2) %>%
group_by(barcode) %>%
fill(cluster, .direction = "up") %>%
fill(cluster, .direction = "down") %>%
unique() %>%
filter((row_number() == 1))
Hi I am trying to take the mean of duplicate sample rows within a data frame. I can produce the mean of all columns within the two rows, however some of my columns have text within then - this results in a lot of NA. How can I work around this?
If the rows are truly duplicated (as in, all of the values are the same), and assuming you have an ID variable that groups these duplicated rows, then you can simply take the first row for each ID.
Something like this may work:
library(dplyr)
new_data <- duplicated_data %>%
group_by(ID) %>%
slice(1) %>%
ungroup()
Where duplicated_data is your original dataset, and ID is the ID variable that you use to determine whether a sample is duplicated or now.
I have my data frame below, I want to sum the data like I have in the first row in the image below (labelled row 38). The total flowering summed for Sections A-D for each date, i also have multiple plots not just Dry1, but Dry2, Dry3 etc.
It's so simple to do in my head but I can't workout how to do it in R?
Essentially I want to do this:
with(dat1, sum(dat1$TotalFlowering[dat1$Date=="1997-07-01" & dat1$Plot=="Dry1"]))
Which tells me that the sum of total flowers for sections "A,B,C,D" in plot "Dry1" for the date "1997-07-01" = 166
I want a way to write a code so this does so for every date and plot combo, and then puts it in the data frame?
In the same format as the first row in the image I included :)
Based on your comment it seems like you just want to add a variable to keep track of TotalFlowering for every Date and Plot combination. If that's the case then can you just add a field like TotalCount below?
library(dplyr)
df %>%
group_by(Date, Plot) %>%
mutate(TotalCount = sum(TotalFlowering)) %>%
ungroup()
Or, alternatively, if all you want is the sum you could make use of dplyr's summarise like below
library(dplyr)
df %>%
group_by(Date, Plot) %>%
summarise(TotalCount = sum(TotalFlowering))
I have a data.frame:
set.seed(1L)
vector <- data.frame(patient=rep(1:5,each=2),medicine=rep(1:3,length.out=10),prob=runif(10))
I want to get the mean of the "prob" column while grouping by patient. I do this with the following code:
vector %>%
group_by(patient) %>%
summarise(average=mean(prob))
This code perfectly works. However, I need to get the same values without using the word "prob" on the "summarise" line. I tried the following code, but it gives me a data.frame in which the column "average" is a vector with 5 identical values, which is not what I want:
vector %>%
group_by(patient) %>%
summarise(average=mean(vector[,3]))
PD: for the sake of understanding why I need this, I have another data frame with multiple columns with complex names that need to be "summarised", that's why I can't put one by one on the summarise command. What I want is to put a vector there to calculate the probs of each column grouped by patients.
It appears you want summarise_each
vector %>%
group_by(patient) %>%
summarise_each(funs(mean), vars= matches('prop'))
Using data.table you could do
setDT(vector)[,lapply(.SD,mean),by=patient,.SDcols='prob')