I have several series, each one indicates the deflator for the GDP for each country. (Data attached down below)
So what I want to do is to divide every column for the 97th position.
I know this could be pretty simple for you, but I am struggling.
This is my code so far:
d_data <- d_data %>%
mutate_if(is.numeric, function(x) x/d_data[[97,x]])
So as you can see in the data, from columns 3 to 8 data are numeric.
I think the error is that argument x of the function refers to the column name, while in the d_data, the second argument refers to column position and that is the main issue.
How can I solve this? Thanks in advance!!
Data
Data was massive to put here (745 rows, 8 columns)
So I uploaded the dput(d_data) output here
Use mutate with across as _at/_all are deprecated. Also, to extract by position, use nth
library(dplyr)
d_data %>%
mutate(across(where(is.numeric), ~ .x/nth(.x, 97)))
In the OP's code, instead of d_data[[97,x]], it should be x[97] as x here is the column value itself
d_data %>%
mutate_if(is.numeric, function(x) x/x[97])
If we want to subset the original data column, have to pass either column index or column name. Here, x doesn't refer to column index or name. But with across, we can get the column name with cur_column() e.g. (mtcars %>% summarise(across(everything(), ~ cur_column()))) which is not needed for this case
Related
i have the following issue:
In my data frame (89 columns) I have 4 of them which have the values in a negative way as you can see in the following image
![1]: https://i.stack.imgur.com/ZFF0U.png
So I would like to know how I could mutate that specific columns of my data frame in order to make the values of them positive (absolute value).
Many thanks
Here's one option:
library(dplyr)
your_data %>%
mutate(across(c("DAYS_BIRTH", "DAYS_EMPLOYED", "DAYS_REGISTRATION", "DAYS_ID_PUBLISH"), abs))
Depending on which columns you want to mutate and which you want to leave, you might be able to use a simpler select helper, like mutate(across(starts_with("DAYS"), abs)), for example.
A general solution:
library(dplyr)
data %>% mutate_if(function(x) all(x<0), function(x) abs(x))
I am currently struggling to transition one of my columns in my data to a row as an observation. Below is a representative example of what my data looks like:
library(tidyverse)
test_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409),
abstention=rep(52199))
As seen above, the abstention column exists at the end of my data frame, and I would like my data to look like the following:
library(tidyverse)
desired_df <- tibble(unit_name=rep("Chungcheongbuk-do"),unit_n=rep(2),
can=c("Cho Bong-am","Lee Seung-man","Lee Si-yeong","Shin Heung-woo","abstention"),
pev1=rep(510014),vot1=rep(457815),vv1=rep(445955),
ivv1=rep(11860),cv1=c(25875,386665,23006,10409,52199))
Here, abstentions are treated like a candidate, in the can column. Thus, the rest of the data is maintained, and the abstention values are their own observation in the cv1 column.
I have tried using pivot_wider, but I am unsure how to use the arguments to get what I want. I have also considered t() to transpose the column into a row, but also having a hard time slotting it back into my data. Any help is appreciated! Thanks!
Here's a strategy what will work if you have multiple unit_names
test_df %>%
group_split(unit_name) %>%
map( function(group_data) {
slice(group_data, 1) %>%
mutate(can="abstention", cv1=abstention) %>%
add_row(group_data, .) %>%
select(-abstention)
}) %>%
bind_rows()
Basically we split the data up by unit_name, then we grab the first row for each group and move the values around. Append that as a new row to each group, and then re-combine all the groups.
Hi I am trying to take the mean of duplicate sample rows within a data frame. I can produce the mean of all columns within the two rows, however some of my columns have text within then - this results in a lot of NA. How can I work around this?
If the rows are truly duplicated (as in, all of the values are the same), and assuming you have an ID variable that groups these duplicated rows, then you can simply take the first row for each ID.
Something like this may work:
library(dplyr)
new_data <- duplicated_data %>%
group_by(ID) %>%
slice(1) %>%
ungroup()
Where duplicated_data is your original dataset, and ID is the ID variable that you use to determine whether a sample is duplicated or now.
Forgive me if this has been asked before. I am using the following code to create a list of groups, produced with LSD.test (agricolae) and nested by id.
lsd_groups <- dataset %>%
group_by(id) %>%
do(lsd_statistics = LSD.test(lm(value ~ book_name + treatment_name, data=.),
"treatment_name", alpha=0.1)$groups) %>%
unnest()
My problem is when I unnest the results, I use the identifiers (treatment names) associated with the means in the grouping.
I know if I were to leave the LSD.test output as a list, I could see the treatment names by running:
lsd_groups$lsd_statistics[[1]]
I could also convert the treatment names, which are stored as row.names, to a column.
I was hoping, though, for a more elegant solution using unnest(). Is there any way to instruct unnest() to keep those row names? Alternatively, is there a way to tell LSD.test to list the treatment names in a column instead of assigning them as row names? Thank you.
I have a data.frame:
set.seed(1L)
vector <- data.frame(patient=rep(1:5,each=2),medicine=rep(1:3,length.out=10),prob=runif(10))
I want to get the mean of the "prob" column while grouping by patient. I do this with the following code:
vector %>%
group_by(patient) %>%
summarise(average=mean(prob))
This code perfectly works. However, I need to get the same values without using the word "prob" on the "summarise" line. I tried the following code, but it gives me a data.frame in which the column "average" is a vector with 5 identical values, which is not what I want:
vector %>%
group_by(patient) %>%
summarise(average=mean(vector[,3]))
PD: for the sake of understanding why I need this, I have another data frame with multiple columns with complex names that need to be "summarised", that's why I can't put one by one on the summarise command. What I want is to put a vector there to calculate the probs of each column grouped by patients.
It appears you want summarise_each
vector %>%
group_by(patient) %>%
summarise_each(funs(mean), vars= matches('prop'))
Using data.table you could do
setDT(vector)[,lapply(.SD,mean),by=patient,.SDcols='prob')