This question already has answers here:
Remove all rows where length of string is more than n
(4 answers)
Closed 1 year ago.
I'm working with an untidy dataset and want to filter out any object with an ID shorter than 6 digits (these rows contain errors).
I created a new column that calculates the number of characters for each ID, and then I filter for all objects with 6 or more digits, like so:
clean_df <- df %>%
mutate(chars = nchar(id)) %>%
filter(chars >= 6)
This is working just fine, but I'm wondering if there's an easier way.
Using str_length() from the stringr package (part of the tidyverse):
library(tidyverse)
clean_df <- df %>%
filter(str_length(id) >= 6)
If id's are numeric, just use log10
df %>%
filter(log10(id)>=5)
You can skip mutate
df %>%
filter(nchar(id) >= 6)
This question already has answers here:
Group by multiple columns and sum other multiple columns
(7 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 2 years ago.
So, i don't know if the title makes it easy to understand, but basically i want to change this to the minimum of code possible:
data %>%
group_by(name) %>%
mutate(
plataforma.3DS = sum(plataforma.3DS),
plataforma.PS3 = sum(plataforma.PS3),
plataforma.PS4 = sum(plataforma.PS4),
plataforma.PSP = sum(plataforma.PSP),
plataforma.PSV = sum(plataforma.PSV),
plataforma.Wii = sum(plataforma.Wii),
plataforma.WiiU = sum(plataforma.WiiU),
plataforma.X360 = sum(plataforma.X360),
plataforma.XOne = sum(plataforma.XOne)
)
I have some other columns that i need to do this, so how can i reduce my code? thanks in advance.
We can specify it with across. Note that mutate replaces the column value with the sum of that column.
library(dplyr)
data %>%
group_by(name) %>%
mutate(across(starts_with('plataforma'), sum))
It the intention is to return a single sum per each column, change the mutate to summarise
data %>%
group_by(name) %>%
summarise(across(starts_with('plataforma'), sum), .groups = 'drop')
NOTE: The title specified row sum, while the code showed in OP's post is doing column sum.
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
Here is my data:
For each x1 level, I am trying to duplicate a number of rows equal to number.class and I would like for each row the length class to goes from the Lmin..cm. to Lmax..cm. increasing by 1 for each row.I came up with this code:
test<-A.M %>% filter(x1=="Crenimugil crenilabis")
for (i in 1:test$number.class){test<-test %>% add_row()}
for (i in 1:nrow(test)){test[i,]=test[1,]}
for (i in 1:nrow(test)){test$length.class[i]<-print(i+test$Lmin..cm.)}
test$length.class<-test$length.class-1
which basically works and gives me the expected results: 2
However, this script does not allow me to run this for every species.
Thank you.
Here, we could use uncount from tidyr to replicate the rows, do a group by 'x1' and mutate the 'Lmin..cm' by adding the row_number()
library(dplyr)
library(tidyr)
A.M %>%
uncount(number.class) %>%
group_by(x1) %>%
mutate(`Lmin..cm.` = `Lmin..cm.` + row_number())
If we need to create a sequence from Lmin..cm to Lmax..cm, then instead of uncount, we could use map2 to create the sequence and then unnest
library(purrr)
A.M %>%
mutate(new = map2(`Lmin..cm.`, `Lmax..cm`, ~ seq(.x, .y, by = 1)) %>%
unnest(c(new))
This question already has answers here:
cumsum by group [duplicate]
(2 answers)
Closed 3 years ago.
I have an R dataframe where I need a counter which gives me a fresh new number for a new set of circumstances while also continuing this number (respecting the order of the data).
There are quite a few previous posts on this but none seems to work for my problem. I've tried using combinations of row_counter, ave and rleid and none seems to hit the spot.
id <- c("A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","C","C","C","C","D","D")
marker_new <- c(1,0,0,0,0,1,0,1,0,0,0,0,1,0,1,1,0,1,0,1,0)
counter_result <- c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,4,1,1,2,2,1,1)
df <- data.frame(id,marker_new, counter_result)
df <- df %>%
group_by(id, marker_new) %>%
mutate(counter =
ifelse(marker_new != 0,
row_number(),
lag(marker_new,lag(marker_new))) %>%
ungroup()
I can get to the point using the code above which will give me a fresh number but won't continue this set of numbers down (as in the counter_result i've included).
Any help much appreciated!
Since, we have marker_new column as 1/0, we can use cumsum by group (id) to get counter.
Base R:
df$result <- with(df, ave(marker_new, id, FUN = cumsum))
dplyr:
df %>% group_by(id) %>% mutate(result = cumsum(marker_new))
data.table
setDT(df)[, result := cumsum(marker_new), by = id]
This question already has answers here:
Select multiple columns with dplyr::select() with numbers as names
(2 answers)
Closed 6 years ago.
I want to reshape the data and then select a specific column.
data(ChickWeight)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet=="1")
It creates the column names for me, which are numbers. So how could I select the column that named "0"? I know that %>% select(3) may work, but I need the solution to select columns with their names being number.
Use backticks to select columns with their names being number
data(ChickWeight)
library(dplyr)
library(tidyr)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet==2) %>% select(`0`)