This question already has answers here:
Select multiple columns with dplyr::select() with numbers as names
(2 answers)
Closed 6 years ago.
I want to reshape the data and then select a specific column.
data(ChickWeight)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet=="1")
It creates the column names for me, which are numbers. So how could I select the column that named "0"? I know that %>% select(3) may work, but I need the solution to select columns with their names being number.
Use backticks to select columns with their names being number
data(ChickWeight)
library(dplyr)
library(tidyr)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet==2) %>% select(`0`)
Related
This question already has answers here:
Remove all rows where length of string is more than n
(4 answers)
Closed 1 year ago.
I'm working with an untidy dataset and want to filter out any object with an ID shorter than 6 digits (these rows contain errors).
I created a new column that calculates the number of characters for each ID, and then I filter for all objects with 6 or more digits, like so:
clean_df <- df %>%
mutate(chars = nchar(id)) %>%
filter(chars >= 6)
This is working just fine, but I'm wondering if there's an easier way.
Using str_length() from the stringr package (part of the tidyverse):
library(tidyverse)
clean_df <- df %>%
filter(str_length(id) >= 6)
If id's are numeric, just use log10
df %>%
filter(log10(id)>=5)
You can skip mutate
df %>%
filter(nchar(id) >= 6)
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
Here is my data:
For each x1 level, I am trying to duplicate a number of rows equal to number.class and I would like for each row the length class to goes from the Lmin..cm. to Lmax..cm. increasing by 1 for each row.I came up with this code:
test<-A.M %>% filter(x1=="Crenimugil crenilabis")
for (i in 1:test$number.class){test<-test %>% add_row()}
for (i in 1:nrow(test)){test[i,]=test[1,]}
for (i in 1:nrow(test)){test$length.class[i]<-print(i+test$Lmin..cm.)}
test$length.class<-test$length.class-1
which basically works and gives me the expected results: 2
However, this script does not allow me to run this for every species.
Thank you.
Here, we could use uncount from tidyr to replicate the rows, do a group by 'x1' and mutate the 'Lmin..cm' by adding the row_number()
library(dplyr)
library(tidyr)
A.M %>%
uncount(number.class) %>%
group_by(x1) %>%
mutate(`Lmin..cm.` = `Lmin..cm.` + row_number())
If we need to create a sequence from Lmin..cm to Lmax..cm, then instead of uncount, we could use map2 to create the sequence and then unnest
library(purrr)
A.M %>%
mutate(new = map2(`Lmin..cm.`, `Lmax..cm`, ~ seq(.x, .y, by = 1)) %>%
unnest(c(new))
This question already has answers here:
Remove all copies of rows with duplicate values in R [duplicate]
(2 answers)
Closed 3 years ago.
I'm trying to remove all duplicate values based on multiple variable using dplyr. Here's how I do it without dplyr:
dat = data.frame(id=c(1,1,2),date=c(1,1,1))
dat = dat[!(duplicated(dat[c('id','date')]) | duplicated(dat[c('id','date')],fromLast=TRUE)),]
It should only return id number 2.
This can be done with a group_by/filter operation in tidyverse. Grouped by the columns of interest (here used group_by_all as all the columns in the dataset are grouped. Instead can also make use of group_by_at if a selected number of columns are needed)
library(dplyr)
dat %>%
group_by_all() %>%
filter(n()==1)
Or simply group_by
dat %>%
group_by(id, date) %>%
filter(n() == 1)
If the OP intended to use the duplicated function
dat %>%
filter_at(vars(id, date),
any_vars(!(duplicated(.)|duplicated(., fromLast = TRUE))))
# id date
#1 2 1
This question already has answers here:
Lookup value from another column that matches with variable
(3 answers)
Replace values in a dataframe based on lookup table
(8 answers)
Closed 3 years ago.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
I have complete data frame "data" but I do not have values for everybody which is in my second data frame "data1". However for administrative reasons I must use the full data. Basically "WANT" maintains the structure of "data" but fills in the values where they are available.
Here is a simple solution.
set.seed(1)
data=data.frame("id"=1:10,
"score"=NA)
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
select(-score) %>%
left_join(data1)
I may be reaching but maybe you need.
set.seed(1)
data=data.frame("id"=1:10,
"score"=sample(50:100,10))
data1=data.frame("id"=c(1:3,5,7,9,10),
"score"=sample(50:100,7))
WANT=data.frame("id"=1:10,
"score"=c(83,81,53,NA,59,NA,58,NA,99,67))
library(tidyverse)
data %>%
mutate(score1 = score) %>%
select(-score) %>%
left_join(data1) %>%
mutate(score = if_else(is.na(score),
score1,
score)) %>%
select(-score1)
This question already has answers here:
filter for complete cases in data.frame using dplyr (case-wise deletion)
(7 answers)
Closed 5 years ago.
Suppose I have a 27 columns data frame. The first column is the ID, and the rest of columns (A to Z) are just data. I want to take out all the rows whose A to Z columns are NA. How should I do it?
The straightforward way is just
data %>%
filter(!(is.na(A) & is.na(B) .... & is.na(Z)))
Is there a more efficient or easier way to do it?
This question is different from This one because I want to exclude rows whose value are ALL NA, and keep the rows whose value are partially NA.
Using tidyverse:
library(tidyverse)
Load data:
ID <- c(1:8)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
mydf <- data_frame(ID,Col1,Col2,Col3)
mydf %>%
slice(which(complete.cases(.)))
Whether you want to preserve selected columns removing rows with all NAs you may run:
mydf %>%
mutate(full_incomplete_cases=rowSums(is.na(.[-1]))) %>%
filter(full_incomplete_cases<length(mydf[,-1])) %>%
select(ID:Col3)