I have a data set with several columns and I'm working on it using R. Most of those columns have missing data, which was set as a value -200. What I want to do is to delete all the rows that have -200 in any of the columns. Is there an easy way to do this other than going by each column at a time? Can I delete all rows that a value of -200 all at once?
Thank you for your time!
A tidyverse option would be
library(tidyverse)
df %>%
filter_all(all_vars(. != -200))
data
df <- data.frame(v1 = c(-200, 1, 2, 3), v2 = c(1, -200, 2, 4))
You can use rowSums(), i.e.
df[rowSums(df == -200) == 0,]
Related
I have a dataframe df and I wish to create a new column b that is the smaller value of column a and 10 - a. When there is NA, I wish column b also returnsNA in the corresponding rows. So column b should be c(1, 3, 1, NA). I tried the following code but all rows of b are 1. I wish to find a solution in tidyverse.
library(tidyverse)
df <- data.frame(a = c(1, 3, 9, NA))
df2 <- df %>% mutate(b = min(a, 10 - a, na.rm = T))
I guess the issue arises becuase of applying the min function, which is complicated by the presence of NA. But I cannot figure out how to solve the issue.
I have thousands of rows in each column. I need to find specific values in column A based on the value of column B, and replace column A with a new value if it is greater than a specific value.
For example, if column B = 1 and the values in column A > 2, then I want to replace all the values in column A > 2 equal to 2 when column B = 1.
I've tried this code:
if(dt$B=='1'){
dt <- dt %>% mutate(A = ifelse(A > 2, 2, A))
}
But this does not work. I've tried some other methods as well, but nothing I do works. Please let me know if you can help with this! Thank you.
We can have a & option within ifelse for the test condition
library(dplyr)
dt <- dt %>%
mutate(A = ifelse(A > 2 & B == 1, 2, A))
I'm trying to identify lines with a missing date between two dates.
data.table initial
i want
I want to delete the columns with only "NA" in them (dt_7 and dt_8).
Perhaps you are looking for something like
df <- data.frame(dt_1 = 1:10, dt_2 = c(1, NA, 2, 3, NA, 6:10), dt_3 = rep(NA, 10))
df[,-(which(colSums(is.na(df))==dim(df)[1]))]
or
df %>% select_if(colSums(is.na(.))!=dim(df)[1])
The first option doesn't work for data.tables. Sorry, but the second one should solve your problem.
It is a follow-up question to this one. What I would like to check is if any column in a data frame contain the same value (numerical or string) for all rows. For example,
sample <- data.frame(col1=c(1, 1, 1), col2=c("a", "a", "a"), col3=c(12, 15, 22))
The purpose is to inspect each column in a data frame to see which column does not have identical entry for all rows. How to do this? In particular, there are both numbers as well as strings.
My expected output would be a vector containing the column number which has non-identical entries.
We can use apply columnwise (margin = 2) and calculate unique values in the column and select the columns which has number of unique values not equal to 1.
which(apply(sample, 2, function(x) length(unique(x))) != 1)
#col3
# 3
The same code can also be done using sapply or lapply call
which(sapply(sample, function(x) length(unique(x))) != 1)
#col3
# 3
A dplyr version could be
library(dplyr)
sample %>%
summarise_all(funs(n_distinct(.))) %>%
select_if(. != 1)
# col3
#1 3
We can use Filter
names(Filter(function(x) length(unique(x)) != 1, sample))
#[1] "col3"
I have the following data frames:
DF <- data.frame(Time=c(1:20))
StartEnd <- data.frame(Start=c(2,6,14,19), End=c(4,10,17,20))
I want to add a column "Activity" to DF if the values in the Time column lie inbetween one of the intervals specified in the StartEnd dataframe.
I came up with the following:
mapply(FUN = function(Start,End) ifelse(DF$Time >= Start & DF$Time <= End, 1, 0),
Start=StartEnd$Start, End=StartEnd$End)
This doesn't give me the output I want (it gives me a matrix with four columns), but I would like to get a vector that I can add to DF.
I guess the solution is easy but I'm not seeing it :) Thank you in advance.
EDIT: I'm sure I can use a loop but I'm wondering if there are more elegant solutions.
You can achieve this with
DF$Activity <- sapply(DF$Time, function(x) {
ifelse(sum(ifelse(x >= StartEnd$Start & x <= StartEnd$End, 1, 0)), 1, 0)
})
I hope this helps!
If you're using the tidyverse, I think a good way to go would be with with purrr::map2:
# generate a sequence (n, n + 1, etc.) for each StartEnd row
# (map functions return a list; purrr::flatten_int or unlist can
# squash this down to a vector!)
activity_times = map2(StartEnd$Start, StartEnd$End, seq) %>% flatten_int
# then get a new DF column that is TRUE if Time is in activity_times
DF %>% mutate(active = Time %in% active_times)