This question already has answers here:
Subset / filter rows in a data frame based on a condition in a column
(3 answers)
Closed 5 years ago.
I have this dataframe with a set of zero values along two columns. df
Sl_No No_of_Mails No_of_Responses
1 10 2
2 0 0
3 20 0
4 10 10
5 0 0
6 0 NA
7 10 NA
8 10 0
I want to remove those rows where No_of_Mails equals zero without disturbing the other column.
I have tried the following code
row_sub = apply(df, 1, function(row) all(row !=0 ))
df[row_sub,]
This removes all the 0 values including the one from the number_of_responses column. I wish to have that column undisturbed
I have also tried this
df[df$No_of_Mails][!(apply(df, 1, function(y) any(y == 0))),].
This deletes all the rows and gives me a table with zero rows.
Just subset the data frame based on the value in the No_of_Mails column:
df[df$No_of_Mails != 0, ]
Demo
Related
This question already has answers here:
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 2 years ago.
I have a data frame:
L1 2020 NA
1 1 0 0
2 2 1 0
3 3 1 0
I want to delete first and last column, to get dataframe like this:
2020
1 0
2 1
3 1
I tried:
1)
df <- df[,-c(1,ncol(df))]
or 2)
df <- subset(df, select = -c(1,ncol(df)))
For both I get result:
[1] 0 1 1
So I guess it changed data frame into vector. How can I delete these columns to keep it as a data frame?It is important for me to keep it like this. I don't have this problem when there are more columns. It changes only when one column is supposed to be left.
After specifiing the columns in the square-brackets, add ,drop=FALSE right after it.
The drop-argument is TRUE by default and you are struggling with this default.
df <- data.frame(a=1:10,b=1:10)
df[,1] #R simplifies to a vector via implicit drop=TRUE default
df[,1,drop=FALSE] #dataframe-structure remains
This question already has answers here:
Calculate the mean of every 13 rows in data frame
(4 answers)
Closed 1 year ago.
I have a large data frame 3000 columns and 3000 rows and I need to sum every 5 rows and then every 5 columns and convert my data frame into 600 rows and 600 columns:
my data is like this
matches1 matches2 matches3 matches4 matches5 ...
1 0 1 0 1
0 1 0 0 1
0 1 0 1 0
1 0 1 0 1
0 1 1 1 1
my expected results are this:
my data is like this for the first 5 rows and 5columns is this:
w1 ... w600
14
We can create an index for aggregating rows and columns using gl() and then use aggregate and split.default to sum rows and columns respectively.
index <- gl(nrow(df)/5, 5)
#Sum rows
df1 <- aggregate(.~index, df, sum)[-1]
#Sum columns
sapply(split.default(df1, index), rowSums)
We can create a grouping index with %/%
index <- (seq_len(nrow(df)) - 1) %/% 5 + 1
Then, use that in aggregate
aggregate(.~ index, df, sum)[-1]
Or in rowsum
rowsum(df, group = index)
It can also be used for summing the columns
sapply(split.default(df, index), rowSums)
This question already has answers here:
Remove columns with zero values from a dataframe
(10 answers)
Closed 4 years ago.
I'm trying to recreate a data frame (DC5_prod) that has hundreds of columns, but many without any values other than zero.
The first column in the data frame is text and the rest are numeric. Is there a way to ignore the first column while simultaneously eliminating the remainder of columns that are composed entirely of zeros?
DC5_Prod
a b c d e f
1 AK 0 0 0 0 1
2 JI 0 0 0 0 0
The above is a snippet of how it currently stands and would want an output of:
DC5_Prod
a f
1 AK 1
2 JI 0
When I attempt to utilize the solution issued on a similar question on the site:
DC5_prod[, colSums(DC5_prod != 0) > 0]
just essentially returns the first column without removing any.
Try this R base approach
> ind <- sapply(DC5_Prod, function(x) sum(x==0)) != nrow(DC5_Prod)
> DC5_Prod[,ind]
a f
1 AK 1
2 JI 0
This question already has answers here:
Remove columns from dataframe where some of values are NA
(8 answers)
Closed 6 years ago.
I need a function in R which is doing the following.
I have a matrix with some data
mydata <- data.frame (matrix(c(1,2,3,NA,2,3,NA,NA,2), 3,3))
mydata
X1 X2 X3
1 1 NA NA
2 2 2 NA
3 3 3 2
No I want to check every column of this matrix if there is any NA in a column and create a vector which stores 0 if there is a NA in a column or 1 if there is no NA in the column.
So check column X1: if NA is in this column write 0 to the vector, if not write 1 to the vector. Then check the next column and so on.
After checking mydata the vector should look like this
(1 0 0)
with
colnames(mydata)[colSums(is.na(mydata)) > 0]
I get the column names which have NA. But how can I use this function to create the vector?
We can coerce the logical vector from colSums on the logical matrix to binary with as.integer
as.integer(!colSums(is.na(mydata)))
#[1] 1 0 0
This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 6 years ago.
I'd like to create a new data frame column that helps me quickly identify duplicate rows based on the value of the first column per row (index). Assuming that my dataframe (df) has almost 18000 rows-observations and the new column is called "unique" I have tried the following rather unsuccessfully...
df$unique = ifelse(df[row.names(df):1]==df[row.names(df)-1:1], "YES", "NO")
The rationale behind the code is that a comparison between the cell of the same row and the one before in the same column, can give out unique entries as long as these values do not match.
My dataframe
index num1 num2
1 12 12
1 12 12
2 14 14
2 14 14
2 14 14
3 18 18
4 19 19
You can use the duplicated function. Be aware that the first occurence of a non-unique column is not a duplicate, hence we need it twice, searching from the beginning and from the end.
# Toy data, where the first two rows are identical, the third row is unique
df <- data.frame(a = c(1, 1, 1), b = c(1, 1, 2))
# Find unique columns
df$unique <- !(duplicated(df) | duplicated(df, fromLast = TRUE))
Output:
> df
a b unique
1 1 1 FALSE
2 1 1 FALSE
3 1 2 TRUE