This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 4 years ago.
An example of the data frame I have is:
Index TimeDifference
1
2
3 20
4
5 67
I want to delete all rows that are blank (these are blank and NOT na). Hence the following data frame I want is:
Index TimeDifference
3 20
5 67
Thanks
Assuming that TimeDifference is a character column:
df <- data.frame(Index=1:5, TimeDifference=c("","","20","","67"))
Then you can use:
df[-which(df$TimeDifference==""),]
or
df[!(df$TimeDifference==""),]
or
df[df$TimeDifference!="",]
which gives:
Index TimeDifference
3 3 20
5 5 67
df <- df[as.character(df$TimeDifference)!= "" ,]
Related
This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 1 year ago.
I'm beginning with R and I have a question.
I have this:
x <- data.frame(x0=c(1:10), x1=c("z", "a","a","a","a","a","c","b","b","b"))
So basically two columns. I want to sort alphabetically taking the entire row of the data frame.
So that 1 - z (both x0 and x1) appear at the end.
I've tried sort() but just managed to sort the column x1 and not both x0 and x1.
Thanks
In base R you can subset and order:
x[order(x$x1),]
x0 x1
2 2 a
3 3 a
4 4 a
5 5 a
6 6 a
8 8 b
9 9 b
10 10 b
7 7 c
1 1 z
With dplyr you use arrange:
library(dplyr)
x %>%
arrange(x1)
This question already has answers here:
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 2 years ago.
I have a data frame:
L1 2020 NA
1 1 0 0
2 2 1 0
3 3 1 0
I want to delete first and last column, to get dataframe like this:
2020
1 0
2 1
3 1
I tried:
1)
df <- df[,-c(1,ncol(df))]
or 2)
df <- subset(df, select = -c(1,ncol(df)))
For both I get result:
[1] 0 1 1
So I guess it changed data frame into vector. How can I delete these columns to keep it as a data frame?It is important for me to keep it like this. I don't have this problem when there are more columns. It changes only when one column is supposed to be left.
After specifiing the columns in the square-brackets, add ,drop=FALSE right after it.
The drop-argument is TRUE by default and you are struggling with this default.
df <- data.frame(a=1:10,b=1:10)
df[,1] #R simplifies to a vector via implicit drop=TRUE default
df[,1,drop=FALSE] #dataframe-structure remains
This question already has answers here:
Aggregate multiple columns at once [duplicate]
(2 answers)
Aggregating rows for multiple columns in R [duplicate]
(3 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 4 years ago.
I have a large data frame where I have one column (Phylum) that has repeated names and 253 other columns (each with a unique name) that have counts of the Phylum column. I would like to sum the counts within each column that correspond to each Phylum.
This is a simplified version of what my data look like:
Phylum sample1 sample2 sample3 ... sample253
1 P1 2 3 5 5
2 P1 2 2 10 2
3 P2 1 0 0 1
4 P3 10 12 3 1
5 P3 5 7 14 15
I have seen similar questions, but they are for fewer columns, where you can just list the names of the columns you want summed. I don't want to enter 253 unique column names.
I would like my results to look like this
Phylum sample1 sample2 sample3 ... sample253
1 P1 4 5 15 7
2 P2 1 0 0 1
3 P3 15 19 17 16
I would appreciate any help. Sorry for the format of the question, this is my first time asking for help on stackoverflow (rather than sleuthing).
If your starting file looks like this (test.csv):
Phylum,sample1,sample2,sample3,sample253
P1,2,3,5,5
P1,2,2,10,2
P2,1,0,0,1
P3,10,12,3,1
P3,5,7,14,15
Then you can use group_by and summarise_each from dplyr:
read_csv('test.csv') %>%
group_by(Phylum) %>%
summarise_each(funs(sum))
(I first loaded tidyverse with library(tidyverse).)
Note that, if you were trying to do this for one column you can simply use summarise:
read_csv('test.csv') %>%
group_by(Phylum) %>%
summarise(sum(sample1))
summarise_each is required to run that function (in the above, funs(sum)) on each column.
This question already has answers here:
Sum values in a rolling/sliding window
(6 answers)
Running Sum in R data.table [duplicate]
(1 answer)
Closed 4 years ago.
I have a little problem...let's say I have a data.table with one numerical column like:
NR
1
2
3
5
7
10
1
I want to create a new column which is computed in this way:
in row j I want the sum of NR in the rows j, j+1, j+2. So I want this result:
NR NEW_NR
1 6
2 10
3 15
5 22
7 18
10 11
1 1
Could anyone help me pls?
This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 2 years ago.
I have a data frame like this but much longer:
A B
1 0
3 9
7 3
6 2
1 4
2 1
I want to get the maximum value of column A and the value in column B that corresponds with it, regardless of whether it is also the maximum value. So for this data set I would like to get 7 and 3. But if I use:
Max<-apply(df,2,max)
I get 7 and 9.
Thanks for your help!
You want the row at which A has its maximum: df[which.max(df$A), ]
We can use dplyr
library(dplyr)
df1 %>%
slice(which.max(A))
# A tibble: 1 x 2
# A B
# <int> <int>
#1 7 3