Delete specific row in a data frame? - r

I have a data frame that looks like this:
col1 col2 col3
1 0 1 5
2 0 3 0
3 5 4 5
4 5 5 0
5 5 3 7
I want to delete every row that contains the string '0' in column 'col1' and '5' in column 'col3'. How can I accomplish this in R?
col1 col2 col3
2 0 3 0
3 5 4 5
4 5 5 0
5 5 3 7
Thank U.

Assuming your data set called df
df[!(df$col1 == 0 & df$col3 == 5), ]
## col1 col2 col3
## 2 0 3 0
## 3 5 4 5
## 4 5 5 0
## 5 5 3 7

Related

Creating a new column based on other columns in a dataframe R

I have a dataframe that looks like this:
df <- data.frame('col1'=c(1,2,2,4,5), 'col2'=c(4,9,3,5,13), 'col3'=c(3,5,8,7,10))
> df
col1 col2 col3
1 1 4 3
2 2 9 5
3 2 3 8
4 4 5 7
5 5 13 10
I want to create a new column that has a value of 1 if at least one of the values in the row is greater or equal to 8 and a value of 0 if all of the values in the row are less than 8. So the final result would look something like this:
> df
col1 col2 col3 new
1 1 4 3 0
2 2 9 5 1
3 2 3 8 1
4 4 5 7 0
5 5 13 10 1
Thanks!
This works:
df$new <- apply(df, 1, function(x) max(x >= 8))
df
# col1 col2 col3 new
# 1 1 4 3 0
# 2 2 9 5 1
# 3 2 3 8 1
# 4 4 5 7 0
# 5 5 13 10 1
Using rowSums.
df$new <- +(rowSums(df>=8, na.rm=TRUE) > 0); df
col1 col2 col3 new
1 1 4 3 0
2 2 9 5 1
3 2 3 8 1
4 4 5 7 0
5 5 13 10 1
Alternatively using matrix multiplication
df$new <- as.numeric(((df >= 8) %*% rep(1, ncol(df))) > 0)
df
col1 col2 col3 new
1 1 4 3 0
2 2 9 5 1
3 2 3 8 1
4 4 5 7 0
5 5 13 10 1
# Or logical column
df$new <- ((df >= 8) %*% rep(1, ncol(df))) > 0
df
col1 col2 col3 new
1 1 4 3 FALSE
2 2 9 5 TRUE
3 2 3 8 TRUE
4 4 5 7 FALSE
5 5 13 10 TRUE

selecting columns in the same order as names from an array containing column names

I have a data frame consisting of say, 5 columns. I also have an array which contains some of these column names. I want a new data frame, consisting of only the columns whose names are present in this array, but in the same order as they appear in the array. I am able to get the columns, but not in the same order. Please see code below
col1 = c(1,1,1,1,1)
col2 = c(2,2,2,2,2)
col3 = c(3,3,3,3,3)
col4 = c(4,4,4,4,4)
col5 = c(5,5,5,5,5)
df = data.frame(col1,col2,col3,col4,col5)
df
col1 col2 col3 col4 col5
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
columnsarray = c("col4","col1","col2")
df[which(names(df) %in% columnsarray)]
col1 col2 col4
1 1 2 4
2 1 2 4
3 1 2 4
4 1 2 4
5 1 2 4
As you can see, I have mentioned the columns names as col4, col1, col2. However, the output looks like col1 col2 col4.
if you want it in the order of you vector try this :
df[columnsarray[columnsarray %in% names(df)]]
# col4 col1 col2
# 1 4 1 2
# 2 4 1 2
# 3 4 1 2
# 4 4 1 2
# 5 4 1 2
Just df[columnsarray] will do.
You can use df[,columnsarray] also, but in case of length(columnsarray) == 1 you will get a vector (not data.frame).
> str( df[c("col4")] )
'data.frame': 5 obs. of 1 variable:
$ col4: num 4 4 4 4 4
> str( df[,c("col4")] )
num [1:5] 4 4 4 4 4

Perform function over groups in columns in R

I am completely new to R and have a question about performing a function over a column.
data <- read.table(text ="group; val
a; 4
a; 24
a; 12
b; 1
a; 2
c; 4
c; 5
b; 6 ", sep=";", header=T,stringsAsFactors = FALSE)
How could I add data in the following way?
I would like to create two new columns which I am doing like this:
data$col1 <- 0
data$col2 <- 1
What I now want to do is to add +2 for each group value into the new columns and reach the following pattern:
group val col1 col2
a 4 0 1
a 24 0 1
a 12 0 1
b 1 2 3
a 2 0 1
c 4 4 5
c 5 4 5
b 6 2 3
How could I do this? I hope I made my example more or less clear.
Try this:
Creating an index to cumulatively add +2 depending on the number of groups
indx <- c(0, 2 * seq_len(length(unique(data[, 1])) - 1))
Splitting the data set by groups, adding (cumulatively) +2 and unsplitting back so everything comes back in place
data[, 3:4] <- unsplit(Map(`+`, split(data[, 3:4], data[, 1]), indx), data[, 1])
data
# group val col1 col2
# 1 a 4 0 1
# 2 a 24 0 1
# 3 a 12 0 1
# 4 b 1 2 3
# 5 a 2 0 1
# 6 c 4 4 5
# 7 c 5 4 5
# 8 b 6 2 3
Or you could do
within(data, {col1 <- 2*(as.numeric(factor(group))-1)
col2 <- col1+1})[,c(1:2,4:3)]
# group val col1 col2
#1 a 4 0 1
#2 a 24 0 1
#3 a 12 0 1
#4 b 1 2 3
#5 a 2 0 1
#6 c 4 4 5
#7 c 5 4 5
#8 b 6 2 3
Using data.table
library(data.table)
setDT(data)[,c('col1', 'col2'):= {list(indx=2*(match(group,
unique(group))-1), indx+1)}]
data
# group val col1 col2
#1: a 4 0 1
#2: a 24 0 1
#3: a 12 0 1
#4: b 1 2 3
#5: a 2 0 1
#6: c 4 4 5
#7: c 5 4 5
#8: b 6 2 3

Add new column to data frame, taking existing values within range

I was wondering if anyone knows a simple way to create a new column in a data frame, taking data from an existing column, within a certain range.
For example, I have this data frame
range col1
1 5
2 4
3 9
4 5
5 2
6 8
7 9
I would like to create col2 using the data in col1, and have col2 take values above the range 3
range col1 col2
1 5 0
2 4 0
3 9 0
4 5 5
5 2 2
6 8 8
7 9 9
I have tried
data$col2 <- data$col1 [which(data$range > 3)) ]
data$col2 <- subset ( data$col1 , data$range >3 )
However both of these produce error:
replacement has 4 rows, data has 7
Any help greatly appreciated
You can do it even without ifelse here:
data$new <- with(data, (range > 3) * col1)
data
# range col1 new
#1 1 5 0
#2 2 4 0
#3 3 9 0
#4 4 5 5
#5 5 2 2
#6 6 8 8
#7 7 9 9
Try ifelse
transform(data, col2=ifelse(range >3, col1, 0))
# range col1 col2
#1 1 5 0
#2 2 4 0
#3 3 9 0
#4 4 5 5
#5 5 2 2
#6 6 8 8
#7 7 9 9

Filter rows based on multiple column conditions R

Suppose I have a dataset that has 100-odd columns and I need to keep only those rows in the data which meets one condition applied across all 100 columns.. How do I do this?
Suppose, its like below... I need to only keep rows where either of Col1 or 2 or 3 or 4 is >0
Col1 Col2 Col3 Col4
1 1 3 4
0 0 4 2
4 3 4 3
2 1 0 2
1 2 0 3
0 0 0 0
In above example, except last row all rows will make it .. I need to place results in same dataframe as original. not sure if I can use the lapply to loop through the columns where>0 or I can use subset.. Any help is appreciated
Can I use column indices and do df<-subset(df,c(2:100)>0). This doesn't give me the right result.
Suppose your data.frame is DF then using [ will do the work for you.
> DF[DF[,1]>0 | DF[,2] >0 | DF[,3] >0 | DF[,4] >0, ]
Col1 Col2 Col3 Col4
1 1 1 3 4
2 0 0 4 2
3 4 3 4 3
4 2 1 0 2
5 1 2 0 3
If you have hundreds of columns you can use this alternative approach
> DF[rowSums(DF)=!0, ]
Col1 Col2 Col3 Col4
1 1 1 3 4
2 0 0 4 2
3 4 3 4 3
4 2 1 0 2
5 1 2 0 3
dat <- read.table(header = TRUE, text = "
Col1 Col2 Col3 Col4
1 1 3 4
0 0 4 2
4 3 4 3
2 1 0 2
1 2 0 3
0 0 0 0
")
You can use data.table to automatically accomodate however many columns your data.frame happens to have. Here's one way but there's probably a more elegant method of doing this with data.table:
require(data.table)
dt <- data.table(dat)
dt[rowSums(dt>0)>0]
# Col1 Col2 Col3 Col4
# 1: 1 1 3 4
# 2: 0 0 4 2
# 3: 4 3 4 3
# 4: 2 1 0 2
# 5: 1 2 0 3

Resources