Subset a matrix in R - r

I have a matrix of bundles and I would like to subset it based on the column sum (a budget) and the first value. If the first value is 0 and I could add the value in and still be under the budget I would like to drop the column.
For example, if my budget is 10 (column sum) and my matrix looks like this:
col1 col2 col3 col4
1 2 2 0 0
2 3 3 3 3
3 0 0 2 0
4 4 0 4 0
I would like the end matrix to look like this because the 0 in col4 row 1 could be included and the column sum would be under 10:
col1 col2 col3
1 2 2 0
2 3 3 3
3 0 0 2
4 4 0 4
My code is currently:
for (i in 1:ncol(df)) {
if (df[1,i]==0) {
df<-df[,which(colSums(df)+2>10)]
}
}
The code is not working because it also removes column 2. I don't think it is considering the if statement when subsetting the matrix.
Thanks.

Similar to the solution by #akrun, but I think the following subset approach already can make it
> df[,head(df,1)!=0 | colSums(df)+2>10]
col1 col2 col3
1 2 2 0
2 3 3 3
3 0 0 2
4 4 0 4
DATA
df <- structure(list(col1 = c(2L, 3L, 0L, 4L), col2 = c(2L, 3L, 0L,
0L), col3 = c(0L, 3L, 2L, 4L), col4 = c(0L, 3L, 0L, 0L)),
class = "data.frame", row.names = c("1",
"2", "3", "4"))

One option is to create the condition with colSums and the value in first row to subset the columns. colSums would be more efficient
bids <- 2
df1[which(!(df1[1,] == 0 & (colSums(df1) + bids) < 10))]
# col1 col2 col3
#1 2 2 0
#2 3 3 3
#3 0 0 2
#4 4 0 4
Or using the for loop
for(i in seq_along(df1)) if(df1[1, i] == 0 & sum(df1[[i]]) + bids < 10) df1[[i]] <- NULL
data
df1 <- structure(list(col1 = c(2L, 3L, 0L, 4L), col2 = c(2L, 3L, 0L,
0L), col3 = c(0L, 3L, 2L, 4L), col4 = c(0L, 3L, 0L, 0L)),
class = "data.frame", row.names = c("1",
"2", "3", "4"))

Related

running if and else within a for loop across columns

I am attempting to have R read across columns by row and evaluate whether values from two adjacent cells are equal. If the values are equal, I want R to count this occurence in a new variable. Here is example data (df):
Var1
Var2
Var3
2
3
3
3
3
3
1
2
3
3
2
1
...and I want to get here:
Var1
Var2
Var3
NewVar
2
3
3
1
3
3
3
2
1
2
3
0
3
2
1
0
One example set of code I have tried out is the following:
df$NewVar <- 0
for (i in 1:2){
if (df[i]==df[i+1]){
df$NewVar <- df$NewVar + 1
}
else{
df$NewVar <- df$NewVar
}
}
This particular set of code just returns 0s in the NewVar variable.
Any sort of help would be much appreciated!
Here's a vectorized solution using rowSums :
df$NewVar <- rowSums(df[-1] == df[-ncol(df)])
df
# Var1 Var2 Var3 NewVar
#1 2 3 3 1
#2 3 3 3 2
#3 1 2 3 0
#4 3 2 1 0
data
df <- structure(list(Var1 = c(2L, 3L, 1L, 3L), Var2 = c(3L, 3L, 2L,
2L), Var3 = c(3L, 3L, 3L, 1L)), class = "data.frame", row.names = c(NA,-4L))
We can use Reduce
df$NewVar <- Reduce(`+`, Map(`==`, df[-1], df[-ncol(df)]))
data
df <- structure(list(Var1 = c(2L, 3L, 1L, 3L), Var2 = c(3L, 3L, 2L,
2L), Var3 = c(3L, 3L, 3L, 1L)), class = "data.frame", row.names = c(NA,-4L))

Sorting data with some similar words in R

I have a database with 100 columns, but a minimal production of my data are as follows:
df1<=read.table(text="PG1S1AW KOM1S1zo PG2S2AW KOM2S2zo PG3S3AW KOM3S3zo PG4S4AW KOM4S4zo PG5S5AW KOM5S5zo
4 1 2 4 4 3 0 4 0 5
4 4 3 1 3 1 0 3 0 1
2 3 5 3 3 2 1 4 0 2
1 1 1 1 1 3 0 5 0 1
2 5 3 4 4 5 0 1 3 4", header=TRUE)
I want to get columns starting with KOM and PG which have a greater of 3 . So we need to have PG4, KOM4 and above. Put it simply, starting with PG and KOM have the same values which is 4 and greater.
The intended output is:
PG4S4AW KOM4S4zo PG5S5AW KOM5S5zo
0 4 0 5
0 3 0 1
1 4 0 2
0 5 0 1
0 1 3 4
I have used the following code, but it does not work for me:
df2<- df1%>% select(contains("KO"))
Thanks for your help.
It is not entirely clear about the patterns. We create a function (f1) to extract one or more digits (\\d+) that follows the 'KOM' or (|) 'PG' with str_extract (from stringr), convert to numeric ('v1'), similarly, extract numbers after the 'S' ('v2'). Do a check whether these values are same and if one of the value is greater than 3, wrap with which so that if there are any NAs resulting from str_extract would be removed as which gives the column index while removing any NAs. Use the function in select to select the columns that follow the pattern
library(dplyr)
library(stringr)
f1 <- function(nm) {
v1 <- as.numeric(str_extract(nm, "(?<=(KOM|PG))\\d+"))
v2 <- as.numeric(str_extract(nm, "(?<=S)\\d+"))
nm[which((v1 == v2) & (v1 > 3))]
}
df1 %>%
select(f1(names(.)))
# PG4S4AW KOM4S4zo PG5S5AW KOM5S5zo
#1 0 4 0 5
#2 0 3 0 1
#3 1 4 0 2
#4 0 5 0 1
#5 0 1 3 4
data
df1 <- structure(list(PG1S1AW = c(4L, 4L, 2L, 1L, 2L), KOM1S1zo = c(1L,
4L, 3L, 1L, 5L), PG2S2AW = c(2L, 3L, 5L, 1L, 3L), KOM2S2zo = c(4L,
1L, 3L, 1L, 4L), PG3S3AW = c(4L, 3L, 3L, 1L, 4L), KOM3S3zo = c(3L,
1L, 2L, 3L, 5L), PG4S4AW = c(0L, 0L, 1L, 0L, 0L), KOM4S4zo = c(4L,
3L, 4L, 5L, 1L), PG5S5AW = c(0L, 0L, 0L, 0L, 3L), KOM5S5zo = c(5L,
1L, 2L, 1L, 4L)), class = "data.frame", row.names = c(NA, -5L
))
Given your example data, you can just instead look for the numbers 4 or 5.
df1 %>%
select(matches("4|5"))
#> KO4S4AW KOM4S4zo KO5S5AW KOM5S5zo
#> 1 0 4 0 5
#> 2 0 3 0 1
#> 3 1 4 0 2
#> 4 0 5 0 1
#> 5 0 1 3 4

Replacing values with 'NA' by ID in R

I have data that looks like this
ID v1 v2
1 1 0
2 0 1
3 1 0
3 0 1
4 0 1
I want to replace all values with 'NA' if the ID occurs more than once in the dataframe. The final product should look like this
ID v1 v2
1 1 0
2 0 1
3 NA NA
3 NA NA
4 0 1
I could do this by hand, but I want R to detect all the duplicate cases (in this case two times ID '3') and replace the values with 'NA'.
Thanks for your help!
You could use duplicated() from either end, and then replace.
idx <- duplicated(df$ID) | duplicated(df$ID, fromLast = TRUE)
df[idx, -1] <- NA
which gives
ID v1 v2
1 1 1 0
2 2 0 1
3 3 NA NA
4 3 NA NA
5 4 0 1
This will also work if the duplicated IDs are not next to each other.
Data:
df <- structure(list(ID = c(1L, 2L, 3L, 3L, 4L), v1 = c(1L, 0L, 1L,
0L, 0L), v2 = c(0L, 1L, 0L, 1L, 1L)), .Names = c("ID", "v1",
"v2"), class = "data.frame", row.names = c(NA, -5L))
One more option:
df1[df1$ID %in% df1$ID[duplicated(df1$ID)], -1] <- NA
#> df1
# ID v1 v2
#1 1 1 0
#2 2 0 1
#3 3 NA NA
#4 3 NA NA
#5 4 0 1
data
df1 <- structure(list(ID = c(1L, 2L, 3L, 3L, 4L), v1 = c(1L, 0L, 1L,
0L, 0L), v2 = c(0L, 1L, 0L, 1L, 1L)), .Names = c("ID", "v1",
"v2"), class = "data.frame", row.names = c(NA, -5L))
Here is a base R method
# get list of repeated IDs
repeats <- rle(df$ID)$values[rle(df$ID)$lengths > 1]
# set the corresponding variables to NA
df[, -1] <- sapply(df[, -1], function(i) {i[df$ID %in% repeats] <- NA; i})
In the first line, we use rle to extract repeated IDs. In the second, we use sapply to loop through non-ID variables and replace IDs that repeat with NA for each variable.
Note that this assumes that the data set is sorted by ID. This may be accomplished with the order function. (df <- df[order(df$ID),]).
If the dataset is very large, you might break up the first function into two steps to avoid computing the rle twice:
dfRle <- rle(df$ID)
repeats <- dfRle$values[dfRle$lengths > 1]
data
df <- read.table(header=T, text="ID v1 v2
1 1 0
2 0 1
3 1 0
3 0 1
4 0 1")

R: Search multiple rows&columns to match a list of conditions then add a new column with dichotomous outcomes

Problem: Extraordinarily large dataset with dozens of columns. How to search a list of columns and all the rows within them, and if they match conditions, create a new column that adds a dichotomous variable to the row. Normally would use Excel, but size is too large.
Example
col1 col2 col3 col4
1 2 3 4
1 2 5 6
3 3 3 3
1 1 1 2
2 3 4 1
If any of these columns (col1-4) and any of the rows within match a list of numbers, say List: 1, 2, 3, then add a new colum (col5) and add 1 if it matches, 0 if not. Repetition doesn't matter - the value returned is 1 if there is one or more occurence of any of the list conditions.
Potential solution idea
For i in col1:col4, for j in row1:allrows, ifelse(row=list, col5=1, col5=0), next.
Thanks!
May be you need
df$col5 <- (apply(df, 1, function(x)
!any(!table(factor(x[x %in% v1], levels=v1)))))+0L
df
# col1 col2 col3 col4 col5
#1 1 2 3 4 1
#2 1 2 5 6 0
#3 3 3 3 3 0
#4 1 1 1 2 0
#5 2 3 4 1 1
data
df <- structure(list(col1 = c(1L, 1L, 3L, 1L, 2L), col2 = c(2L, 2L,
3L, 1L, 3L), col3 = c(3L, 5L, 3L, 1L, 4L), col4 = c(4L, 6L, 3L,
2L, 1L)), .Names = c("col1", "col2", "col3", "col4"), class =
"data.frame", row.names = c(NA, -5L))
v1 <- 1:3

R: create vector from data frame when needed colls saved in this data frame

let's suppose that I have data frame like:
col1 col2 col3 what_col
1 1 2 5 1
2 4 1 2 2
3 3 1 8 2
4 1 5 3 1
5 4 4 1 3
...
I need to do to create vector:
1 1 1 1 1 .....
(In what_col stored what column needed in each row)
You can try
df[cbind(1:nrow(df), df$what_col)]
#[1] 1 1 1 1 1
data
df <- structure(list(col1 = c(1L, 4L, 3L, 1L, 4L), col2 = c(2L, 1L,
1L, 5L, 4L), col3 = c(5L, 2L, 8L, 3L, 1L), what_col = c(1L, 2L,
2L, 1L, 3L)), .Names = c("col1", "col2", "col3", "what_col"),
class = "data.frame", row.names = c("1", "2", "3", "4", "5"))
Here's another option
df[col(df) == df$what_col]
## [1] 1 1 1 1 1

Resources