I have a matrix with that needs to be filled up with values. The first row of the matrix will have the same value and the subsequent row values would be generated using a function based on the first row value
I can do this using nested for loop like this. The outer loop goes over the column, sets the first row in that column to value. Then the inner loop fills up the rest of the rows in that column using the fn. The function itself takes the previous row value as its input.
fn <- function(value){ value + 1 }
myMatrix <- matrix(NA,5,3)
value <- 100
for(col in 1:ncol(myMatrix)){
myMatrix[1,col]<-value #First row value for all the columns should be the same
for(row in 2:nrow(myMatrix)){
#Rest of the row values generated using fn
myMatrix[row,col] <- fn(myMatrix[row-1,col])
}
}
myMatrix
I don't want to use a for loop and would like to specifically achieve this using one of R's vectorized *apply functions. I tried this but its not working.
fn <- function(value){ value + 1 }
myMatrix2 <- matrix(NA,5,3)
value <- 100
sapply(1:ncol(myMatrix2), function(col){
myMatrix2[1,col]<-value
sapply(2:nrow(myMatrix2),function(row){
fn(myMatrix2[row-1,col])
})
})
EDIT :
I was able to achieve it using sapply and the <<- assignment operator for filling up the matrix. But, is there a more cleaner/efficient way to do it using the *apply family ?
fn <- function(value){ value + 1 }
myMatrix2 <- matrix(NA,5,3)
value <- 100
myMatrix2[1,]<-value #first row of the matrix to have the same value
sapply(1:ncol(myMatrix2), function(col){
sapply(2:nrow(myMatrix2),function(row){
myMatrix2[row,col] <<- fn(myMatrix2[row-1,col])
})
})
myMatrix2
Related
I need to optimize a small piece of code. The code can be simplified as following.
Let's say I have two data frame, I want to obtain a "result" data frame that is a selection of data2 with some conditions. For each line I need to add an identifier that corresponds to the line of the first data frame. This identifier is added to the resulting data frame as a column called "identity".
data=data.frame(a=sample(1:100, 100, replace=TRUE),b=sample(1:100, 100, replace=TRUE) )
data2=data.frame(a=sample(1:100, 100, replace=TRUE),b=sample(1:100, 100, replace=TRUE) )
result=NULL
for(i in 1:nrow(data)){ # I loop on each row of "data"
# if the difference between the current row and the column "a"
# of "data2" is bigger than zero we store the values of data2
boolvect=data[i,"a"]-data2$a>0
ares=data2[ boolvect,]
if(nrow(ares)>0){
# we add an identifier for such event, the identifier is the
# row number of "data"
ares$identity=i
result=rbind(result,ares)
}
}
I tried to use apply with margin 1. The results are the same but I don't know how to properly deal with the "identity" column.
all_df=apply(data, 1, function(x, data2){
val=as.numeric(x["a"])
boolvect=val-data2$a>0
return(data2[boolvect,])
}, data2=data2)
result2=do.call(rbind, all_df)
Any help please?
To get the identity column we need to iterate over the index of data.
You can do this using lapply or Map.
result1 <- do.call(rbind, lapply(seq_along(data$a), function(i) {
boolvect= data$a[i] - data2$a > 0
if(any(boolvect)) transform(data2[boolvect, ], identity = i)
}))
With Map :
result2 <- do.call(rbind, Map(function(x, y) {
boolvect = x - data2$a > 0
if(any(boolvect)) transform(data2[boolvect, ], identity = y)
}, data$a, 1:nrow(data)))
I would use lapply instead of apply and feed in the index of each row for the lapply to iterate over. It's the only way for an apply function to "know what row it's on".
all_df=lapply(1:nrow(data), function(x, data, data2){
boolvect=data[x,"a"]-data2$a>0
ares=data2[ boolvect,]
if(nrow(ares)>0){
ares$identity=x
}
return(ares)
}, data =data,data2=data2)
result2=dplyr::bind_rows(all_df)
I'm working on a Kaggle Kernel relating to FIFA 19 data(https://www.kaggle.com/karangadiya/fifa19) and trying to create a function which adds up numbers in a column.
The column has values like 88+2 (class - character)
The desired result would be 90 (class - integer)
I tried to create a function in order to transform such multiple columns
add_fun <- function(x){
a <- strsplit(x, "\\+")
for (i in 1:length(a)){
a[[i]] <- as.numeric(a[[i]])
}
for (i in 1:length(a)){
a[[i]] <- a[[i]][1] + a[[i]][2]
}
x <- as.numeric(unlist(a))
}
This works perfectly fine when I manually transform each column but the function won't return the desired results. Can someone sort this out?
read the csv data in df
then extract the 4 columns required using
dff <- df[, c("LS","ST", "RS","LW")]
def_fun <- function(x){
a <- strsplit(x, '\\+')
for (i in length(a)){
b <- sum(as.numeric(a[[i]]))
}
return (b)
}
Then apply the operations on the required columns
for (i in 1: ncol(dff)){
dff[i] <- apply(dff[i], 1, FUN = def_fun)
}
You can cbind this dataFrame with the original one and drop the original columns.
I hope it proves helpful.
I've got a huge dataframe with many negative values in different columns that should be equal to their original value*0.5.
I've tried to apply many R functions but it seems I can't find a single function to work for the entire dataframe.
I would like something like the following (not working) piece of code:
mydf[] <- replace(mydf[], mydf[] < 0, mydf[]*0.5)
You can simply do,
mydf[mydf<0] <- mydf[mydf<0] * 0.5
If you have values that are non-numeric, then you may want to apply this to only the numeric ones,
ind <- sapply(mydf, is.numeric)
mydf1 <- mydf[ind]
mydf1[mydf1<0] <- mydf1[mydf1<0] * 0.5
mydf[ind] <- mydf1
You could try using lapply() on the entire data frame, making the replacements on each column in succession.
df <- lapply(df, function(x) {
x <- ifelse(x < 0, x*0.5, x)
})
The lapply(), or list apply, function is intended to be used on lists, but data frames are a special type of list so this works here.
Demo
In the replace the values argument should be of the same length as the number of TRUE values in the list ('index' vector)
replace(mydf, mydf <0, mydf[mydf <0]*0.5)
Or another option is set from data.table, which would be very efficient
library(data.table)
for(j in seq_along(mydf)){
i1 <- mydf[[j]] < 0
set(mydf, i = which(i1), j= j, value = mydf[[j]][i1]*0.5)
}
data
set.seed(24)
mydf <- as.data.frame(matrix(rnorm(25), 5, 5))
I have a list of table list_table_Tanzania which I want to conver into a list of matrices list_matrix_Tanzania. I tried to run this for loop function but I got only one output.
for (i in 1:length(list_table_Tanzania)) {
list_matrix_Tanzania<-as.matrix(list_table_Tanzania[[i]], rownames.force = NA)
}
Does somebody know how could I get the same number of matrices of the number of tables in my list?
You are overwriting the value of list_matrix_Tanzania in each iteration.
Try with lapply
list_matrix_Tanzania <- lapply(list_table_Tanzania, as.matrix, rownames.force = NA)
Doing it with a for loop is also possible, you first have to initialise your list by list_matrix_Tanzania <- vector("list", length(list_table_Tanzania) and then in the for loop assign the result of the ith iteration to list_matrix_Tanzania[[i]]
Edit To remove some columns from the lapply result, you can modify it to something like
list_matrix_Tanzania <- lapply(list_table_Tanzania, function(d) {
result <- as.matrix(d, rownames.force = NA)
v <- 1:2 # Replace this with the indices of the columns you wish to remove
result[, -v]
})
I have the following function taken from R: iterative outliers detection (this is an updated version):
dropout<-function(x) {
outliers <- NULL
res <- NULL
if(length(x)<2) return (1)
vals <- rep.int(1, length(x))
r <- chisq.out.test(x)
while (r$p.value<.05 & sum(vals==1)>2) {
if (grepl("highest",r$alternative)) {
d <- which.max(ifelse(vals==1,x, NA))
res <- rbind(list(as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value)),fill=TRUE)
}
else {
d <- which.min(ifelse(vals==1, x, NA))
}
vals[d] <- r$p.value
r <- chisq.out.test(x[vals==1])
}
return(res)
}
The problem is that in each round it gives me some missing rows to fill in the data.frame
i want to fill res but in some iterations it contains missing values.
I used all possible things e.g rbindlist, rbind.fill, rbind (with fill=TRUE) but nothing is working.
When i do something like :
res <- c(res,as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value))
it works but it creates 2 rows for each set of (V1,V2), one with the last column as r$alternativeand the second row with the same first 2 columns but with the p-value in the last column instead.
Thats how I'm calling the function on data similar as the one in the mentioned question:
outliers <- d[, dropout(V3), list(V1, V2)]
and im getting always this error : j doesn't evaluate to the same number of columns for each group