I am writing a piece of code that looks up a 2 letter string (variable y) in another data frame column of strings(test4$names) for each app(variable e)
I am trying to use lists instead of rbind to join each data frame created in the inner loop (x)
for (e in b) {
bi_grams_3 <- bi_final[which(bi_final$app == e),c(1,3)]
test_4 <- test_1[which(test_1$app == e),c(1,3,5)]
for(c in 1:nrow(bi_grams_3))
{
y <-as.character(bi_grams_3[c,1])
aa <- intersect(
grep(word(y,1),test_4$names),
grep(word(y,2),test_4$names)
)
x <- test_4[aa,]
x$root <- y
dat_list[[c]] <- x
#x_all <- rbind(x,x_all)
}
fin_list[[e]] <- dat_list
n=n+1
message(paste(n,e))
}
However, it is not giving me the same result as rbind in the output (x_all). I am getting some extra values in the fin_list (for all except the 1st app).
Also. How can I later rbind the results to get a dataframe?
I try to simulate changes in a data frame through different steps depending on each others. Let's try to take a very simple example to illustrate my problem.
I create a dataframe with two columns
a=runif(10)
b=runif(10)
data_1=data.frame(a,b)
data_1
a b
1 0.94922669 0.47418098
2 0.26702201 0.79179699
3 0.57398333 0.25158378
4 0.52724079 0.61531202
5 0.03999831 0.95233479
6 0.15171673 0.64564561
7 0.51353129 0.75676464
8 0.60312432 0.85318316
9 0.52900913 0.06297818
10 0.75459362 0.40209925
Then, I would like to create n steps, where each step consists in creating a new dataframe at i+1 which is function (let's call it "whatever") of the dataframe at i: data_2 is a transformation of data_1, data_3 a transformation of data_2, etc.
iterations=function(nsteps)
{
lapply(1:nsteps,function(i)
{
data_i+1=whatever(data_i)
})
}
Whatever the function I use, I have an error message saying:
Error in whatever(data_i) : object 'data_i' not found
Can someone help me figure out what I am missing?
See if you can get some inspiration from the following example.
First, a whatever function to be applied to the previous dataframe.
whatever <- function(DF) {
DF[[2]] <- DF[[2]]*2
DF
}
Now the function you want. I have added an extra argument, the dataframe x.
The function starts by creating the object to be returned. Each member of the list data_list will be a dataframe function of the previous dataframe.
iterations <- function(nsteps, x){
data_list <- vector("list", length = nsteps)
data_list[[1]] <- x
for(i in seq_len(nsteps)[-1]){
data_list[[i]] <- whatever(data_list[[i - 1]])
}
names(data_list) <- sprintf("data_%d", seq_len(nsteps))
data_list
}
And apply iterations to an example dataframe.
df1 <- data.frame(A = letters[1:10], X = 1:10)
iterations(10, df1)
You might be looking for a combination of assign and paste:
assign(paste("data_", i + 1, sep = ""), whatever(data_i))
I'm trying to create a function the will do a pairwise comparison between the values of one column to another and create a new vector depending on those values. I cannot work out how to allow two of the arguments to be column names that can then be changed and the function can be used on another set of columns.
The specific situation is there are four columns of coloured band labels for a parent bird (pbc1...pbc4) and another four for its chick(obc1...obc4). The band columns are columns of characters such as 'G' 'PG' 'B' etc.
this is the code of the first part of my function which I will extend to include all pairwise comparisons after I get this running:
colourdistance1 <- function(df, refcoldistdf, pbc, obc){
n <- length(pbc)
coldist1 <- rep(NA,n)
for(i in 1:n){
if(pbc[i]==obc[i]){
coldist1[i] <- 0
} else if(pbc[i]=='M'|obc[i]=='M'){
coldist1[i] <- NA
} else if(pbc[i]=='G'& obc[i]=='PG'| obc[i]=='G'& pbc[i]=='PG'){
coldist1[i] <- refcoldistdf[2,2]
} else {
coldist1[i] <- NA
}
}
}
p1o1 <- colourdistance1(bd_df, refcoldistdf,pbc = pbc1, obc = obc1)
This call just returns the object p1o1 as being NULL
I have also tried:
colourdistance1 <- function(df, refcoldistdf, pbc, obc){
n <- length(pbc)
coldist1 <- rep(NA,n)
for(i in 1:n){
if(df$pbc[i]==df$obc[i]){
coldist1[i] <- 0
} else if(df$pbc[i]=='M'|df$obc[i]=='M'){
coldist1[i] <- NA
} else if(df$pbc[i]=='G'& df$obc[i]=='PG'| df$obc[i]=='G'& df$pbc[i]=='PG') {
coldist1[i] <- refcoldistdf[2,2]
} else {
coldist1[i] <- NA
}
}
}
But that just gives this error:
Error in if (df$pbc[i] == df$obc[i]) { : argument is of length zero
I have tried all the code outside the function, inserting the column names and index number and df name and it all works. This makes me think I have an issue with the function arguments not connecting to the function code as I intended.
Any help will be appreciated!!
Reproducible test data:
pbc1 <- c('B','W','G','R')
obc1 <- c('Y','W','PG','FP')
pbc2 <- c('W','W','W','M')
obc2 <- c('M','W','R','R')
pbc3 <- c('W','K','FP','K')
obc3 <- c('G','PG','B','PB')
pbc4 <- c('K','K','B','M')
obc4 <- c('K','PG','W','M')
testbanddf <- cbind(pbc1,obc1,pbc2,obc2,pbc3,obc3,pbc4,obc4)
testrefcoldist <- diag(11)
So there are quite a few comments to make, but first, you might try this:
pbc1 <- c('B','W','G','R')
obc1 <- c('Y','W','PG','FP')
pbc2 <- c('W','W','W','M')
obc2 <- c('M','W','R','R')
pbc3 <- c('W','K','FP','K')
obc3 <- c('G','PG','B','PB')
pbc4 <- c('K','K','B','M')
obc4 <- c('K','PG','W','M')
testbanddf <- data.frame(pbc1,obc1,pbc2,obc2,pbc3,obc3,pbc4,obc4)
testrefcoldist <- diag(11)
colourdistance1 <- function(df, refcoldistdf, pbc, obc){
n <- nrow(df)
coldist1 <- rep(NA,n)
pbc <- df[[pbc]]
obc <- df[[obc]]
for(i in 1:n){
if(pbc[i]==obc[i]){
coldist1[i] <- 0
} else if(pbc[i]=='M'|obc[i]=='M'){
coldist1[i] <- NA
} else if(pbc[i]=='G'& obc[i]=='PG'| obc[i]=='G'& pbc[i]=='PG'){
coldist1[i] <- refcoldistdf[2,2]
} else {
coldist1[i] <- NA
}
}
coldist1
}
colourdistance1(testbanddf, testrefcoldist,pbc = "pbc1", obc = "obc1")
cbind() creates a matrix, not a data frame. You create data frames with the function data.frame().
The simplest way forward is to make the arguments pbc and obc be characters representing the column names.
Referring to data frame columns using $ is useful when working interactively, but isn't so useful (as you discovered) when writing functions and don't know the names of columns in advance. In that case, you use [[, and can select them by name or position.
Your function as written didn't explicitly return coldist1.
I want to search through a vector for the sequence of strings "hello" "world". When I find this sequence, I want to copy it, including the 10 elements before and after, as a row in a data.frame to which I'll apply further analysis.
My problem: I get an error "new column would leave holes after existing columns". I'm new to coding, so I'm not sure how to manipulate data.frames. Maybe I need to create rows in the loop?
This is what I have:
df = data.frame()
i <- 1
for(n in 1:length(v))
{
if(v[n] == 'hello' & v[n+1] == 'world')
{
df[i,n-11:n+11] <- v[n-10:n+11]
i <- i+1
}
}
Thanks!
May be this helps
indx <- which(v1[-length(v1)]=='hello'& v1[-1]=='world')
lst <- Map(function(x,y) {s1 <- seq(x,y)
v1[s1[s1>0 & s1 < length(v1)]]}, indx-10, indx+11)
len <- max(sapply(lst, length))
d1 <- as.data.frame(do.call(rbind,lapply(lst, `length<-`, len)))
data
set.seed(496)
v1 <- sample(c(letters[1:3], 'hello', 'world'), 100, replace=TRUE)
I have the following function taken from R: iterative outliers detection (this is an updated version):
dropout<-function(x) {
outliers <- NULL
res <- NULL
if(length(x)<2) return (1)
vals <- rep.int(1, length(x))
r <- chisq.out.test(x)
while (r$p.value<.05 & sum(vals==1)>2) {
if (grepl("highest",r$alternative)) {
d <- which.max(ifelse(vals==1,x, NA))
res <- rbind(list(as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value)),fill=TRUE)
}
else {
d <- which.min(ifelse(vals==1, x, NA))
}
vals[d] <- r$p.value
r <- chisq.out.test(x[vals==1])
}
return(res)
}
The problem is that in each round it gives me some missing rows to fill in the data.frame
i want to fill res but in some iterations it contains missing values.
I used all possible things e.g rbindlist, rbind.fill, rbind (with fill=TRUE) but nothing is working.
When i do something like :
res <- c(res,as.numeric(strsplit(r$alternative," ")[[1]][3]),as.numeric(r$p.value))
it works but it creates 2 rows for each set of (V1,V2), one with the last column as r$alternativeand the second row with the same first 2 columns but with the p-value in the last column instead.
Thats how I'm calling the function on data similar as the one in the mentioned question:
outliers <- d[, dropout(V3), list(V1, V2)]
and im getting always this error : j doesn't evaluate to the same number of columns for each group