I'm trying to create a function the will do a pairwise comparison between the values of one column to another and create a new vector depending on those values. I cannot work out how to allow two of the arguments to be column names that can then be changed and the function can be used on another set of columns.
The specific situation is there are four columns of coloured band labels for a parent bird (pbc1...pbc4) and another four for its chick(obc1...obc4). The band columns are columns of characters such as 'G' 'PG' 'B' etc.
this is the code of the first part of my function which I will extend to include all pairwise comparisons after I get this running:
colourdistance1 <- function(df, refcoldistdf, pbc, obc){
n <- length(pbc)
coldist1 <- rep(NA,n)
for(i in 1:n){
if(pbc[i]==obc[i]){
coldist1[i] <- 0
} else if(pbc[i]=='M'|obc[i]=='M'){
coldist1[i] <- NA
} else if(pbc[i]=='G'& obc[i]=='PG'| obc[i]=='G'& pbc[i]=='PG'){
coldist1[i] <- refcoldistdf[2,2]
} else {
coldist1[i] <- NA
}
}
}
p1o1 <- colourdistance1(bd_df, refcoldistdf,pbc = pbc1, obc = obc1)
This call just returns the object p1o1 as being NULL
I have also tried:
colourdistance1 <- function(df, refcoldistdf, pbc, obc){
n <- length(pbc)
coldist1 <- rep(NA,n)
for(i in 1:n){
if(df$pbc[i]==df$obc[i]){
coldist1[i] <- 0
} else if(df$pbc[i]=='M'|df$obc[i]=='M'){
coldist1[i] <- NA
} else if(df$pbc[i]=='G'& df$obc[i]=='PG'| df$obc[i]=='G'& df$pbc[i]=='PG') {
coldist1[i] <- refcoldistdf[2,2]
} else {
coldist1[i] <- NA
}
}
}
But that just gives this error:
Error in if (df$pbc[i] == df$obc[i]) { : argument is of length zero
I have tried all the code outside the function, inserting the column names and index number and df name and it all works. This makes me think I have an issue with the function arguments not connecting to the function code as I intended.
Any help will be appreciated!!
Reproducible test data:
pbc1 <- c('B','W','G','R')
obc1 <- c('Y','W','PG','FP')
pbc2 <- c('W','W','W','M')
obc2 <- c('M','W','R','R')
pbc3 <- c('W','K','FP','K')
obc3 <- c('G','PG','B','PB')
pbc4 <- c('K','K','B','M')
obc4 <- c('K','PG','W','M')
testbanddf <- cbind(pbc1,obc1,pbc2,obc2,pbc3,obc3,pbc4,obc4)
testrefcoldist <- diag(11)
So there are quite a few comments to make, but first, you might try this:
pbc1 <- c('B','W','G','R')
obc1 <- c('Y','W','PG','FP')
pbc2 <- c('W','W','W','M')
obc2 <- c('M','W','R','R')
pbc3 <- c('W','K','FP','K')
obc3 <- c('G','PG','B','PB')
pbc4 <- c('K','K','B','M')
obc4 <- c('K','PG','W','M')
testbanddf <- data.frame(pbc1,obc1,pbc2,obc2,pbc3,obc3,pbc4,obc4)
testrefcoldist <- diag(11)
colourdistance1 <- function(df, refcoldistdf, pbc, obc){
n <- nrow(df)
coldist1 <- rep(NA,n)
pbc <- df[[pbc]]
obc <- df[[obc]]
for(i in 1:n){
if(pbc[i]==obc[i]){
coldist1[i] <- 0
} else if(pbc[i]=='M'|obc[i]=='M'){
coldist1[i] <- NA
} else if(pbc[i]=='G'& obc[i]=='PG'| obc[i]=='G'& pbc[i]=='PG'){
coldist1[i] <- refcoldistdf[2,2]
} else {
coldist1[i] <- NA
}
}
coldist1
}
colourdistance1(testbanddf, testrefcoldist,pbc = "pbc1", obc = "obc1")
cbind() creates a matrix, not a data frame. You create data frames with the function data.frame().
The simplest way forward is to make the arguments pbc and obc be characters representing the column names.
Referring to data frame columns using $ is useful when working interactively, but isn't so useful (as you discovered) when writing functions and don't know the names of columns in advance. In that case, you use [[, and can select them by name or position.
Your function as written didn't explicitly return coldist1.
Related
I am working on a project in which I need to filter my list if it has certain values for each of my IDs but unfortunately, it isn't working.
So I have one list with 12 different matrixes with the same columns but diff
library(tidyverse)
trajectory_C <- list()
trajectory_D <- list()
file_list_C <- list.files(pattern=".trajectory_C.csv")
file_list_D <- list.files(pattern=".trajectory_D.csv")
for (i in 1:length(file_list_C)) {
trajectory_C[[i]] <- read.csv(file_list_C[i])
}
for (i in 1:length(file_list_D)) {
trajectory_D[[i]] <- read.csv(file_list_D[i])
}
So my two lists are trajectory_D and trajectory_C. I then created two other lists and I saved the unique values of a certain column called "ID" and added a validation column to it.
unique_ID_C <- list()
unique_ID_D <- list()
for (i in 1:12) {
unique_ID_C[[i]] <- unique(trajectory_C[[i]]["ID"])
unique_ID_D[[i]] <- unique(trajectory_D[[i]]["ID"])
}
for (i in 1:12) {
Turning <- matrix(data=0,nrow = length(unique_ID_C[[i]]), ncol = 1)
unique_ID_C[[i]] <- cbind(unique_ID_C[[i]],Turning)
names(unique_ID_C[[i]]) <- c("ID","Validation")
}
What I want to do right now is understand if each of my unique values has certain elements (28 and 29) in the variable "Segment". For all the twelve different levels of my list.
for (i in 1:12) {
for (ID in unique_ID_C[[1]]) {
c <- unique(trajectory_C[[i]][trajectory_C[[i]]["ID"] == ID,"Segment"])
unique_ID_C[[i]][unique_ID_C[[i]]["ID"] == ID,2] <- ifelse(any(28 == c) == TRUE & any(29 == c) == TRUE,1,0)
}
}
I am new in programming and this is the first time I am using Lists so this might be my problem.
I want to do an operation if each data frame of a list. I want to perform the Kolmogorov–Smirnov (KS) test for one column in each data frame. I am using the code below but it is not working:
PDF_mean <- matrix(nrow = length(siteNumber), ncol = 4)
PDF_mean <- data.frame(PDF_mean)
names(PDF_mean) <- c("station","normal","gamma","gev")
listDF <- mget(ls(pattern="DSF_moments_"))
length(listDF)
i <- 1
for (i in length(listDF)) {
PDF_mean$station[i] <- siteNumber[i]
PDF_mean$normal[i] <- ks.test(list[i]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(list[i]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(list[i]$mean,"gamma")$p.value
}
Any help?
It is not length(listDF) instead, it would be seq_along(listDF) or 1:length(listDF) (however, it is more appropriate with seq_along) because length is a single value and it is not doing any loop
for(i in seq_along(listDF)) {
PDF_mean$station[i] <- listDF[[i]]$siteNumber
PDF_mean$normal[i] <- ks.test(listDF[[i]]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(listDF[[i]]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(listDF[[i]]$mean,"gamma")$p.value
}
My code works well when I put the name of the column. But when I want to create a loop with a vector of column names and order the dataframe by indexing the column is not working properly.
Code below works well:
indexROW <- round(nrow(Home_strategy_Yes) * 0.2)
Home_strategy_Yes_reordered <- Home_strategy_Yes[order(HT.av.points)]
Home_strategy_Yes_reordered$ID <- seq.int(nrow(Home_strategy_Yes_reordered))
value <- Home_strategy_Yes_reordered[indexROW,HT.av.points]
percentageTOfilter <- min(Home_strategy_Yes_reordered[HT.av.points == value,ID]) -1
valueTOfilter <- Home_strategy_Yes_reordered[percentageTOfilter,HT.av.points]
Problem when looping with vector of colnames
columns_setA <- c("HT.av.points","HT_av.PointsTotal")
for (i in 1:length(columns_setA)){
indexROW <- round(nrow(Home_strategy_Yes) * 0.2)
Home_strategy_Yes_reordered <- Home_strategy_Yes[order(columns_setA[i])]
Home_strategy_Yes_reordered$ID <- seq.int(nrow(Home_strategy_Yes_reordered))
value <- Home_strategy_Yes_reordered[indexROW,columns_setA[i]]
percentageTOfilter <- min(Home_strategy_Yes_reordered[columns_setA[i] == value,ID]) -1
valueTOfilter <- Home_strategy_Yes_reordered[percentageTOfilter,columns_setA[i]]
}
The order function does not work inside the loop as it does outside the loop.
The i is not evaluated. We may need either get or convert to symbol and evaluate inside the loop
...
percentageTOfilter <- min(Home_strategy_Yes_reordered[get(i) == value,ID]) -1
...
Adding get() order function works as expected.
for (i in 1:length(columns_setA)){
indexROW <- round(nrow(Home_strategy_Yes) * 0.2)
Home_strategy_Yes_reordered <- Home_strategy_Yes[order(get(columns_setA[i]))]
Home_strategy_Yes_reordered$ID <- seq.int(nrow(Home_strategy_Yes_reordered))
value <- Home_strategy_Yes_reordered[indexROW,get(columns_setA[i])]
percentageTOfilter <- min(Home_strategy_Yes_reordered[get(columns_setA[i]) == value,ID]) -1
valueTOfilter <- Home_strategy_Yes_reordered[percentageTOfilter,get(columns_setA[i])]
}
I'm working on a Kaggle Kernel relating to FIFA 19 data(https://www.kaggle.com/karangadiya/fifa19) and trying to create a function which adds up numbers in a column.
The column has values like 88+2 (class - character)
The desired result would be 90 (class - integer)
I tried to create a function in order to transform such multiple columns
add_fun <- function(x){
a <- strsplit(x, "\\+")
for (i in 1:length(a)){
a[[i]] <- as.numeric(a[[i]])
}
for (i in 1:length(a)){
a[[i]] <- a[[i]][1] + a[[i]][2]
}
x <- as.numeric(unlist(a))
}
This works perfectly fine when I manually transform each column but the function won't return the desired results. Can someone sort this out?
read the csv data in df
then extract the 4 columns required using
dff <- df[, c("LS","ST", "RS","LW")]
def_fun <- function(x){
a <- strsplit(x, '\\+')
for (i in length(a)){
b <- sum(as.numeric(a[[i]]))
}
return (b)
}
Then apply the operations on the required columns
for (i in 1: ncol(dff)){
dff[i] <- apply(dff[i], 1, FUN = def_fun)
}
You can cbind this dataFrame with the original one and drop the original columns.
I hope it proves helpful.
Using the following code, I can print the values iterating each for loop.
for(i in 5:12)
{
for(j in 5:12)
{
for(k in 5:12)
{
for(l in 5:12)
{
cat(i,j,k,l,'\n')
}
}
}
}
Now I want to store the output data into a data frame df considering 4 columns (a,b,c,d) of numeric data. All I know is only the following code but has only single 'for' in it.
f3 <- function(n){
df <- data.frame(x = numeric(n), y = numeric(n))
for(i in 1:n){
df$x[i] <- i
df$y[i] <- i
}
df
}
How to input data into data frames while using nested for loops. Thank you.
you should try expand.grid
a <- 5:12
df <- expand.grid(a,a,a,a)
names(df) <- c("a","b","c","d")