I have the following code:
df<- iris
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
for (z in column.D) {
x[[z]] <- NULL #with this part I wanted to delete the above selected columns
}
}
}
columnFunction(df)
So how is it possible to address data.frame columns "dynamically" so: x[[z]] <- NULL should translate to:
df$Species <- NULL
df[["Species"]] <- NULL
df[,"Species"] <- NULL
and that for every selected column in every data.frame chosen for the function.
Well does anyone know how to archive something like that? I tried several things like with the paste command or sprintf, deparse but i didnt get it working. I also tied to address the data.frame as a global variable by using <<- but didn`t help, too. (Well its the first time i even heard about that). It looks like i miss the right method transferring x and z to the variable assignment.
If you want to create a function columnFunction that removes columns from a passed data frame df, all you need to do is pass the data frame to the function, return the modified version of df, and replace df with the result:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
Related
So I've created the following in R that prints the row number and column name of values missing but is there a way to turn what i've coded into a function - this is likely redic easy but I am very new to this... if I were to create a function based off the code below where would I input the "function_name <-"
for (i in 1:nrow(airbnb)){
rownum <- i
#print(rownum)
for (j in 1:ncol(airbnb)){
colname <- names(airbnb[,j])
#airbnb[i,j]
if(is.na(airbnb[i,j])){
print(paste("Row Number:",i))
print(paste("Column Name:",colname))
}
}
}
I think this is what you're looking for:
You can name your function whatever you want, here it is called missing_func
You can replace the x to be more descriptive, so you can change all of the values for x to be df or dataframe or xyz:
missing_func <- function(x){
for (i in 1:nrow(x)){
rownum <- i
#print(rownum)
for (j in 1:ncol(x)){
colname <- names(x[,j])
#airbnb[i,j]
if(is.na(x[i,j])){
print(paste("Row Number:",i))
print(paste("Column Name:",colname))
}
}
}
}
Now to call the function above, you just need to supply a value for x (or whatever you choose)
missing_func(airbnb)
So, I have a function:
complete <- function(directory,id = 1:332 ) {
directory <- list.files(path="......a")
g <- list()
for(i in 1:length(directory)) {
g[[i]] <- read.csv(directory[i],header=TRUE)
}
rbg <- do.call(rbind,g)
rbgr <- na.omit(rbg) #reads files and omits NA's
complete_subset <- subset(rbgr,rbgr$ID %in% id,select = ID)
table.rbgr <- sapply(complete_subset,table)
table.rbd <- data.frame(table.rbgr)
id.table <- c(id)
findla.tb <- cbind (id.table,table.rbd)
names(findla.tb) <- c("id","nob")
print(findla.tb) #creates table with number of observations
}
Basically when you call the specific numberic id (say 4),
you are suppose to get this output
id nobs
15 328
So, I just need the nobs data to be fed into another function which measures the correlation between two columns if the nobs value is greater than another arbitrarily determined value(T). Since nobs is determined by the value of id, I am uncertain how to create a function that takes into account the output of the other function?
I have tried something like this:
corr <- function (directory, t) {
directory <- list.files(path=".......")
g <- list()
for(i in 1:length(directory)) {
g[[i]] <- read.csv(directory[i],header=TRUE)
}
rbg <- do.call(rbind,g)
g.all <- na.omit(rbg) #reads files and removes observations
source(".....complete.R") #sourcing the complete function above
complete("spec",id)
g.allse <- subset(g.all,g.all$ID %in% id,scol )
g.allnit <- subset(g.all,g.all$ID %in% id,nit )
for(g.all$ID %in% id) {
if(id > t) {
cor(g.allse,g.allnit) #calcualte correlation of these two columns if they have similar id
}
}
#basically for each id that matches the ID in g.all function, if the id > t variable, calculate the correlation between columns
}
complete("spec", 3)
cr <- corr("spec", 150)
head(cr)
I have also tried to make the complete function a data.frame but it does not work and it gives me the following error:
error in data.frame(... check.names = false) arguments imply differing number of rows. So, I am not sure how to proceed....
First off, a reproducible example always helps in getting your question answered, along with a clear explanation of what your functions do/are supposed to do. We cannot run your example code.
Next, you seem to have an error in your corr function. You make multiple references to id but never actually populate this variable in your example code. So we'll just have to guess at what you need help with.
I think what you are trying to do is:
given an id, call complete with that id
use the nobs from that in your code.
In this case, you need to make sure to store the output of your call to complete, e.g.
comp <- complete('spec', id)
You can access the id column value comp['id'] and the nobs value via comp['nobs'] so you could do e.g.
if (comp['nobs'] > t) {
# do stuff e.g.
cor(g.allse, g.allnit)
}
Make sure you store the output of cor somewhere if you wish to actualy get it back later.
You will have to fix the problem of id not being defined yourself, because it is unclear what you want that to be.
EDIT: I solved this one on my own. It had nothing to do with the function object assignment, it was that I was assigning the results to a vector "[]" rather then to a list "[[]]"
here's more reading on the subject: The difference between [] and [[]] notations for accessing the elements of a list or dataframe
I'm trying to filter event data. Depending on what I'm looking at I've got to do the filtering different ways. I've got two functions that I use for filtering (I use them throughout my project, in addition to this instance):
drop_columns <- function(x, ...) {
selectors <- list(...)
return(x[ , -which(names(x) %in% selectors)])
}
filter_by_val <- function(x, col, ...) {
return(x[ which(x[, col] %in% ...), ])
}
Here's the function that choses which function does the filtering, and then executes it. Note that I'm assigning the function to an object called "filter_method":
filter_playtime_data <- function (key_list, data) {
filter_method <- NULL
out_list <- list()
if(key_list$kind == "games") {
filter_method <- function(key_list) {
drop_columns(filter_by_val(data, "GameTitle", key_list), "X")
}
} else if (key_list$kind == "skills") {
filter_method <- function(key_list) {
filter_by_val(data, "Skill", key_list)
}
}
# Separate data with keys
out_list["ELA"] <- filter_method(key_list[["ELA"]])
out_list["MATH"] <- filter_method(key_list[["MATH"]])
out_list["SCI"] <- filter_method(key_list[["SCI"]])
return (out_list)
}
I'm trying to filter data based on "skills" (ie. using filter_by_val) and it's not working as expected. I'm feeding in a data.frame and I'm expecting a data.frame to come out, but instead I'm getting a list of indexes, as if the function is only returning this part of my function: -which(names(x) %in% selectors)
When I run this is the debug browser -- ie. filter_method(key_list[["ELA"]]) -- it works as expected, I get the data frame. But the values held in my output list: out_list[[ELA]] is the list of indexes. Any idea what's happening?
I found the following piece of code here at stackoverflow:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
So i wanted to use it for my own proposes, but it did not work out as planned.
What i try to archive is to use it in a for loop or with lapply to use it with multiple data.frames. Amongst others I tried:
d.frame1 <- iris
d.frame2 <- cars
l.frames <- c("d.frame1","d.frame2")
for (b in l.frames){
columnFunction(b)
}
but it yields the following error message:
Error in dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res :
$ operator is invalid for atomic vectors
Well, what i need additionally is that I can loop though that function so that i can iterate through different data.frames.
Last but not least I would need something like:
for (xyz in l.frames){
xyz <- columnFunction(xyz)
}
to automate the saving step.
Does anyone have any idea how i could loop though that function or how i could change the function so that it performs all those steps and is loopable.
I`m quite new to R so perhaps Im missing something obvious.
lapply was designed for this task:
l.frames <- list(d.frame1, d.frame2)
l.frames <- lapply(l.frames, columnFunction)
If you insist on using a for loop:
for (i in seq_along(l.frames)) l.frames[[i]] <- columnFunction(l.frames[[i]])
I have a list called "scenbase" that contains 40 data frames, which are each 326 rows by 68 columns. I would like to use lapply() to subset the data frames so they only retain rows 33-152. I've written a simple function called trim() (below), and am attempting to apply it to the list of data frames but am getting an error message. The function and my attempt at using it with lapply is below:
trim <- function(i)
{ (i <- i[33:152,]) }
lapply(scenbase, trim)
Error in i[33:152, ] : incorrect number of dimensions
When I try to do the same thing to one of the individual data frames (soil11base.txt) that are included in the list (below), it works as expected:
soil11base.txt <- soil11base.txt[33:152,]
Any idea what I need to do to get the dimensions correct?
You have 2 solutions. You can either
(a) assign to a new list newList = lapply(scenbase, function(x) { x[33:152,,drop=F]} )
(b) use the <<- operator will assign your trimmed data in place lapply(1:length(scenbase), function(x) { scenbase[[x]] <<- scenbase[[x]][33:152,,drop=F]} ).
Your call does not work because the i is not in the global scope. You can work your way around that by using calls to the <<- operator which assigns to the first variable it finds in successive parent environments. Or by creating a new trimmed list.
Here is some code that reproduces solution (a):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
choppedList = lapply(listOfDfs, function(x) { x[33:152,,drop=F]} )
Here is some code that reproduces solution (b):
listOfDfs = list()
for(i in 1:10) { listOfDfs[[i]] = data.frame("x"=sample(letters,200,replace=T),"y"=sample(letters,200,replace=T)) }
lapply(1:length(listOfDfs), function(x) { listOfDfs[[x]] <<- listOfDfs[[x]][33:152,,drop=F]} )