I make a simple code which can change the index number to name
change_name = function(x) {
valid_user[match(x, valid_user$id),'name']
}
and apply this function to data.frame.
Data.frame name is 'ga.screen', and the column name is 'dimension1'.
ga.screen[, 'dimension1'] =sapply(ga.screen[, 'dimension1'], change_name)
It works well.
And next I want to make this code function which is be able to apply various case.
readable_user_id = function(data, col) {
data[, col] = sapply(data[, col], change_name)
}
readable_user_id(ga.screen, 'dimension1')
This is totally same code but the latter one doesn't work!
Why this happens? Is this a sapply problem which doesn't work in function? or Is this data.frame problem which can't be passed as parameter?
Your function should return the modified data, try
readable_user_id = function(data, col) {
data[, col] = sapply(data[, col], change_name)
data
}
Related
I have this function in the dataset ecr, that takes a column of ecr as a string and changes each value appropriately:
ctrc <- function(x) {
(ecr[[x]]-mean(ecr[[x]]))/sd(ecr[[x]])
}
It then prints the values to the console. I would like to assign the results of this function to a column in ecr. However, nothing that I have tried has resulted in this happening. Some examples:
ctrc <- function(x) {
ecr$var2 <- (ecr[[x]]-mean(ecr[[x]]))/sd(ecr[[x]])
}
mctrc <- function(y) {
mutate(ecr, var2=ctrc("y"))
}
How do I get this function to work, and why don't these versions work?
The function returns some value/output which is either defined under return() or the last executed syntax. In this case, it will be (x-mean(x)) /sd. So the variable you are looking for to create can be assigned to the new var as below:
#function defn::
ctrc <- function(x) {
(ecr[[x]]-mean(ecr[[x]]))/sd(ecr[[x]])
}
#assign in var
ecr$newvar=ctrc(oldVarName)
where nevar: new variable name which you are trying to create,
oldVarName : variable on which you want to apply this function
As best practice one should have return statement in function to ease the interpretability of code:
ctrc <- function(x) {
y=(ecr[[x]]-mean(ecr[[x]]))/sd(ecr[[x]])
return(y)
}
Unless you have a very strong reason, don't pass column name as input to the function, pass the values instead.
ctrc <- function(x) (x-mean(x))/sd(x)
you can then call it as :
library(dplyr)
ecr %>% mutate(var2 = ctrc(y))
OR for multiple columns
ecr %>% mutate(across(c(y, z), ctrc))
EDIT: I solved this one on my own. It had nothing to do with the function object assignment, it was that I was assigning the results to a vector "[]" rather then to a list "[[]]"
here's more reading on the subject: The difference between [] and [[]] notations for accessing the elements of a list or dataframe
I'm trying to filter event data. Depending on what I'm looking at I've got to do the filtering different ways. I've got two functions that I use for filtering (I use them throughout my project, in addition to this instance):
drop_columns <- function(x, ...) {
selectors <- list(...)
return(x[ , -which(names(x) %in% selectors)])
}
filter_by_val <- function(x, col, ...) {
return(x[ which(x[, col] %in% ...), ])
}
Here's the function that choses which function does the filtering, and then executes it. Note that I'm assigning the function to an object called "filter_method":
filter_playtime_data <- function (key_list, data) {
filter_method <- NULL
out_list <- list()
if(key_list$kind == "games") {
filter_method <- function(key_list) {
drop_columns(filter_by_val(data, "GameTitle", key_list), "X")
}
} else if (key_list$kind == "skills") {
filter_method <- function(key_list) {
filter_by_val(data, "Skill", key_list)
}
}
# Separate data with keys
out_list["ELA"] <- filter_method(key_list[["ELA"]])
out_list["MATH"] <- filter_method(key_list[["MATH"]])
out_list["SCI"] <- filter_method(key_list[["SCI"]])
return (out_list)
}
I'm trying to filter data based on "skills" (ie. using filter_by_val) and it's not working as expected. I'm feeding in a data.frame and I'm expecting a data.frame to come out, but instead I'm getting a list of indexes, as if the function is only returning this part of my function: -which(names(x) %in% selectors)
When I run this is the debug browser -- ie. filter_method(key_list[["ELA"]]) -- it works as expected, I get the data frame. But the values held in my output list: out_list[[ELA]] is the list of indexes. Any idea what's happening?
I have the following code:
df<- iris
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
for (z in column.D) {
x[[z]] <- NULL #with this part I wanted to delete the above selected columns
}
}
}
columnFunction(df)
So how is it possible to address data.frame columns "dynamically" so: x[[z]] <- NULL should translate to:
df$Species <- NULL
df[["Species"]] <- NULL
df[,"Species"] <- NULL
and that for every selected column in every data.frame chosen for the function.
Well does anyone know how to archive something like that? I tried several things like with the paste command or sprintf, deparse but i didnt get it working. I also tied to address the data.frame as a global variable by using <<- but didn`t help, too. (Well its the first time i even heard about that). It looks like i miss the right method transferring x and z to the variable assignment.
If you want to create a function columnFunction that removes columns from a passed data frame df, all you need to do is pass the data frame to the function, return the modified version of df, and replace df with the result:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
I found the following piece of code here at stackoverflow:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
So i wanted to use it for my own proposes, but it did not work out as planned.
What i try to archive is to use it in a for loop or with lapply to use it with multiple data.frames. Amongst others I tried:
d.frame1 <- iris
d.frame2 <- cars
l.frames <- c("d.frame1","d.frame2")
for (b in l.frames){
columnFunction(b)
}
but it yields the following error message:
Error in dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res :
$ operator is invalid for atomic vectors
Well, what i need additionally is that I can loop though that function so that i can iterate through different data.frames.
Last but not least I would need something like:
for (xyz in l.frames){
xyz <- columnFunction(xyz)
}
to automate the saving step.
Does anyone have any idea how i could loop though that function or how i could change the function so that it performs all those steps and is loopable.
I`m quite new to R so perhaps Im missing something obvious.
lapply was designed for this task:
l.frames <- list(d.frame1, d.frame2)
l.frames <- lapply(l.frames, columnFunction)
If you insist on using a for loop:
for (i in seq_along(l.frames)) l.frames[[i]] <- columnFunction(l.frames[[i]])
I'm totally new to R, and I've been trying to replace the NA values with the mean value for each column. I've tried a lot of options. but none seems to work. I've tried this one and many similar ones but i keep on getting: argument is not numeric or logical: returning NA.
script<-function() {
for (i in names(data)) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
}
But then after a while I thought I'd just count the columns and came up with this:
script<-function() {
for (i in 1:20) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
}
which doesn't show any errors, but doesn't seem to work either. When I type in data it's just the same data frame, but unedited. Could anyone help me with this?
The problem with your function is that it is a function, and thus the scoping only updates data within the scope of the function
running
for (i in names(data)) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
}
Not within a function will work as you wish.
Another approach would be to pass data as an argument
imputeMean <-function(data) {
for (i in names(data)) {
data[[i]][is.na(data[[i]])] <- mean(data[[i]], na.rm=TRUE);
}
return(data)
}
# then you can save the result as a new object
updatedData <- imputeMean(data)
Note that for named lists (as data is), [[<- will copy every time, so you could get around this by using lapply
updatedData <- lapply(data, function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
Feel free to make a function out of this (updated per mnel correction):
data.frame(lapply(data, function(x){replace(x, is.na(x), mean(x,na.rm=T))}))