R - Apply custom function to single column row by row - r

I have created a custom function and wish to apply it to a single column of a dataframe row by row, then assign it back to the original column
The custom function is below, and aims to fix the dates in an excel file.
format_dates = function(x) {
x = trimws(x)
if ( grepl('/', x, fixed=TRUE) ) {
as.Date(x, tryFormats = c("%d/%m/%Y", "%m/%d/%Y"))
} else {
tryCatch(
{ as.Date(as.integer(x), origin='1899-12-30') },
warning=function(x) { return( NA )
} )
}
}
It is mandatory to do this row by row. I have searched high and low and I have seen many replies using lapply, apply, and sapply but they do not work. As an example, I tried:
df$Child_Date_of_Birth = apply(df$Child_Date_of_Birth, 2, format_dates)
With the result of
Error in apply(df$Child_Date_of_Birth, 2, format_dates) :
dim(X) must have a positive length
This is frustrating, as in Pandas you can simply run
df['Child_Date_of_Birth'] = df['Child_Date_of_Birth'].apply(format_dates)
but in R this becomes the most obscure thing ever??
Anyone able to enlighten me... will appreciate it

An example data would be helpful but I think you can try sapply :
df$Child_Date_of_Birth <- sapply(df$Child_Date_of_Birth, format_dates)

Related

Missing value often results in errors in lapply (in R)

I got errors in this code:
FUN = function(files) {
df_week<- data.table::fread(files)
#Sun rate
for (i in 1: nrow(df_week) ){
#check if df is not NA
if(!is.na(df_week[i]))
{
if(df_week$Sun[i] >=10 ) {df_week$Sunr[i] =5}
....
}
}
files = list.files(pattern="1_Stas*")
lapply(files, FUN)
Output:
Error in if (!is.na(df_week[i])) { : argument is of length zero
In addition: Warning message:
13 failed to parse.
Why does the code if () {} gives errors?
If the input contains missing value or NA, the ouput should be NaN or NA , and lapply should continue to the next list of files.
I have tried it with a single file without using lapply and function, the output appears in the environment as empty data.
So when I do it one by one, there's no error. When it is done using lapply, very often there would be problems. Should I uses for loop instead?
Any suggestions to fix it and make lapply continue to the next list of files when the previous file contains missing value?
Thanks.
Just use a for loop. The apply family is good for quick one-liners when you need to apply a single operation to one dimension of your data. Your code will become:
files = list.files(pattern="1_Stas*")
df_week <- data.table::fread(files) # Make sure length(files) == 1
#Sun rate
for (i in 1:nrow(df_week)) {
#check if df is not NA
if (!is.na(df_week[i])) {
if (df_week$Sun[i] >= 10) {
df_week$Sunr[i] = 5
}
...
}
}
The line containing if(!is.na(df_week[i])) { needs clarification. From context, length(dim(df_week)) > 1, so you probably want
if (all(!is.na(df_week[i]))) {
...
}
all(!is.na(df_week[i])) returns true when the ith row of df_week containes no NA values.

R - function which has a data frame parameter doesn't work

I make a simple code which can change the index number to name
change_name = function(x) {
valid_user[match(x, valid_user$id),'name']
}
and apply this function to data.frame.
Data.frame name is 'ga.screen', and the column name is 'dimension1'.
ga.screen[, 'dimension1'] =sapply(ga.screen[, 'dimension1'], change_name)
It works well.
And next I want to make this code function which is be able to apply various case.
readable_user_id = function(data, col) {
data[, col] = sapply(data[, col], change_name)
}
readable_user_id(ga.screen, 'dimension1')
This is totally same code but the latter one doesn't work!
Why this happens? Is this a sapply problem which doesn't work in function? or Is this data.frame problem which can't be passed as parameter?
Your function should return the modified data, try
readable_user_id = function(data, col) {
data[, col] = sapply(data[, col], change_name)
data
}

Delete data frame column within function

I have the following code:
df<- iris
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
for (z in column.D) {
x[[z]] <- NULL #with this part I wanted to delete the above selected columns
}
}
}
columnFunction(df)
So how is it possible to address data.frame columns "dynamically" so: x[[z]] <- NULL should translate to:
df$Species <- NULL
df[["Species"]] <- NULL
df[,"Species"] <- NULL
and that for every selected column in every data.frame chosen for the function.
Well does anyone know how to archive something like that? I tried several things like with the paste command or sprintf, deparse but i didnt get it working. I also tied to address the data.frame as a global variable by using <<- but didn`t help, too. (Well its the first time i even heard about that). It looks like i miss the right method transferring x and z to the variable assignment.
If you want to create a function columnFunction that removes columns from a passed data frame df, all you need to do is pass the data frame to the function, return the modified version of df, and replace df with the result:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)

Delete data.frame columns and loop through data.frame assignment function

I found the following piece of code here at stackoverflow:
library(svDialogs)
columnFunction <- function (x) {
column.D <- dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res
if (!length((column.D))) {
cat("No column selected\n")
} else {
cat("The following columns are choosen:\n")
print(column.D)
x <- x[,!names(x) %in% column.D]
}
return(x)
}
df <- columnFunction(df)
So i wanted to use it for my own proposes, but it did not work out as planned.
What i try to archive is to use it in a for loop or with lapply to use it with multiple data.frames. Amongst others I tried:
d.frame1 <- iris
d.frame2 <- cars
l.frames <- c("d.frame1","d.frame2")
for (b in l.frames){
columnFunction(b)
}
but it yields the following error message:
Error in dlgList(names(x), multiple = T, title = "Spalten auswaehlen")$res :
$ operator is invalid for atomic vectors
Well, what i need additionally is that I can loop though that function so that i can iterate through different data.frames.
Last but not least I would need something like:
for (xyz in l.frames){
xyz <- columnFunction(xyz)
}
to automate the saving step.
Does anyone have any idea how i could loop though that function or how i could change the function so that it performs all those steps and is loopable.
I`m quite new to R so perhaps Im missing something obvious.
lapply was designed for this task:
l.frames <- list(d.frame1, d.frame2)
l.frames <- lapply(l.frames, columnFunction)
If you insist on using a for loop:
for (i in seq_along(l.frames)) l.frames[[i]] <- columnFunction(l.frames[[i]])

read.csv2.ffdf is importing a numeric (float) variable as factor

I have a couple weeks working with the ff package and it has been working great so far,
but today I realized that a variable that should be numeric is being readed as a factor.
The data has about 900k rows and 800 col, so it's not easy to control that every column gets the class that it should...
matff <- read.csv2.ffdf(file = name,encoding = "UTF-8",next.rows=150000,colClasses=NA)
I would like to know why may this be happening and an idea on how to fix it.
Thanks.
Your data has some columns which are clearly texts and not numeric data as you expect it.
You can use the transFUN argument to read.csv2.ffdf to solve your decimal problem. As in
transFUN=function(x){
x$mycolumn <- as.numeric(gsub(",", ".", as.character(x$mycolumn)))
x
}
Or use the appropriate read.table arguments.
Now it should work:
# matff <- data.frame(Col=c('a','b','c'),Mix1=c('a','1.2','c'),Mix2=c(1.1,2.1,3),Num1=c('1.2','2.3','3.4'),Num2=c('1,2','2,3','3,4')) # Data example
func <- function(x) {
if (class(x) != 'numeric') {
x <- levels(x)[x]
if (length(grep('[a-zA-Z]',x,invert=T)) == length(x)) { x <- as.real(gsub(',','\\.',x)) }
else { x <- factor(x) }
}
x
}
for (i in 1:ncol(matff)) {
matff[,i] <- func(matff[,i])
}

Resources