I'm trying to iterate over a data table to calculate the integral of two columns, a dt$xmin and a dt$xmax, with a function f, having the answer be written to a new column dt$integral. I'm currently trying to use something like the below code without success:
dt$integral <- mapply(f, dt$xmin, dt$xmax)
Any help would be greatly appreciated!
Perhaps you do not need mapply and a simple assignment should work dt$integral<- f(dt$min, dt$max). There is no data or example of what you want but here's what I think could work for you (using data.table):
library(data.table)
dt <- as.data.table(mtcars)
newfunc <- function(a, b){
return(a + log(b) - exp(a/b) + 3.1*a^1.918)
}
# Adding a new column called "newcol"
> head(dt[, newcol := newfunc(wt, mpg/qsec),])
mpg cyl disp hp drat wt qsec vs am gear carb newcol
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 14.73145
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 16.30387
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 11.45233
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 13.87593
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 13.78816
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 -10.84599
For a single new column the above style of variable assignment would work. For multiple new columns and functions, you would need to use a function that returns a list for the new columns. Look up more on assignment using := in data.table.
Related
Consider the following columns in my dataset:
df$PT : contain strings with repeating pattern. Example:
[1] "60D 0%" "5M 2%" "4 2ND M 5%" ...
df$date : column of dates
[1] "2021-01-18" "2021-01-18" "2021-01-18" ...
I managed to create a function that reads inputs from the columns above, makes operations with them and returns another date (let's call it date2). The function works fine (I tested it by passing its arguments manually):
function1(PT,date) {
#if/else chain to generate date2 from PT and date
#function returns either (date2) or NA according to if/else conditions
}
So far so good. The problem comes when I try to use sapply to apply my function1 for every single term of column df$PT and store the output (which I want to be either a single date or NA for every term in df$PT) in df$new-col, such as:
df$new_col <- sapply(df$PT,function1,date=df$date)
But instead of having the expected output in df$new_col in date format as:
date2a
date2b
date2c
date2d
...
I am obtaining only the first output repeated everywhere, and in string format of a date:
18705
18705
18705
18705
...
What can be going on and how do I solve it to get the correct calculations of date2 in df$new_col?
Thank you for your help!
Because R is vectorized you can create new df columns directly from existing columns. E.g.:
cars <- mtcars
cars$new <- ifelse(cars$cyl == 6 & cars$mpg > 20, "NewVal", NA)
head(cars)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NewVal
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NewVal
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 <NA>
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 NewVal
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 <NA>
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 <NA>
I've found several SO posts on this already but cannot see how to apply to my specific problem.
I have a dataframe with a number of features that I would like to simultaneously mutate. I want to write over them rather than create new features.
E.g. using mtcars. Suppose I want to amend am, gear and carb to be 1 if greater than 0 and 0 otherwise. For each of those 3 features. How could I do that?
mtcars %>% mutate_at(vs:carb, funs(???))
I want to apply a custom function of this form ifelse(x > 0, 1, 0) where x is either of the 3 features being worked on.
How can I achieve this?
You need to use vars() for vs:carb to parse, and you use . as a stand-in for the argument in funs:
mtcars %>% mutate_at(vars(vs:carb), funs(ifelse(. > 0, 1, 0)))
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 1 1
# 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 1 1
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 1 1
# 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 1 1
# 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 1 1
# 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 1 1
# ...
This is explained in the ?funs help page:
A list of functions specified by:
Their name, "mean"
The function itself, mean
A call to the function with . as a dummy argument, mean(., na.rm = TRUE)
With this corresponding to the third bullet.
I want to create a new column in a data.table based on the values of other columns. Using mtcars as an example:
> library(data.table)
> dt <- as.data.table(mtcars)
> head(dt[, newval := cyl + gear])
mpg cyl disp hp drat wt qsec vs am gear carb newval
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 10
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 10
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 8
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 9
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 11
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 9
which works fine, but for even slightly more complex function, I get warning messages:
simple_func <- function(a, b){
if(a %in% c(4,6) ){
return(a*b)
}else{
return(b/a)
}
}
head(dt[, newval := simple_func(cyl, disp)])
returns:
> head(dt[, newval := simple_func(cyl, disp)])
mpg cyl disp hp drat wt qsec vs am gear carb newval
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 960
2: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 960
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 432
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 1548
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 2880
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 1350
Warning message:
In if (a %in% c(4, 6)) { :
the condition has length > 1 and only the first element will be used
the value for row 5 (cyl == 8) is clearly incorrect and expected value of newval is 45.
The reason is that the function is not being evaluated one row at a time but for the entire column and therefore if the condition is met for the first row (dt$cyl[1], dt$disp[1]), all other rows have the same formula appllied to them.
How do I get around this? I tried using .SDcols but didn't get it right and got other errors instead.
Use ifelse
simple_func <- function(a, b){
ifelse(a %in% c(4,6), a*b, b/a)
}
I'm trying to create a user-defined function which has as one output a network object that is named similarly to the input dataframe used in the function. Something like this.
node_attributes <- function(i){ #i is dataframe
j <- network(i)
##some other function stuff##
(i,'network',sep = '_')) <- j
}
The idea is to create add '_network' onto the i variable, which is meant to be a dataframe. So if my orignial dataframe is foo_bar_data, my output would be: foo_bar_data_network.
It is possible to get the name of input variables with deparse(substitute(argname)).
func <- function(x){
depsrse(substitute(x))
}
func(some_object)
## [1] "some_object"
I am not completely sure how you want to use the name of the input, so I used something similar to the answer of #JackStat
node_attributes <- function(i){
output_name <- paste(deparse(substitute(i)), 'network', sep = '_')
## I simplified this since I don't know what the function network is
j <- i
assign(output_name, j, envir = parent.frame())
}
node_attributes(mtcars)
head(mtcars_network)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
That said I don't really see any reason to code like this. Normally, returning the output from the function is the recommended way.
you can use assign
j <- network(i)
assign(paste0(i,'network',sep = '_'), j)
I've been banging my head against this problem and feel certain there must be an efficient way to do this in R that doesn't involve writing a for loop. Any suggestions much appreciated!
I'd like to create a new column in a data frame that contains values from existing columns in the dataframe, but where the column whose value is selected is dynamically specified. An example will help clarify:
> mydata <- head(mtcars)
> mydata
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> myquery <- c("cyl","cyl","gear","gear","carb", "carb")
At this point, I'd like to know if there's a simple R function that will select the value of column myquery for each row of mydata, in other words:
f(mydata, myquery)
6 6 4 3 2 1
Thanks in advance if anyone knows of a simple and efficient version way to write f, thanks in advance for your time.
You can index a data.frame with a matrix to achieve that behavior
dd<-head(mtcars)
myquery <- c("cyl","cyl","gear","gear","carb", "carb")
dd[cbind(seq_along(myquery), match(myquery, names(dd)))]
# [1] 6 6 4 3 2 1
The first column of the matrix is the row, the second is the column (and note when using this method there is no comma in the brackets like when you do a normal [,] subset. Here i converted the myqeury values to their numeric column indices using match so both columns of the matrix are the same type (as they have to be). You could have also used a character matrix if you used the row names to index the rows. Thus
dd[cbind(rownames(dd), myquery)]
# [1] 6 6 4 3 2 1
also works.