Finding maximum of function with additional information

Finding maximum of function with additional information - r

Let's consider very easy function following :
easy_function=function(vec,string){
if (string=='some_string') sum(vec)
else if (string=='string_some') 3*max(vec)
else if (string=='some_string_some') mean(sum(vec),max(vec))
}
what I want to do is to create another function find_biggest<-function(vec) which goes through all possible strings in easy_function() and returns list with objects :
(1) string for which maximum is reached
(2) value of maximum.
My work so far
It's very easy to obtain second point. Just like the following :
find_biggest<-function(vec){
max(easy_function(vec,'some_string'),easy_function(vec,'string_some'),
easy_function(vec,'some_string_some'))
}
However, I have no idea, how can I obtain for which string, maximum was reached. Could you help me getting so ?
For example find_biggest(1:3) should return list with objects :
(1) 'string_some' (it's the string for which maximum is reached)
(2) 9 (it's the maximum)

How about this:
library(tidyverse)
easy_function=function(vec,string){
if (string=='some_string') sum(vec)
else if (string=='string_some') 3*max(vec)
else if (string=='some_string_some') mean(sum(vec),max(vec))
}
find_biggest <- function(vec){
strings <- c("some_string", "string_some", "some_string_some")
all_vals <- strings %>% map(easy_function, vec = vec) %>% unlist
list(max_string = strings[which.max(all_vals)],
max_val = max(all_vals))
}
find_biggest(1:10)

Related

function with vector R - argument is of length zero

Wrote this function lockdown_func(beta.hat_func).
First thing is: I get an error "argument is of length zero".
Second thing is: when I compute it without the date indices, it doesn't change the value as it should, output vector contains same value for every indices.
date= c(seq(from=30, to=165))
beta.hat_func <- c(rep(x = beta.hat, times = 135))
beta.hat <- beta0[which.min(SSE)]
#implement function for modeling
lockdown_func <- function(beta.hat_func,l){
h=beta.hat_func
{
for(i in 1:length(h))
if(date[i]>60 | date[i]<110){
beta.hat_func[i]=beta.hat_func[i]*exp(-l*(date[i]-date[i-1]))
}else{
beta.hat_func[i]=beta.hat_func[i]
}
return(h)
}
}
lockdown_func(beta.hat_func,0.03)

A few comments:
did you mean to apply an AND rather than an OR to get date range between 60 and 110? This would be date[i]>60 && date[i]<110 (it's better to use the double-&& if you are computing a length-1 logical value)
because you didn't, i=1 satisfies the criterion, so date[i-1] will refer to date[0], which is a length-0 vector.
You might want something like:
l_dates <- date>60 & date<110 ## single-& here for vectorized operation
beta.hat_func[l_dates] <- beta.hat_func[l_dates]*exp(-l*diff(date)[l_dates])

Using group_modify to apply function to grouped dataframe

I am trying to apply a function to each group of data in the main dataframe and I decided to use group_modify() (since it returns a dataframe as well). Here is my initial code:
max_conc_fx <- function(df) {
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
However, I keep getting this error message:
Error in as_group_map_function(.f) :
The function must accept at least two arguments. You can use ... to absorb unused components
After some trial and error, I rectified this by adding the argument "..." to my max_conc_fx() function, which leads to this code which works:
max_conc_fx <- function(df, ...) { #x is the rows of data for one shop
highest_conc <- 0
for (i in 1:nrow(df)) {
curr_time <- df$event_time[i]
within1hr <- filter(df, abs(event_time - curr_time) <= hours(1))
num_buyers <- length(unique(within1hr$userid))
curr_conc <- nrow(within1hr)/num_buyers
if (curr_conc > highest_conc) {
highest_conc <- curr_conc
}
}
mutate(df, highest_conc)
}
conc_data <- group_modify(data, max_conc_fx)
Can someone explain to me what the dots are actually for in this case? I understood them to be used for representing an arbitrary number of arguments or for passing on additional arguments to other functions, but I do not see both of these events happening here. Do let me know if I am missing out something or if you have a better solution for my code.

The dots don't do much in that case, but there is a condition that requires them in your functions case for group_modify()to work. The function you are passing is getting converted using a helper function as_group_map_function(). This function checks if the function has more than two arguments and if not it should have ... to pass:
## dplyr/R/group_map.R (Lines 2-8)
as_group_map_function <- function(.f) {
.f <- rlang::as_function(.f)
if (length(form <- formals(.f)) < 2 && ! "..." %in% names(form)){
stop("The function must accept at least two arguments. You can use ... to absorb unused components")
}
.f
}
I'm not 100% sure why it is done, but based on a quick peek on the source code it looks like there is a point where they pass two arguments and ... to the 'converted' version of your function (technically there is no conversion that happens – the conversion only takes place if you pass a formula instead of a function...), so my best guess is that is the reason: it needs to have some way of dealing with at least two arguments — if it doesn't need them, then it needs ... to 'absorb' them, otherwise it would fail.

Calculating distance using latitude and longitude error [duplicate]

When working with R I frequently get the error message "subscript out of bounds". For example:
# Load necessary libraries and data
library(igraph)
library(NetData)
data(kracknets, package = "NetData")
# Reduce dataset to nonzero edges
krack_full_nonzero_edges <- subset(krack_full_data_frame, (advice_tie > 0 | friendship_tie > 0 | reports_to_tie > 0))
# convert to graph data farme
krack_full <- graph.data.frame(krack_full_nonzero_edges)
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
# Calculate reachability for each vertix
reachability <- function(g, m) {
reach_mat = matrix(nrow = vcount(g),
ncol = vcount(g))
for (i in 1:vcount(g)) {
reach_mat[i,] = 0
this_node_reach <- subcomponent(g, (i - 1), mode = m)
for (j in 1:(length(this_node_reach))) {
alter = this_node_reach[j] + 1
reach_mat[i, alter] = 1
}
}
return(reach_mat)
}
reach_full_in <- reachability(krack_full, 'in')
reach_full_in
This generates the following error Error in reach_mat[i, alter] = 1 : subscript out of bounds.
However, my question is not about this particular piece of code (even though it would be helpful to solve that too), but my question is more general:
What is the definition of a subscript-out-of-bounds error? What causes it?
Are there any generic ways of approaching this kind of error?

This is because you try to access an array out of its boundary.
I will show you how you can debug such errors.
I set options(error=recover)
I run reach_full_in <- reachability(krack_full, 'in')
I get :
reach_full_in <- reachability(krack_full, 'in')
Error in reach_mat[i, alter] = 1 : subscript out of bounds
Enter a frame number, or 0 to exit
1: reachability(krack_full, "in")
I enter 1 and I get
Called from: top level
I type ls() to see my current variables
1] "*tmp*" "alter" "g"
"i" "j" "m"
"reach_mat" "this_node_reach"
Now, I will see the dimensions of my variables :
Browse[1]> i
[1] 1
Browse[1]> j
[1] 21
Browse[1]> alter
[1] 22
Browse[1]> dim(reach_mat)
[1] 21 21
You see that alter is out of bounds. 22 > 21 . in the line :
reach_mat[i, alter] = 1
To avoid such error, personally I do this :
Try to use applyxx function. They are safer than for
I use seq_along and not 1:n (1:0)
Try to think in a vectorized solution if you can to avoid mat[i,j] index access.
EDIT vectorize the solution
For example, here I see that you don't use the fact that set.vertex.attribute is vectorized.
You can replace:
# Set vertex attributes
for (i in V(krack_full)) {
for (j in names(attributes)) {
krack_full <- set.vertex.attribute(krack_full, j, index=i, attributes[i+1,j])
}
}
by this:
## set.vertex.attribute is vectorized!
## no need to loop over vertex!
for (attr in names(attributes))
krack_full <<- set.vertex.attribute(krack_full,
attr, value = attributes[,attr])

It just means that either alter > ncol( reach_mat ) or i > nrow( reach_mat ), in other words, your indices exceed the array boundary (i is greater than the number of rows, or alter is greater than the number of columns).
Just run the above tests to see what and when is happening.

Only an addition to the above responses: A possibility in such cases is that you are calling an object, that for some reason is not available to your query. For example you may subset by row names or column names, and you will receive this error message when your requested row or column is not part of the data matrix or data frame anymore.
Solution: As a short version of the responses above: you need to find the last working row name or column name, and the next called object should be the one that could not be found.
If you run parallel codes like "foreach", then you need to convert your code to a for loop to be able to troubleshoot it.

If this helps anybody, I encountered this while using purr::map() with a function I wrote which was something like this:
find_nearby_shops <- function(base_account) {
states_table %>%
filter(state == base_account$state) %>%
left_join(target_locations, by = c('border_states' = 'state')) %>%
mutate(x_latitude = base_account$latitude,
x_longitude = base_account$longitude) %>%
mutate(dist_miles = geosphere::distHaversine(p1 = cbind(longitude, latitude),
p2 = cbind(x_longitude, x_latitude))/1609.344)
}
nearby_shop_numbers <- base_locations %>%
split(f = base_locations$id) %>%
purrr::map_df(find_nearby_shops)
I would get this error sometimes with samples, but most times I wouldn't. The root of the problem is that some of the states in the base_locations table (PR) did not exist in the states_table, so essentially I had filtered out everything, and passed an empty table on to mutate. The moral of the story is that you may have a data issue and not (just) a code problem (so you may need to clean your data.)
Thanks for agstudy and zx8754's answers above for helping with the debug.

I sometimes encounter the same issue. I can only answer your second bullet, because I am not as expert in R as I am with other languages. I have found that the standard for loop has some unexpected results. Say x = 0
for (i in 1:x) {
print(i)
}
The output is
[1] 1
[1] 0
Whereas with python, for example
for i in range(x):
print i
does nothing. The loop is not entered.
I expected that if x = 0 that in R, the loop would not be entered. However, 1:0 is a valid range of numbers. I have not yet found a good workaround besides having an if statement wrapping the for loop

This came from standford's sna free tutorial
and it states that ...
# Reachability can only be computed on one vertex at a time. To
# get graph-wide statistics, change the value of "vertex"
# manually or write a for loop. (Remember that, unlike R objects,
# igraph objects are numbered from 0.)
ok, so when ever using igraph, the first roll/column is 0 other than 1, but matrix starts at 1, thus for any calculation under igraph, you would need x-1, shown at
this_node_reach <- subcomponent(g, (i - 1), mode = m)
but for the alter calculation, there is a typo here
alter = this_node_reach[j] + 1
delete +1 and it will work alright

What did it for me was going back in the code and check for errors or uncertain changes and focus on need-to-have over nice-to-have.

find_element in dataframe in R

I am new to R. I wanted to define a R function, find_element, that takes as its inputs a list and a value of any type, and returns the value of the matched element in the input list that matches the value. thanks for your help
find_element <- function(arr, val){
count = 0
for(i in arr){
if (i == val){
print(count)
} else
count = count + 1
print ("No Match")
}
}
e.g.
arr <- 1:10
find_element(arr, 10)
# 10
find_element(arr, 12)
# NULL

Just for educational purposes, please, try (although this is not recommended practice in R!):
find_element <- function(arr, val) {
count = 1
for (i in arr) {
if (i == val) {
return(count)
} else
count = count + 1
}
return("No Match")
}
This will yield
arr <- 1:10
find_element(arr, 10)
#[1] 10
find_element(arr, 12)
#[1] "No Match"
Please, note
In R, elements of vectors, etc are numbered starting with 1
You have to use return instead of print to indicate the return value of a function (well, I know there's a short cut - but it's for the purpose of education, here)
The final return must come after the for loop.
Built-in function
Also for educational purposes, please, note that Sotos already has shown the R way in his comment:
which(arr == 10)
#[1] 10
which(arr == 12)
#integer(0)
In R, it's almost always better to use the well-documented built-in functions or those from packages. And, yes, try to avoid for loops in R.
Learnig R online
As pointed out in the (now deleted) answer of engAnt there are several ressources to learn R. https://www.rstudio.com/online-learning/#R lists a number of resources.

input array in R

Hello please find below mentioned code. What I want is to add values to my array on the basis of certain condition checks which I want to undertake. If the values are eligible, then they should add to array otherwise they should be discarded. However, I am unable to get the required array. Any help in that regard will be of great help.
>NODE_1
[1]GTTGGCCGAGCCCCAGGACGCGTGGTTGTTGAACCAGATCAGGTCCGGGCTCCACTGCACGTAGTCCTCTTCCCAATTTCCCTTAA
>NODE_2
[1] CCTCCGGCGGCACCACGGTCGGCGAGGCCCTCAACATCCTG GAGCGCACCGACCTGTCCACCGCGGACAAGGCCGGTTACCT
GCACCGCTACATCGAGGCCAGCCGCATCGCGTTCGCGGACC
>NODE_3
[1]GCCCGGCGCCTGGCCGCGGGCGAGTGGGTCGTGGACCTGCGCTCCCGGGTGGCCTTCGCCGCCGGTCACGTCGCCGGG
TCGCTCAACTTCGAGGCCGACGGACAGCT
My code is:
Length <- function(a)
{
b<-list()
for ( i in 1: length(a))
{
b[i]<-which(length(a[i])<30, arr.ind = FALSE, useNames = TRUE)
m<- array(b[i])
}
}
k<- Length(Y)
So what I want to do is add only those data to array b from Y whose length is less than 30.

you should use nchar() instead of length() to get the number of characters.
And to do it the R way, you could use the boolean index: k <- a[nchar(a)<30]
Hope that helps!

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Finding maximum of function with additional information - r

Related

function with vector R - argument is of length zero

Using group_modify to apply function to grouped dataframe

Calculating distance using latitude and longitude error [duplicate]

find_element in dataframe in R

input array in R

Categories

Resources