javaslang List of Tuples2 to Map transormation - functional-programming

What is the most idiomatic way to transform a Stream<Tuple2<T,U>> into a Map<T,List<U>> with javaslang 2.1.0-alpha?
// initial stream
Stream.of(
Tuple.of("foo", "x"),
Tuple.of("foo", "y"),
Tuple.of("bar", "x"),
Tuple.of("bar", "y"),
Tuple.of("bar", "z")
)
should become:
// end result
HashMap.ofEntries(
Tuple.of("foo", List.of("x","y")),
Tuple.of("bar", List.of("x","y","z"))
);

#Opal is right, foldLeft is the way to go in order to create a HashMap from Tuples.
In javaslang 2.1.0-alpha we additionally have Multimap to represent Tuples in a Map-like data structure.
// = HashMultimap[List]((foo, x), (foo, y), (bar, x), (bar, y), (bar, z))
Multimap<String, String> map =
HashMultimap.withSeq().ofEntries(
Tuple.of("foo", "x"),
Tuple.of("foo", "y"),
Tuple.of("bar", "x"),
Tuple.of("bar", "y"),
Tuple.of("bar", "z")
);
// = Some(List(x, y))
map.get("foo");
(See also HashMultimap Javadoc)
Depending on how the map is further processed, this might come in handy.
Disclaimer: I'm the creator of Javaslang

Not sure if this is the most idiomatic but this is job for foldLeft:
Stream
.of(
Tuple.of("foo", "x"),
Tuple.of("foo", "y"),
Tuple.of("bar", "x"),
Tuple.of("bar", "y"),
Tuple.of("bar", "z")
)
.foldLeft(
HashMap.empty(),
(map, tuple) ->
map.put(tuple._1, map.getOrElse(tuple._1, List.empty()).append(tuple._2))
);

Related

Using map and pluck to get values from nested list

I have the following nasty, nested list
Edit: updated to include value_I_dont_want
mylist <- list(
list(
nested_1 = list(
nested_2 = list(
list( value_I_want = "a", value_I_dont_want = "f"),
list( value_I_want = "b", value_I_dont_want = "g")
)
)
),
list(
nested_1 = list(
nested_2 = list(
list( value_I_want = "c", value_I_dont_want = "h"),
list( value_I_want = "d", value_I_dont_want = "i"),
list( value_I_want = "e", value_I_dont_want = "j")
)
)
)
)
And I want to get all the value_I_wants
I know I can use the following code within a for loop
mylist[[x]]$nested_1$nested_2[[y]]$value_I_want
But I want to improve my map skills. I understand how to use map_chr when the list is a single level but I haven't found many resources on plucking from very nested lists. I also know I can use [[ but haven't found good documentation for when this is appropriate?
Any help appreciated!
If we need the 'yay's
library(purrr)
library(dplyr)
map(mylist, ~ .x$nested_1$nested_2 %>% unlist%>% grep("^yay", ., value = TRUE))
Or use pluck to extract the elements based on the key 'value_I_want' after looping over the list with map
map(mylist, ~ .x$nested_1$nested_2 %>%
map(pluck, "value_I_want") )
A more general solution that requires we only know how deeply the desired values are nested:
map(mylist, ~pluck(.,1,1) %>% map(pluck, "value_I_want"))
The second pluck operates on the nesting level set by the first pluck.
This can also work on nested lists that are missing intermediate names, as often found in JSON data pulled from the internet.

Recycle function parameters instead of writing out each time

I have a custom function fit_xgb which takes several paramaters:
fit_xgb(train_df = training_data,
test_df = testing_data,
training_features = c("spender", "spend_7d", "spend_30d", "d7_utility_sum", "recent_utility_ratio", "IOS",
"is_publisher_organic", "is_publisher_facebook"),
hyper_param = hyperparam_value,
binary_target = "spender",
regression_target = paste0("spend_", day_m, "d"),
spend_from = paste0("spend_", day_n, "d"),
spend_to = paste0("spend_", day_m, "d"))
I have another custom function fit_rf that takes the same paramaters:
fit_rf(train_df = training_data,
test_df = testing_data,
training_features = c("spender", "spend_7d", "spend_30d", "d7_utility_sum", "recent_utility_ratio", "IOS",
"is_publisher_organic", "is_publisher_facebook"),
hyper_param = hyperparam_value,
binary_target = "spender",
regression_target = paste0("spend_", day_m, "d"),
spend_from = paste0("spend_", day_n, "d"),
spend_to = paste0("spend_", day_m, "d"))
Rather than spell out the params each time I call either of these two functions, I'd like to create a single variable that I can call once:
model_function_params <- list(
train_df = training_data,
test_df = testing_data,
training_features = c("spender", "spend_7d", "spend_30d", "d7_utility_sum", "recent_utility_ratio", "IOS",
"is_publisher_organic", "is_publisher_facebook"),
hyper_param = hyperparam_value,
binary_target = "spender",
regression_target = paste0("spend_", day_m, "d"),
spend_from = paste0("spend_", day_n, "d"),
spend_to = paste0("spend_", day_m, "d"))
fit_rf(model_function_params)
fit_xgb(model_function_params)
This does not work. I know I would have to specify each list component with
training_data = model_function_params$train_df
test_df = model_function_params$test_df
etc
But that it almost going to defeat the purpose of writing less code and keeping my script minimal.
Is there an elegant way of defining the function parameters once, then passing to either fit_rf or fit_xgb without having to specify the parameters twice in my code?

r function list all parameters including ellipses

Is there any ways to list all parameters including ellipses(additional parameters with three dots) of R function? For example,I want to know "qplot" function's parameters,the only way I found is args(qplot),which result
> args(qplot)
function (x, y = NULL, ..., data, facets = NULL, margins = FALSE,
geom = "auto", xlim = c(NA, NA), ylim = c(NA, NA), log = "",
main = NULL, xlab = deparse(substitute(x)), ylab = deparse(substitute(y)),
asp = NA, stat = NULL, position = NULL)
But I really want to know what additional parameters the three dots represents can pass into this function.for example,the "shape" parameter.
The three dot ellipsis ... refer to any number of function arguments that get processes/passed on within the function body.
For example, in the case of qplot, the function body (which you can see if you execute qplot) reveals that any additional function arguments will be used as additional aesthetic specifications.
The relevant lines are:
arguments <- as.list(match.call()[-1])
env <- parent.frame()
aesthetics <- compact(arguments[.all_aesthetics])
where
.all_aesthetics <- c("adj", "alpha", "angle", "bg", "cex", "col", "color",
"colour", "fg", "fill", "group", "hjust", "label", "linetype", "lower",
"lty", "lwd", "max", "middle", "min", "pch", "radius", "sample", "shape",
"size", "srt", "upper", "vjust", "weight", "width", "x", "xend", "xmax",
"xmin", "xintercept", "y", "yend", "ymax", "ymin", "yintercept", "z")
The definition of .all_aesthetics can be found here.

Automatically split function output (list) into component data.frames

I have a functions which yields 2 dataframes. As functions can only return one object, I combined these dataframes as a list. However, I need to work with both dataframes separately. Is there a way to automatically split the list into the component dataframes, or to write the function in a way that both objects are returned separately?
The function:
install.packages("plyr")
require(plyr)
fun.docmerge <- function(x, y, z, crit, typ, doc = checkmerge) {
mergedat <- paste(deparse(substitute(x)), "+",
deparse(substitute(y)), "=", z)
countdat <- nrow(x)
check_t1 <- data.frame(mergedat, countdat)
z1 <- join(x, y, by = crit, type = typ)
countdat <- nrow(z1)
check_t2 <- data.frame(mergedat, countdat)
doc <- rbind(doc, check_t1, check_t2)
t1<-list()
t1[["checkmerge"]]<-doc
t1[[z]]<-z1
return(t1)
}
This is the call to the function, saving the result list to the new object results.
results <- fun.docmerge(x = df1, y = df2, z = "df3", crit = c("id"), typ = "left")
In the following sample data to replicate the problem:
df1 <- structure(list(id = c("XXX1", "XXX2", "XXX3",
"XXX4"), tr.isincode = c("ISIN1", "ISIN2",
"ISIN3", "ISIN4")), .Names = c("id", "isin"
), row.names = c(NA, 4L), class = "data.frame")
df2 <- structure(list(id= c("XXX1", "XXX5"), wrong= c(1L,
1L)), .Names = c("id", "wrong"), row.names = 1:2, class = "data.frame")
checkmerge <- structure(list(mergedat = structure(integer(0), .Label = character(0), class = "factor"),
countdat = numeric(0)), .Names = c("mergedat", "countdat"
), row.names = integer(0), class = "data.frame")
In the example, a list with the dataframes df3 and checkmerge are returned. I would need both dataframes separately. I know that I could do it via manual assignment (e.g., checkmerge <- results$checkmerge) but I want to eliminate manual changes as much as possible and am therefore looking for an automated way.

PrevR - could not find function "as.prevR"

I am trying to use PrevR (https://cran.r-project.org/web/packages/prevR/prevR.pdf), and I am at the very beginning.
When I run this code:
install.packages('prevR')
par(ask = TRUE)
col <- c(id = "cluster",
x = "x",
y = "y",
n = "n",
pos = "pos",
c.type = "residence",
wn = "weighted.n",
wpos = "weighted.pos"
)
dhs <- as.prevR(fdhs.clusters,col, fdhs.boundary)
I get this error message:
Error in as.prevR(fdhs.clusters, col, fdhs.boundary) :
could not find function "as.prevR"
I thought the function as.prevR was included in the prevR package I installed. Am I missing something?
Thank you

Resources