creating subvectors from two vectors in clojure - vector

simple question here. how do i go from having the two vectors [1 2 3] [5 7 9] to
having this: [1 5] [2 7] [3 9]?
I tried this: (map concat [1 2 3] [ 4 5 6]),
but i get "don't know how to create ISeq from: java.lang.Long "

Use map vector instead
(map vector [1 2 3] [5 7 9])

Related

How to filter data frame row by a more decent way in R, given the condition that the cell is a list of lists?

I have been working on a project that analyzes organizational members' data. One of the approaches is that use the geocoding technique to get each member's location data. I have already gathered the relevant information from Google but there are still some that cannot process properly.
I would like to first filter out those rows that contain nothing inside the list. Yet, due to the nature of the data is a list of lists objects, I cannot find a proper way to filter them all effectively.
The targeted column that I aimed to process:
> family[4]
# A tibble: 5,324 x 1
district
<list>
1 <named list [2]>
2 <named list [2]>
3 <tibble [1 x 2]>
4 <named list [2]>
5 <named list [2]>
6 <tibble [1 x 2]>
7 <named list [2]>
8 <named list [2]>
9 <named list [2]>
10 <named list [2]>
# ... with 5,314 more rows
An example on the sturcture of a valid output (I hided most of the information because of sensitivity):
> family[4][[1]][[1]]
$results
$results[[1]]
$results[[1]]$address_components
$results[[1]]$address_components[[1]]
$results[[1]]$address_components[[1]]$long_name
[1] "xxxxxxxxxxxxxxxx"
$results[[1]]$address_components[[1]]$short_name
[1] "xxxxxxxxxxxxxxxx"
$results[[1]]$address_components[[1]]$types
$results[[1]]$address_components[[1]]$types[[1]]
[1] "premise"
$results[[1]]$address_components[[2]]
$results[[1]]$address_components[[2]]$long_name
[1] "xxxxxxxxxxxxxxxx"
$results[[1]]$geometry$viewport$northeast$lat
[1] xxxxxxxxxxxxxxxx
$results[[1]]$geometry$viewport$northeast$lng
[1] xxxxxxxxxxxxxxxx
$results[[1]]$geometry$viewport$southwest
$results[[1]]$geometry$viewport$southwest$lat
[1] xxxxxxxxxxxxxxxx
$results[[1]]$geometry$viewport$southwest$lng
[1] xxxxxxxxxxxxxxxx
$results[[2]]$geometry$viewport
$results[[2]]$geometry$viewport$northeast
$results[[2]]$geometry$viewport$northeast$lat
[1] xxx.xx
$results[[2]]$geometry$viewport$northeast$lng
[1] xxx.xx
$results[[2]]$geometry$viewport$southwest
$results[[2]]$geometry$viewport$southwest$lat
[1] xxx.xx
$results[[2]]$geometry$viewport$southwest$lng
[1] xxx.xx
$results[[2]]$place_id
[1] "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
$results[[2]]$types
$results[[2]]$types[[1]]
[1] "establishment"
$results[[2]]$types[[2]]
[1] "point_of_interest"
$status
[1] "OK"
The invalid output that I would like to fitler out:
> family[4][[1]][[3]]
# A tibble: 1 x 2
lon lat
<dbl> <dbl>
1 NA NA
Questions:
What is the coding that can extract those rows with valid outcomes (To keep the <named list [2]> and filter out the <tibble [1 x 2]>) out of the data frame?
Is there a way that to extract the only desired attributes from the list of lists into a new column of a data frame?
Such as the data of lat and lng:
$results[[2]]$geometry$viewport$northeast$lat
[1] xxx.xx
$results[[2]]$geometry$viewport$northeast$lng
[1] xxx.xx
Here's a simple MCVE to possibly solve your problem which at the moment doesn't really have one.
It builds a function that returns a logical vector that is then used to index a list:
dput(x) #This is what you should use to illustrate your problem.
#list(list(3, 4), list(x = 5, y = 6))
is_named <- function(x) sapply(x, function(z){ !is.null(names(z))})
is_named(x)
#[1] FALSE TRUE
x[is_named(x)]
#---vvvvv---------
[[1]]$x
[1] 5
[[1]]$y
[1] 6
You might need to make this recursive. And you might need to add a test for "list-ness".

How to apply NADA package (censtats) in a nested database in R

I'm trying to improve my R code by removing plenty of for loops.
I would like to apply censtats from the NADA package to all of my data, grouped by several factor.
Here is an example of my code (with the for loops) using a simple database :
Data <- data.frame("A"=c("a","a","a","a","b","b","b","b"), "B" = c("c","c","c","d","c","c","d","d"),"X"=c(2,1,3,1,1,2,1,1), "Y"=c(FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE), Z = c(1,1,1,0,1,1,0,0))
Data_calc <- data.frame( #create empty database to increment at each loop
"K-M"=numeric(),check.names=FALSE, #result of censtats
MLE=numeric(), #result of censtats
ROS=numeric(), #result of censtats
A=factor(),
B=factor(),
stringsAsFactors=FALSE)
List_A <- unique(Data$A)
List_B <- unique(Data$B)
for (a in seq_along(List_A)){
for (b in seq_along(List_B)){
Temp <- subset(Data, A == List_A[a] & B == List_B[b]) # subset by A and B
if (nrow(Temp) > 1){ #condition 1 recquired by censtats
if (Temp$Z > 0) { #condition 2 recquired by censtats
Temp <- censtats(Temp$X, Temp$Y) #formating the results
Temp$myNames <- rownames(Temp)
Temp<- spread(Temp[c(2,4)], myNames, mean)
Temp$A <- List_A[a]
Temp$B <- List_B[b]
Data_calc <- bind_rows(Data_calc, Temp)
} else {}
} else {} }}
This is the results we obtain :
> Data_calc
K-M MLE ROS A B
1 2.333333 1.977163 1.991738 a c
2 2.000000 1.369061 2.000000 b c
In order to improve my code, I would like to remove the loops by grouping the factor using nest().
Data_nest <- nest(group_by(Data, A, B))
> Data_nest
# A tibble: 4 x 3
A B data
<fct> <fct> <list>
1 a c <tibble [3 x 3]>
2 a d <tibble [1 x 3]>
3 b c <tibble [2 x 3]>
4 b d <tibble [2 x 3]>
I'm stuck here as before using censtats I have to apply conditions 1 and 2 but I cannot find how to apply the conditions row by row.
Could anyone tell me the best solution (with or without nest) to improve the code as in reality my database has 4 factors and almost 2000 rows containing a list and the loop method take lot of time.
Thanks in advance.

sum by groups in matrix in R

so I am starting to learn R and don't know if there is an easy way to sum every n parameters of a matrix going by each row and when it finishes the range move to the other set of columns until all the columns have been computed
[1 4 7]
[2 5 8]
[3 6 9]
so in this case if n=2 the output should be
[5 11 8]
[7 13 10]
[9 15 12]
Is there an efficient way? Thank you!
data:
m <- matrix(1:9, 3, 3)
setting:
n = 2
code:
t(
apply(m, 1, function(x) { zoo::rollsum(c(x,x), n, align = "left")[seq_along(x)] })
)
result:
# [,1] [,2] [,3]
#[1,] 5 11 8
#[2,] 7 13 10
#[3,] 9 15 12
your homework: :-)
Next question will be a clear one.
Reading about every function I have used: for eg type ?t, ?apply ... etc into R-console.

How to use r purrr:map to create multiple of the same object (such as data frame)

I want to know to what extent it is possible to use purrr's mapping functions to create objects in general, though at the moment and with the example below I'm looking at data frames.
A<-seq(1:5)
B<-seq(6:10)
C<-c("x","y","x","y","x")
dat<data.frame(A,B,C)
cols<-names(dat)
create_df<-function(x) {
x<- dat[x]
return(x)
}
A<-create_df("A")
This will create a data frame called A with column A from dat. I want to create data frames A/B/C, each with one column. I have tried different ways of specifying the .f argument as well as different map functions (map, map2, map_dfc, etc.). My original best guess:
map(.x=cols,~create_df(.x))
Clarification: I am asking for help because all of the specifications of map that I have tried have given an error.
Code that worked:
map(names(dat), ~assign(.x, dat[.x], envir = .GlobalEnv))
This creates A/B/C as data frames and prints to the console (which I don't need but does not bother me for now).
Using the purrr package, I think your custom function is not necessary.
The function includes a reference to the data, which is not optimal (especially if it doesn't exist in the environment).
to return as a list of single column dataframes:
cols<-names(dat)
map(cols, ~dat[.x])
or alternatively: map(names(dat), ~dat[.x])
returns:
[[1]]
# A tibble: 5 x 1
A
<int>
1 1
2 2
3 3
4 4
5 5
[[2]]
# A tibble: 5 x 1
B
<int>
1 1
2 2
3 3
4 4
5 5
[[3]]
# A tibble: 5 x 1
C
<chr>
1 x
2 y
3 x
4 y
5 x
If you want to stick with tidyverse principles, you can store them within a dataframe as a list-column.
dfs <-
data_frame(column = cols) %>%
mutate(data = map(cols, ~dat[.x]))
# A tibble: 3 x 2
column data
<chr> <list>
1 A <tibble [5 x 1]>
2 B <tibble [5 x 1]>
3 C <tibble [5 x 1]>
You can pull out individual data as needed:
B <- dfs$data[[2]]
# A tibble: 5 x 1
B
<int>
1 1
2 2
3 3
4 4
5 5
Along the lines of your original suggestion, here's an alternative function that uses purrr:map within it. I'm not sure how good of an idea this is, but maybe it has a use:
create_objects_from_df <- function(dat) {
map(names(dat), ~assign(.x, dat[.x], envir = .GlobalEnv))
}
create_objects_from_df(dat)
This creates the objects in your global environment, as individual objects with the column names.
We can use split from base R to get a list of one column data.frames
lst <- split.default(dat, names(dat))
It is better to keep it in a list, but if the intention is to have multiple objects in the global environment
list2env(lst, envir = .GlobalEnv)

Clojure: recur vs. recursion via fn name

I'm just a beginner on Clojure, and I've been trying the 4clojure.com problems. There I stumbled upon a problem in an exercise where I am supposed to write a flatten implementation.
I basically understand the concept of tail call optimization, and how recur allows not consuming the stack, as opposed to "normal" recursion (I don't know if there's a proper term).
And that's why I don't get what is going on here:
(defn foo1 [x]
(if (> x 0)
(do (println x)
(foo1 (dec x)))))
(defn foo2 [x]
(if (> x 0)
(do (println x)
(recur (dec x)))))
As expected both foo1 and foo2 are the same functionally, but, given a parameter large enough (100000 in my case), I get a stack overflowâ„¢ on foo1 while foo2 completes normally.
Now, on to the flatten problem:
(defn flatten1 [ls]
(mapcat
#(if (coll? %)
(flatten1 %)
(list %))
ls))
(defn flatten2 [ls]
(mapcat
#(if (coll? %)
(recur %)
(list %))
ls))
Test case:
(flatten [1 [2] 3 [4 [5 6 [7] 8]]])
(flatten1 [1 [2] 3 [4 [5 6 [7] 8]]])
(flatten2 [1 [2] 3 [4 [5 6 [7] 8]]])
Expected result: '(1 2 3 4 5 6 7 8)
Well, flatten1 works ok (it's a small input anyway). But flatten2 just hangs indefinitely. Doesn't recur target the recursion point set at the defn? What's the difference (optimization aside) with recursing to the function by name?
By modifying the program a bit you can see the problem:
(ns clj.core
(:require [tupelo.core :as t] )
(:gen-class))
(t/refer-tupelo)
(defn flatten1 [ls]
(mapcat
(fn [it]
(println "f1: it=" it)
(if (coll? it)
(flatten1 it)
(list it)))
ls))
(defn flatten2 [ls]
(mapcat
(fn [it]
(println "f2: it=" it)
(if (coll? it)
(recur it)
(list it)))
ls))
(defn -main
[& args]
(newline) (println "main - 1")
(spyx (flatten [1 [2] 3 [4 [5 6 [7] 8]]]))
(newline) (println "main - 2")
(spyx (flatten1 [1 [2] 3 [4 [5 6 [7] 8]]]))
(newline) (println "main - 3")
(flatten2 [1 [2] 3 [4 [5 6 [7] 8]]])
Running the code produces this output:
main - 1
(flatten [1 [2] 3 [4 [5 6 [7] 8]]]) => (1 2 3 4 5 6 7 8)
main - 2
f1: it= 1
f1: it= [2]
f1: it= 2
f1: it= 3
f1: it= [4 [5 6 [7] 8]]
f1: it= 4
f1: it= [5 6 [7] 8]
f1: it= 5
f1: it= 6
f1: it= [7]
f1: it= 7
f1: it= 8
(flatten1 [1 [2] 3 [4 [5 6 [7] 8]]]) => (1 2 3 4 5 6 7 8)
main - 3
f2: it= 1
f2: it= [2]
f2: it= [2]
f2: it= [2]
f2: it= [2]
f2: it= [2]
f2: it= [2]
f2: it= [2]
f2: it= [2]
So you can see it gets stuck on the [2] item, the 2nd element of the input list.
The reason this fails is that the recur statement only jumps back to the innermost function, which is the anonymous form #(if ...) in your original problem, of the form (fn [it] ...) in the 2nd version.
Note that recur can only "jump" to the innermost fn/loop target. You cannot use recur to jump out of your inner anonymous function to reach flatten2. Since it only jumps to the inner function, the 1-elem collection [2] does not replace the ls value at the end of the mapcat call, and you therefore get the infinite loop.
The best advice for any programming is "keep it simple". Recursion is simpler than loop/recur for most problems.
On the JVM, each stack frame requires some memory (consult the docs about the -Xs switch to increase). If you use too many stack frames, you will eventually run out of memory (controlled by the -Xmx switch). You should usually be able to count on at least 1000 stack frames being available (you can test if you like for your machine & params). So as a rule of thumb, if your recursion depth is 1000 or less, don't worry about using loop/recur.

Resources