I have the following code:
generator <- function(n){
nodes <- c()
distances <- c()
for (main in 1:n){
for (i in 1:n) {
for (j in 1:n){
if (main != i & i != j & main != j & i < j){
nodes <- c(nodes, paste(main, i, j, sep=",", collapse=""))
distances <- c(distances, main+i+j)}}}}
data <- data.frame(nodes, distances)
return(data)}
Running generator(4), I get the following output:
nodes distances
1 1,2,3 6
2 1,2,4 7
3 1,3,4 8
4 2,1,3 6
5 2,1,4 7
6 2,3,4 9
7 3,1,2 6
8 3,1,4 8
9 3,2,4 9
10 4,1,2 7
11 4,1,3 8
12 4,2,3 9
What I would like is to have the items in the "nodes" column be actual vectors of values, with the goal of comparing different triplets of nodes and finding the common members. It is currently just a string. So, for instance, I would like a$nodes[1][1], to yield 1, or something along those lines, so I can extract every individual value from the nodes triplets.
I currently have paste(main, i, j, sep=",") and I tried replacing this with c(main, i, j) but what happened is that I got a data frame of 36 rows instead of twelve: each individual node was its own row.
Thank you.
You need to work with list columns to do what you want to do.
I suggest working with tibbles instead of regular data frames, they're much more convenient for this type of work.
If you'd rather stick with base R you can replace data <- tibble::tibble(nodes, distances) by:
data <- data.frame(distances)
data$nodes <- nodes
new function
generator <- function(n){
nodes <- list() # changed this from `c()` to `list()`
distances <- c()
for (main in 1:n){
for (i in 1:n) {
for (j in 1:n){
if (main != i & i != j & main != j & i < j){
nodes <- append(nodes, list(c(main, i, j))) # changed this to append a list instead of a vector
distances <- c(distances, main+i+j)}}}}
data <- tibble::tibble(nodes, distances) # changed `data.frame` to `tibble`
return(data)}
output
df <- generator(4)
df
# A tibble: 12 x 2
# nodes distances
# <list> <int>
# 1 <int [3]> 6
# 2 <int [3]> 7
# 3 <int [3]> 8
# 4 <int [3]> 6
# 5 <int [3]> 7
# 6 <int [3]> 9
# 7 <int [3]> 6
# 8 <int [3]> 8
# 9 <int [3]> 9
# 10 <int [3]> 7
# 11 <int [3]> 8
# 12 <int [3]> 9
df$nodes[[1]]
# [1] 1 2 3
side note
You're growing a list, which can be slow, if you run into performance issues try defining nodes with nodes <- vector("list", length = max_length) and trim it in the end with nodes <- nodes[lengths(nodes)!=0].
Related
I'd like to compare the values of all the different list objects and then output or record the matching values with corresponding list object indices that have the matching value in program R.
Here is the code:
a <- c(10,23,50,7,3)
b <- c(1,1,2,2,3)
c <- c(33,24,4,10,1)
d <- c(1,4,7,8,3)
y <- c()
z <- c()
r <- list()
table <- data.frame(a,b,c,d)
for (i in 1:5){ for (j in 3:4){
x <- table$b[j]
for (l in which(table$b == x)){
if (abs((table$c[j] - table$a[i])-table$d[l])<10)
{y <- rbind(y, table[j,])}}
z <- y} y <- c() r[[i]] <- z}
The output:
> r
[[1]]
a b c d
4 7 2 10 8
41 7 2 10 8
[[2]]
NULL
[[3]]
NULL
[[4]]
a b c d
4 7 2 10 8
41 7 2 10 8
[[5]]
a b c d
3 50 2 4 7
31 50 2 4 7
4 7 2 10 8
41 7 2 10 8
I'd like to record the matching values between list objects and also the indices of the list objects that have these matching values. For example in this instance r[[1]] and r[[4]] have matching vector value 7 2 10 8, I'd like to write in the data frame both the vector and the indices of list object that have this value. How to do this?
EDIT.
I'd like the output to be a vector whose first element would be the matching vectors' 3rd value (c column) and the rest of the elements would be indices of the list objects.
In this case:
10
1
4
I'm trying to improve my R code by removing plenty of for loops.
I would like to apply censtats from the NADA package to all of my data, grouped by several factor.
Here is an example of my code (with the for loops) using a simple database :
Data <- data.frame("A"=c("a","a","a","a","b","b","b","b"), "B" = c("c","c","c","d","c","c","d","d"),"X"=c(2,1,3,1,1,2,1,1), "Y"=c(FALSE,TRUE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE), Z = c(1,1,1,0,1,1,0,0))
Data_calc <- data.frame( #create empty database to increment at each loop
"K-M"=numeric(),check.names=FALSE, #result of censtats
MLE=numeric(), #result of censtats
ROS=numeric(), #result of censtats
A=factor(),
B=factor(),
stringsAsFactors=FALSE)
List_A <- unique(Data$A)
List_B <- unique(Data$B)
for (a in seq_along(List_A)){
for (b in seq_along(List_B)){
Temp <- subset(Data, A == List_A[a] & B == List_B[b]) # subset by A and B
if (nrow(Temp) > 1){ #condition 1 recquired by censtats
if (Temp$Z > 0) { #condition 2 recquired by censtats
Temp <- censtats(Temp$X, Temp$Y) #formating the results
Temp$myNames <- rownames(Temp)
Temp<- spread(Temp[c(2,4)], myNames, mean)
Temp$A <- List_A[a]
Temp$B <- List_B[b]
Data_calc <- bind_rows(Data_calc, Temp)
} else {}
} else {} }}
This is the results we obtain :
> Data_calc
K-M MLE ROS A B
1 2.333333 1.977163 1.991738 a c
2 2.000000 1.369061 2.000000 b c
In order to improve my code, I would like to remove the loops by grouping the factor using nest().
Data_nest <- nest(group_by(Data, A, B))
> Data_nest
# A tibble: 4 x 3
A B data
<fct> <fct> <list>
1 a c <tibble [3 x 3]>
2 a d <tibble [1 x 3]>
3 b c <tibble [2 x 3]>
4 b d <tibble [2 x 3]>
I'm stuck here as before using censtats I have to apply conditions 1 and 2 but I cannot find how to apply the conditions row by row.
Could anyone tell me the best solution (with or without nest) to improve the code as in reality my database has 4 factors and almost 2000 rows containing a list and the loop method take lot of time.
Thanks in advance.
I want to know to what extent it is possible to use purrr's mapping functions to create objects in general, though at the moment and with the example below I'm looking at data frames.
A<-seq(1:5)
B<-seq(6:10)
C<-c("x","y","x","y","x")
dat<data.frame(A,B,C)
cols<-names(dat)
create_df<-function(x) {
x<- dat[x]
return(x)
}
A<-create_df("A")
This will create a data frame called A with column A from dat. I want to create data frames A/B/C, each with one column. I have tried different ways of specifying the .f argument as well as different map functions (map, map2, map_dfc, etc.). My original best guess:
map(.x=cols,~create_df(.x))
Clarification: I am asking for help because all of the specifications of map that I have tried have given an error.
Code that worked:
map(names(dat), ~assign(.x, dat[.x], envir = .GlobalEnv))
This creates A/B/C as data frames and prints to the console (which I don't need but does not bother me for now).
Using the purrr package, I think your custom function is not necessary.
The function includes a reference to the data, which is not optimal (especially if it doesn't exist in the environment).
to return as a list of single column dataframes:
cols<-names(dat)
map(cols, ~dat[.x])
or alternatively: map(names(dat), ~dat[.x])
returns:
[[1]]
# A tibble: 5 x 1
A
<int>
1 1
2 2
3 3
4 4
5 5
[[2]]
# A tibble: 5 x 1
B
<int>
1 1
2 2
3 3
4 4
5 5
[[3]]
# A tibble: 5 x 1
C
<chr>
1 x
2 y
3 x
4 y
5 x
If you want to stick with tidyverse principles, you can store them within a dataframe as a list-column.
dfs <-
data_frame(column = cols) %>%
mutate(data = map(cols, ~dat[.x]))
# A tibble: 3 x 2
column data
<chr> <list>
1 A <tibble [5 x 1]>
2 B <tibble [5 x 1]>
3 C <tibble [5 x 1]>
You can pull out individual data as needed:
B <- dfs$data[[2]]
# A tibble: 5 x 1
B
<int>
1 1
2 2
3 3
4 4
5 5
Along the lines of your original suggestion, here's an alternative function that uses purrr:map within it. I'm not sure how good of an idea this is, but maybe it has a use:
create_objects_from_df <- function(dat) {
map(names(dat), ~assign(.x, dat[.x], envir = .GlobalEnv))
}
create_objects_from_df(dat)
This creates the objects in your global environment, as individual objects with the column names.
We can use split from base R to get a list of one column data.frames
lst <- split.default(dat, names(dat))
It is better to keep it in a list, but if the intention is to have multiple objects in the global environment
list2env(lst, envir = .GlobalEnv)
My data frame
set.seed(1)
df <- data_frame(col1 = c(1:49), col2 = sample(c(0:20), 49, replace = T))
My list
fields <- list(A = c(2:4, 12:16, 24:28, 36:40, 48:49),
B = c(6:10, 18:22, 30:34, 42:46))
I would like to create a new column that contains the name of the (vector) object in fields, which contains the number in df$col1
I have created a conditional for loop over fields:
col1 <- df$col1
for (i in col1) {
if (col1[i] %in% fields[[1]] == T) {
col1[i] <- names(fields)[1]
} else if (col1[i] %in% fields[[2]] == T) {
col1[i] <- names(fields)[2]
}
}
Although this works, and I can then assign the resulting new vector col1 to my data frame, this doesn't seem very efficient to me- especially because I also have lists with more objects.
The reason why I want to do this: I would like to use ggplot and dplyr to grouping and summarising the observations according to their position in my lists (fields, but also other lists) . I hope it is clear from my question what I intend to do. Thanks!
EDIT
I have created a more generalised function that contains a nested for-loop
find_object <- function(x, list) {
for (j in 1:length(list)) {
for (i in 1:length(x)) {
if (x[i] %in% list[[j]] == TRUE) {
x[i] <- names(list)[j]
}
}
}
x
}
find_object(col1, fields)
That is more or less what I want - but this is a nested for loop, and I have heard that this is bad... Does anyone have a better solution??
Thanks
A better way is to transform the list to data.frame and then do a join/merge:
library(dplyr)
fields.df <- stack(fields) %>% mutate(ind = as.character(ind))
df %>% left_join(fields.df, by = c('col1' = 'values'))
# col1 col2 ind
# <int> <int> <chr>
# 1 1 5 <NA>
# 2 2 7 A
# 3 3 12 A
# 4 4 19 A
# 5 5 4 <NA>
# 6 6 18 B
# 7 7 19 B
# 8 8 13 B
# 9 9 13 B
# 10 10 1 B
note: I use left_join from dplyr because you are using data_frame. The base R merge should also work.
Another way would be to use match() after creating a data frame with stack().
library(dplyr)
foo <- stack(fields)
mutate(df, whatever = foo$ind[match(df$col1, foo$values)])
col1 col2 whatever
<int> <int> <fctr>
1 1 5 <NA>
2 2 7 A
3 3 12 A
4 4 19 A
5 5 4 <NA>
6 6 18 B
7 7 19 B
8 8 13 B
9 9 13 B
10 10 1 B
Thanks for all the help I got from just reading stuff.
I'm not happy with my R loops when I am only dealing within one data.frame because I have to write down the name of the dataframe over and over again which bloats up my R code.
Here is a silly example:
x<- rep(NA,10)
y <- 1:10
dat <- data.frame(x,y)
for(i in 2:nrow(dat)){
dat$x[i] <- dat$y[i] + dat$y[i-1]
}
So what I want to get rid of is that dat$ -bit. Outside loops this can neatly be done with within(), but I am not exactly sure whether you can actually do that with R. I tried it though:
remove(x,y) # In order to avoid accidental usage of the initial vectors
within(dat,{
for(i in 2:nrow(dat)){
x[i] <- y[i] + y[i-1]
}})
The output looks like this:
x y i
1 NA 1 10
2 3 2 10
3 5 3 10
4 7 4 10
5 9 5 10
6 11 6 10
7 13 7 10
8 15 8 10
9 17 9 10
10 19 10 10
So the loop did actually work, it's just that there is a new magical column.
Does anyone know (1) what is going on here and (2) how to elegantly deal with that kind of loops (a more complicated example wrapping within() around a loop including several if() statements and calculations failed btw?
Thanks a lot in advance!
skr
Ben answered your main question, by noting that i is being assigned to by the for loop. You can see that that is so by trying something like this:
for(j in 1:3) cat("hi\n")
hi
hi
hi
> j
[1] 3
One option is just to remove the unwanted i variable by making its value NULL:
within(dat,{
for(i in 2:nrow(dat)){
x[i] <- y[i] + y[i-1]
}
i <- NULL
})
Another is to use with() instead of within():
dat$x <- with(dat, {
for(i in 2:nrow(dat)){
x[i] <- y[i] + y[i-1]
}
x
})
Finally, though I realize yours was a toy example, the best solution will very often be to avoid for loops altogether:
d <- data.frame(y=1:10)
within(d, {x = y + c(NA, head(y, -1))})
# y x
# 1 1 NA
# 2 2 3
# 3 3 5
# 4 4 7
# 5 5 9
# 6 6 11
# 7 7 13
# 8 8 15
# 9 9 17
# 10 10 19