I have a long list of lists (> 100k) and need to iteratively loop through each list and extract the "id" - which i can do easily by constructing lapply in a for loop.
Here is an example of the lists:
l1 <- list(id="002e2b45555652749339ab9c34359fb6", key="2", value="xx")
l2 <- list(id="002e2b433226527493jsab9c34353fb6", key="4", value="zz")
l3 <- list(list1, list2)
I do the looping with:
for(i in 1:20) {
lapply(l3$id[[i]][1], function (x) print(x))
}
Basically printing all the elements of each list of the id - which is nice.
I ultimately want to construct a matrix / dataframe with all the "ids" in rows. What bugs me is, the print in my loop works well, printing out all ids from all the lists - although I cannot bind my rows to a dataframe or a matrix etc. i was trying out something like - ain't doing what i want (although not getting an error etc.)
for(i in 1:20) {
lapply(l3$id[[i]][1], function (x) rbind(x))
}
SO the desired output shall be (preferrably as a dataframe).
[1] "002e2b45555652749339ab9c3400cc52"
[1] "002e2b45555652749339ab9c34040525"
If you want a vector of the IDs you can do
sapply(l3, "[[", "id")
or using tidyverse functions
purrr::map_chr(l3, "id")
No need for loops for stuff like this in R.
using base R:
s=aggregate(.~ind,stack(setNames(l3,1:length(l3))),identity)
ind values.1 values.2 values.3
1 1 002e2b45555652749339ab9c34359fb6 2 xx
2 2 002e2b433226527493jsab9c34353fb6 4 zz
if you just need the id's
s$values[,1]
[1] "002e2b45555652749339ab9c34359fb6" "002e2b433226527493jsab9c34353fb6"
Related
I have 5 lists called "nightclub", "hospital", "bar", "attraction", "social_facility" all of which contain a data frame called osm_points. I want to create a new list with 5 dataframes with the names of the original dataframes that only contain 3 vectors "osm_id","name","addr.postcode" with no NA values for the vector "addr.postcode". Below is my attempted code, I do not know another way to subset lists without $ (which gives me an error for having an atomic vector), or without the square brackets. Let know if you guys have some advice.
vectors <- c("osm_id","name","addr.postcode")
features <- c("nightclub", "hospital", "bar", "attraction", "social_facility")
datasets <- list()
n <- 0
for (i in features){
n <- n + 1
datasets[[n]] <- paste(i)[["osm_points"]][!is.na(paste(i)[["osm_points"]][["addr.postcode"]]), variables]
}
I managed to do this operation without a for loop (below), but I want to be able to code better and do it all in one operation. Thanks so much for your help.
nightclub1 <- nightclub$osm_points[!is.na(nightclub$osm_points$addr.postcode), variables]
Thanks !!
Try this code using lapply :
result <- lapply(mget(features), function(x)
x$osm_points[!is.na(x$osm_points$addr.postcode), vectors])
result should have list of 5 dataframe one for each features with only vectors column and without NA value for addr.postcode.
You need to put your lists into a list to have something to iterate over.
As you have it, features is a character vector. In your first iteration i is "nightclub", a string. paste(i) changes nothing, it is still "nightclub". So your code becomes "nightclub"[["osm_points"]]..., but you need nightclub[["osm_points"]].
## make a list of lists
list_of_lists <- mget(features)
## then you can do
for(i in seq_along(list_of_lists)) {
datasets[[i]] <- list_of_lists[[i]][["osm_points"]]...
}
Substituting list_of_lists[[i]] wherever you currently have paste(i).
I have a list of n vectors. I would like to split it to sub-list where the number of the vectors at each list is different. The number of the vectors is increased sequentially from one list to another. For example,
if I have a list with 6 vectors. Then, I would like to split it to several list as follows:
The first list contains one vector. Then, the second list contains 2 vectors and so on. For example,
Suppose I have the list x as follows:
x <- list(x1=c(1,2,3), x2=c(1,4,3), x3=c(3,4,6), x4=c(4,8,4), x5=c(4,33,4), x6=c(9,6,7))
Then, I would like to split it into 3 lists,
list1 = x1
list2 = list(x2, x3)
list3 = list(x4,x5, x6)
I have similar question (How to splitting a list of vectors to small lists in decreasing order in r) but in a decreasing order.
How I can generate it to arbitrary number of vectors. For example, how if I have 10 or 20 vectors?
Any idea, please?
I'd stick them all in a list of lists
MyLists <- list()
i <- 1
for (inc in 1:3){
MyLists[[inc]] <- x[i:(i+inc-1)]
i <- i+inc
}
Now MyLists[[1]] is list1, etc.
Building off farnsy's answer, If you need each list in a separate indexed list in the global environment you could do something like this.
#your Stater list
x <- list(x1=c(1,2,3), x2=c(1,4,3), x3=c(3,4,6),
x4=c(4,8,4), x5=c(4,33,4), x6=c(9,6,7))
#using a paste parse eval approach to evaluate a string
i<-1
for(inc in 1:3){
eval(parse(text =
paste0("list", inc, "<-list(",
paste0("x$",names(x)[i:(i+inc-1)],collapse = ","),
")")
))
i <- i+inc
}
I am trying to transpose a table I have created from a list of lists.
Each nested list has this format:
list(storm_name=NA, storm_level=NA, file_date=NA, file_time=NA,
date=NA, time=NA, actual_or_forecast=NA, lat=NA, long=NA, max_wind=NA,
gusts=NA, eye_speed=NA, eye_location=NA, storm_end=NA)
In short, each row has 14 elements within it.
storm_df <- as.data.frame(matrix(unlist(list1), nrow=length(unlist(list1[1]))))
The code I have written above so far creates the table where the orientation is 14 rows x N (number of inner lists) columns whereas I would like it to be N rows x 14 columns.
Does anyone see what I am doing wrong?
Thanks in advance!
Let's use do.call, rbind, and lapply:
## data
l1 <- list(storm_name=NA, storm_level=NA, file_date=NA, file_time=NA,
date=NA, time=NA, actual_or_forecast=NA, lat=NA, long=NA, max_wind=NA,
gusts=NA, eye_speed=NA, eye_location=NA, storm_end=NA)
big_list <- list(l1, l1, l1)
## make data.frame
do.call('rbind', lapply(big_list, data.frame))
Stepping through this, first we run lapply on big_list, so for each item in big_list, we create a data.frame. Try data.frame(l1) to see the result of each call.
Then we use do.call('rbind', ...) because the lapply returns a lists of data.frames, we want to "stack" the data on top of each other.
I'm trying to apply a very complex function to a list of more than 50 Data Frames.
Let's use a very simple function to lowercase names and just 3 data frames for the sake of clarity, but my general approach is coded below
[EDITED NAMES]
# Data Sample. Every column name is different accross Data Frames
quality <- data.frame(FIRST=c(1,5,3,3,2), SECOND=c(3,6,1,5,5))
thickness <- data.frame(THIRD=c(6,0,9,1,2), FOURTH=c(2,7,2,2,1))
distance <- data.frame(ONEMORE=c(0,0,1,5,1), ANOTHER=c(4,1,9,2,3))
# list of dataframes
dfs <- list(quality, thickness, distance)
# a very simple function (just for testing)
# actually a very complex one is used on real data
BetterNames <- function(x) {
names(x) <- tolower(names(x))
x
}
# apply function to data frame list
dfs <- lapply(dfs, BetterNames)
# I know the expected R behaviour is to modify a copy of the object,
# instead of the original object itself. So if you get the names
# you get the original version, not the needed one
names(quality)
[1] "FIRST" "SECOND"
is there any way of using any function inside a loop or "apply" in place for a huge amount of data frames?
As a result we must get the modified one replacing the original one for every data frame in the list (big list)
I know there's a trick using Data Table, but I wonder if using base R is that possible.
Expected Results:
names(quality)
[1] "first" "second"
[EDITED]
Pointed out to this answer: Rename columns in multiple dataframes, R
But not working. You can't use a vector of string names in my case because my new names are not a fixed list of strings.[EDITED DATA]
for(df in dfs) {
df.tmp <- get(df)
names(df.tmp) <- BetterNames(df)
assign(df, df.tmp)
}
> names(quality)
[1] "quality" NA
Thanks
i'd use a simple yet effective parse & eval approach.
Let's use a for loop to compose a command that suited your needs:
for(df in dfs) {
command <- paste0("names(",df,") <- BetterNames(",df,")")
# print(command)
eval(parse(text=command))
}
names(quality)
[1] "first" "second"
names(thickness)
[1] "third" "fourth"
names(distance)
[1] "onemore" "another"
You already have the best case scenario:
Let's add some names to your list:
names(dfs) <- c("quality", "thickness", "distance")
dfs <- lapply(dfs, BetterNames)
dfs[["quality"]]
# first second
# 1 1 3
# 2 5 6
# 3 3 1
# 4 3 5
# 5 2 5
This works great. And all your data is in a list, so if there are other things you want to do to all your data frames it is very easy.
If you are done treating these data frames similarly and really want them back in the global environment to work with individually, you can do it with
list2env(dfs, envir = .GlobalEnv)
I would recommend keeping them in a list though---in most cases if you have 50 data frames you are working with, in a list it is easy to use lapply or for loops to use them, but as individual objects you will be copy/pasting code and making mistakes.
I would consider even starting with 50 data frames in your workspace a problem - see How do I make a list of data frames? for recommendations on finding an upstream fix: going straight to a list from the start.
This is for sure not optimal and I hope something better comes up but here it goes:
BetterNames <- function(x, y) {
names(x) <- tolower(names(x))
assign(y, x, envir = .GlobalEnv)
}
dfs <- list(quality, thickness, distance)
dfs2 <- c("quality", "thickness", "distance")
mapply(BetterNames, dfs, dfs2)
> names(quality)
[1] "first" "second"
In R, I have a list comprised of objects with an unequal number of elements. For example,
l <- list(a=c(1,2), b=3, c=4)
I have figured out how to find the maximum length of any object:
lmax <- max(unlist(lapply(l,length)))
And also how to identify which objects are not the longest:
notlongest <- unlist(lapply(l,length)) != max(unlist(lapply(l,length)))
What I need to do now: for those objects in the list that are notlongest, repeat their elements the number of times of lmax and get a new list. That is, for objects b and c, repeat their elements twice so I get a new list that looks something that this:
newl <- list(a=c(1,2), b=c(3,3), c=c(4,4))
I'm sure there is an easy answer with the lapply function but I can't figure it out. Apologies if this question has been asked before. Thank you!
lmax <- max(sapply(l,length))
ll <- lapply(l, function(x) c(x, rep(x, lmax-length(x)) ) )
ll
$a
[1] 1 2
$b
[1] 3 3
$c
[1] 4 4
From R 3.2.0, lengths(l) can be used in place of sapply(l,length)
lmax <- max(lengths(l))
The simplest way is that I can think of is to use R's recycling rule and data.frame to group the lists into a list of equal length lists:
dat <- do.call('data.frame', l)
You can operate directly out of that structure now but if you want to create separate lists again you use sapply to break it back apart into separate lists:
sapply(dat, list)