Getting nested elements from a list - r

I am trying to get nested elements from a list. I can extract the elements using: unlist(pull_lists[[i]]$content[[n]]['sha']), however, it seems that I cannot insert them in a nested list. I have extracted a single element of the list in a gist, which creates the reproducible example below. Here is what I have so far:
library("devtools")
pull_lists <- list(source_gist("669dfeccad88cd4348f7"))
sha_list <- list()
for (i in length(pull_lists)){
for (n in length(pull_lists[[i]]$content)){
sha_list[i][n] <- unlist(pull_lists[[i]]$content[[n]]['sha'])
}
}
How can I insert the elements in a nested fashion?

When I download the content, I get a much more complicated structure than you do. For me, it's not pull_lists[[i]]$content, it's pull_lists[[i]]$value$content[[1 or 2]]$parents$sha. The reason nothing is populating is because there is nothing there to populate (ie, n = 0).
I've had to deal with similar data structures before. What I found was that it's much easier to search the naming structure after unlisting rather than to figure out the correct sequence of subsets.
Here's an example:
sha_locations <- grep("sha$",names(unlist(pull_list[[1]])))
unlist(pull_list[[1]])[sha_locations]
Cleaning the for loop a bit, this would look like:
sha_list <- lapply(
pull_list,
function(x) unlist(x)[grep("sha$",names(unlist(x)))]
)
Since there are multiple SHAs, and the question only asks for the SHAs at specific positions, you need to extract those SHAs:
sha_list <- sha_list[[1]][attr(sha_list[[1]], "names")=="value.content.sha"]

Related

Renaming Columns with index with a For Loop in R

I am writing this post to ask for some advice for looping code to rename columns by index.
I have a data set that has scale item columns positioned next to each other. Unfortunately, they are oddly named.
I want to re-name each column in this format: SimRac1, SimRac2, SimRac3.... and so on. I know the location of the columns (Columns number 30 to 37). I know these scale items are ordered in such a way that they can be named and numbered in increased order from left to right.
The code I currently have works, but is not efficient. There are other scales, in different locations, that also need to be renamed in a similar fashion. This would result in dozens of code rows.
See below code.
names(Total)[30] <- "SimRac1"
names(Total)[31] <- "SimRac2"
names(Total)[32] <- "SimRac3"
names(Total)[33] <- "SimRac4"
names(Total)[34] <- "SimRac5"
names(Total)[35] <- "SimRac6"
names(Total)[36] <- "SimRac7"
names(Total)[37] <- "SimRac8"
I want to loop this code so that I only have a chunk of code that does the work.
I was thinking perhaps a "for loop" would help.
Hence, the below code
for (i in Total[,30:37]){
names(Total)[i] <- "SimRac(1:8)"
}
This, unfortunately does not work. This chunk of code runs without error, but it doesn't do anything.
Do advice.
In the OP's code, "SimRac(1:8)" is a constant. To have dynamic names, use paste0.
We do not need a loop here. We can use a vectorized function to create the names, then assign the names to a subset of names(Total)
names(Total)[30:37]<-paste0('SimRac', 1:8)

For loop to create multiple empty data frames gives error

I wrote a for loop to create empty multiple data frames, using a vector of names, but even though it seemed really easy at start I got an error message : Error in ID_names[i] <- data.frame() : replacement has length zero
To be more specific I' ll provide you with a reproducable example:
ID_names <- c("Athens","Rome","Barcelona","London","Paris","Madrid")
for(i in 1:length(ID_names){
ID_names[i] <- data.frame()
}
Do you have any idea why this is wrong? I would like to ask you not only provide a solution, but specify me why this for loop is wrong in order to avoid such kind of mistakes in the future.
You are trying to store a dataframe in one element of a vector (ID_names[i]) which is not possible. You might want to create a list of empty dataframes and assign names to it which can be done using replicate.
ID_names <- c("Athens","Rome","Barcelona","London","Paris","Madrid")
list_data <- setNames(replicate(length(ID_names), data.frame()), ID_names)
However, very rarely such initialisation of empty dataframes will be useful. It ends up creating more confusion down the road. Depending on your actual use case there might be other better ways to handle this.

R loop to create multiple objects from equally as many existing objects

I've tried searching for the answer to this but am having trouble because I'm not sure I'm even searching the right thing. Basically I would like in R to create a loop to create multiple objects, each from a different object. For example, let's say I have 50 existing objects (matrix, data frame, graph etc.) and they are all named similarly (table1, table2...table50). Now I would like to create 50 new objects, lets say graph1...graph50. I'm having trouble with a loop because I don't know how to work with the names being strings. I've tried the assign function, but it isn't dynamic enough in the assignment argument. I would basically like something like this:
for (i in list(table names)){
graph "i" <- as.network(table "i")
}
I would also like to have this hold for objects assigned as some function of itself ie graph "i" <- somefunction(graph "i") etc...
Additionally if there is a more efficient way by all means I'm open to it. It seems like an easy task but I can't figure it out. Right now I'm literally just concatenating the statements in excel and pasting to R so it doesn't take too long, but it is a pain. Thank you.
I think you could have a nested loop to do what you're looking for; you could could apply whatever transformations you're wanting to each object within the input list and store the results in a new list with the same object names.
in_list <- list(table1 = iris,
table2 = EuStockMarkets)
out_list <- list()
for(i in 1:length(in_list)){
for(j in colnames(in_list[[i]])){
out_list[[ gsub("table", "graph", names(in_list)[i]) ]][[j]] <- summary(in_list[[i]][,j])
}
}
Hope this helps!

How to unnest data and obtain the first element from an array in SparkR?

I am new to SparkR and trying first steps of data preparation.The dataset is something of this kind. I was trying to subset and select significant columns. My question is how can I select a column from an array element. I was trying something like this, which allowed me to select columns by un-nesting data but couldn't unnest and flatten the array to get it's first element. Helpful Link
select.col <- SparkR::select(data,c("parsed.nid","parsed.status","parsed.sections.element[0].name"))
I myself found a way to resolve this issue.This can be done in two simple steps :-
First we need to use explode() in SparkR, to get all the contents in
the list from that column.
Next, we need to use windowPartitionBy() in SparkR to create a
partitions and then we can get anything we want based on our
requirements like row_number(),dense_rank(),rank() etc. Like here we want the first element of the list, so I have used row_number function.
Snippet :
data.select <- SparkR::select(data,c("parsed.nid","parsed.status","parsed.sections"))
names(data.select) <- c("nid","status","sections")
categories <- SparkR::select(data.select,data.select$nid,data.select$status,explode(data.select$sections))
ws <- SparkR::orderBy(SparkR::windowPartitionBy("nid","status","sections"),"nid")
data.final <- SparkR::mutate(categories,row_num = over(row_number(), ws))
##If we want to get the first element of the array.
data.final <- data.final[data.final$row_num==1,]
Please add your suggestions as well.

How do I change column names in list of data frames inside a function?

I know that the answer to "how to change names in a list of data frames" has been answered multiple times. However, I'm stuck trying to generate a function that can take any list as an argument and change all of the column names of all of the data frames in the list. I am working with a large number of .csv files, all of which will have the same 3 column names. I'm importing the files in groups as follows:
# Get a group of drying data data files, remove 1st column
files <- list.files('Mang_Run1', pattern = '*.csv', full = TRUE)
mr1 <- lapply(files, read.csv, skip = 1, header = TRUE, colClasses = c("NULL", NA, NA, NA))
I will have 6 such file groups. If I run the following code on a single list, the names of the columns in each data frame within the specified list will be changed correctly.
for (i in seq_along(mr1)) {
names(mr1[[i]]) <- c('Date_Time', 'Temp_F', 'RH')
}
However, if I try to generalize the function (see code below) to take any list as an argument, it does not work correctly.
nameChange <- function(ls) {
for (i in seq_along(ls)) {
names(ls[[i]]) <- c('Date_Time', 'Temp_F', 'RH')
}
return(ls)
}
When I call nameChange on mr1 (list generated from above), it prints the entire contents of the list to the console and does not change the names of the columns in the data frames within the list. I'm clearly missing something fundamental about the inner workings of R here. I've tried the above function with and without return, and have made several modifications to the code, none of which have proven successful. I'd greatly appreciate any help, and would really like to understand the 'why' behind the problem as well. I've had considerable trouble in the past handling functions that take lists as arguments.
Thanks very much in advance for any constructive input.
I think this might be a very simple fix:
First, generalize the function you are using to rename the columns. This only needs to work on one dataframe at a time.
renameFunction<-function(x,someNames){
names(x) <- someNames
return(x)
}
Now we need to define the names we want to change each column name to.
someNames <- c('Date_Time', 'Temp_F', 'RH')
Then we call the new function and apply it to every element of the "mr1" list.
lapply(mr1, renameFunction, someNames)
I may have gotten some of the details wrong with regards to your exact sitiuation, but I've used this method before to solve similar issues. Since you were able to get it to work on the specific case, I'm pretty sure this will generalize readily using lapply

Resources