for-loop code array my result by binary system - r

In R:
I tried to make a list of dataframes arrayed by the names of dataframes (p_text_tm_list_1, p_text_tm_list_2, ..., p_text_tm_list_892)
by using loop (for i in 1:892)
but the result of that codes was arrayed by binary (1,10,100,101...) system as you can see in the second captured console screen.
Why was the result arrayed by binary system?
How can I array the dataframe in decimal system?
Thanks for reading.

Here is a way to solve your problem.
First, create the list p_text_top10_list without resorting to assign. The list is created with its final length in order not to keep extending it,which is ineffective.
p_text_top10_list <- vector("list", length = length(p_text_tm_list))
for(i in seq_along(p_text_tm_list)){
p_text_top10_list[[i]] <- head(p_text_tm_list[[i]], 10)
}
Another much simpler way is to use lapply.
p_text_top10_list <- lapply(p_text_tm_list, head, 10)
That's it. This one-liner does exactly the same as the previous for loop.
Now assign the names with 3 digits to have them in the proper order.
names(p_text_top10_list) <- sprintf("p_text_top10_list_%03d", seq_along(p_text_top10_list))

Related

R For loop/map2 that iterates over two lists

I've read [this][1] and [this][2], but am unable to adapt it to my use case. I am trying to use the openxlsx function writeData to write several dataframes I saved in a vector along with the names of the workbook sheets, also saved in a vector:
names_of_worksheets <- c(
"total_suppliers",
"nato_fvey_suppliers",
"us_suppliers",
"sole_sourcing",
"single_sourcing",
"geographic_risk_us_only",
"foreign_dependence",
"exposure_to_non_nato_fvey"
)
names_of_dataframes <- c(
total_suppliers,
nato_fvey_suppliers,
us_suppliers,
sole_sourcing,
single_sourcing,
geographic_risk_us_only,
foreign_dependence,
exposure_to_non_nato_fvey
)
The pseudo code I'd like to write is a for loop (which I think I can do in python) that iterates over two lists/vectors:
for (name_of_worksheet, name_of_dataframe in names_of_worksheets, names_of_data_frames) {
writeData(workbook, name_of_worksheet, name_of_dataframe)
}
That of course doesn't work. I've tried map2 but gotten an error I can't deal with:
map2(names_of_worksheets, names_of_dataframes, writeData, workbook)
Error: Mapped vectors must have consistent lengths:
* `.x` has length 8
* `.y` has length 25
Any thoughts? Thanks!
[1]: Looping over multiple lists with base R
[2]: R Loop Iterating Over Two Lists
I think you might want to use list() instead of c() to gather your data frames.
The later will take all the columns of the data frames and put them as independent elements in a list (which is why you get inconsistent length error) while the former keeps them as separate elements. The map2 should work then for simultaneous iteration.
It looks like writeData takes workbook as first element and currently your map2 call would use it as third element, so maybe you want to rewrite it as
map2(names_of_worksheets, names_of_dataframes, function(x,y) writeData(workbook, x, y))

R add to a list in a loop, using conditions

I have a data.frame dim = (200,500)
I want to do a shaprio.test on each column of my dataframe and append to a list. This is what I'm trying:
colstoremove <- list();
for (i in range(dim(I.df.nocov)[2])) {
x <- shapiro.test(I.df.nocov[1:200,i])
colstoremove[[i]] <- x[2]
}
However this is failing. Some pointers? (background is mainly python, not much of an R user)
Consider lapply() as any data frame passed into it runs operations on columns and the returned list will be equal to number of columns:
colstoremove <- lapply(I.df.noconv, function(col) shapiro.test(col)[2])
Here is what happens in
for (i in range(dim(I.df.nocov)[2]))
For the sake of example, I assume that I.df.nocov contains 100 rows and 5 columns.
dim(I.df.nocov) is the vector of I.df.nocov dimensions, i.e. c(100, 5)
dim(I.df.nocov)[2] is the 2nd dimension of I.df.nocov, i.e. 5
range(x)is a 2-element vector which contains minimal and maximal values of x. For example, range(c(4,10,1)) is c(1,10). So range(dim(I.df.nocov)[2]) is c(5,5).
Therefore, the loop iterate twice: first time with i=5, and second time also with i=5. Not surprising that it fails!
The problem is that R's function range and Python's function with the same name do completely different things. The equivalent of Python's range is called seq. For example, seq(5)=c(1,2,3,4,5), while seq(3,5)=c(3,4,5), and seq(1,10,2)=c(1,3,5,7,9). You may also write 1:n, it is the same as seq(n), and m:n is same as seq(m,n) (but the priority of ':' is very high, so 1:2*x is interpreted as (1:2)*x.
Generally, if something does not work in R, you should print the subexpressions from the innerwise to the outerwise. If some subexpression is too big to be printed, use str(x) (str means "structure"). And never assume that functions in Python and R are same! If there is a function with same name, it usually does a different thing.
On a side note, instead of dim(I.df.nocov)[2] you could just write ncol(I.df.nocov) (there is also a function nrow).

update a vector using assign in R

I am implementing k-means in R.
In a loop, I am initiating several vectors that will be used to store values that belong to a particular cluster, as seen here:
for(i in 1:k){
assign(paste("cluster",i,sep=""),vector())
}
I then want to add to a particular "cluster" vector, depending on the value I get for the variable getIndex. So if getIndex is equal to 2 I want to add the variable minimumDistance to the vector called cluster2. This is what I am attempting to do:
minimumDistance <- min(distanceList)
getIndex <- match(minimumDistance,distanceList)
clusterName <- paste("cluster",getIndex,sep="")
name <- c(name, minimumDistance)
But obviously the above code does not work because in order to append to a vector that I'm naming I need to use assign as I do when I instantiate the vectors. But I do not know how to use assign, when using paste, when also appending to a vector.
I cannot use the index such as vector[i] because I don't know what index of that particular vector I want to add to.
I need to use the vector <- c(vector,newItem) format but I do not know how to do this in R. Or if there is any other option I would greatly, greatly appreciate it. If I were using Python I would simply use paste and then use append but I can't do that in R. Thank you in advance for your help!
You can do something like this:
out <- list()
for (i in 1:nclust) {
# assign some data (in this case a list) to a cluster
assign(paste0("N_", i), list(...))
# here I put all the clusters data in a list
# but you could use a similar statement to do further data manipulation
# ie if you've used a common syntax (here "N_" <index>) to refer to your elements
# you can use get to retrieve them using the same syntax
out[[i]] <- get(paste0("N_", i))
}
If you want a more complete code example, this link sounds like a similar problem emclustr::em_clust_mvn

How to append all the elements of a list efficiently in R

So, I have a list with elements that are integer vectors of variable length... It can be reproduced like this
N<-1000
x<-list()
for(i in 1:N){
x[[i]]<-1:sample.int(10,1)
}
But in my case I have a list with a million elements, N=1.000.000.
What I need is to append all the elements of the list in order to create a unique vector. I have tried the following methods, all of them extremely slow and in fact never end to run because appending in R is really inefficient.
library(abind)
abind(x)
also this:
y<-integer()
for(i in 1:length(x)){
y<-append(y,x[[i]])
}
What do you suggest? The vector will end up having around 10 million values... so maybe R just can't handle this in a normal computer and the only solution is to split the list in various parts?
We can use unlist with use.names=FALSE if the list elements are named
unlist(x, use.names=FALSE)

Getting elements of a list in R

This is my problem:
There is a predefined list named gamma with three entries: gamma$'2' is 2x2 matrix gamma$'3' a 3x3 matrix and gamma$'4' a 4x4 matrix. I would like to have function that returns the matrix I need:
GiveMatrix <- function(n) {
gamma.list <- #init the list of matrices
gamma.list$n # return the list entry named n
Since n is not a character, the last line does not work. I tried gamma.list$paste(n)and gamma.list$as.character(n)but both did not work. Is there a function that converts nto the right format? Or is there maybe a much better way? I know, I am not really good in R.
You need to use:
gamma.list[[as.character(n)]]
In your example, R is looking for a entry in the list called n. When using [[, the contents of n is used, which is what you need.
I've found it!
gamma.list[as.character(n)] is the solution I needed.

Resources