R equivalent to the MATLAB structure? - r

Is there an R type equivalent to the Matlab structure type?
I have a few named vectors and I try to store them in a data frame. Ideally, I would simply access one element of an object and it would return the named vectors (like a structure in Matlab). I feel that using a data frame is not the right thing to do since it can store the values of the named vectors but not the names when they differ from one vector to the other.
More generally, is it possible to store a bunch of different objects in a single one in R?
Edit: As Joran said I think that list does the job.
l = list()
l$vec1 = namedVector1
l$vec2 = namedVector2
...
If I have a list of names
name1 = 'vec1'
name2 = 'vec2'
is there any way for the interpreter to understand that when I use a variable name like name1, I am not referring to the variable name but to its content? I have tried get(name1) but it does not work.

I could still be wrong about what you're trying to do, but I think this is the best you're going to get in terms of accessing each list element by name:
l <- list(a= 1:3,b = 1:10)
> ind <- "a"
> l[[ind]]
[1] 1 2 3
Namely, you're going to have to use [[ explicitly.

Related

Dynamically change part of variable name in R

I am trying to automatise some post-hoc analysis, but I will try to explain myself with a metaphor that I believe will illustrate what I am trying to do.
Suppose I have a list of strings in two lists, in the first one I have a list of names and in the other a list of adjectives:
list1 <- c("apt", "farm", "basement", "lodge")
list2 <- c("tiny", "noisy")
Let's suppose also I have a data frame with a bunch of data that I have named something like this as they are the results of some previous linear analysis.
> head(df)
qt[apt_tiny,Intercept] qt[apt_noisy,Intercept] qt[farm_tiny,Intercept]
1 4.196321 -0.4477012 -1.0822793
2 3.231220 -0.4237787 -1.1433449
3 2.304687 -0.3149331 -0.9245896
4 2.768691 -0.1537728 -0.9925387
5 3.771648 -0.1109647 -0.9298861
6 3.370368 -0.2579591 -1.0849262
and so on...
Now, what I am trying to do is make some automatic operations where the strings in the previous lists dynamically change as they go in a for loop. I have made a list with all the distinct combinations and called it distinct. Now I am trying to do something like this:
for (i in 1:nrow(distinct)){
var1[[i]] <- list1[[i]]
var2[[i]] <- list2[[i]]
#this being the insertable name part for the rest of the variables and parts of variable,
#i'll put it inside %var[[i]]% for the sake of the explanation.
%var1[[i]]%_%var2[[i]]%_INT <- df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept]`+ df$`qt[%var1[[i]]%,Intercept]`
}
The difficult thing for me here is %var1[[i]]% is at the same time inside a variable and as the name of a column inside a data frame.
Any help would be much appreciated.
You cannot use $ to extract column values with a character variable. So df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept] will not work.
Create the name of the column using sprintf and use [[ to extract it. For example to construct "qt[apt_tiny,Intercept]" as column name you can do :
i <- 1
sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])
#[1] "qt[apt_tiny,Intercept]"
Now use [[ to subset that column from df
df[[sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])]]
You can do the same for other columns.

How can I specify data structure name from the string output in r programming?

I am coding this in r and solved this in an alternative way to make the vector to a list and assign value to each of the element of the list, but is there any other direct simple approach?
for(i in 1:5){
paste('var',i,sep='')=i
}
i want output where 1:5 will assign like
var1=1
var2=2
var3=3
var4=4
var5=5
Don’t do this. Use a vector or list instead:
var = 1 : 5
Now you can use var[1] (instead of var1) etc.
Your code doesn’t work because paste creates a character vector, not a variable name.

R Generic References to Data Frames and Variables

I would like to know how to make a reference to a data frame and variable generic, please. Say I have a data frame named 's' and a variable in that data frame named 'Y'.
Regular R code:
look = s$Y
What I would like to do:
data = s
variable = Y
look = data$variable (which functions the same as look = s$Y)
Any thoughts? The reason I would like to do this is that I have s$Y throughout my code, and later I may want to change s for t (or Y for some other variable), and don't want to have to go through all of my code manually replacing s$Y with t$Y where I need it changed.
Thanks!
This is the reason that the $-operator is considered poor-practice inside function definitions, i.e. it "locks you in" to a particular spelling of a column name. You are not going to do this, however:
variable = Y
Rather you are going to do this:
variable = "Y"
And that is because the first version would have caused the R-interpreter to go out and try to identify a value for the symbol Y someplace in what is known as its "search path" which is roughly speaking all that functions and values that have been called and are still being processed since code was started. In the case of the second version "Y" is its own value and no further searching is needed. With that fundamental confusion corrected you would now do this
look <- data[[ variable ]] # although using 'data' as a name is another "poor-practice"
Whereupon R will look for a value of variable and find it in the global environment, returning the character "Y" and delivering a column named "Y" from the dataset s. Column names are not considered first-class objects in R, whereas named dataframes are. The "names" of columns are not true R names (even though they are called colnames).. The $-operator is just shorthand for "[[" with a character value. Here's a full transcript to test this:
> s <- data.frame(Y=1:10, X=LETTERS[1:10]); data = s
>
> variable <- "Y"
>
> look1 <- data$Y; look2 <- data[["Y"]]
> identical(look1, look2)
[1] TRUE
The confusion that this "non-standard evaluation" (NSE) shorthand feature of R has caused new users appears to be one of the motivations for the creation of first the ggplot aes function and later the evolution of the package-dplyr and the tidyverse-bundle-of-packages. Those packages allow the use of non-quoted names or tokens to refer to column identities.
In addition to #42-'s answer, you can dynamically reference columns like this:
colName <- "something"
myDataFrame[,colname]
Edit: Since you also asked about dynamically referencing data.frames #Rich Scriven suggested making a function that takes the data.frame as an argument, which is one working solution. You can also just load the data you need at the top of your script, which is easy to change on the fly if you need:
fileName <- "file1.csv"
data <- read.table(fileName, header = TRUE, stringsAsFactors = FALSE)
As per -42 above, the best choice seems to be the packages referenced. Using a function is close but doesn't seem to allow 'data' and 'variable' to be generic in 'data$variable'.
Thanks everyone!

update a vector using assign in R

I am implementing k-means in R.
In a loop, I am initiating several vectors that will be used to store values that belong to a particular cluster, as seen here:
for(i in 1:k){
assign(paste("cluster",i,sep=""),vector())
}
I then want to add to a particular "cluster" vector, depending on the value I get for the variable getIndex. So if getIndex is equal to 2 I want to add the variable minimumDistance to the vector called cluster2. This is what I am attempting to do:
minimumDistance <- min(distanceList)
getIndex <- match(minimumDistance,distanceList)
clusterName <- paste("cluster",getIndex,sep="")
name <- c(name, minimumDistance)
But obviously the above code does not work because in order to append to a vector that I'm naming I need to use assign as I do when I instantiate the vectors. But I do not know how to use assign, when using paste, when also appending to a vector.
I cannot use the index such as vector[i] because I don't know what index of that particular vector I want to add to.
I need to use the vector <- c(vector,newItem) format but I do not know how to do this in R. Or if there is any other option I would greatly, greatly appreciate it. If I were using Python I would simply use paste and then use append but I can't do that in R. Thank you in advance for your help!
You can do something like this:
out <- list()
for (i in 1:nclust) {
# assign some data (in this case a list) to a cluster
assign(paste0("N_", i), list(...))
# here I put all the clusters data in a list
# but you could use a similar statement to do further data manipulation
# ie if you've used a common syntax (here "N_" <index>) to refer to your elements
# you can use get to retrieve them using the same syntax
out[[i]] <- get(paste0("N_", i))
}
If you want a more complete code example, this link sounds like a similar problem emclustr::em_clust_mvn

A loop to create a list

I would like to form a list thanks to a loop.
I have a list of variables called:
var1, var2, ... varN
And I would like to create easily a list of length named listvar with:
unlist(listvar[i])=vari (with i in 1:N)
Is someone inspired ?
The code makes me wonder why the variables var1 … varN exist in the first place: they shouldn’t. Instead, generate the list directly.
That said, you can easily retrieve the value of a variable given by its name using get. This doesn’t even require a loop, you can use R’s vectorised operations.
varnames = paste0('var', 1 : N)
listvar = mget(varnames)

Resources