How can I modify the variable named in a dataframe cell? - r

I'm trying to modify a "variable" variable; that is to say, I wish to modify only that variable whose name matches the text in a cell of a dataframe/matrix.
For example, if matrix1[1,1] == "Rupert", I want to perform an operation on the variable Rupert (say, Rupert <- Rupert + 1). But if matrix1[1,1] == "Paddington", I want to perform the operation on the Paddington variable instead.
I've discovered the assign() function which allows me create new variables whose name is that of the text in a matrix, but I haven't been able to figure out how to modify variables in a similar fashion.
Thanks for your attention,
Alistair

Using your example:
var <- matrix1[1,1]
assign(var,get(var)+1)
The get function can be found in the "See also" section of help(assign).

Related

Efficient way of extracting names of a large number of variables in R

It could be a very easy question, given that I am very unfamiliar with R. I know normally one can use deparse(substitute(.)) to extract the name of a variable. However, if I have a long list of variables (let's say it's built without names), how can I extract the name of each variable efficiently? I was thinking about using loops, but the deparse(substitute(.)) method would obviously generate the 'general' variable name we used to denote every item.
Sample code:
countries<-
list(austria,belgium,czech,denmark,france,germany,italy,luxemberg,netherlands,poland,swiss)
Suppose I want to get countryNames equals to list("austria","belgium",...,"swiss"), how shall I code? I tried generating the list using countries <- list(countryA = countryA, countryB = countryB, ...), but it was extremely tedious, and in some cases I might only have an unnamed input list from elsewhere.
countries would just have values of each individual objects (austria,belgium etc.). To access the names you need to create a named list while creating countries which can be done like :
countries <- list(austria = austria,belgium = belgium....)
However, if this is very tedious you can use tibble::lst which creates the names automatically without explicitly mentioning them.
countries <- tibble::lst(austria,belgium....)
In both the case you can access the names using names(countries).
If the country objects are the only ones loaded in the global environment, we can do this easily with ls and mget to return a named list of values
countries <- mget(ls())

assignment in R with left side a formula

I have a dataframe created from a loop. The loop examines ordinal regression for a couple dozen outcomes for a given exposure.
At the beginning of the loop, a variable called exposure is defined. Example: exposure <- "MyExposure"
At the end of the routine, I want to actually save the resulting data set I've compiled and to have the name of the saved data object be related to the exposure.
I've had issues with making the left hand side of the assignment based on the variable names.
The name of the new dataframe should be
paste0(exposure,"_imputed_ds")
[1] "MyExposure_imputed_ds"
However, when I try to put this on left hand of an assignment, it fails.
paste0(exposure,"_imputed_ds") <- existing.data.frame
Error in paste0(exposure,"_imputed_ds") <- existing.data.frame
could not find function "paste0<-"
What I wanted was a new dataframe named MyExposure_imputed_ds that contained contents of existing.data.frame
You can use assign() to set a value for a name you construct with paste
assign(paste0('MyExposure', '_imputed_ds'), 5)
Now you have MyExposure_imputed_ds in the environment with value 5
I find the use of assign to be generally a warning flag, though! Maybe you want something like this instead...
imputed_ds <- list()
imputed_ds[['MyExposure']] <- 5

R varied length vector or list in variable

I am using R to prepare some data for a D3 visualization. The visualization was created using the following structure (this is a single row from a .csv file that is subsequently converted to JSON in javascript).
Joe.Schmoe, joe.schmoe#email.com, Sao Paulo, ["Community01", "Community02", "Community03"],
["workgroup01","workgroup02"]
This is a single row. The headers would be:
Person, Email, Location, Communities, Workgroups
You'll notice that the Communities and Workgroup columns contain lists. Furthermore, these lists will vary in length depending on what Communities and Workgroups each individual is associated with. I recognize that this is probably not best practice with regard to data "tidyness," but it is what this viz is expecting.
So ... in R (which I'm learning), I'm finding it impossible to recreate this structure because, when I try to populate the "communities" or "workgroups" variables, R seems to expect that each variable will be of equal length.
The code that I have is reading from a data.frame which is list of the members of a particular community, and adding the name of that community to a column in a master data.frame of all employees. I'm indexing by email address because it is unique. So this particular loop looks at each individual email address in a data.frame called "commTD" and finds it in a master data.frame called "testr." If it finds it, it looks at the communities variable and either replaces an NA value with the name of the community (in this case "Technical Design"), or if the vector already exists, appends Technical Design to it:
for(i in commTD$email){
if(i %in% testr$email){
tmpList <- testr[which(testr$email ==i) , 'communities']
if(is.na(tmpList)){
tmpList <- list(c("Technical Design"))
}
else{
tmpList <- append(tmpList[[1]][1], 'Technical Design')
}
testr[which(testr$email ==i) , 'communities'] <- list(tmpList)
}
}
This works fine for the initial replacement, but if I append a new community to the list, and then try to pass it back into the testr data.frame, I get an error:
Error in `[<-.data.frame`(`*tmp*`, which(testr$email == i), "communities",
: replacement has 2 rows, data has 1
You'll note that I'm trying to create a list of vectors, which is just one way I've tried to figure this out. I thought maybe I could force R to see the list as a single object, even though it contains multiple items -- or in this case a vector of multiple items.
Is this just impossible in R, to have varied length vectors or lists as a single variable in a data frame?
Data frames are by definition a list of vectors of equal length, so when you ask if this is possible as a class data.frame(), no its not.
You could either use as suggested another type of object like data.table, or another way would be to think of your desired output as a list of unequal vectors, to pass to your js.
That object would look like something like:
dataList <- list(name = c("Joe.Schmoe", "Joe.Bloe"),
email = c("joe.schmoe#email.com", "joe.bloe#email.com"),
location = c("Sao Paulo", "London"),
Communities = list(c("Community01", "Community02", "Community03"),
c("Community02", "Community05", "Community03")
),
Workgroups = list(c("workgroup01","workgroup02"),
c("workgroup01","workgroup03"))
)
Then access each field like a dataframe, for output to your js:
dataList$name
dataList$Communities
etc...
As per Frank's suggestion, if you want to access each entry via the email address, so you can access each entry like this:
data_list[["joe.schmoe#email.com"]]
...then build the list with the names of the email as the index, like so:
data_list = list(`joe.schmoe#email.com`=list(name="Joe",
location="Sao Paulo",
Communities=....),
`joe.bloe#email.com`=list(n‌​ame="Joe", ...))
Then, you can avoid the non-R style of using for() loops, and start the fun of the lapply() family of functions to work on all the entries in a vectorised manner. (See ?lapply for details)
Hope it helps.

Accessing the name of a SpatialPolygons object from a list in r

I have created a list of SpatialPolygons objects in r using the below code and wish to run each polygon through a for loop. I would like to access the original name that I assigned to each object so that it can be used within the for loop. This should be really easy but I can't figure out how to do it with a SpatialPolygons object, as there appears to be no information stored in the object once loaded within the for loop that links it to this original name. Any help would be great. Thanks!
oblist = c(p1,p2,p3,p4)
for(i in 1:length(oblist)){
obs = oblist[[i]]
obj.nm = #some way to obtain the original object name i.e. p1 for oblist[[1]]
…#etc#
}
Use a list with named components, rather than a vector:
> oblist = list(p1=p1, p2=p2, p3=p3, p4=p4)
> for(i in 1:length(oblist)){
+ print(names(oblist)[i])
+ print(oblist[[i]])
+ }
Note that the name of a variable should rarely be of interest to code. This kind of introspection is discouraged. Very few languages allow it. A variable should not be able to ask what its name is. Its only in rare occasions, like when you do plot(foo,bar) and you want the axes to be labelled foo and bar, that you should do it.
Better to have another variable that stores the names of the elements of the objects (and this is how the above code sort of works, by storing their names in the names attribute of a list). This also lets you have names that aren't valid variable names.

Select a column from a dynamic variable

How can I select the second column of a dynamically named variable?
I create variables of the form "population.USA", "population.Mexico", "population.Canada". Each variable has a column for the year, and another column for the population value. I would like to select the second column from each of these variables during a loop.
I use this syntax:
sprintf("population.%s", country)[, 2]
R returns the error: Error in sprintf("population.%s", country)[, 2] : incorrect number of dimensions
Based on your sequence of questions over the last few minutes, I have two general recommendations for you as you get familiar with R:
Don't use sprintf.
Don't use assign.
Now, obviously, those functions are both useful at times. But you've learned about them too early, before you've mastered some basic stuff about R's data structures. Try to write code without those crutches (for the time being!), as they're just causing you problems.
Rather than creating separate individual variables for each nation's population, place them in a list.
population <- vector("list",3)
names(population) <- c('USA','Mexico','Russia')
Then you can access each using the string representation of the name of each country:
population[['USA']] <- 10000
Or,
region <- 'USA'
population[[region]]
In this example, I've assigned a single value to a list element, lists will hold any other data type, including matrices or data frames. It will be a lot less typing than using sprintf and assign, and a lot safer and more efficient as well.
See ?get. Here is an example:
> country <- "FOO"
> assign(sprintf("population.%s", country), data.frame(runif(5), runif(5)))
>
> get(sprintf("population.%s", country))[,2]
[1] 0.2241105 0.5640709 0.5945869 0.1830719 0.1895938
It is critically important to look at the object returned by a function if you get an error. It is immediately clear why your example fails if you just look at what it returns:
> sprintf("population.%s", country)
[1] "population.FOO"
At that point it would be immediately clear, if you didn't already know or have thought to read ?sprintf, that sprintf() returns a string not the object of that name. Armed with that knowledge you would have narrowed down the problem to how to recall an object from the computed name?

Resources