Convert a list of strings to call already set values? - r

Is it possible to convert a list of strings so that it will return the value it's named after?
For example, I have this list of strings that I made with paste:
mylist <- c("nhdata$Credit", "nhdata$Honey", "nhdata$Plants")
mylist
The list I'm working with is a lot bigger (about 35). So is it possible to print these strings in a way that it will actually call the value they are named after?
Appreciate any help, this is my first question stackoverflow

You can use the get function:
temp <- 1:10
get("temp")
In your example, you may do better to use the following, though:
mylist <- c("Credit", "Honey", "Plants")
nhdata[, mylist[1]]
or similarly,
nhdata[[mylist[1]]]

Related

Paste function to construct existing data frame name and evaluate in R

I am working with a long list of data frames.
Here is a simple hypothetical example of a data frame:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
I am trying to retrieve a specified column of the data frame using paste function.
get(paste("DFrame","$","ColTwo",sep=""))
The get function returns the following error, when trying to retrieve a specified column:
Error in get(paste("DFrame", "$", "ColTwo", sep = "")) :object 'DFrame$ColTwo' not found
When I enter the constructed name of the data frame DFrame$ColTwo it returns the desired output of the second column.
If I reconstruct an example without the '$' sign then I get the desired answer from the get function. For example the code yields 2:
enter code here
Ans <- 2
get(paste("An","s",sep=""))
[1] 2
I am looking for the same desired outcome, but struggling to get past the error that the object could not be found.
I also attempted using the following format, but the quotation in the column name breaks the paste function:
paste("DFrame","[,"ColTwo"]",sep="")
Thank you very much for the input,
Kind regards
You can do that using the following syntax:
get("DFrame")[,"ColTwo"]
You can use paste() in both of these strings, for example:
get(paste("D", "Frame", sep=""))[,paste("Col", "Two", sep="")]
Edit: Despite someone downvoting this answer without leaving a comment, this does exactly what the original poster asked for. If you feel that it does not or is in some way dangerous, I would encourage you to leave a comment.
Stop trying to use paste and get entirely.
The whole point of having a list (of data frames, say) is that you can reference them using names:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
#A list of data frames
l <- list(DFrame,DFrame)
#The data frames in the list can have names
names(l) <- c("DF1",'DF2')
# Now you just use `[[`
> l[["DF1"]][["ColOne"]]
[1] 1 0
> l[["DF1"]][["ColTwo"]]
[1] Yes No
Levels: No Yes
If you have to, you can use paste to construct the indices passed inside [[.

R programming - difference between using lapply and a simple function

I'm not sure that I understand the different outputs in these two scenarios:
(1)
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split
(2)
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- lapply(pioneers, strsplit, split = ":")
split
In both cases, the output is a list but I'm not sure when I'd use the one notation (simply applying a function to a vector) or the other (using lapply to loop the function over the vector).
Thanks for the help.
Greg
To me it's to do with how the output is returned. [l]apply stands for list apply - i.e. the output is returned as a list. strsplit already returns a list as, if there were multiple :s in your pioneers vector, it's the only data structure that makes sense - i.e. a list element of each of the 4 elements of the vector and each list element contains a vector of the split string.
So using lapply(x, strsplit, ...) will always return a list inside a list, which you probably don't want in this case.
Using lapply is useful in cases where you expect the result of the function you're applying to be a vector of an undefined or variable length. As strsplit can see this coming already, the use of lapply is redundant, so you should probably know what form you expect/want your answer to be in, and use the appropriate functions to coerce the output in to the right data structure.
To make clear, the output of the examples you gave is not the same. One is a list, one is a list of lists. The identical result would be
lapply(pioneers, function(x, split) strsplit(x, split)[[1]], split = ":")
i.e. taking the first list element of the inner list (which is only 1 element anyway) in each case.

Reorder a string in R using splitstring

I can't figure out what I'm doing wrong. I'm trying to reorder a string, and the easiest way that I could think of doing so was by removing elements and then putting them back in using paste. But I can't figure out how to remove elements. Here's a string:
x <- "the.cow.goes.moo"
But when I use
x <- strsplit(x, '[.]')
resulting in the list "the" "cow" "goes" "moo".
And try to remove the second element using either
x <- x[-2]
or
[x <- x[x != "cow"]
I get the exact same list. But when I declare x as
x <- list("the", "cow", "goes", "moo")
then
x <- x[-2]
works!
What's different? What am I doing wrong? Also, is there an easier way to reorder a string?
EDIT: I just realized that what I need is "moo.goes.the.cow", but I need to repeat this same change for a number of other strings. So I need to reorder the elements, and can't actually delete them. How can I do that?
strsplit returns a list object. So each element of the vector x will now be broken out into individuals pieces in a list. Lists can be painful to subset in this fashion but it's good to get your head around it early.
In your example, it would be:
x[[1]][-2]
For your update, you can reorder like so:
x[[1]][c(2,1,3,4)] # or whatever order you want.
x[[1]][sample(1:x[[1]],length(x[[1]]))] # randomly even
Add this line
x<-unlist(x)
x <- x[-2]

Processing files in a particular order in R

I have several datafiles, which I need to process in a particular order. The pattern of the names of the files is, e.g. "Ad_10170_75_79.txt".
Currently they are sorted according to the first numbers (which differ in length), see below:
f <- as.matrix (list.files())
f
[1] "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_1049_25_79.txt" "Ad_10531_77_79.txt"
But I need them to be sorted by the middle number, like this:
> f
[1] "Ad_1049_25_79.txt" "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_10531_77_79.txt"
As I just need the middle number of the filename, I thought the easiest way is, to get rid of the rest of the name and renaming all files. For this I tried using strsplit (plyr).
f2 <- strsplit (f,"_79.txt")
But I'm sure there is a way to sort the files directly, without renaming all files. I tried using sort and to describe the name with regex but without success. This has been a problem for many days, and I spent several hours searching and trying, to solve this presumably easy task. Any help is very much appreciated.
old example dataset:
f <- c("Ad_10170_75_79.txt", "Ad_10345_76_79.txt",
"Ad_1049_25_79.txt", "Ad_10531_77_79.txt")
Thank your for your answers. I think I have to modify my example, because the solution should work for all possible middle numbers, independent of their digits.
new example dataset:
f <- c("Ad_10170_75_79.txt", "Ad_10345_76_79.txt",
"Ad_1049_9_79.txt", "Ad_10531_77_79.txt")
Here's a regex approach.
f[order(as.numeric(gsub('Ad_\\d+_(\\d+)_\\d+\\.txt', '\\1', f)))]
# [1] "Ad_1049_9_79.txt" "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_10531_77_79.txt"
Try this:
f[order(as.numeric(unlist(lapply(strsplit(f, "_"), "[[", 3))))]
[1] "Ad_1049_25_79.txt" "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_10531_77_79.txt"
First we split by _, then select the third element of every list element, find the order and subset f based on that order.
I would create a small dataframe containing filenames and their respective extracted indices:
f<- c("Ad_10170_75_79.txt","Ad_10345_76_79.txt","Ad_1049_25_79.txt","Ad_10531_77_79.txt")
f2 <- strsplit (f,"_79.txt")
mydb <- as.data.frame(cbind(f,substr(f2,start=nchar(f2)-1,nchar(f2))))
names(mydb) <- c("filename","index")
library(plyr)
arrange(mydb,index)
Take the first column of this as your filename vector.
ADDENDUM:
If a numeric index is required, simply convert character to numeric:
mydb$index <- as.numeric(mydb$index)

Extracting out numbers in a list from R

I am reading this from a CSV file, and i need to write a function that churns out a final data frame, so given a particular entry, i have
x
[1] {2,4,5,11,12}
139 Levels: {1,2,3,4,5,6,7,12,17} ...
i can change it to
x2<-as.character(x)
which gives me
x
[1] "{2,4,5,11,12}"
how do i extract 2,4,5,11,12 out? (having 5 elements)
i have tried to use various ways, like gsub, but to no avail
can anyone please help me?
It sounds like you're trying to import a database table that contains arrays. Since R doesn't know about such data structures, it treats them as text.
Try this. I assume the column in question is x. The result will be a list, with each element being the vector of array values for that row in the table.
dat <- read.csv("<file>", stringsAsFactors=FALSE)
dat$x <- strsplit(gsub("\\{(.*)\\}", "\\1", dat$x), ",")

Resources