Using lists in R - r

Sorry for possibly a complete noob question but I have just started programming with R today and I am stuck already.
I am reading some data from a file which is in the format.
3.482373 8.0093238198371388 47.393873
0.32 20.3131 31.313
What I want to do is split each line then deal with each of the individual numbers.
I have imported the stringr package and using
x = str_split(line, " ")
This produces a list which I would like to index but don't know how.
I have learnt that x[[1:2]] gets the second element but that is about it. Ideally I would like something like
x1 = x[1]
x2 = x[2]
x3 = x[3]
But can't find anyway of doing this.
Thanks in advance

By using unlist you will get a vector instead of a list of vectors, and you will then be able to index it directly :
R> unlist(str_split("foo bar baz", " "))
[1] "foo" "bar" "baz"
But maybe you should read your file directly from read.table or one of its variant ?
And if you are beginning with R, you really should read one of the introduction available if you want to understand subsetting, indexing, etc.

you can wrap your call to str_split with unlist to get the behavior you're looking for.

The usual way to get this in would be to import it into a dataframe (a special sort of list). If file name is "fil.dat"" and is in "C:/dir/"
dfrm <- read.table("C:/dir/fil.dat") # resist the temptation to use backslashes
dfrm[2,2] # would give you the second item on the second row.
By default the field separator in R is "white-space" and that seems to be what you have, so you do not need to supply a sep= argument and the read.table function will attempt to import as numeric. To be on the safe side, you might consider forcing that option with colClasses=rep("numeric", 3) because if it encounters a strange item (such as often produced by Excel dumps), you will get a factor variable and will probably not understand how to recover gracefully.

Related

Rename a column with R

I'm trying to rename a specific column in my R script using the colnames function but with no sucess so far.
I'm kinda new around programming so it may be something simple to solve.
Basically, I'm trying to rename a column called Reviewer Overall Notes and name it Nota Final in a data frame called notas with the codes:
colnames(notas$`Reviewer Overall Notes`) <- `Nota Final`
and it returns to me:
> colnames(notas$`Reviewer Overall Notes`) <- `Nota Final`
Error: object 'Nota Final' not found
I also found in [this post][1] a code that goes:
colnames(notas) [13] <- `Nota Final`
But it also return the same message.
What I'm doing wrong?
Ps:. Sorry for any misspeling, English is not my primary language.
You probably want
colnames(notas)[colnames(notas) == "Reviewer Overall Notes"] <- "Nota Final"
(#Whatif's answer shows how you can do this with the numeric index, but probably better practice to do it this way; working with strings rather than column indices makes your code both easier to read [you can see what you're renaming] and more robust [in case the order of columns changes in the future])
Alternatively,
notas <- notas %>% dplyr::rename(`Nota Final` = `Reviewer Overall Notes`)
Here you do use back-ticks, because tidyverse (of which dplyr is a part) prefers its arguments to be passed as symbols rather than strings.
Why using backtick? Use the normal quotation mark.
colnames(notas)[13] <- 'Nota Final'
This seems to matter:
df <- data.frame(a = 1:4)
colnames(df)[1] <- `b`
Error: object 'b' not found
You should not use single or double quotes in naming:
I have learned that we should not use space in names. If there are spaces in names (it works and is called a non-syntactic name: And according to Wickham Hadley's description in Advanced R book this is due to historical reasons:
"You can also create non-syntactic bindings using single or double quotes (e.g. "_abc" <- 1) instead of backticks, but you shouldn’t, because you’ll have to use a different syntax to retrieve the values. The ability to use strings on the left hand side of the assignment arrow is an historical artefact, used before R supported backticks."
To get an overview what syntactic names are use ?make.names:
make.names("Nota Final")
[1] "Nota.Final"

Referencing recently used objects in R

My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!
The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

Trouble interpolating column name for subset in R function

I have (fbodata) 'data.frame': 6181090 obs. of 41 variables:
I want to subset it and save the portion that pertains to a specific subset (like a zip). My approach seems to work when it is not in a function, but I ultimately want to use sapply.
nmakedir <- function(item, ccol) {
snipped a bunch of code that works
trim<- fbodata[ which(paste(ccol)==item),]
trim%>% drop_na(paste(ccol))
trim<- droplevels(trim
save(trim, file = paste(item, "rda", sep="."))
}
The line that doesn't work is one where I creating the subset with which. If i hardcode the line using fbodata$zip instead of paste(ccol) it works fine. Eventually, I plan to call it with something like:
sapply(unique(fbodata$zip),zip, FUN = nmakedir)
I appreciate any clues, I have been on this for a good long while.
A few things going on:
ccol is a string. paste(ccol) is the same string. You never need to call paste with only one argument. (You can use paste to coerce non-strings to strings, but in that case you should use as.character() to be clear.)
Keeping in mind that ccol is a string, what is fbodata$zip? It's a column! What is the equivalent using ccol and brackets? fbodata[[ccol]] or fbodata[, ccol]. You can use either of those interchangeably with fbodata$zip. So, this bad line
fbodata[ which(paste(ccol)==item),]
# should be this:
fbodata[which(fbodata[[ccol]] == item), ]
drop_na, like most dplyr functions, expects (quoting from the help) "bare variable names", not strings. Also from the help, "See Also: drop_na_ for a version that uses regular evaluation and is suitable for programming with". In this case, I don't think you need to do anything more than replace drop_na with drop_na_.
You are missing a right parenthesis on your droplevels command.
There might be more, but this is much as I can see without any sample data. Your sapply call looks funny to me because I thought zip is supposed to be a column name, but when you call sapply(unique(fbodata$zip),zip, FUN = nmakedir) it needs to be an object in your global environment. I would think sapply(unique(fbodata$zip), 'zip', FUN = nmakedir) makes more sense, but without a reproducile example there's no way to know.
It also seems like you're coding your own version of split. I would probably start this off with fbo_split = split(fbodata, fbodata$zip) and then use lapply to drop_na_, droplevels, and save, but maybe your snipped code makes that a less good idea.

Assigning and removing objects in a loop: eval(parse(paste(

I am looking to assign objects in a loop. I've read that some form of eval(parse( is what I need to perform this, but I'm running into errors listing invalid text or no such file or directory. Below is sample code of generally what I'm attempting to do:
x <- array(seq(1,18,by=1),dim=c(3,2,3))
for (i in 1:length(x[1,1,])) {
eval(parse(paste(letters[i],"<-mean(x[,,",i,"])",sep="")
}
And when I'm finished using these objects, I would like to remove them (the actual objects are very large and cause memory problems later on...)
for (i in 1:length(x[1,1,])) eval(parse(paste("rm(",letters[i],")",sep="")))
Both eval(parse(paste( portions of this script return errors for invalid text or no such file or directory. Am I missing something in using eval(parse(? Is there a easier/better way to assign objects in a loop?
That's a pretty disgusting and frustrating way to go about it. Use assign to assign and rm's list argument to remove objects.
> for (i in 1:length(x[1,1,])) {
+ assign(letters[i],mean(x[,,i]))
+ }
> ls()
[1] "a" "b" "c" "i" "x"
> a
[1] 3.5
> b
[1] 9.5
> c
[1] 15.5
> for (i in 1:length(x[1,1,])) {
+ rm(list=letters[i])
+ }
> ls()
[1] "i" "x"
>
Whenever you feel the need to use parse, remember fortune(106):
If the answer is parse() you should
usually rethink the question.
-- Thomas Lumley, R-help (February 2005)
Although it seems there are better ways to handle this, if you really did want to use the "eval(parse(paste(" approach, what you're missing is the text flag.
parse assumes that its first argument is a path to a file which it will then parse. In your case, you don't want it to go reading a file to parse, you want to directly pass it some text to parse. So, your code, rewritten (in what has been called disgusting form above) would be
letters=c('a','b','c')
x <- array(seq(1,18,by=1),dim=c(3,2,3))
for (i in 1:length(x[1,1,])) {
eval(parse(text=paste(letters[i],"<-mean(x[,,",i,"])",sep="")))
}
In addition to not specifying "text=" you're missing a few parentheses on the right side to close your parse and eval statements.
It sounds like your problem has been solved, but for people who reach this page who really do want to use eval(parse(paste, I wanted to clarify.
Very bad idea; you should never use eval or parse in R, unless you perfectly know what you are doing.
Variables can be created using:
name<-"x"
assign(name,3) #Eqiv to x<-3
And removed by:
name<-"x"
rm(list=name)
But in your case, it can be done with simple named vector:
apply(x,3,mean)->v;names(v)<-letters[1:length(v)]
v
v["b"]
#Some operations on v
rm(v)
It is best to avoid using either eval(paste( or assign in this case. Doing either creates many global variables that just cause additional headaches later on.
The best approach is to use existing data structures to store your objects, lists are the most general for these types of cases.
Then you can use the [ls]apply functions to do things with the different elements, usually much quicker than looping through global variables. If you want to save all the objects created, you have just one list to save/load. When it comes time to delete them, you just delete 1 single object and everything is gone (no looping). You can name the elements of the list to refer to them by name later on, or by index.

Resources