How to correctly index an array? - r

Please download the file into your computer,and run :
http://freeuploadfiles.com/bb3cwypih2d2
data=read.table("path/to/file", sep="|",quote='',
head=T,blank.lines.skip=T,as.is=T)
ddata=array(data,dim=c(nrow(data),ncol(data)))
ddata[1,1]
I want to extract the first element of the first column. The answer should be AAC.
How do I do that?

Some suggestions to clean your code and make life easier in the long term:
Work with the data in a data.frame, not an array.
Never refer to TRUE as T. TRUE is a reserved word that can never be redefined, whereas T can take any value, including FALSE
Use the <- symbol for assignment
Don't use abbreviate argument names. The arguement is header, not head. This might bite you
Arrays can only contain a single class of object. Thus converting your data to array will implicitly convert the numeric column to character, which surely is a bad thing.
You then index the data frame like this:
dat <- read.table("nasdaqlisted.txt", sep="|", quote='',
header=TRUE, blank.lines.skip=TRUE, as.is=TRUE)
dat$Symbol[1]
[1] "AAC"
The following alternative ways of indexing also return the same element:
dat[1, "Symbol"]
dat[1, 1]
dat[, 1][1]
dat[["Symbol"]][1]
If you really want to do the foolish thing and convert your data to an array, then use matrix:
mdat <- as.matrix(dat)
mdat[1, 1]
Symbol
"AAC"
Disclaimer: I only post this since you ask. Arrays and matrices are powerful and fast, but not appropriate for this data.

Related

"Named tuples" in r

If you load the pracma package into the r console and type
gammainc(2,2)
you get
lowinc uppinc reginc
0.5939942 0.4060058 0.5939942
This looks like some kind of a named tuple or something.
But, I can't work out how to extract the number below the lowinc, namely 0.5939942. The code (gammainc(2,2))[1] doesn't work, we just get
lowinc
0.5939942
which isn't a number.
How is this done?
As can be checked with str(gammainc(2,2)[1]) and class(gammainc(2,2)[1]), the output mentioned in the OP is in fact a number. It is just a named number. The names used as attributes of the vector are supposed to make the output easier to understand.
The function unname() can be used to obtain the numerical vector without names:
unname(gammainc(2,2))
#[1] 0.5939942 0.4060058 0.5939942
To select the first entry, one can use:
unname(gammainc(2,2))[1]
#[1] 0.5939942
In this specific case, a clearer version of the same might be:
unname(gammainc(2,2)["lowinc"])
Double brackets will strip the dimension names
gammainc(2,2)[[1]]
gammainc(2,2)[["lowinc"]]
I don't claim it to be intuitive, or obvious, but it is mentioned in the manual:
For vectors and matrices the [[ forms are rarely used, although they
have some slight semantic differences from the [ form (e.g. it drops
any names or dimnames attribute, and that partial matching is used for
character indices).
The partial matching can be employed like this
gammainc(2, 2)[["low", exact=FALSE]]
In R vectors may have names() attribute. This is an example:
vector <- c(1, 2, 3)
names(vector) <- c("first", "second", "third")
If you display vector, you should probably get desired output:
vector
> vector
first second third
1 2 3
To ensure what type of output you get after the function you can use:
class(your_function())
I hope this helps.

Convert a list of strings to call already set values?

Is it possible to convert a list of strings so that it will return the value it's named after?
For example, I have this list of strings that I made with paste:
mylist <- c("nhdata$Credit", "nhdata$Honey", "nhdata$Plants")
mylist
The list I'm working with is a lot bigger (about 35). So is it possible to print these strings in a way that it will actually call the value they are named after?
Appreciate any help, this is my first question stackoverflow
You can use the get function:
temp <- 1:10
get("temp")
In your example, you may do better to use the following, though:
mylist <- c("Credit", "Honey", "Plants")
nhdata[, mylist[1]]
or similarly,
nhdata[[mylist[1]]]

Paste function to construct existing data frame name and evaluate in R

I am working with a long list of data frames.
Here is a simple hypothetical example of a data frame:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
I am trying to retrieve a specified column of the data frame using paste function.
get(paste("DFrame","$","ColTwo",sep=""))
The get function returns the following error, when trying to retrieve a specified column:
Error in get(paste("DFrame", "$", "ColTwo", sep = "")) :object 'DFrame$ColTwo' not found
When I enter the constructed name of the data frame DFrame$ColTwo it returns the desired output of the second column.
If I reconstruct an example without the '$' sign then I get the desired answer from the get function. For example the code yields 2:
enter code here
Ans <- 2
get(paste("An","s",sep=""))
[1] 2
I am looking for the same desired outcome, but struggling to get past the error that the object could not be found.
I also attempted using the following format, but the quotation in the column name breaks the paste function:
paste("DFrame","[,"ColTwo"]",sep="")
Thank you very much for the input,
Kind regards
You can do that using the following syntax:
get("DFrame")[,"ColTwo"]
You can use paste() in both of these strings, for example:
get(paste("D", "Frame", sep=""))[,paste("Col", "Two", sep="")]
Edit: Despite someone downvoting this answer without leaving a comment, this does exactly what the original poster asked for. If you feel that it does not or is in some way dangerous, I would encourage you to leave a comment.
Stop trying to use paste and get entirely.
The whole point of having a list (of data frames, say) is that you can reference them using names:
DFrame<-data.frame(c(1,0),c("Yes","No"))
colnames(DFrame)<-c("ColOne","ColTwo")
#A list of data frames
l <- list(DFrame,DFrame)
#The data frames in the list can have names
names(l) <- c("DF1",'DF2')
# Now you just use `[[`
> l[["DF1"]][["ColOne"]]
[1] 1 0
> l[["DF1"]][["ColTwo"]]
[1] Yes No
Levels: No Yes
If you have to, you can use paste to construct the indices passed inside [[.

How to check if any words in a list of phrases are contained in a list in R?

I have a data frame with a column called listA, and a listB. I want to pull out only those rows in the data frame which match to an entry in listB, so I have:
newData <- mydata[mydata$listA %in% listB,]
However, some entries of listA are in the format "ABC /// DEF", where both ABC and DEF are possible entries in listB.
I want to pull out the rows of the data frame which have a listA for which any of the words match to an entry in listB. So if listB had "ABC" in it, that entry would be included in newData. I found the strsplit function, but things like
strsplit(mydata$listA," ") %in% listB
always returns FALSE, presumably because it's checking if the whole list returned by strsplit is an entry in listB.
match(word_vector, target_vector) allows both arguments to be vectors, which is what you want (note: that's vectors, not lists). In fact, %in% operator is a synonym for match(), as its help tells you.
But stringi package's methods stri_match_* may well directly do what you want, are all vectorized, and are way more performant than either match() or strsplit():
stri_match_all stri_match_all_regex stri_match_first stri_match_first_regex stri_match_last stri_match_last_regex
Also, you probably won't need to use an explicit split function, but if you must, then use stringi::stri_split_*(), avoid base::strsplit()
Note on performance: avoid splitting strings like the plague in R whenever possible, it creates memory leaks via unnecessary conscells, as gc() will show you. That's yet another reason why stringi is very efficient.

Extracting out numbers in a list from R

I am reading this from a CSV file, and i need to write a function that churns out a final data frame, so given a particular entry, i have
x
[1] {2,4,5,11,12}
139 Levels: {1,2,3,4,5,6,7,12,17} ...
i can change it to
x2<-as.character(x)
which gives me
x
[1] "{2,4,5,11,12}"
how do i extract 2,4,5,11,12 out? (having 5 elements)
i have tried to use various ways, like gsub, but to no avail
can anyone please help me?
It sounds like you're trying to import a database table that contains arrays. Since R doesn't know about such data structures, it treats them as text.
Try this. I assume the column in question is x. The result will be a list, with each element being the vector of array values for that row in the table.
dat <- read.csv("<file>", stringsAsFactors=FALSE)
dat$x <- strsplit(gsub("\\{(.*)\\}", "\\1", dat$x), ",")

Resources