Basically, we know the components of a vector are base vectors that need to add up to the original vector. For example, <1,2,3>= 1<1,0,0> + 2<0,1,0> + 3<0,0,1>. We get the i, j and k base vectors for three dimensional space. But the thing is, the elements in the vector <1,2,3> i.e. 1/2/3 are basically scalar multipliers for the base vectors.
Given this interpretation, is it possible write a vector like this for example:
<{a,b},{c,d},{e,f},{g,h}>
But if we write it like this, how does the operation work? You don't get the scalar multiplication when adding up the base vectors. The base vectors here would be <1,0,0,0>...<0,0,0,1> right? As four elements imply a four dimentional space?
A clarification would be much appreciated thanks!!
Related
"Alice" is a character vector of length 1. "Bob" is also a character vector of length 1, but it's clearly shorter. At face value, it appears that R's character are made out of something smaller than characters, but if you try to subset them, say "Alice"[1], you'll just get the original vector back. How does R internally make sense of this? What are character vectors actually made of?
You're mistaking vector length for string length.
In R common variables are all vectors containing whatever data you typed, so both are vectors that contain 1 string even if you don't assign a name to them.
If you want to check the size of each string, use nchar function:
nchar("Alice")
[1] 5
nchar("Bob")
[1] 3
Starting to learn R, and I would appreciate some help understanding how R decides the class of different vectors. I initialize vec <- c(1:6) and when I perform class(vec) I get 'integer'. Why is it not 'numeric', because I thought integers in R looked like this: 4L
Also with vec2 <- c(1,'a',2,TRUE), why is class(vec2) 'character'? I'm guessing R picks up on the characters and automatically assigns everything else to be characters...so then it actually looks like c('1','a','2','TRUE') am I correct?
Type the following, you can see the help page of the colon operator.
?`:`
Here is one paragraph.
For numeric arguments, a numeric vector. This will be of type integer
if from is integer-valued and the result is representable in the R
integer type, otherwise of type "double" (aka mode "numeric").
So, in your example c(1:6), since 1 for the from argument can be representable in R as integer, the resulting sequence becomes integer.
By the way, c is not needed to create a vector in this case.
For the second question, since in a vector all the elements have to be in the same type, R will automatically convert all the elements to the same. In this case, it is possible to convert everything to be character, but it is not possible to convert "a" to be numeric, so it results in a character vector.
I am trying to match Cell Phone Tower IDs contained in one table with a master table of locations(in lat long) of Cell Phone Tower IDs. The format of IDs in the locations table are different from the ones in the first table and I am trying to use agrep() to do a fuzzy match. To give you an example, let's say the ID I am trying to match is:
x <- c("405-800-125-39883")
A sample of IDs located in the locations table:
y <- c("405-810-1802-19883", "405-810-2101-29883", "405-810-1401-31883",
"405-810-5005-49883","125-39883","405-810-660-39883")
I am then using agrep() with different combinations of max.distance:
agrep(x,y,max.distance=0.3,value=TRUE)
This returns:
[1] "405-810-1802-19883" "405-810-2101-29883" "405-810-1401-31883" "405-810-5005-49883"
[5] "405-810-660-39883"
Whereas the value that I am really after is "125-39883"
I have also tried the stringdist_join() function from the stringdist package and applied to the two data frames bby varying max_dist but with no success. Basically what I am looking for is a perfect match after the last hyphen and then macth on the number on the second last hyphen and so on. Is there any way of doing that?
You can vectorized agrep to be able to use all the values of y as the pattern.
Your aim is to look for the whole of y as a part of x. Thus your pattern should be y and not x
names(unlist(Vectorize(agrep)(y,x)))
[1] "125-39883"
Although we can use adist with the argument partial=TRUE so that it may do exactly what agrep does:
y[which.min(c(adist(y,x,partial = T)))]
[1] "125-39883"
If x is a vector and y is also a vector, you would rather use adist instead of agrep. All the arguments of agrep are contained in adist. Check ?adist for further details.
with your new question in the comments, you can do something like this:
w=adist(y,x,partial=T)
z=setNames(nchar(sub(".*?(M*)$","\\1",c(attr(adist(y,x,counts=T),"trafos")))),y)
names(which.max(z[which(min(w)==w)]))
[1] "126-39883"
I'm not sure that I understand the different outputs in these two scenarios:
(1)
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- strsplit(pioneers, split = ":")
split
(2)
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
split <- lapply(pioneers, strsplit, split = ":")
split
In both cases, the output is a list but I'm not sure when I'd use the one notation (simply applying a function to a vector) or the other (using lapply to loop the function over the vector).
Thanks for the help.
Greg
To me it's to do with how the output is returned. [l]apply stands for list apply - i.e. the output is returned as a list. strsplit already returns a list as, if there were multiple :s in your pioneers vector, it's the only data structure that makes sense - i.e. a list element of each of the 4 elements of the vector and each list element contains a vector of the split string.
So using lapply(x, strsplit, ...) will always return a list inside a list, which you probably don't want in this case.
Using lapply is useful in cases where you expect the result of the function you're applying to be a vector of an undefined or variable length. As strsplit can see this coming already, the use of lapply is redundant, so you should probably know what form you expect/want your answer to be in, and use the appropriate functions to coerce the output in to the right data structure.
To make clear, the output of the examples you gave is not the same. One is a list, one is a list of lists. The identical result would be
lapply(pioneers, function(x, split) strsplit(x, split)[[1]], split = ":")
i.e. taking the first list element of the inner list (which is only 1 element anyway) in each case.
I have a vector of strings, all containing a common symbol lets say "*". I need to delete the * and all characters after that for all vector elements. For example:
In abcd*123 I need to have abcd. The number of characters, before and after * are various.
Thanks for the help.
out <- gsub("\\*.*", "", yourVector)