Numbering elements in a vector - r

I would like to number the elements of a vector, assigning '1' to the smallest element in the vector. I know how to do this, but my solution (code included below) seems overly complex. Is there a much simpler solution?
In my example below there are 5 unique numbers in the vector 'data'. The number 3 is the smallest and should be assigned the number '1'; the number 100 is the largest and should be assigned the number '5'.
The desired solution for the vector 'data' is: c(2,3,4,4,3,1,5).
data <- c(5,8,12,12,8,3,100)
unique.numbers <- sort(unique(data))
numbering <- seq(1:length(unique(data)))
template <- cbind(numbering,unique.numbers)
output <- rep(NA, length(data))
for(i in 1:length(data)) {
for(j in 1:dim(template)[1]) {
if(data[i]==template[j,2]) output[i]=j
}
}
output
Thank you for any advice. I am trying to become more efficient with my programming.
Mark Miller

More compact version of your program.
dat <- c(5,8,12,12,8,3,100)
dat_sorted <- sort(unique(dat))
match(dat,dat_sorted)

If you're using numeric or integer data you can use as.numeric(factor())
dat <- c(5,8,12,12,8,3,100)
as.numeric(factor(dat))
Also, as a side note, you should avoid using data as a variable name in R since its already a built-in function.

Another possibility is:
> rank(data)
[1] 2.0 3.5 5.5 5.5 3.5 1.0 7.0
You can see the argument "ties.method" for how to handle ties.

Related

Getting a similiarty index of a vector of numbers to see how close the numbers are

I am trying to compare two number vectors by how similar their numbers are. For example:
vecA <- c(2.1,2.5,2)
vecB <- c(4,4.5,5.1)
Would like an index value that tells me how similar the numbers in vecA are. i.e. a value of 1 means they are the same.
My attempt at this is a bit messy, is there a better and more representative way to do it:
> sum(vecA/max(vecA))/length(vecA)
[1] 0.88
> sum(vecB/max(vecB))/length(vecB)
[1] 0.8888889
Any assistance/input is appriciated.
Use mean instead of sum/length
mean(vecA/max(vecA))
To make this apply for multiple objects, create a function
f1 <- function(v1) {
mean(v1/max(v1))
}

Given LSD's and values - output significance letters

I've been looking for library functions to do this, but I'm surprised that I cannot find one.
There are quite a few stats functions in R that do the statistical test then output
a table that includes letters denoting significance groups, for example LSD.test.An example of how LSD's might be calculated and used to make multicomparison letters, used in a graph
There are others. All of the examples I could find tend to work from a model object, and then do their job. However, I already have the LSD values and the means -- and want to work directly with them. I've been looking for a common function that all of these multicomparison methods use to do this final step, but can't find one.
So, this is what I want to do...given the least significant difference between values (LSD) and the mean values them selves:
lsd <- 1.0
vals <- c(2,3,3.5,4,4.2,6.0)
I want to I want to produce output something like:
2 a
3 b
3.5 bc
4 c
4.2 c
6.0 d
,where values followed by the same letter are not significantly different, based on the least significant difference value.
Ideally, it would be best if it could handle the list of values un-ordered...
vals <- c(6.0, 2, 3.5, 4.0, 4.2, 3)
producing the output:
6.0 d
2 a
3.5 bc
4.0 b
4.2 c
3 c
I've been thinking that most of these LSD.test and multicompare functions
are probably using a base function to put together the letter list -- but I have not been able to find it.
Working through the problem, I think this does the trick, but it's pretty ugly...
lsd.letters <- function(vals, lsd) {
#find their order
#record their order
indx <- order(vals)
#sort their order
srt <- vals[indx]
#assign a variable of letters
lts <- letters
#create a character vector
siglets <- rep("", length(vals))#c("a",rep("", length(vals)-1))
#use a single pass through the list of means
#use the first letter a for the lowest value
itlet <- 1
for (i in c(1:(length(vals)))){
crnt <- srt[i]
clet <- lts[itlet]
#is this value within the LSD of any other value in the remaining list
ix <- which(srt[i:length(srt)] < (crnt+lsd))+i-1
for (ix2 in ix){
newletter <- 0
if (length(intersect( unlist(strsplit(siglets[i], "")), unlist(strsplit(siglets[ix2], "")))) == 0){
#If the string for this mean does not already contain a letter in common for the current step mean... assign the letter
#siglets[ix2] <- paste0(siglets[ix2],clet)
newletter <- 1
}
}
if (newletter == 1){
siglets[ix] <- paste0(siglets[ix],clet)
itlet <- itlet + 1
}
}
siglets
}
It's ugly, and I am not yet sorting the output (sorting it is easy).
Is there a library function to do this? Or has anyone written a better approach to do this?
Thanks for your help!

R Matching closest number from columns

I have a list of responses to 7 questions from a survey, each their own column, and am trying to find the response within the first 6 that is closest (numerically) to the 7th. Some won't be the exact same, so I want to create a new variable that produces the difference between the closest number in the first 6 and the 7th. The example below would produce 0.
s <- c(1,2,3,4,5,6,3)
s <- t(s)
s <- as.data.frame(s)
s
Any help is deeply appreciated. I apologize for not having attempted code as nothing I have tried has actually gotten close.
How about this?
which.min( abs(s[1, 1:6] - s[1, 7]))
I'm assuming you want it generalized somehow, but you'd need to provide more info for that. Or just run it through a loop :-)
EDIT: added the loop from the comment and changed exactly 2 tiny things.
s <- c(1,2,3,4,5,6,3)
t <- c(1,2,3,4,5,6,7)
p <- c(1,2,3,4,5,6,2)
s <- data.frame(s,t,p)
k <- t(s)
k <- as.data.frame(k)
k$t <- NA ### need to initialize the column
for(i in 1:3){
## need to refer to each line of k when populating the t column
k[i,]$t <- which.min(abs(k[i, 1:6] - k[i, 7])) }

Make a new vector using elements of list and another vector interchengeably

I have a list with 20 elements each contains a vector of 2 numbers. I have also generated a sequence of numbers (20). Now I would like to construct 1 long vector that would first list the elements of intervals[[1]] and the first element of newvals[1], later intervals[[2]], newvals[2] etc etc
Help will be much appreciated. I think plyr package might be helpful although I am not sure how to structure it. help will be much appreciated!
s1 <- seq(0, 1, by = 0.05)
intervals <- Map(c, s1[-length(s1)], s1[-1])
intervals[[length(intervals)]][2] <- intervals[[length(intervals)]][2]+0.1
newvals <- seq(1,length(intervals),1)
#### HERE I WOULD LIKE TO HAVE A VECTOR IN THE FOLLOWING PATTERN
####UP TO THE LAST ELEMENT OF THE LIST:
stringreclass <- c(intervals[[1]],newvals[1]), .... , intervals[[20]],newvals[20])

Assigning output of a function to two variables in R [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
function with multiple outputs
This seems like an easy question, but I can't figure it out and I haven't had luck in the R manuals I've looked at. I want to find dim(x), but I want to assign dim(x)[1] to a and dim(x)[2] to b in a single line.
I've tried [a b] <- dim(x) and c(a, b) <- dim(x), but neither has worked. Is there a one-line way to do this? It seems like a very basic thing that should be easy to handle.
This may not be as simple of a solution as you had wanted, but this gets the job done. It's also a very handy tool in the future, should you need to assign multiple variables at once (and you don't know how many values you have).
Output <- SomeFunction(x)
VariablesList <- letters[1:length(Output)]
for (i in seq(1, length(Output), by = 1)) {
assign(VariablesList[i], Output[i])
}
Loops aren't the most efficient things in R, but I've used this multiple times. I personally find it especially useful when gathering information from a folder with an unknown number of entries.
EDIT: And in this case, Output could be any length (as long as VariablesList is longer).
EDIT #2: Changed up the VariablesList vector to allow for more values, as Liz suggested.
You can also write your own function that will always make a global a and b. But this isn't advisable:
mydim <- function(x) {
out <- dim(x)
a <<- out[1]
b <<- out[2]
}
The "R" way to do this is to output the results as a list or vector just like the built in function does and access them as needed:
out <- dim(x)
out[1]
out[2]
R has excellent list and vector comprehension that many other languages lack and thus doesn't have this multiple assignment feature. Instead it has a rich set of functions to reach into complex data structures without looping constructs.
Doesn't look like there is a way to do this. Really the only way to deal with it is to add a couple of extra lines:
temp <- dim(x)
a <- temp[1]
b <- temp[2]
It depends what is in a and b. If they are just numbers try to return a vector like this:
dim <- function(x,y)
return(c(x,y))
dim(1,2)[1]
# [1] 1
dim(1,2)[2]
# [1] 2
If a and b are something else, you might want to return a list
dim <- function(x,y)
return(list(item1=x:y,item2=(2*x):(2*y)))
dim(1,2)[[1]]
[1] 1 2
dim(1,2)[[2]]
[1] 2 3 4
EDIT:
try this: x <- c(1,2); names(x) <- c("a","b")

Resources