Create vector by given distibution of values - r

Let's say I have a vector a = (1,3,4).
I want to create new vector with integer numbers in range [1,length(a)]. But the i-th number should appear a[i] times.
For the vector a I want to get:
(1,2,2,2,3,3,3,3)
Would you explain me how to implement this operation without several messy concatenations?

You can try rep
rep(seq_along(a), a)
#[1] 1 2 2 2 3 3 3 3
data
a <- c(1,3,4)

Related

Looping through items on a list in R

this may be a simple question but I'm fairly new to R.
What I want to do is to perform some kind of addition on the indexes of a list, but once I get to a maximum value it goes back to the first value in that list and start over from there.
for example:
x <-2
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
data[x]
1
data[x+12]
1
data[x+13]
3
or something functionaly equivalent. In the end i want to be able to do something like
v=6
x=8
y=9
z=12
values <- c(v,x,y,z)
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
set <- c(data[values[1]],data[values[2]], data[values[3]],data[values[4]])
set
5 7 8 11
values <- values + 8
set
1 3 4 7
I've tried some stuff with additon and substraction to the lenght of my list but it does not work well on the lower numbers.
I hope this was a clear enough explanation,
thanks in advance!
We don't need a loop here as vectors can take vectors of length >= 1 as index
data[values]
#[1] 5 7 8 11
NOTE: Both the objects are vectors and not list
If we need to reset the index
values <- values + 8
ifelse(values > length(data), values - length(data) - 1, values)
#[1] 1 3 4 7

R: Index to unique vector that returns original

I have a vector v <- c(6,8,5,5,8) of which I can obtain the unique values using
> u <- unique(v)
> u
[1] 6 8 5
Now I need an index i = [2,3,1,1,3] that returns the original vector v when indexed into u.
> u[i]
[1] 6,8,5,5,8
I know such an index can be generated automatically in Matlab, the ci index, but does not seem to be part of the standard repertoire in R. Is anyone aware of a function that can do this?
The background is that I have several vectors with anonymized IDs that are long character strings:
ids
"PTefkd43fmkl28en==3rnl4"
"cmdREW3rFDS32fDSdd;32FF"
"PTefkd43fmkl28en==3rnl4"
"PTefkd43fmkl28en==3rnl4"
"cmdREW3rFDS32fDSdd;32FF"
To reduce the file size and simplify the code, I want to transform them into integers of the sort
ids
1
2
1
1
2
and found that the index of the unique vector does just this. Since there are many rows, I am hesitant to write a function that loops over each element of the unique vector and wonder whether there is a more efficient way — or a completely different way to transform the character strings into matching integers.
Try with match
df1$ids <- with(df1, match(ids, unique(ids)) )
df1$ids
#[1] 1 2 1 1 2
Or we can convert to factor and coerce to numeric
with(df1,as.integer(factor(ids, levels=unique(ids))))
#[1] 1 2 1 1 2
Using u and v. Based on the output of 'u' in the OP's post, it must have been sorted
u <- sort(unique(v))
match(v, u)
#[1] 2 3 1 1 3
Or using findInterval. Make sure that 'u' is sorted.
findInterval(v,u)
#[1] 2 3 1 1 3

Vectors of different lengths from a `for` cycle in R: merging in a data frame [duplicate]

This question already has answers here:
Create a Data Frame of Unequal Lengths
(6 answers)
Closed 9 years ago.
I have the following elementary issue in R.
I have a for (k in 1:x){...} cycle which produces numerical vectors whose length depends on k.
For each value of k I produce a single numerical vector.
I would like to collect them as rows of a data frame in R, if possible. In other words, I would like to introduce a data frame data s.t.
for (k in 1:x) {
data[k,] <- ...
}
where the dots represent the command producing the vector with length depending on k.
Unfortunately, as far as I know, the length of the rows of a dataframe in R is constant, as it is a list of vectors of equal length. I have already tried to complete each row with a suitable number of zeroes to arrive at a constant length (in this case equal to x). I would like to work "dynamically", instead.
I do not think that this issue is equivalent to merge vectors of different lengths in a dataframe; due to the if cycle, only 1 vector is known at each step.
Edit
A very easy example of what I mean. For each k, I would like to write the vector whose components are 1,2,...,k and store it as kth row of the dataframe data. In the above setting, I would write
for (k in 1:x) {
data[k,] <- seq(1,k,1)
}
As the length of seq(1,k,1) depends on k the code does not work.
You could consider using ldply from plyr here.
set.seed(123)
#k is the length of each result
k <- sample( 5 , 3 , repl = TRUE )
#[1] 2 4 3
# Make a list of vectors, each a sequence from 1:k
ll <- lapply( k , function(x) seq_len(x) )
#[[1]]
#[1] 1 2
#[[2]]
#[1] 1 2 3 4
#[[3]]
#[1] 1 2 3
# take our list and rbind it into a data.frame, filling in missing values with NA
ldply( ll , rbind)
# 1 2 3 4
#1 1 2 NA NA
#2 1 2 3 4
#3 1 2 3 NA

R how to produce a vector trimming another vector by choosing a fixed value of its components

This is an elementary question; I apologize for it.
Let x <- c(1,2,3,4,5). I would like to produce a vector z of length 5 s.t. its components are all those x satisfying the condition
if x[i]>2 then write 2.
The result should look like
z <- c(1,2,2,2,2)
I know that
z <- which(x>2)
gives me
3 4 5
but I cannot find a good way to implement it to arrive at the result.
I thank you all for your support.
EDIT. If instead of considering a vector x I have a matrix M with columns x and y and I want to apply the above trimming to the column x leaving y untouched, how should I proceed?
You can use pmin:
pmin(x, 2)
# [1] 1 2 2 2 2
For example:
y <- x
y[x>2] <- 2
1 2 2 2 2
If you've a matrix M with two columns, and you want to replace only the first column with values > 2 to 2, then do:
M[,1][M[,1]>2] <- 2

Stepwise fill dataframe

I'm using a for-loop to perform operations on specific subsets of my data. At the end of each iteration of the for loop, I have all the values that I need to fill a row of my dataframe.
So far I tried
df=NULL
for(...){
//stuff to calculate
newline=c(allthethingscalculated)
df=rbind(df,newline)
}
this results in the contents of the dataframe not being accessable using '$' , because the rows are then atomic vectors.
I also tried to append the values I get at the end of each iteration to an already existing vector and when the for loop ends create a dataframe from these vectors using but appending the values to the respective vector didn't work, the values weren't added.
x<-data.frame(a,b,c,d,...)
Any ideas on this?
Since my for loop iterates over IDs in my data, I realized I could do something like this:
uids=unique(data$id)
filler=c(1:length(uids))
df=data.frame(uids,filler,filler,filler,filler,filler,filler,filler,filler,filler)
for(i in uids){
...
df[i,]<-newline
}
I used filler to create a dataframe with the correct number of columns and rows so I don't get an error like 'replacement has length of 9, replacement has length of 1'
Is there a better way to do this? Using this approach I still have the values of filler in the respective row that I'd need to remove?
This should work, can your show us you data ?
R) x=data.frame(a=rep(1,3),b=rep(2,3),c=rep(3,3))
R) d=c(4,4,4)
R) rbind(x,d)
a b c
1 1 2 3
2 1 2 3
3 1 2 3
4 4 4 4
R) cbind(x,d)
a b c d
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4

Resources