I have to do the following:
I have a vector, let as say
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
I have to subset the remainder of a vector after 1, 2, 3, 4 occurred at least once.
So the subset new vector would only include 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1.
I need a relatively easy solution on how to do this. It might be possible to do an if and while loop with breaks, but I am kinda struggling to come up with a solution.
Is there a simple (even mathematical way) to do this in R?
Use sapply to find where each predefined number occurs first time.
x[-seq(max(sapply(1:4, function(y) which(x == y)[1])))]
# [1] 4 5 5 3 2 11 1 3 3 4 1
Data
x <- c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
You can use run length encoding for this
x = c(1, 1, 2, 3, 3, 3, 4, 4, 5, 5, 3, 2, 11, 1, 3, 3, 4, 1)
encoded = rle(x)
# Pick the first location of 1, 2, 3, and 4
# Then find the max index location
indices = c(which(encoded$values == 1)[1],
which(encoded$values == 2)[1],
which(encoded$values == 3)[1],
which(encoded$values == 4)[1])
index = max(indices)
# Find the index of x corresponding to your split location
reqd_index = cumsum(encoded$lengths)[index-1] + 2
# Print final split value
x[reqd_index:length(x)]
The result is as follows
> x[reqd_index:length(x)]
[1] 4 5 5 3 2 11 1 3 3 4 1
Related
I am trying to generate a polychoric correlation matrix in R-psych for a 227 x 6 data table which I have called nepr. Importing the data from an excel spreadsheet and entering the code:
nepr=as.data.frame(nepr)
attach(nepr)
library(psych)
out=polychoric(nepr)
neprpoly=out$rho
print(neprpoly,digits=2)
generates the following error message:
>Error in if (any(lower > upper)) stop("lower>upper integration
limits"): missing value where TRUE/FALSE needed
>In addition: warning messages:
>1. In polychoric(nepr): The items do not have an equal number
of response alternatives, global set to FALSE.
>2. In qnorm(cumsum(rsum)[-length(rsum)]): NaNs produced
I was expecting the code which I entered to produce a polychoric correlation matrix based on the dataframe nepr and don't know how to interpret/ act on the error messages which I have received.
Can anyone suggest what changes I need to make to the code to address the error messages?
A sample of the dataset is as follows:
structure(list(Balance = c(4, 4, 5, 5, 3, 4, 3, 4, 2, 2, 2, 5,
2, 2, 2, 2, 1, 2, 4, 1), Earth = c(4, 5, 5, 5, 5, 5, 5, 4, 4,
4, 4, 5, 3, 4, 4, 2, 5, 4, 5, 5), Plants = c(2, 2, 2, 3, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 2, 2, 4), Modify = c(2, 2, 1,
1, 2, 2, 2, 2, 4, 2, 4, 2, 4, 2, 2, 2, 2, 2, 2, 2), Growth =
c(2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 4, 1, 4, 2, 2, 4, 4, 4, 1, 2),
Mankind = c(2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1,
1, 1, 2)), row.names = c(NA,20L), class = "data.frame")
The data consists of inputs of Likert scale rankings (ranked 1-5) to the items 'Balance', 'Earth', 'Plants', 'Modify', 'Growth', and 'Mankind'. There are no missing values in any cells of the 227 row x 6 item matrix; Balance, Plants, & Growth all contain the values 1-5; Earth contains the values 2-5 (no ranking of 1 recorded); Mankind contains the values 1-4 (no ranking of 5 recorded). When I ran the original data set (before reversing the valence of the last 3 columns) I was able to get a polychoric matrix with no problems even though the data contained the Earth data as it appears in the nepr data set. I assume that it is not uncommon to have similar data sets from surveys where variables do not necessarily contain the full range of response values.
I want to create a vector, but to do that I have to use information of another vector. I guess it's necessary to use a loop, but I don't know.
I have the vector
n <- c(2, 4, 2, 4, 4, 2, 3, 4, 2, 3, 5, 10, 2, 5)
and I have to create
rbeta(N-12,i+1,N-i+1)
where i is the ith element of n.
If N is a scalar value then you can use the fact that R is vectorized to get your result with
rbeta(N - 12, n + 1, N - n + 1)
For example:
n <- c(2, 4, 2, 4, 4, 2, 3, 4, 2, 3, 5, 10, 2, 5)
N <- 20
rbeta(N - 12, n + 1, N - n + 1)
#> [1] 0.06464326 0.41683835 0.14648202 0.22730181 0.21056577 0.17171969
#> [7] 0.28686094 0.14333501
I am having to write code in r where I have to count the number of times a specific set of numbers within a vector are repeated one after another.
For example, in the following set of numbers, I would want to count the number of times a number was repeated after itself, such as 2,2 and 4,4, or even repeating after itself 3 times in a row such as 1,1,1 or 3,3,3, not counting the number of times an individual number has occurred throughout the set.
5, 3, 2, 2, 4, 1, 4, 4, 6, 1, 3, 2, 1, 4, 3, 1, 6, 4, 5, 5, 3, 4, 3, 4, 4, 5, 6, 6, 2, 4, 6, 1, 1, 1, 2, 2, 4, 3, 3, 3, 1, 3, 5, 1, 5, 2, 2, 6, 5, 6, 3
You can use rle to find repeated consecutive values. For example,
i1 <- rle(x)
setNames(i1$lengths[i1$lengths > 1], paste0('value:', i1$values[i1$lengths > 1]))
#value:2 value:4 value:5 value:4 value:6 value:1 value:2 value:3 value:2
# 2 2 2 2 2 3 2 3 2
DATA
dput(x)
c(5, 3, 2, 2, 4, 1, 4, 4, 6, 1, 3, 2, 1, 4, 3, 1, 6, 4, 5, 5,
3, 4, 3, 4, 4, 5, 6, 6, 2, 4, 6, 1, 1, 1, 2, 2, 4, 3, 3, 3, 1,
3, 5, 1, 5, 2, 2, 6, 5, 6, 3)
I have a vector that looks like this:
c(1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5..)
I want to get the index of when the element changes, i.e. (1,5,9,...)
I know how to do it with a for loop, but I am trying a faster way as my vector is very large.
Thanks,
Try
which(c(TRUE,diff(v1)!=0))
Or
match(unique(v1), v1)
Or if the vector is sorted
head(c(1, findInterval(unique(v1), v1)+1),-1)
data
v1 <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4,
4, 4, 5, 5, 5, 5, 5)
Another fun approach:
v1 <- c(1, 1, 2, 3, 4, 4, 5, 6, 7, 7, 7, 8)
head(c(1, cumsum(rle(v1)$lengths) + 1), -1)
Or if you have magrittr then it can become
library(magrittr)
v1 %>%
rle %>%
.$lengths %>%
cumsum %>%
add(1) %>%
c(1, .) %>%
head(-1)
Result: 1 3 4 5 7 8 9 12
Might look weird but it's fun to think that through :)
Explanation: cumsum(rle(v1)$lengths) gets you almost all the way there, but it'll give you the index of where a sequence ends rather than where the next sequence starts, so that's why we add one to each element, append the index 1, and remove the last element.
I have the 2 tables as below
subj <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
gamble <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
ev <- c(4, 5, 6, 4, 5, 6, 4, 5, 6)
table1 <- data.frame(subj, gamble, ev)
subj2 <- c(1, 2, 3)
gamble2 <- c(1, 3, 2)
table2 <- data.frame(subj2, gamble2)
I want to merge the two tables by gamble, only choose the gamble from table 1 which has the same number to gamble in table 2. The expected output is as follows:
sub gamble ev
1 1 4
2 3 6
3 2 5
You are looking for merge
merge(table1, table2, by.x=c("subj", "gamble"), by.y=c("subj2", "gamble2"), all=FALSE, sort=TRUE)
edited as per Ananda's helpful observation