R - Counting through a vector and refreshing at a certain value? - r

Let's say I have a vector
vec <- c(3,0,1,1,0,3,0,1,3,0,0,0,3)
And I want to be able to count through this vector using the value 3 as the refresh point. So, the output I want is
vec out
[1,] 3 1
[2,] 0 2
[3,] 1 3
[4,] 1 4
[5,] 0 5
[6,] 3 1
[7,] 0 2
[8,] 1 3
[9,] 3 1
[10,] 0 2
[11,] 0 3
[12,] 0 4
[13,] 3 1
How would I do this in R, preferably without using loops?

With base R, you can do:
ave(vec, cumsum(vec == 3), FUN = seq_along)
[1] 1 2 3 4 5 1 2 3 1 2 3 4 1

An option using data.table::rowid:
data.table::rowid(cumsum(vec==3L))

As another idea, we can locate the indices of the last value of 3 for each element of vec:
last3 = cummax((vec == 3) * seq_along(vec))
last3
# [1] 1 1 1 1 1 6 6 6 9 9 9 9 13
And subtract from their respective indices in vec:
seq_along(vec) - last3 + 1 ## `.. - pmax(last3, 1) ..` if `vec[1] != 3`
# [1] 1 2 3 4 5 1 2 3 1 2 3 4 1

Related

If n occurs less than 5 times, change to n-1

In R, if I have a df of numbers c(1,1,1,2,3,3,3,3,3,3,4,4,4,5,5), how do change n to n-1 if n occurs less than 5 times? Example input x and output out.
x out
1 1 1
2 1 1
3 1 1
4 2 1
5 3 3
6 3 3
7 3 3
8 3 3
9 3 3
10 3 3
11 4 3
12 4 3
13 4 3
As the first value in the column (it will also be the minimum value), 1 would stay the same. However, if it would make the coding easier, the 1s can change to 0, but the 2 would still change to 1.
EDIT:
How can I repeat this if the changed values now occur <5 times? For example
# x out
# [1,] 1 0
# [2,] 1 0
# [3,] 1 0
# [4,] 2 1
# [5,] 3 3
# [6,] 3 3
# [7,] 3 3
# [8,] 3 3
# [9,] 3 3
#[10,] 3 3
#[11,] 4 3
#[12,] 4 3
#[13,] 4 3
#[14,] 5 3
#[15,] 5 3
#[16,] 5 3
#[17,] 6 3
#[18,] 6 3
#[19,] 6 3
#[20,] 7 3
#[21,] 7 3
Using ave :
x <- c(1,1,1,2,3,3,3,3,3,3,4,4,4)
pmax(x - +(ave(x, x, FUN =length) < 5), 1)
#[1] 1 1 1 1 3 3 3 3 3 3 3 3 3
If values in x can repeat we need to use rle for grouping.
pmax(x - +(ave(x,with(rle(x),rep(seq_along(values), lengths)),FUN =length) < 5),1)
You can use rle if x is sorted to find how many times a number is there. And subtract 1 if there are less then 5.
i <- order(x)
y <- rle(x[i])
y$values <- y$values - (y$lengths < 5)
cbind(x,out=inverse.rle(y)[order(i)])
# x out
# [1,] 1 0
# [2,] 1 0
# [3,] 1 0
# [4,] 2 1
# [5,] 3 3
# [6,] 3 3
# [7,] 3 3
# [8,] 3 3
# [9,] 3 3
#[10,] 3 3
#[11,] 4 3
#[12,] 4 3
#[13,] 4 3
#[14,] 5 4
#[15,] 5 4
Another solution
library(tidyvese)
x <- c(1,1,1,2,3,3,3,3,3,3,4,4,4,5,5)
df <- tibble(x = x)
df %>%
group_by(x) %>%
mutate(n = n()) %>%
ungroup %>%
transmute(x,
out = if_else((x != min(x, na.rm = T) & n < 5), x - 1, x))

Add a vector based on previous vectors

Here is the data that I have:
round<-rep(1:5,4)
players<-rep(1:2, c(10,10))
decs<-sample(1:3,20,replace=TRUE)
game<-rep(rep(1:2,c(5,5)),2)
gamematrix<-cbind(players,game,round,decs)
gamematrix
Here is the output:
players game round decs
[1,] 1 1 1 2
[2,] 1 1 2 2
[3,] 1 1 3 1
[4,] 1 1 4 2
[5,] 1 1 5 1
[6,] 1 2 1 1
[7,] 1 2 2 1
[8,] 1 2 3 2
[9,] 1 2 4 1
[10,] 1 2 5 3
[11,] 2 1 1 2
[12,] 2 1 2 1
[13,] 2 1 3 3
[14,] 2 1 4 3
[15,] 2 1 5 3
[16,] 2 2 1 3
[17,] 2 2 2 2
[18,] 2 2 3 1
[19,] 2 2 4 1
[20,] 2 2 5 2
Now, I would like to add another column: "Same Choice" which I want to be "1" if the same player in the same game makes the same decision in next round as in previous round. For example, for player 1, the output should be: c(0,1,0,0,0,0,1,0,0,0). Any ideas how can I do it?
Thanks!
Here is a data.table answer:
# set seed
set.seed(1234)
# load data
round<-rep(1:5,4)
players<-rep(1:2, c(10,10))
decs<-sample(1:3,20,replace=TRUE)
game<-rep(rep(1:2,c(5,5)),2)
gamematrix<-cbind(players,game,round,decs)
library(data.table)
dt <- data.table(gamematrix)
dt[, .(decs=decs, lag=c(0,head(decs,-1)),
sameDec=as.integer(decs==c(NA,head(decs,-1)))),
by=c("players","game")]
I included the lag term so that you can verify.
#Frank s suggestion to use shift is much cleaner (and probably faster):
dt[, .(decs=decs, lag=shift(decs, 1),
sameDec=as.integer(decs==shift(decs, 1))),
by=c("players","game")]
compared to my hand-coded lag.
Following code will work
library(dplyr)
gamematrix %>% as.data.frame %>% group_by(players, game) %>% mutate(new_col = ifelse(decs == lag(decs), 1, 0) )
gamematrix$new_col[is.na(gamematrix$new_col)]<- 0

Exporting Items from List into Single .csv File R

I have a list which contains vectors that I would like to export as a single .csv file containing all vectors as named colums.
For instance, if I have, simply, four vectors containing ten items from hypothetical cluster analyses of four models containing a variable number data points created by
veglist=list.files(pattern="TXT") #create list of files
veg=lapply(veglist,read.csv,header=T,row.names=1) #read list of files
vegbc=lapply(veg,vegdist,method="bray") #create dissimilarity matrix from each file
av=lapply(vegbc,agnes,method="average") #do clustering analysis with each dissimilarity mat
av2=lapply(av,cutree,k=2) #cut the hierarchical analysis at 2 groups level
when I type in fix(av2) I would see:
list(c(1,1,1,1,1,1,2,2,2,2,2,2),c(1,1,1,1,1,2,2,2,2,2),c(1,1,1,2,1,2,2,2,2,2),c(1,1,1,1,2,1,2,2,2,2,2,2,2))
If I type in av2 I see
[[1]]
[1] 1 1 1 1 1 1 2 2 2 2 2 2
[[2]]
[1] 1 1 1 1 1 2 2 2 2 2
[[3]]
[1] 1 1 1 2 1 2 2 2 2 2
[[4]]
[1] 1 1 1 1 2 1 2 2 2 2 2 2 2
I have tried following this example How to read every .csv file in R and export them into single large file. This did not work.
I think the underlying problem is that my vectors are not the same size. What I want to do is output the vectors into a single table that looks something like:
a b c d
1 1 1 1
1 1 1 1
1 1 1 1
1 1 2 1
1 1 1 1
1 2 2 2
2 2 2 2
2 2 2 2
2 2 2 2
2 2
2 2
2
Where a,b,c,d are in place of my actual names. Preferably it would look prettier than this, but I could work with it.
I apologize for the very long question, but I was trying to provide enough of an example to go by. I am also sorry if this has a very easy answer, but I am not yet good with R. Thanks in advance.
Here is one way you can do:
l <- list(c(1,1,1,1,1,1,2,2,2,2,2,2),c(1,1,1,1,1,2,2,2,2,2),c(1,1,1,2,1,2,2,2,2,2),c(1,1,1,1,2,1,2,2,2,2,2,2,2))
maxlength <- max(sapply(l, length))
df <- data.frame(sapply(l, function(x) c(x, rep(NA, (maxlength - length(x))))))
df
X1 X2 X3 X4
1 1 1 1 1
2 1 1 1 1
3 1 1 1 1
4 1 1 2 1
5 1 1 1 2
6 1 2 2 1
7 2 2 2 2
8 2 2 2 2
9 2 2 2 2
10 2 2 2 2
11 2 NA NA 2
12 2 NA NA 2
13 NA NA NA 2
You would first need to extend each vector to the length of the maximum length-ed vector and then you could cbind them together so that write.csv would send them out as "columns":
> maxlength <- max(sapply(l, length))
> mat <- cbind(sapply(l, `length<-`, maxlength))
> mat
[,1] [,2] [,3] [,4]
[1,] 1 1 1 1
[2,] 1 1 1 1
[3,] 1 1 1 1
[4,] 1 1 2 1
[5,] 1 1 1 2
[6,] 1 2 2 1
[7,] 2 2 2 2
[8,] 2 2 2 2
[9,] 2 2 2 2
[10,] 2 2 2 2
[11,] 2 NA NA 2
[12,] 2 NA NA 2
[13,] NA NA NA 2
> write.csv(mat, file="mycsv.csv")
Which looks like this in a text editor (and would get imported into Excel properly.):
"","V1","V2","V3","V4"
"1",1,1,1,1
"2",1,1,1,1
"3",1,1,1,1
"4",1,1,2,1
"5",1,1,1,2
"6",1,2,2,1
"7",2,2,2,2
"8",2,2,2,2
"9",2,2,2,2
"10",2,2,2,2
"11",2,NA,NA,2
"12",2,NA,NA,2
"13",NA,NA,NA,2
This can be done with stri_list2matrix from stringi
library(stringi)
m1 <- stri_list2matrix(l)
mode(m1) <- "integer"
m1
# [,1] [,2] [,3] [,4]
# [1,] 1 1 1 1
# [2,] 1 1 1 1
# [3,] 1 1 1 1
# [4,] 1 1 2 1
# [5,] 1 1 1 2
# [6,] 1 2 2 1
# [7,] 2 2 2 2
# [8,] 2 2 2 2
# [9,] 2 2 2 2
#[10,] 2 2 2 2
#[11,] 2 NA NA 2
#[12,] 2 NA NA 2
#[13,] NA NA NA 2

Create numbered sequence for occurrences of a given nesting variable

I'm hoping to add to a data set a variable that sequences the instances a certain grouping variable appears. For example:
ids <- c(rep(1,4),rep(2,6),rep(3,2))
I'm wanting another variable that would count the instances each id appears. Creating a vector like this:
1,2,3,4,1,2,3,4,5,6,1,2
With them combined looking something like this:
ids count
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 2 3
8 2 4
9 2 5
10 2 6
11 3 1
12 3 2
Any ideas? Many thanks!
I suggest ave with seq_along
ids <- c(rep(1,4),rep(2,6),rep(3,2))
count <- ave(ids,ids, FUN=seq_along)
cbind(ids, count)
# ids count
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 1 4
# [5,] 2 1
# [6,] 2 2
# [7,] 2 3
# [8,] 2 4
# [9,] 2 5
# [10,] 2 6
# [11,] 3 1
# [12,] 3 2
Or if it is ordered
cbind(ids, count=sequence(unname(table(ids))))
# ids count
# [1,] 1 1
# [2,] 1 2
# [3,] 1 3
# [4,] 1 4
# [5,] 2 1
# [6,] 2 2
# [7,] 2 3
# [8,] 2 4
# [9,] 2 5
# [10,] 2 6
# [11,] 3 1
# [12,] 3 2
Or
cbind(ids,within.list(rle(ids), lengths <- sequence(lengths))$lengths)
Or
library(data.table)
dt <- as.data.table(ids)
dt[,count:=seq_len(.N), by=ids]
Or
library(dplyr)
dat <- data.frame(ids)
dat %>%
group_by(ids) %>%
mutate(count=row_number())

Create dataframe of all array indices in R

Using R, I'm trying to construct a dataframe of the row and col numbers of a given matrix. E.g., if
a <- matrix(c(1:15), nrow=5, ncol=3)
then I'm looking to construct a dataframe that gives:
row col
1 1
1 2
1 3
. .
5 1
5 2
5 3
What I've tried:
row <- matrix(row(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
col <- matrix(col(a), ncol=1, nrow=dim(a)[1]*dim(a)[2], byrow=T)
out <- cbind(row, col)
colnames(out) <- c("row", "col")
results in:
row col
[1,] 1 1
[2,] 2 1
[3,] 3 1
[4,] 4 1
[5,] 5 1
[6,] 1 2
[7,] 2 2
[8,] 3 2
[9,] 4 2
[10,] 5 2
[11,] 1 3
[12,] 2 3
[13,] 3 3
[14,] 4 3
[15,] 5 3
Which isn't what I'm looking for, as the sequence of rows and cols in suddenly reversed, even tough I specified "byrow=T". I don't see if and where I'm making a mistake but would hugely appreciate suggestions to overcome this problem. Thanks in advance!
I'd use expand.grid on the vectors 1:ncol and 1:nrow, then flip the columns with [,2:1] to get them in the order you want:
> expand.grid(seq(ncol(a)),seq(nrow(a)))[,2:1]
Var2 Var1
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Use row and col, but more directly manipulate their output ordering since they return corresponding indices in place for the input array. Use t to get the non-default order you want in the end:
data.frame(row = as.vector(t(row(a))), col = as.vector(t(col(a))))
row col
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
10 4 1
11 4 2
12 4 3
13 5 1
14 5 2
15 5 3
Or, as a matrix not a data.frame:
cbind(as.vector(t(row(a))), as.vector(t(col(a))))
[,1] [,2]
[1,] 1 1
[2,] 1 2
[3,] 1 3
[4,] 2 1
[5,] 2 2
[6,] 2 3
[7,] 3 1
[8,] 3 2
[9,] 3 3
[10,] 4 1
[11,] 4 2
[12,] 4 3
[13,] 5 1
[14,] 5 2
[15,] 5 3
You may want to have a look at ?expand.grid, which does just about exactly what you want to achieve.
Since there are many ways to skin a cat, I'll chip in with yet another variant based on rep:
data.frame(row=rep(seq(nrow(a)), each=ncol(a)), col=rep(seq(ncol(a)), nrow(a)))
...but to announce a "winner", I think you need to time the solutions:
# Make up a huge matrix...
a <- matrix(runif(1e7), 1e4)
system.time( a1<-data.frame(row = as.vector(t(row(a))),
col = as.vector(t(col(a)))) ) # 0.68 secs
system.time( a2<-expand.grid(col = seq(ncol(a)),
row = seq(nrow(a)))[,2:1] ) # 0.49 secs
system.time( a3<-data.frame(row=rep(seq(nrow(a)), each=ncol(a)),
col=rep(seq(ncol(a)), nrow(a))) ) # 0.59 secs
identical(a1, a2) && identical(a1, a3) # TRUE
...so it seems #Spacedman has the speediest solution!

Resources