I have a vector of the following form:-
a <- c(4, 6, 3, 6, 1)
What I want is to make a vector such that it has the index of the vector a the number of times the value of that index in vector a.
Like the first index has value 4, so there should be 4 ones, followed by 6 twos, followed by 3 threes, and so on.
Then resulting vector should be of the following form:-
b <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5)
Thanks in advance.
We can use rep as :
a <- c(4, 6, 3, 6, 1)
rep(seq_along(a), a)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 5
We can use sequence
cumsum(sequence(a) == 1)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 5
Or using uncount
library(dplyr)
library(tidyr)
tibble(a) %>%
mutate(rn = row_number()) %>%
uncount(a)
Related
I have two vectors
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- (1, 2, 3, 4, 5, 6)
I want to know how many times each element in b occurs in a. So the result should be
c(3, 3, 2, 1, 2, 0)
All methods I found like match(),==, %in% etc. are not suited for entire vectors. I know I can use a loop over all elements in b,
for (i in 1:length(b)) {
c[I] <- sum(a==b, na.rm=TRUE)
}
but this is used often and takes to long. That's why I'm looking for a vectorized way, or a way to use apply().
You can do this using factor and table
table(factor(a, unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Since you mentioned match, here is a possibility without sapply loop (thanks to #thelatemail)
table(factor(match(a, b), unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Here is a base R option, using sapply with which:
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- c(1, 2, 3, 4, 5, 6)
sapply(b, function(x) length(which(a == x)))
[1] 3 3 2 1 2 0
Demo
Here is a vectorised method
x = expand.grid(b,a)
rowSums( matrix(x$Var1 == x$Var2, nrow = length(b)))
# [1] 3 3 2 1 2 0
I would like to compare the frequency of samples from two different observations. The problem is that the first doesn't contain the whole range of numbers of the second. How could I combine these without writing a for loop sorting them based on the x values returned by count?
Here's a MWE for clarification:
library(plyr)
a <- c(5, 4, 5, 7, 3, 5, 6, 5, 5, 4, 5, 5, 4, 5, 4, 7, 2, 4, 4, 5, 3, 6, 5, 6, 4, 4, 5, 4, 5, 5, 6, 7, 4)
b <- c(1, 3, 4, 6, 2, 7, 7, 4, 3, 6, 6, 3, 6, 6, 5, 6, 6, 5)
a.count <- count(a)
b.count <- count(b)
My desired result should look somehow like that:
freq.a freq.b
1 1
2 1 1
3 3 2
4 2 10
5 2 13
6 7 4
7 2 3
If you put your data in long format (one row per observation, with a variable for which sample it is from), then you can just make a contingency table:
data.frame(v=df.a, s='a') %>% rbind(data.frame(v=df.b, s='b')) %>%
xtabs(f=~v+s)
Produces:
s
v a b
1 0 1
2 1 1
3 2 3
4 10 2
5 13 2
6 4 7
7 3 2
df <- merge(a.count, b.count, by ='x', all=TRUE)[2:3]
names(df) <- c('freq.a', 'freq.b')
df
freq.a freq.b
1 NA 1
2 1 1
3 2 3
4 10 2
5 13 2
6 4 7
7 3 2
I have bunch of observations
x = c(1, 2, 4, 1, 6, 7, 11, 11, 12, 13, 14)
that I want to turn into the group:
y = c(1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3)
I.e I want the first 5 integers (1 to 5) to constitute one group, the next 5 integers to constitute the next group (6 to 10), and so on.
Is there a straightforward way to accomplish this without a loop?
Clarification: I need to programmatically create the groups form the input vector (x)
We can use %/% to create the group
x%/%5+1
#[1] 1 1 1 1 2 2 3 3 3 3 3
You can use ceiling to create groups
ceiling(x/5)
# [1] 1 1 1 1 2 2 3 3 3 3 3
I have a dataframe with around 30k observations, divided in 300 groups. For example
id, group, x, y
1, 1, 2, 3
2, 1, 4, 3
3, 1, 2, 4
4, 2, 5, 4
5, 2, 5, 3
6, 2, 6, 4
I want to make it so
pair, group, x_i, x_j, y_i, y_j
12, 1, 2, 4, 3, 3
13, 1, 2, 2, 3, 4
23, 1, 4, 2, 3, 4
45, 2, 5, 5, 4, 3
and so on. I've found a few topics, but they don't seem to apply exactly to my problem.
The combn function can be used to generate each corresponding pair of x and y values. We operate by group using lapply. lapply returns a list so we use rbind to put each list element (the results for each group) back together in a single data frame.
new.dat = lapply(unique(dat$group), function(g) {
data.frame(pairs = apply(t(combn(dat$id[dat$group==g], 2)), 1, paste, collapse=""),
group=g,
x = t(combn(dat$x[dat$group==g], 2)),
y = t(combn(dat$y[dat$group==g], 2)))
})
do.call(rbind, new.dat)
pairs group x.1 x.2 y.1 y.2
1 12 1 2 4 3 3
2 13 1 2 2 3 4
3 23 1 4 2 3 4
4 45 2 5 5 4 3
5 46 2 5 6 4 4
6 56 2 5 6 3 4
You could also use split, which saves some typing, but is about 10% slower on my machine:
lapply(split(dat, dat$group), function(df) {
data.frame(pairs = apply(t(combn(df$id, 2)), 1, paste, collapse=""),
group=g,
x = t(combn(df$x, 2)),
y = t(combn(df$y, 2)))
})
I won't say this is an ooptimal result, but it should work:
df <- read.table(text="id, group, x, y
1,1,2,3
2,1,4,3
3,1,2,4
4,2,5,4
5,2,5,3
6,2,6,4", header=T, sep=",")
df.new <- do.call(rbind,lapply(tapply(df$id, df$group, combn, m=2), FUN=function(x) data.frame(pairi=x[1,], pairj=x[2,])))
df.new <- do.call(rbind,apply(df.new, 1, FUN=function(x) data.frame(pair=paste0(x[1], x[2]),group=df[df$id==x[1], 'group'], x_i=df[df$id==x[1],'x'], x_j=df[df$id==x[2],'x'], y_i=df[df$id==x[1],'y'], y_j=df[df$id==x[2],'y'] )))
df.new
pair group x_i x_j y_i y_j
1.1 12 1 2 4 3 3
1.2 13 1 2 2 3 4
1.3 23 1 4 2 3 4
2.1 45 2 5 5 4 3
2.2 46 2 5 6 4 4
2.3 56 2 5 6 3 4
I am relatively new with R and I have a problem with a dataframe.
I have a very long dataframe (df1) with some coordinates xy and a value z. I have a shorter dataframe (df2) with the same columns but smaller number of rows. I want to replace values in df1 when xy are equal in df2.
x = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4)
y = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4)
z = c(8, 5, 3, 1, 2, 6, 8, 5, 3, 2, 8, 4, 4, 6, 2, 1)
df1 = data.frame(x, y, z)
x1=c(1,3,4)
y1=c(2,1,4)
z1=c(58,37,23)
df2=data.frame(x1,y1,z1)
names(df2) <- c("x", "y", "z")
I thought that I might use ifelse function as:
df1$znew<-ifelse((df1[,1]== df2[,1])&(df1[,2]==df2[,2]), df2[,3], df1[,3])
But the two objects are not the same dimensions.
I have tried to use loops so it analyse each row to compare x and y and then decide what z to use but I can't make it work.
At the end I would like to have a dataframe with a new variable of z to compare the values and corroborate that it really changed the values. My final dataframe would look like:
znew = c(8,58,3,1,2,6,8,5,37,2,8,4,4,6,2,23)
I really appreciate any help and I am sorry if somebody else posted similar questions, I have been all day trying to figure it out and I can't find any example that suits my case.
Assuming the two data frames do in fact have the same column names (probably just a typo in your question), you might do this with merge:
tmp <- merge(df1,df2,all.x = TRUE,by = c('x','y'))
tmp$z.x[!is.na(tmp$z.y)] <- tmp$z.y[!is.na(tmp$z.y)]
> tmp
x y z.x z.y
1 1 1 8 NA
2 1 2 4 4
3 1 3 3 NA
4 1 4 1 NA
5 2 1 2 NA
6 2 2 6 NA
7 2 3 8 NA
8 2 4 5 NA
9 3 1 4 4
10 3 2 2 NA
11 3 3 8 NA
12 3 4 4 NA
13 4 1 4 NA
14 4 2 6 NA
15 4 3 2 NA
16 4 4 3 3
Then just remove the extra column and rename the columns.