I have two vectors
a <- c(18,19,19,19,21,21,22,23,24,25,26,27,28,30,31,35,36,37)
b <- c(19,25,31,37)
I need the data frame following format:
a b
18 19
19 19
19 19
19 19
21 25
21 25
22 25
23 25
24 25
25 25
26 31
27 31
28 31
30 31
31 31
35 37
36 37
37 37
Here value 19 in vector b repeat up to the value 19 in vector a.
After that 21(in a) is the greater than 19 ,so the next value of 25(in b) is be repeat until the 25(in a )
in similar way construct the dataframe.
Thank you.
We can get the position index from findInterval, use that to create the times for the rep
i1 <- findInterval(b, a)
data.frame(a, b = rep(b, c(i1[1], diff(i1))))
# a b
#1 18 19
#2 19 19
#3 19 19
#4 19 19
#5 21 25
#6 21 25
#7 22 25
#8 23 25
#9 24 25
#10 25 25
#11 26 31
#12 27 31
#13 28 31
#14 30 31
#15 31 31
#16 35 37
#17 36 37
#18 37 37
Alternatively,
data.frame(a, b = sapply(a, function(x) b[x <= b][1]))
# a b
# 1 18 19
# 2 19 19
# 3 19 19
# 4 19 19
# 5 21 25
# 6 21 25
# 7 22 25
# 8 23 25
# 9 24 25
# 10 25 25
# 11 26 31
# 12 27 31
# 13 28 31
# 14 30 31
# 15 31 31
# 16 35 37
# 17 36 37
# 18 37 37
Related
I am trying to create a vector where I have 3 repetitions of the number 1, then 3 repetitions of the number 2, and so on up to, for instance, 3 repetitions of the number 36.
c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5...)
I have tried the following use of rep() but got the following error:
Error in rep(3, seq(1:36)) : argument 'times' incorrect
What formulation do I need to use to properly generate the vector I want?
sort(rep(1:36, 3))
Or even better as #Wimpel mentioned in the comments, use the each argument of the rep function.
rep(1:36, each = 3)
output
# [1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 20 20 20 21 21 21 22
# [65] 22 22 23 23 23 24 24 24 25 25 25 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34 34 34 35 35 35 36 36 36
This one should work. However probably not the most elegant.
reps = c()
n = 36
for(i in 1:n){
reps = append(reps, rep(i, 3))
}
reps
alternatively using the rep function properly (see documentation (?rep for argument each):
rep(1:36,each = 3)
rep approach is preferable (see existing answers)
Here are some other options:
> kronecker(1:36, rep(1, 3))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36
> c(outer(rep(1, 3), 1:36))
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9
[26] 9 9 10 10 10 11 11 11 12 12 12 13 13 13 14 14 14 15 15 15 16 16 16 17 17
[51] 17 18 18 18 19 19 19 20 20 20 21 21 21 22 22 22 23 23 23 24 24 24 25 25 25
[76] 26 26 26 27 27 27 28 28 28 29 29 29 30 30 30 31 31 31 32 32 32 33 33 33 34
[101] 34 34 35 35 35 36 36 36
I have tried to change column names based on a vector as follows:
library(data.table)
df <- fread(
"radio1 radio2 radio3 radio4 radio5 radio6 radio7
8 12 18 32 40 36 32
6 12 18 24 30 36 30
8 16 18 24 30 36 18
4 12 12 24 30 36 24
6 16 24 32 40 48 24
8 12 18 24 30 36 30
8 12 18 24 30 36 18
8 16 24 32 40 48 40
8 16 24 24 30 48 48",
header = TRUE
)
var <- c("radio1","radio2","radio3","radio4","radio5", "radio6", "radio7")
recode <- c("A","B","C","D","E", "F", "G")
variables <- cbind(var, recode)
variables <- as.data.table(variables)
for (i in seq_len(ncol(df))) {
colnames(df[[i]]) <- variables$recode[match(names(df)[i], variables $var)]
}
I however get the error:
Error in `colnames<-`(`*tmp*`, value = variables$recode[match(names(df)[i], :
attempt to set 'colnames' on an object with less than two dimensions
What am I doing wrong? Is there a better way to do this?
You can use match directly.
names(df) <- variables$recode[match(names(df), variables$var)]
df
# A B C D E F G
#1: 8 12 18 32 40 36 32
#2: 6 12 18 24 30 36 30
#3: 8 16 18 24 30 36 18
#4: 4 12 12 24 30 36 24
#5: 6 16 24 32 40 48 24
#6: 8 12 18 24 30 36 30
#7: 8 12 18 24 30 36 18
#8: 8 16 24 32 40 48 40
#9: 8 16 24 24 30 48 48
By changing colnames(df[[i]]) to colnames(df)[i], the loop works fine:
for (i in seq_len(ncol(df))) {
colnames(df)[i] <- variables$recode[match(names(df)[i], variables$var)] }
> df
A B C D E F G
1: 8 12 18 32 40 36 32
2: 6 12 18 24 30 36 30
3: 8 16 18 24 30 36 18
4: 4 12 12 24 30 36 24
5: 6 16 24 32 40 48 24
6: 8 12 18 24 30 36 30
7: 8 12 18 24 30 36 18
8: 8 16 24 32 40 48 40
9: 8 16 24 24 30 48 48
I have a dataset where the last columns indicate the number of stops extracted from that dataset.
ColA ColB ColC 1 2 3 4 5 6 7 8 9 10 (...)
a g c a q e r e r q g h q (...)
What I want is to select from column 1, until the last column, and add Stop before it, ending up with Stop1, Stop2, etc...
The problem is that those columns can vary. Sometimes I have 10 after 1 other times I have 6.
I've tried with dplyr and data.table but I'm not sure how to automate this.
EDIT: ColA to ColC are fixed and always the same.
If I correctly understood your problem, this is a sufficiently flexible code that should solve your problem. Start considering the following dataset:
set.seed(1)
df <- data.frame(matrix(rpois(130, 20),ncol=13))
names(df) <- c(paste("Col",LETTERS[1:3],sep=""),as.character(1:10))
df
#######
ColA ColB ColC 1 2 3 4 5 6 7 8 9 10
1 17 21 20 13 13 15 29 25 16 15 12 23 17
2 25 17 11 24 23 14 22 23 25 14 18 19 15
3 25 18 22 18 19 30 16 19 23 27 18 19 11
4 21 18 24 25 23 19 19 18 27 23 18 16 18
5 13 21 16 18 21 23 22 18 22 24 22 26 15
6 22 16 17 27 17 20 24 24 14 21 19 17 15
7 23 23 18 22 16 16 20 18 21 27 17 22 14
8 22 22 17 17 26 13 19 25 24 17 15 13 20
9 18 24 21 22 28 26 15 22 23 20 19 15 27
10 26 23 19 16 18 20 17 25 16 20 19 18 19
Now rename columuns as required:
k <- which(names(df)=="1")
names(df)[k:ncol(df)] <- paste("Stop",1:(ncol(df)-k+1),sep="")
df
#############
ColA ColB ColC Stop1 Stop2 Stop3 Stop4 Stop5 Stop6 Stop7 Stop8 Stop9 Stop10
1 17 21 20 13 13 15 29 25 16 15 12 23 17
2 25 17 11 24 23 14 22 23 25 14 18 19 15
3 25 18 22 18 19 30 16 19 23 27 18 19 11
4 21 18 24 25 23 19 19 18 27 23 18 16 18
5 13 21 16 18 21 23 22 18 22 24 22 26 15
6 22 16 17 27 17 20 24 24 14 21 19 17 15
7 23 23 18 22 16 16 20 18 21 27 17 22 14
8 22 22 17 17 26 13 19 25 24 17 15 13 20
9 18 24 21 22 28 26 15 22 23 20 19 15 27
10 26 23 19 16 18 20 17 25 16 20 19 18 19
I hope it can help you.
What's the name for the [1] below.
What is its significance?
Is it always only [1]? If not, then under what conditions is it something else? (example please)
> bb <- c(5,6,7)
> bb
[1] 5 6 7
It shows the count of the variables. In your case, it shows
bb <- c(5,6,7)
> bb
# [1] 5 6 7
Try,
c(1:50)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
#[35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
You can also avoid that being displayed by using cat
cat(c(1:50))
#1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
I am pretty much new to r and I have a dummy example of a bigger table underneath. I want to split the table based on id (a,b,c,d) and do iterative simple linear regression for every subset:
x is my x variable, and columns 1:6 are y variables, to have an output of each id and each column from 1:6. Also, it would be great if I could output the model p values of the slopes into a new data frame
id x 1 2 3 4 5 6
1 a 74 18 19 NA 23 29 1
2 a 77 16 19 17 22 29 2
3 a 79 16 NA 19 23 29 3
4 a 81 17 20 18 23 29 4
5 b 74 19 20 19 23 28 11
6 b 76 15 19 18 26 28 12
7 b 79 19 21 20 24 28 NA
8 b 81 19 21 20 23 28 14
9 c 68 19 20 20 23 29 8
10 c 70 17 22 22 27 29 9
11 c 73 18 22 21 23 29 10
12 c 75 19 20 19 23 29 11
13 d 65 18 18 19 22 28 5
14 d 68 18 NA 18 20 29 6
15 d 70 18 19 18 23 28 7
16 d 72 19 17 19 22 28 8
I tried to do use plyr package but it didn't work out
regression = NULL
for ( i in 3:ncol(dumm)){
regression[i] <- dlply(dumm, .(id), function(z) lm(dumm[,i]~dumm$x, z))
}
coefs <- ldply(regression, coef)
Thanks in advance!