Using rep/seq to create ID column - r

Is there an efficient way to create an ID column using rep/seq or some other function I'm not thinking of to make a sequence such as the following:
1, 2, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10.....
So every 3 numbers the following 3 numbers get repeated an additional time. My actual data will require a sequence that is:
1:1000- 1 each
1001:2000- 2 each
2001:3000 - 3 each
....
Any ideas/help would be greatly appreciated.

We can use
v2 <- 1:7000
rep(v2, as.integer(gl(length(v2), 1000, length(v2))))
For the first case
v1 <- 1:15
rep(v1, as.integer(gl(length(v1), 3, length(v1))))
[1] 1 2 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 13 14 14 14 14 14 15 15 15 15

We can use rep inside rep.int.
First case:
rep.int(1:12, rep(1:4, each = 3))
Second case:
rep.int(1:3e3, rep(1:3, each = 1e3))

Related

Loop to sum observation of a time serie in R

I'm sure this is very obvious but i'm a begginer in R and i spent a good part of the afternoon trying to solve this...
I'm trying to create a loop to sum observation in my time serie in steps of five.
for example :
input:
1
2
3
4
5
5
6
6
7
4
5
5
4
4
5
6
5
6
4
4
output:
15
28
23
25
My time serie as only one variable, and 7825 obserbations.
The finality of the loop is to calculate the weekly realized volatility. My observations are squared returns. Once i'll have my loop, i'll be able to extract the square root and have my weekly realized volatility.
Thank you very much in advance for any help you can provide.
H.
We can create a grouping variable with gl and use that to get the sum in tapply
tapply(input, as.integer(gl(length(input), 5, length(input))),
FUN = sum, na.rm = TRUE)
# 1 2 3 4
# 15 28 23 25
data
input <- scan(text = "1 2 3 4 5 5 6 6 7 4 5 5 4 4 5 6 5 6 4 4", what = numeric())
Here is another base R option using sapply + split
> sapply(split(x,ceiling(seq_along(x)/5)),sum)
1 2 3 4
15 28 23 25
Data
x <- c(1, 2, 3, 4, 5, 5, 6, 6, 7, 4, 5, 5, 4, 4, 5, 6, 5, 6, 4, 4)

For loop that only counts unique values [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
My data frame consists of these columns: A_NUMBER, B_NUMBER, DURATION. I would like to count how many times A_NUMBER calls to a different B_NUMBER (to see how big their network is).
I first created a new column with all values set equal to 0.
df$CFU <- rep (0,nrow(df))
Next, I tried the following for loop:
for (j in 1:nrow(df)){ for (i in 1:nrow(unique(df$B_NUMBER))){
if(df$A_NUMBER[i] == df$A_NUMBER[j]) {df$CFU[j] <- sum(df$CFU[j],1) }}}
Then I get the following error:
'error in 1:nrow(unique(df$B_NUMBER)): argument of length 0.
How should I solve this?
The way I understood your question is that you are looking for is a list of unique B_NUMBERs for each A_NUMBER.
A_NUMBER = round(runif(100,0,10))
B_NUMBER = round(runif(100,0,10))
df = cbind(A_NUMBER, B_NUMBER)
aggregate(B_NUMBER ~ A_NUMBER, data=df, unique)
A_NUMBER B_NUMBER
1 0 10, 8
2 1 9, 3, 1, 7, 8, 0
3 2 7, 0, 6, 1, 9, 2, 10
4 3 7, 3, 6, 8, 4, 5
5 4 7, 9, 3, 10, 4, 8, 1, 2, 5
6 5 6, 5, 2, 8
7 6 4, 8, 9, 6, 10, 3
8 7 7, 3, 6, 0, 4, 1, 9, 8
9 8 7, 9, 8, 5, 2
10 9 8, 6, 2, 9, 0, 4, 1
11 10 7
and then you can call the length of the vectors as
aggregate(B_NUMBER ~ A_NUMBER, data=df, function(x) length(unique(x))
A_NUMBER B_NUMBER
1 0 2
2 1 6
3 2 7
4 3 6
5 4 9
6 5 4
7 6 6
8 7 8
9 8 5
10 9 7
11 10 1
and check whether it was correct by
subset(df,A_NUMBER == 8)
A_NUMBER B_NUMBER
[1,] 8 7
[2,] 8 9
[3,] 8 7
[4,] 8 8
[5,] 8 5
[6,] 8 7
[7,] 8 2
[8,] 8 2
[9,] 8 8
Looks good, only 7s, 9s, 8s, 5s and 2s!
Because you did not provide an example data, it is difficult to further examine what happened to your for loop. But based on the error message, it is clear that 1:nrow(unique(df$B_NUMBER)) is not working. The function unique returns a vector, which is one-dimensional. If you take this vector as your input to nrow, it will return NULL. It is possible that what you need is length, not nrow, in this case.
By the way, df$CFU <- rep(0, nrow(df)) can be simplified to df$CFU <- 0

How to exclude combinations that have less than 2 rows in a data frame? [duplicate]

I am attempting to keep only deids with multiple observations.
I have the below code
help <- data.frame(deid = c(1, 5, 5, 5, 5, 5, 5, 12, 12, 12, 12),
session.number = c(1, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4),
days.since.last = c(0, 0, 7, 14, 93, 5, 102, 0, 21, 104, 4))
deid session.number days.since.last
1 1 1 0
2 5 1 0
3 5 2 7
4 5 3 14
5 5 4 93
6 5 5 5
7 5 6 102
8 12 1 0
9 12 2 21
10 12 3 104
11 12 4 4
My feeble attempt was to use the group_by and then the filter( ) command
help %>% group_by(deid) %>% filter(session.number >=2)
However, it only keeps session.number's at 2 or greater. So I get rid of the deid = 1, but all the remaining deid data starts at session.number 2, and not session.number 1.
What I am trying to tell R is to keep the groups (deid) with greater than 1 observation (session.number)
Any assistance is greatly appreciated.
this should do it - you need to filter by number of observations in each group which is got using n():
help %>% group_by(deid) %>% filter(n()>1)
deid session.number days.since.last
1 5 1 0
2 5 2 7
3 5 3 14
4 5 4 93
5 5 5 5
6 5 6 102
7 12 1 0
8 12 2 21
9 12 3 104
10 12 4 4
Using data.table instead:
helpcount <- help[, list(Count = .N), by = deid]
helpf <- merge(help,helpcount, by = "deid")
helpf <- helpf[Count > 1]
EDIT: A bit more concise:
help[, Count := .N, by = deid]
help[Count > 1]
EDIT2: thelatemail's even more concise solution:
help[,if(.N > 1) .SD, by=deid]

Changing rows of data frame to columns in R

I have a data frame in following format.
Drug A 4 5 4 3 2 4 3 4 4
Drug B 6 8 4 5 4 6 5 8 6
Drug C 6 7 6 6 7 5 6 5 5
I want to convert it to following format without manually entering the value.
as by
pain = c(4, 5, 4, 3, 2, 4, 3, 4, 4, 6, 8, 4, 5, 4, 6, 5, 8, 6, 6, 7, 6, 6, 7, 5, 6, 5, 5)
drug = c(rep("A",9), rep("B",9), rep("C",9))
migraine = data.frame(pain,drug)
pain drug
1 4 A
2 5 A
3 4 A
4 3 A
5 2 A
6 4 A
...
25 6 C
26 5 C
27 5 C
Is there a better method to handle this?
I think this is an ideal use case for Hadley Wickham's reshape2 package. Here's a tutorial that will show you what you need. The melt function should do nicely for your purposes, I think.

Remove two outliers in multiple regression

we've got a problem with removing two outliers from our dataset. The data is about an experiment with two independent and one dependent variable. We've exercised the multiple regression and analyzed the "Normal Q-Q" plot. It showed us two outliers (10,46). Now we would like to remove those two cases, before rerunning the multiple regression without the outliers.
We've already tried out various commands recommended in several R platforms but unfortunately nothing worked out.
We would be glad, if anyone of you had an idea that would help us solving our problem.
Thank You very much for helping.
Since no data was provided, I fabricated some:
> x <- data.frame(a = c(10, 12, 14, 6, 10, 8, 11, 9), b = c(1, 2, 3, 24, 4, 1, 2, 4),
c = c(2, 1, 3, 6, 3, 4, 2, 48))
> x
a b c
1 10 1 2
2 12 2 1
3 14 3 3
4 6 24 6
5 10 4 3
6 8 1 4
7 11 2 2
8 9 4 48
If the 4th case in column x$b and the 8th case in column x$c are outliers:
> x1 <- x[-c(4, 8), ]
> x1
a b c
1 10 1 2
2 12 2 1
3 14 3 3
5 10 4 3
6 8 1 4
7 11 2 2
Is this what you need?

Resources