I have a data frame in following format.
Drug A 4 5 4 3 2 4 3 4 4
Drug B 6 8 4 5 4 6 5 8 6
Drug C 6 7 6 6 7 5 6 5 5
I want to convert it to following format without manually entering the value.
as by
pain = c(4, 5, 4, 3, 2, 4, 3, 4, 4, 6, 8, 4, 5, 4, 6, 5, 8, 6, 6, 7, 6, 6, 7, 5, 6, 5, 5)
drug = c(rep("A",9), rep("B",9), rep("C",9))
migraine = data.frame(pain,drug)
pain drug
1 4 A
2 5 A
3 4 A
4 3 A
5 2 A
6 4 A
...
25 6 C
26 5 C
27 5 C
Is there a better method to handle this?
I think this is an ideal use case for Hadley Wickham's reshape2 package. Here's a tutorial that will show you what you need. The melt function should do nicely for your purposes, I think.
Related
Is there an efficient way to create an ID column using rep/seq or some other function I'm not thinking of to make a sequence such as the following:
1, 2, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10.....
So every 3 numbers the following 3 numbers get repeated an additional time. My actual data will require a sequence that is:
1:1000- 1 each
1001:2000- 2 each
2001:3000 - 3 each
....
Any ideas/help would be greatly appreciated.
We can use
v2 <- 1:7000
rep(v2, as.integer(gl(length(v2), 1000, length(v2))))
For the first case
v1 <- 1:15
rep(v1, as.integer(gl(length(v1), 3, length(v1))))
[1] 1 2 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 13 14 14 14 14 14 15 15 15 15
We can use rep inside rep.int.
First case:
rep.int(1:12, rep(1:4, each = 3))
Second case:
rep.int(1:3e3, rep(1:3, each = 1e3))
I'm sure this is very obvious but i'm a begginer in R and i spent a good part of the afternoon trying to solve this...
I'm trying to create a loop to sum observation in my time serie in steps of five.
for example :
input:
1
2
3
4
5
5
6
6
7
4
5
5
4
4
5
6
5
6
4
4
output:
15
28
23
25
My time serie as only one variable, and 7825 obserbations.
The finality of the loop is to calculate the weekly realized volatility. My observations are squared returns. Once i'll have my loop, i'll be able to extract the square root and have my weekly realized volatility.
Thank you very much in advance for any help you can provide.
H.
We can create a grouping variable with gl and use that to get the sum in tapply
tapply(input, as.integer(gl(length(input), 5, length(input))),
FUN = sum, na.rm = TRUE)
# 1 2 3 4
# 15 28 23 25
data
input <- scan(text = "1 2 3 4 5 5 6 6 7 4 5 5 4 4 5 6 5 6 4 4", what = numeric())
Here is another base R option using sapply + split
> sapply(split(x,ceiling(seq_along(x)/5)),sum)
1 2 3 4
15 28 23 25
Data
x <- c(1, 2, 3, 4, 5, 5, 6, 6, 7, 4, 5, 5, 4, 4, 5, 6, 5, 6, 4, 4)
This question already has answers here:
Repeat vector to fill down column in data frame
(2 answers)
Closed 4 years ago.
I want to create the vectors with R commands:
(4, 6, 3, 4, 6, 3, ..., 4, 6, 3, 4, 6) where there are 10 occurrences of 4, 10 occurrences of 6, and 9 occurrences of 3.
Try rep and its length.out argument
x <- rep(c(4, 6, 3), length.out = 29)
x
#[1] 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6
Count the occurrences of each element
table(x)
#x
# 3 4 6
# 9 10 10
You could also use rep_len as suggested by #snoram
rep_len(c(4, 6, 3), 29)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
My data frame consists of these columns: A_NUMBER, B_NUMBER, DURATION. I would like to count how many times A_NUMBER calls to a different B_NUMBER (to see how big their network is).
I first created a new column with all values set equal to 0.
df$CFU <- rep (0,nrow(df))
Next, I tried the following for loop:
for (j in 1:nrow(df)){ for (i in 1:nrow(unique(df$B_NUMBER))){
if(df$A_NUMBER[i] == df$A_NUMBER[j]) {df$CFU[j] <- sum(df$CFU[j],1) }}}
Then I get the following error:
'error in 1:nrow(unique(df$B_NUMBER)): argument of length 0.
How should I solve this?
The way I understood your question is that you are looking for is a list of unique B_NUMBERs for each A_NUMBER.
A_NUMBER = round(runif(100,0,10))
B_NUMBER = round(runif(100,0,10))
df = cbind(A_NUMBER, B_NUMBER)
aggregate(B_NUMBER ~ A_NUMBER, data=df, unique)
A_NUMBER B_NUMBER
1 0 10, 8
2 1 9, 3, 1, 7, 8, 0
3 2 7, 0, 6, 1, 9, 2, 10
4 3 7, 3, 6, 8, 4, 5
5 4 7, 9, 3, 10, 4, 8, 1, 2, 5
6 5 6, 5, 2, 8
7 6 4, 8, 9, 6, 10, 3
8 7 7, 3, 6, 0, 4, 1, 9, 8
9 8 7, 9, 8, 5, 2
10 9 8, 6, 2, 9, 0, 4, 1
11 10 7
and then you can call the length of the vectors as
aggregate(B_NUMBER ~ A_NUMBER, data=df, function(x) length(unique(x))
A_NUMBER B_NUMBER
1 0 2
2 1 6
3 2 7
4 3 6
5 4 9
6 5 4
7 6 6
8 7 8
9 8 5
10 9 7
11 10 1
and check whether it was correct by
subset(df,A_NUMBER == 8)
A_NUMBER B_NUMBER
[1,] 8 7
[2,] 8 9
[3,] 8 7
[4,] 8 8
[5,] 8 5
[6,] 8 7
[7,] 8 2
[8,] 8 2
[9,] 8 8
Looks good, only 7s, 9s, 8s, 5s and 2s!
Because you did not provide an example data, it is difficult to further examine what happened to your for loop. But based on the error message, it is clear that 1:nrow(unique(df$B_NUMBER)) is not working. The function unique returns a vector, which is one-dimensional. If you take this vector as your input to nrow, it will return NULL. It is possible that what you need is length, not nrow, in this case.
By the way, df$CFU <- rep(0, nrow(df)) can be simplified to df$CFU <- 0
I want to capture data values from a post on SE into RStudio, and I manage to do so by copying the values, and then pasting them into the following command in the console:
> a = as.numeric(read.table(text = "8 8 4 1 2 2 0 2 5 2 3 3 3 1 5 4 4 1 4 2", sep = " "))
> a
[1] 8 8 4 1 2 2 0 2 5 2 3 3 3 1 5 4 4 1 4 2
Now a is in the global environment. The problem is that I would like to save it into an R file containing a number of other things, let's call it file.R, where vector a would appear as:
a <- c(8, 8, 4, 1, 2, 2, 0, 2, 5, 2, 3, 3, 3, 1, 5, 4, 4, 1, 4, 2)
Unfortunately for me, the only way I know is to type the commas manually. How can I do this otherwise?