For loop that only counts unique values [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
My data frame consists of these columns: A_NUMBER, B_NUMBER, DURATION. I would like to count how many times A_NUMBER calls to a different B_NUMBER (to see how big their network is).
I first created a new column with all values set equal to 0.
df$CFU <- rep (0,nrow(df))
Next, I tried the following for loop:
for (j in 1:nrow(df)){ for (i in 1:nrow(unique(df$B_NUMBER))){
if(df$A_NUMBER[i] == df$A_NUMBER[j]) {df$CFU[j] <- sum(df$CFU[j],1) }}}
Then I get the following error:
'error in 1:nrow(unique(df$B_NUMBER)): argument of length 0.
How should I solve this?

The way I understood your question is that you are looking for is a list of unique B_NUMBERs for each A_NUMBER.
A_NUMBER = round(runif(100,0,10))
B_NUMBER = round(runif(100,0,10))
df = cbind(A_NUMBER, B_NUMBER)
aggregate(B_NUMBER ~ A_NUMBER, data=df, unique)
A_NUMBER B_NUMBER
1 0 10, 8
2 1 9, 3, 1, 7, 8, 0
3 2 7, 0, 6, 1, 9, 2, 10
4 3 7, 3, 6, 8, 4, 5
5 4 7, 9, 3, 10, 4, 8, 1, 2, 5
6 5 6, 5, 2, 8
7 6 4, 8, 9, 6, 10, 3
8 7 7, 3, 6, 0, 4, 1, 9, 8
9 8 7, 9, 8, 5, 2
10 9 8, 6, 2, 9, 0, 4, 1
11 10 7
and then you can call the length of the vectors as
aggregate(B_NUMBER ~ A_NUMBER, data=df, function(x) length(unique(x))
A_NUMBER B_NUMBER
1 0 2
2 1 6
3 2 7
4 3 6
5 4 9
6 5 4
7 6 6
8 7 8
9 8 5
10 9 7
11 10 1
and check whether it was correct by
subset(df,A_NUMBER == 8)
A_NUMBER B_NUMBER
[1,] 8 7
[2,] 8 9
[3,] 8 7
[4,] 8 8
[5,] 8 5
[6,] 8 7
[7,] 8 2
[8,] 8 2
[9,] 8 8
Looks good, only 7s, 9s, 8s, 5s and 2s!

Because you did not provide an example data, it is difficult to further examine what happened to your for loop. But based on the error message, it is clear that 1:nrow(unique(df$B_NUMBER)) is not working. The function unique returns a vector, which is one-dimensional. If you take this vector as your input to nrow, it will return NULL. It is possible that what you need is length, not nrow, in this case.
By the way, df$CFU <- rep(0, nrow(df)) can be simplified to df$CFU <- 0

Related

Using rep/seq to create ID column

Is there an efficient way to create an ID column using rep/seq or some other function I'm not thinking of to make a sequence such as the following:
1, 2, 3, 4, 4, 5, 5, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 10.....
So every 3 numbers the following 3 numbers get repeated an additional time. My actual data will require a sequence that is:
1:1000- 1 each
1001:2000- 2 each
2001:3000 - 3 each
....
Any ideas/help would be greatly appreciated.
We can use
v2 <- 1:7000
rep(v2, as.integer(gl(length(v2), 1000, length(v2))))
For the first case
v1 <- 1:15
rep(v1, as.integer(gl(length(v1), 3, length(v1))))
[1] 1 2 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 10 10 10 10 11 11 11 11 12 12 12 12 13 13 13 13 13 14 14 14 14 14 15 15 15 15
We can use rep inside rep.int.
First case:
rep.int(1:12, rep(1:4, each = 3))
Second case:
rep.int(1:3e3, rep(1:3, each = 1e3))

How do you efficiently return the order of an increasing index? [duplicate]

This question already has answers here:
Create group names for consecutive values
(4 answers)
Closed 4 years ago.
I have the following index vector:
TestVec = rep(c(6,8,9,11,18), each = 10)
This reads c(6, 6, ..., 6, 8, 8, ..., 8, 9, 9, ..., 9, ...).
I would like to convert this vector into c(1, 1, ..., 1, 2, 2, ..., 2, 3, 3, ..., 3, ...)
Try
I have improvised a quick-and-dirty method, as follows:
sapply(TestVec, function(x) {which(x == unique(TestVec))})
This works fine, but this takes a lot of time in a large dataset.
Is there any efficient way to improve?
match(TestVec, unique(TestVec))
Another option:
as.numeric(as.factor(TestVec))
# [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5
Requiring data.table:
rleid(TestVec)
Here is another one,
c(1, cumsum(diff(TestVec) != 0)) + 1

Create special vectors with R commands [duplicate]

This question already has answers here:
Repeat vector to fill down column in data frame
(2 answers)
Closed 4 years ago.
I want to create the vectors with R commands:
(4, 6, 3, 4, 6, 3, ..., 4, 6, 3, 4, 6) where there are 10 occurrences of 4, 10 occurrences of 6, and 9 occurrences of 3.
Try rep and its length.out argument
x <- rep(c(4, 6, 3), length.out = 29)
x
#[1] 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6 3 4 6
Count the occurrences of each element
table(x)
#x
# 3 4 6
# 9 10 10
You could also use rep_len as suggested by #snoram
rep_len(c(4, 6, 3), 29)

Moving a Vector (or other data object) from the RStudio Environment to an .R file

I want to capture data values from a post on SE into RStudio, and I manage to do so by copying the values, and then pasting them into the following command in the console:
> a = as.numeric(read.table(text = "8 8 4 1 2 2 0 2 5 2 3 3 3 1 5 4 4 1 4 2", sep = " "))
> a
[1] 8 8 4 1 2 2 0 2 5 2 3 3 3 1 5 4 4 1 4 2
Now a is in the global environment. The problem is that I would like to save it into an R file containing a number of other things, let's call it file.R, where vector a would appear as:
a <- c(8, 8, 4, 1, 2, 2, 0, 2, 5, 2, 3, 3, 3, 1, 5, 4, 4, 1, 4, 2)
Unfortunately for me, the only way I know is to type the commas manually. How can I do this otherwise?

Remove two outliers in multiple regression

we've got a problem with removing two outliers from our dataset. The data is about an experiment with two independent and one dependent variable. We've exercised the multiple regression and analyzed the "Normal Q-Q" plot. It showed us two outliers (10,46). Now we would like to remove those two cases, before rerunning the multiple regression without the outliers.
We've already tried out various commands recommended in several R platforms but unfortunately nothing worked out.
We would be glad, if anyone of you had an idea that would help us solving our problem.
Thank You very much for helping.
Since no data was provided, I fabricated some:
> x <- data.frame(a = c(10, 12, 14, 6, 10, 8, 11, 9), b = c(1, 2, 3, 24, 4, 1, 2, 4),
c = c(2, 1, 3, 6, 3, 4, 2, 48))
> x
a b c
1 10 1 2
2 12 2 1
3 14 3 3
4 6 24 6
5 10 4 3
6 8 1 4
7 11 2 2
8 9 4 48
If the 4th case in column x$b and the 8th case in column x$c are outliers:
> x1 <- x[-c(4, 8), ]
> x1
a b c
1 10 1 2
2 12 2 1
3 14 3 3
5 10 4 3
6 8 1 4
7 11 2 2
Is this what you need?

Resources