I´m currently struggling with shuffling a dataframe in R Studio. Let's say my dataframe looks as follows:
x y
0 a
0 a
1 a
1 a
0 b
0 b
1 b
1 b
Would it be possible to shuffle the rows but to define, that the four different sequences of variable y (i.e. aa, ab, bb, ba) occur equally often? In total, I have 24 rows in my original dataframe.I hope I could make my problem clear. Thanks a lot in advance for your help!
Ema
It is possible, however it is not a built-in solution so you will have to code this yourself.
From what I can see from your data frame the 0 a and 1 a have 1:1 ratio and same goes for b.
In this case I would recommend grouping the letters in the pairs: aa, ab, ba, bb and repeating these pairs three times.
Now shuffle them - this will ensure that every pair occurs with the same frequency. (This only works if I assume that the pairs you wish to check are 1 and 2, 3 and 4, etc... If not and you wish to check 1 and 2, 2 and 3, etc. then I misunderstood. and you can stop reading.)
Now take only lines with (a)s, assign 6 ones and 6 zeroes in your case. Shuffle the a's only.
Repeat for (b)s.
You have your shuffle.
Related
I have a database with multiple patient visits, like
1
1
1
1
2
2
3
3
3
3
4
4
4
4
They are in a column (although here are shown in a row) and I would like to know how to count how many subjects do I have. Like in this case: 4
I don't know which code to use in R.
Thank you.
If I'm not wrong, you just want to know how many subjects you have.
In your case you have 4 subjects: 1, 2, 3 and 4.
Then, is the column that you say is stored in some data.frame, for example, you have one option:
length(unique(data$subjects))
Or if it's stored in a vector:
length(unique(vector.subjects))
I hope this is what you were looking for.
unique shows the different values that you may find on the vector. In this case: 1, 2, 3 and 4.
length counts the number of elements of unique vector (1, 2, 3 and 4)
I just want to achieve a thing on R. Here is the explanation,
I have data sets which contains same value, please find the below data sets,
A B
1122513454 0
1122513460 0
1600041729 0
2100002632 147905
2840007103 0
2840064133 138142
3190300079 138040
3190301011 138120
3680024411 0
4000000263 4000000263
4100002263 4100002268
4880004352 138159
4880015611 138159
4900007044 0
7084781116 142967
7124925306 0
7225002523 7225001325
23012600000 0
80880593057 0
98880000045 0
I have two columns (A & B). In the b column, I have the same value (138159,138159). It appears two times.
I just want to make a calculation, where it will get the same value it will count as 1. That means, I am getting two 138159, but that will be treated as 1. and finally it will count the whole b column value except 0. That means, 0 is here 10 times and the other value is also 10 times, but 138519 appears 2 times, so it will be counted as 1, so other values are 9 times and finally it will give me only other value's count i.e 9.
So my expected output will be 9
I have already done this in excel. But, want to achieve the same in R. Is there any way to do it in R by dplyr package?
I have written following formula in excel,
=+SUMPRODUCT((I2:I14<>0)/COUNTIFS(I2:I14,I2:I14))
how can I count only other value's record without 0?
Can you guys help me with that?
any suggestion is really appreciable.
Edit 1: I have done this by following way,
abc <- hardy[hardy$couponid !=0,]
undertaker <- abc %>%
group_by(TYC) %>%
summarise(count_couponid= n_distinct(couponid))
any smart way to do that?
Thanks
As I read about rank function, it has Ties.method to specify what happens when ties occur.
In this vector: c(2,3,4,4,5,6), As Matt Krause suggested:
average assigns each tied element the "average" rank. The ranks would therefore be 1, 2, 3.5, 3.5, 5, 6
first lets the "earlier" entry "win", so the ranks are in numerical order (1,2,3,4,5,6)
min assigns every tied element to the lowest rank, so you get 1,2,3,3,5,6
max does the opposite: tied elements get the highest rank (1,2,4,4,5,6)
random breaks ties randomly, so you'd get either (1,2,3,4,5,6) or (1,2,4,3,5,6).
BUT, I need this output: (1,2,3,3,4,5). What can I do for that?
I want to use the output to fill in another matrix (X) which has 5 columns. The final output for this instance should be : (1,1,2,1,1), which means that we have 2 of the third-ranked item and one of the rest.
Now, if we have (2,3,4,4,5,6) as instance 1 and (2,3,3,3,4,2) as instance 2, in matrix (X), they will be converted to:
(1,1,2,1,1)
(2,3,1,0,0)
(the number of the columns of matrix (X) equals to the number of unique values in all instances; considering that all numbers are between 2 to 6 which means we have 5 different values in total) ...
I think rank does not work in this situation correctly.
There's probably a more efficient/shorter way to compute the unique values of the union of all instances, but otherwise this is pretty much as #whuber suggested in the comments:
Test case:
instances <- list(c(2,3,4,4,5,6),c(2,3,3,3,4,2))
The only tricky part is making sure we have the full range of levels so that zeros get counted properly:
ulevs <- sort(unique(Reduce(union,instances)))
f <- function(x) {
table(factor(x,levels=ulevs))
}
Apply and convert to a matrix:
t(sapply(instances,f))
## 2 3 4 5 6
## [1,] 1 1 2 1 1
## [2,] 2 3 1 0 0
I am trying to run a cumsum on a data frame on two separate columns. They are essentially tabulation of events for two different variables. Only one variable can have an event recorded per row in the data frame. The way I attacked the problem was to create a new variable, holding the value ‘1’, and create two new columns to sum the variables totals. This works fine, and I can get the correct total amount of occurrences, but the problem I am having is that in my current ifelse statement, if the event recorded is for variable “A”, then variable “B” is assigned 0. But, for every row, I want to have the previous variable’s value assigned to the current row, so that I don’t end up with gaps where it goes from 1 to 2, to 0, to 3.
I don't want to run summarize on this either, I would prefer to keep each recorded instance and run new columns through mutate.
CURRENT DF:
Event Value Variable Total.A Total.B
1 1 A 1 0
2 1 A 2 0
3 1 B 0 1
4 1 A 3 0
DESIRED RESULT:
Event Value Variable Total.A Total.B
1 1 A 1 0
2 1 A 2 0
3 1 B 2 1
4 1 A 3 1
Thanks!
You can use the property of booleans that you can sum them as ones and zeroes. Therefore, you can use the cumsum-function:
DF$Total.A <- cumsum(DF$variable=="A")
Or as a more general approach, provided by #Frank you can do:
uv = unique(as.character(DF$Variable))
DF[, paste0("Total.",uv)] <- lapply(uv, function(x) cumsum(DF$V == x))
If you have many levels to your factor, you can get this in one line by dummy coding and then cumsuming the matrix.
X <- model.matrix(~Variable+0, DF)
apply(X, 2, cumsum)
I am using R to analyze a survey. Several of the columns include numbers 1-10, depending on how survey respondents answered the respective questions. I'd like to change the 1-10 scale to a 1-3 scale. Is there a simple way to do this? I was writing a complicated set of for loops and if statements, but I feel like there must be a better way in R.
I'd like to change numbers 1-3 to 1; numbers 4 and 8 to 2; numbers 5-7 to 3, and numbers 9 and 10 to NA.
So in the snippet below, OriginalColumn would become NewColumn.
OriginalColumn=c(4,9,1,10,8,3,2,7,5,6)
NewColumn=c(2,NA,1,NA,2,1,1,3,3,3)
Is there an easy way to do this without a bunch of crazy for loops? Thanks!
You can do this using positional indexing:
> c(1,1,1,2,3,3,3,2,NA,NA)[OriginalColumn]
[1] 2 NA 1 NA 2 1 1 3 3 3
It is better than repeated/nested ifelse because it is vectorized (thus easier to read, write, and understand; and probably faster). In essence, you're creating a new vector that contains that new values for every value you want to replace. So, for values 1:3 you want 1, thus the first three elements of the vector are 1, and so forth. You then use your original vector to extract the new values based on the positions of the original values.
You could also try
library(car)
recode(OriginalColumn, '1:3=1; c(4,8)=2; 5:7=3; else=NA')
#[1] 2 NA 1 NA 2 1 1 3 3 3