I am trying to return the combinations of all the possible rows of the following data frame for n times.
test <- expand.grid(rep(list(0:1),3))
For example, now the test is a data frame of 3 columns and 8 rows as follows:
Var1 Var2 Var3
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
For example, combinations with n=2 would then provide a data frame of 6 columns and 64 rows. It is also acceptable if the result is in a list of 64 main elements where each element returns a combination of the two data frames.
I feel that I can still use expand.grid() but did not manage to use it correctly, I guess.
I have figured it out as soon as I posted the question.
I can just use the code of the test as follows:
expand.grid(rep(list(0:1),3*n))
Related
I was wondering if there's a fast way to get an incidence matrix for this such a problem. I've got two data frames with three columns (the join keys)
df1 <- data.frame(K1=c(1,1,0,1,3,2,2),K2=c(1,2,1,0,2,0,1),K3=c(0,0,3,2,1,3,0))
df2 <- data.frame(K1=c(1,2,0,3),K2=c(0,1,2,0),K3=c(2,0,3,1))
and I need to obtain the corresponding incidence matrix
# IM:
# 1 2 3 4
# 1 1 1 0 0
# 2 1 0 1 0
# 3 0 1 1 0
# 4 1 0 0 0
# 5 0 0 1 1
# 6 0 1 1 0
# 7 0 1 0 0
where it's set 1 if there's a match between the corresponding KEY (column value) of rows of the two data frames.
I would do by using multiple loops
for (j in seq_len(nrow(df2)))
for (k in seq_len(ncol(df2))) {
if (df2[j,k])
m[which(df1[,k] == df2[j,k]),j] <- 1
}
but it's a C approach and maybe there's something faster in R. Do you have any other ideas? Besides, when the data.frame are quite big (around 50k and 20k rows), I cannot allocate the matrix as it seems too big.
I want to style the output of table(). Suppose I have the following:
dat$a <- c(1,2,3,4,4,3,4,2,2,2)
dat$b <- c(1,2,3,4,1,2,4,3,2,2)
table(dat$a,dat$b)
1 2 3 4
1 50 0 0 0
2 0 150 50 0
3 0 50 50 0
4 50 0 0 100
There are two problems with this. First, it doesn't give me the correct frequencies. Additionally, it has no row or column labels. I found this, and the table works for both frequency counts and axis labels. Is there an issue because this way subsets from a data frame? I would appreciate any tips on both fixing the frequency counts and adding style to the table.
The only problem is the way that you are inputting arguments to table. To get the desired output (with labels), use the data frame as argument, not 2 vectors (the columns). If you have a larger data frame, use only the subset that you want.
a <- c(1,2,3,4,4,3,4,2,2,2)
b <- c(1,2,3,4,1,2,4,3,2,2)
dat <- data.frame(a,b)
table(dat)
Gives me the output:
b
a 1 2 3 4
1 1 0 0 0
2 0 3 1 0
3 0 1 1 0
4 1 0 0 2
It shouldn't give the wrong frequencies, even with your approach. You could try restarting your R session to check this.
I need to table my data to a 2*2 table, however, due to no values in some cells, the table command in R does not provide a column or row depending on the data. For example:
a<-matrix(c(0,1,1,1,1,1,1,1),4,2)
table(a[,1],a[,2])
This is how it presents:
1
0 1
1 3
However, I need it to be like
0 1
0 0 1
1 0 3
Any suggestion?
The problem is that your matrix a contains numbers and with numbers R has no chance to know which columns should be shown. The solution is though easy. You have to transform you data into a factor, where you provide all potential values:
table(factor(a[,1], levels = unique(c(a))),factor(a[,2], levels = unique(c(a))))
# 0 1
# 0 0 1
# 1 0 3
I am trying to match the values in a column of one data frame with the values in a column of a second data frame. The tricky part is that I would like to do the matching using subsets of the second data frame (designated by a distinct column in the second data frame from the one that is being matched). This is different from the commonly posted problem of trying to subset based on matching between data frames.
My problem is the opposite - I want to match data frames based on subsets. To be specific, I would like to match subsets of the column in the second data frame with the entire column of the first data frame, and then create new columns in the first data frame that show whether or not a match has been made for each subset.
These subsets can have varying number of rows. Using the two dummy data frames below...
DF1 <- data.frame(number=1:10)
DF2 <- data.frame(category = rep(c("A","B","C"), c(5,7,3)),
number = sample(10, size=15, replace=T))
...the objective would be to create three new columns (DF1$A, DF1$B, and DF$C) that show whether the values in DF1$number match with the values in DF2$number for each of the respective subsets of DF2$category. Ideally the rows in these new columns would show a '1' if a match has been made and a '0' if a match has not. With the dummy data below I would end up with DF1 having 4 columns (DF1$number, DF1$A, DF1$B, and DF$C) of 10 rows each.
Note that in my actual second data frame I have a huge number of categories, so I don't want to have to type them out individually for whatever operation is needed to accomplish this objective. I hope that makes sense! Sorry if I'm missing something obvious and thanks very much for any help you might be able to provide.
This should work:
sapply(split(DF2$number, DF2$category), function(x) DF1$number %in% x + 0)
A B C
[1,] 0 0 1
[2,] 1 1 0
[3,] 1 1 1
[4,] 0 1 0
[5,] 0 0 1
[6,] 0 1 0
[7,] 1 1 0
[8,] 1 0 0
[9,] 1 0 0
[10,] 0 1 0
You can add this back to DF1 like:
data.frame(
DF1,
sapply(split(DF2$number, DF2$category), function(x) DF1$number %in% x + 0)
)
number A B C
1 1 0 0 1
2 2 1 1 0
3 3 1 1 1
4 4 0 1 0
5 5 0 0 1
6 6 0 1 0
7 7 1 1 0
8 8 1 0 0
9 9 1 0 0
10 10 0 1 0
I have a table of several columns, with values from 1 to 8. The columns have different lenghts so I have filled them with NAs at the end. I would like to transform each column of the data so I will get something like this for each column:
1 2 3 4 5 6 7 8
0-25 1 0 0 0 0 1 0 2
25-50 5 1 2 0 0 0 0 1
50-75 12 2 2 3 0 1 1 1
75-100 3 25 1 1 1 0 0 0
where the row names are percentages of the actual length of the original column (i.e. without the NAs), the column names are the original 0 to 8 values, and the new values are the number of occurances of the original values in each percentage. Any ideas will be appreciated.
Best,
Lince
PS/ I realize that my original message was very confusing. The data I want to transform contain a number of columns from time series like this:
1
1
8
1
3
4
1
5
1
6
2
7
1
NA
NA
and I need to calculate the frequency of occurences of each value (1 to 8) at the 0-25%, 25-50% et cetera of the series. Joris' answer is very useful. I can work on it. Thanks!
Given the lack of some information, I can offer you this :
Say 0 is no occurence, and 1 is occurence. Then you can use the following little script for the results of one column. Wrap it in a function, apply it over the columns and you get what you need.
x <- c(1,0,0,1,1,0,1,0,0,0,1,0,1,1,1,NA,NA,NA,NA,NA,NA)
prop <- which(x==1) / sum(!is.na(x))*100
result <- cut(prop,breaks=c(0,25,50,75,100))
table(result)