I looked everywhere but did not find answer to my question. I am having trouble with makig contingency table. I have data with many columns, let say 1, 2 and 3. In the first column there are let say 100 different values, in the second 20 and the third column has 2 possible values: 0 and 1. First I take just data with value 1 in column 3 (data<-data[Column3==1,]). Now I have only around 20 different values in 1. column and 5 in 2. column. However when I do a contingency table its size is 100x20, not 20x5, and contains a lot of zeros (they correspond to combination of column1 and column2 which has value 0 in column3). I would be greatful for every kind of help, thanks.
I guess all your three variables are factors.So convert them into character using
as.character()
to all three variables then apply
table()
for that.
Related
I have a messy dataset with multiple entries in some cells. The numbers in paranthesis refer to the specific columns "(1)", "(2)", and "(3)". In this example
multiple entries in cell 30 refers to column (2) and 20 refers to column (1). No information for column (3).
I would like to split up/extract the values in the cells and create 3 additional columns.
Several hundred cells are affected in several columns.
Dataset
In the end I would like to have 3 new columns for each column affected. Any idea how I do that? I'm still a rookie so help is much appreciated!
I'm having some trouble randomly sampling 1 column out of a group. I have over 300 columns, and over 500 rows. I am attempting to sample 1 column out of the first 15, and then move on to sample 1 column from the next 15, etc... until there are no more.
For the basic first sample, I used:
sample(DATA[,1:15],1)
But it only outputs a single number. If I change my size to 535 (amount of rows), it grabs 535 random numbers in total from columns 1:15.
I referenced the below link, which had a somewhat similar basis, but the accepted answer is what I tried and can't seem to work:
R: random sample of columns excluding one column
Any suggestions?
The output of a sample function is an integer. It should be used to randomize the column of the dataframe, not the entire dataframe, like you did earlier.
DATA[,sample(1:15,1)]
This will randomly select columns from 1 to 15 and will return the output as you desired.
Found my answer pretty quickly:
DATA[,sample(1:15,1)]
I'm relatively new in R so excuse me if I'm not even posting this question the right way.
I have a matrix generated from combination function.
double_expression_combinations <- combn(marker_column_vector,2)
This matrix has x columns and 2 rows. Each column has 2 rows with numbers that will be used to represent column numbers in my main data frame named initial. These columns numbers are combinations of columns to be tested. The initial data frame is 27 columns (thousands of rows) with values of 1 and 0. The test consists in using the 2 numbers given by double_expression_combinations as column numbers to use from initial. The test consists in adding each row of those 2 columns and counting how many times the sum is equal to 2.
I believe I'm able to come up with the counting part, I just don't know how to use the data from the double_expression_combinations data frame to select columns to test from the "initial" data frame.
Edited to fix corrections made by commenters
Using R it's important to keep your terminology precise. double_expression_combinations is not a dataframe but rather a matrix. It's easy to loop over columns in a matrix with apply. I'm a bit unclear about the exact test, but this might succeed:
apply( double_expression_combinations, 2, # the 2 selects each column in turn
function(cols){ sum( initial[ , cols[1] ] + initial[ , cols[2] ] == 2) } )
Both the '+' and '==' operators are vectorised so no additional loop is needed inside the call to sum.
I have a datatable where one of the columns should be expressed in dollars and some as percentages. I've been looking around and I'm still not sure how to do it - seems like it would be easy?
The trickier part is I have another data table where only certain entries need to be expressed as dollars (i.e. not whole rows or whole columns) - is there a way to handle this?
Imagine your datatable (myData) is 2 columns by 10 rows.
You want the second row to be in dollars:
myData[,2]<-sapply(myData[,2],function(x) paste0("$",x))
Or, you want rows 6 to 10 in the first column to be percentages:
myData[6:10,1]<-sapply(myData[6:10,1],function(x) paste0(x,"%"))
Or, you want rows 1 to 5 in the second column to be in dollars, you can do:
myData[1:5,2]<-sapply(myData[1:5,2],function(x) paste0("$",x))
Say I have a data.frame of arbitrary dimensions (n by p). I want to extract a vector of length n from that data.frame, one element in the vector per row in the data.frame. However, the column in which each element lies may vary by row. Is there a way to do this without loops?
For example, if I have the following (3x3) data frame, called say DATA
X Y Z
1 17 43
3 4 2
6 9 0
I want to extract one scalar value from DATA per row. I have a vector, call it column.list, c(1,3,1) (arbitrarily selected in this case) which gives the column index for the elements I want, where the kth element of column.list is the column index for row k in DATA. How do I do this without loops? I want to avoid loops because I am using this repeatedly in a simulation study that will take a lot of running time even without loops, and the row number might be 100,000 or so. Much appreciated!
You can do this by indexing your data.frame with a matrix. The first column indicates row, the second indicates column. So if you do
column.list <- c(1,3,1)
DATA[cbind(1:nrow(DATA), column.list)]
You will get
[1] 1 2 6
as desired. If you mix across columns of different classes, all the variable will be coerced to the most accommodating data type.