How to convert row names of table into a vector - r

I have returned stats on my data using the table command as such:
subject<-c(4,4,2,2,3,3)
correct<-c(0,1,1,1,0,0)
test<-data.frame(subject,correct)
freq_test<-head(table(test$subject,test$correct))
This returns a table which looks like this
0 1
2 0 2
3 2 0
4 1 1
That's great, but the problem is that I would like, the first column to be a vector rather than row.names (so that I can code it properly as "subject").
Is there a way to get this column to act in this way?

Just make a new data frame with the row names of freq_test as the first column:
> df<-data.frame(as.numeric(rownames(freq_test)),freq_test)
> colnames(df)[1]="subject"
> df
subject X0 X1
2 2 0 2
3 3 2 0
4 4 1 1
>
Of course, you can rename X0 and X1 to whatever you want by editing colnames(df) as above.

If you want the data in "long" format (useful for some models and plotting, and especially when your tables are more complicated), the table method for the generic function as.data.frame will take care of this for you:
> as.data.frame(table(test))
subject correct Freq
1 2 0 0
2 3 0 2
3 4 0 1
4 2 1 2
5 3 1 0
6 4 1 1

I think you should have used the standard method of construction of a data.frame, which is with name=values pairs:
test <- data.frame( subject=subject, correct=correct)
The first subject will be interpreted as a name to be quoted and the second subject will be interpreted .... i.e, the enclosing environments will be searched for an object named subject and its value will be assigned to the "subject" column of "test".

Related

Use if-else function on data frame with multiple values

I have a data frame that contains multiple values in each spot, like this:
ID<-c(1,1,1,2,2,2,2,3,3,4,4,4,5,6,6)
W<-c(29,72,32,33,34,44,42,78,32,42,18,26,10,34,39)
df1<-data.frame(ID, W)
df<-ddply(df1, .(ID), summarize,
X=paste(unique(W),collapse=","))
ID X
1 1 29,72,32
2 2 33,34,44,42
3 3 78,32
4 4 42,18,26
5 5 10
6 6 34,39
I am trying to generate another column using an if-else function so that every ID that has an X value greater than 70 will show a 1, and all others will show a 0, like this:
ID X Y
1 1 29,72,32 1
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
This is the code that I tried:
df$Y <- ifelse(df$X>=70, 1, 0)
But it doesn't work; it only seems to put the first value of each spot through the function:
ID X Y
1 1 29,72,32 0
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
It worked fine on my one column that has only one value per spot. Is there a way to get to the if-else function to evaluate every value in each spot and assign a 1 if any of them fit the statement?
Thank you, I'm sorry that I do not know a lot of R vocabulary yet.
As 'X' is a string, we can split the 'X' at the , to create a list of vectors, loop over the list with map check if there are any numeric converted values are greater than 70
library(dplyr)
library(purrr)
df %>%
mutate(Y = map_int(strsplit(X, ","), ~ +(any(as.numeric(.x) > 70))))

Creating a matrix with extra row and column information in R

I'm am trying to create a matrix of certain pairwise values, first by doing the calculations in a matrix and then melt it and join in some extra information. I would like to also include that extra information on the columns and rows so that I achieve something like the following if I convert it back into a matrix format (or data frame or whatever is possible):
X.col 1 2 3 4
Y.col 1 2 3 4
Z.col 1 2 3 4
Col 1 2 3 4
X.row Y.row Z.row Row
1 1 1 1 1 0 0 1
2 2 2 2 0 1 0 0
3 3 3 3 0 1 1 0
4 4 4 4 0 1 0 1
or perhaps without the names, like this:
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
1 1 1 1 1 0 0 1
2 2 2 2 0 1 0 0
3 3 3 3 0 1 1 0
4 4 4 4 0 1 0 1
Basically x,y,z contain some extra information on some products which ID's are stored in row and col. I'm doing some pairwise comparisons which I then present in a matrix for our managers, who would also like to see that extra information along with the matrix as shown above.
So for the data, let a df contain the melted matrix joined with the extra information:
row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #e.g. product id
col = rep(c(1,2,3,4), 4) #e.g. product id
value = c(1,0,0,0,0,1,0,0,0,0,1,0,0,0,0,1) #pairwise index value, calculated from comparing product in row with product in col
x.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some x information on row id
y.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some y information on row id
z.row = c(rep(1,4), rep(2,4), rep(3,4), rep(4,4)) #some z information on row id
x.col = rep(c(1,2,3,4), 4) #some x information on col id
y.col = rep(c(1,2,3,4), 4) #some y information on col id
z.col = rep(c(1,2,3,4), 4) #some z information on col id
df <- data.frame(row, col, value, x.row, y.row, z.row, x.col, y.col, z.col)
The question is then: how to accomplish that matrix visual as shown above, or something like it in R?
It is fairly easy to go about the issue in excel since it is cell-based, but I'm more interested in a solution in R (if possible). So I guess I'm looking for inspiration on how I might get about it, or maybe even a specific solution on how to do it. I've been thinking if it is possible using the openxlsx package, and manipulating a sheet in excel through R. Or maybe using lists, and storing them on the DF... Or heatmaply (which has an option for e.g. a dendrogram above a heatmap).
I must admit, however, I'm stuck. I can't get my head around it... So I guess I'm looking for your expertise :)

How to tidy up a character column?

What I have:
test_df <- data.frame(isolate=c(1,2,3,4,1,2,3,4,5),label=c(1,1,1,1,2,2,2,2,2),alignment=c("--at","at--","--at","--at","a--","acg","a--","a--", "agg"))
> test_df
isolate label alignment
1 1 1 --at
2 2 1 at--
3 3 1 --at
4 4 1 --at
5 1 2 a--
6 2 2 acg
7 3 2 a--
8 4 2 a--
9 5 2 agg
What I want:
I'd like to explode the alignment field into two columns, position and character:
> test_df
isolate label aln_pos aln_char
1 1 1 1 -
2 1 1 2 -
3 1 1 3 a
4 1 1 4 t
...
Not all alignments are the same length, but all alignments with the same label have the same length.
What I've tried:
I was thinking I could use separate to first make each position have its own column, then use gather turn those columns into key value pairs. However, I haven't been able to get the separate part right.
Since you mentioned tidyr::gather, you could try this:
test_df <- data.frame(isolate=c(1,2,3,4,1,2,3,4,5),
label=c(1,1,1,1,2,2,2,2,2),
alignment=c("--at","at--","--at","--at","a--","acg","a--","a--", "agg"),
stringsAsFactors = FALSE)
library(tidyverse)
test_df %>%
mutate(alignment = strsplit(alignment,"")) %>%
unnest(alignment)
In base R, you can use indexing along with creation of a list with strsplit like this.
# make variable a character vector
test_df$alignment <- as.character(test_df$alignment)
# get list of individual characters
myList <- strsplit(test_df$alignment, split="")
then build the data.frame
# construct data.frame
final_df <- cbind(test_df[rep(seq_len(nrow(test_df)), lengths(myList)),
c("isolate", "label")],
aln_pos=sequence(lengths(myList)),
aln_char=unlist(myList))
Here, we take the first two columns of the original data.frame and repeat the rows using rep with a vector input in its second argument telling it how many times to repeat the corresponding value in its first argument. The number of times is calculated with lengths. The second argument of cbind is a call to sequence taking the same lengths output. this produces counts from 1 to the corresponding length. The third argument is the unlisted character values.
this returns
head(final_df, 10)
isolate label aln_pos aln_char
1 1 1 1 -
1.1 1 1 2 -
1.2 1 1 3 a
1.3 1 1 4 t
2 2 1 1 a
2.1 2 1 2 t
2.2 2 1 3 -
2.3 2 1 4 -
3 3 1 1 -
3.1 3 1 2 -

Using Merge with an R By class object

So I have a "by" class object (which is essentially a list).
It is indexed by 2 factors [id1,id2], with a list associated with each unique pair.
e.g.
id1:1
id2:1
1,2,3
------
id1:1
id2:2
4,4,NA
------
id1:2
id2:1
NA
I would like to convert this to a data frame which has 3 columns {id1,id2,value} and would take the above and return
id1, id2, value
1 1 1
1 1 2
1 1 3
1 2 4
1 2 4
1 2 NA
2 1 NA
This can be done with a for loop but is obviously slow. I am looking to try and merge the value column back to a data frame which has indices 1 and 2.
Answer: Use the data.table package. It is ridiculously quick for these sorts of problems.

Create a list with one named column in R

Okay, I'm stupid, but:
How can I create a list with one column, 10 rows and a column name, and the same numeric value in all fields? I know how to append it, e.g.
mylist["column_name"] <- rep(1, nrow(mylist))
but not how to create it on its own.
It should look like this:
> mylist
column_name
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
Are you sure you want a list and not a data frame (as that is what your example looks like)? You can get it like this:
data.frame(column_name=rep(1,10))

Resources