How to properly convert dataframe from integer to numeric in R? - r

I have an abundance dataframe improved as a csv with column and row headers. When imported, and running str(data) it shows each row as int. I can't use vegan's package specaccum unless the data is numeric. After converting my dataframe into numeric it is still producing the following error:
Error in colSums(x) : 'x' must be numeric
My sample and code:
Structure of my dataframe before any conversion:
> str(data)
'data.frame': 180 obs. of 727 variables:
$ Sample : Factor w/ 180 levels "Sample1","Sample2",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Abrostola : int 0 0 0 0 0 0 0 0 0 0 ...
$ Abrus : int 0 0 1 0 0 0 0 0 0 0 ...
$ Acanthamoeba : int 0 0 0 0 0 0 0 0 0 0 ...
$ Acanthopagrus : int 0 0 0 0 0 0 1 0 0 0 ...
$ Acetilactobacillus : int 0 1 0 0 0 0 0 0 0 0 ...
$ Acetobacter : int 0 0 0 0 0 0 0 0 0 0 ...
Then:
data2 <- data[-1] ## to get rid of factor column
data2 <- lapply(data2, as.numeric)
> str(data2)
List of 726
$ Abrostola : num [1:180] 0 0 0 0 0 0 0 0 0 0 ...
$ Abrus : num [1:180] 0 0 1 0 0 0 0 0 0 0 ...
$ Acanthamoeba : num [1:180] 0 0 0 0 0 0 0 0 0 0 ...
$ Acanthopagrus : num [1:180] 0 0 0 0 0 0 1 0 0 0 ...
$ Acetilactobacillus : num [1:180] 0 1 0 0 0 0 0 0 0 0 ...
$ Acetobacter : num [1:180] 0 0 0 0 0 0 0 0 0 0 ...
$ Achromobacter : num [1:180] 0 0 0 0 0 0 0 0 0 0 ...
$ Acinetobacter : num [1:180] 0 0 0 0 0 0 0 0 0 0 ...
Next I tried running the very basic vegan command:
mycurve <- specaccum(comm = data2, method = "random", permutations = "1000")
But it gives the same error. I don't get it - my df is clearly converted to numeric so what is the issue??
EDIT
Prior to fixing my dataframe into numeric, I was using:
mycurve <- specaccum(comm = data[-1], method = "random", permutations = "1000") ## without prior removal of factor column
But it was giving the following error:
Error in nperm + EPS : non-numeric argument to binary operator
$ Acetobacter : num [1:180] 0 0 0 0 0 0 0 0 0 0 ...
I am not sure why it is targeting this particular column, it looks exactly the same as everything else. No columns are empty (i.e. no column sums equal 0 as I thought that would be causing an issue). I checked for weird symbols/whitespace - the columns do not have anything out of the ordinary. There are no empty cells either with "NA".
Output of dput(head(data)) but due to body limit in this post I had to truncate the output.
structure(list(Sample = structure(1:6, .Label = c("Sample1", "Sample2",
"Sample3", "Sample4", "Sample5", "Sample6", "Sample180"), class = "factor"), Abrostola = c(0L,
0L, 0L, 0L, 0L, 0L), Abrus = c(0L, 0L, 1L, 0L, 0L, 0L), Acanthamoeba = c(0L,
0L, 0L, 0L, 0L, 0L), Acanthopagrus = c(0L, 0L, 0L, 0L, 0L, 0L
), Acetilactobacillus = c(0L, 1L, 0L, 0L, 0L, 0L), Acetobacter = c(0L,
0L, 0L, 0L, 0L, 0L), Achromobacter = c(0L, 0L, 0L, 0L, 0L, 0L
), Acinetobacter = c(0L, 0L, 0L, 0L, 0L, 0L), Acipenser = c(0L,
0L, 0L, 0L, 0L, 0L), Acomys = c(0L, 0L, 0L, 0L, 0L, 0L), Acremonium = c(0L,
0L, 0L, 0L, 0L, 0L), Acromyrmex = c(0L, 0L, 0L, 0L, 0L, 0L),
Acropora = c(0L, 0L, 0L, 0L, 0L, 0L), Actinidia = c(0L, 0L,
0L, 0L, 0L, 0L), Actinobacillus = c(0L, 0L, 0L, 0L, 0L, 0L
), Acyrthosiphon = c(0L, 0L, 0L, 0L, 0L, 0L), Acytostelium = c(0L,
1L, 0L, 0L, 0L, 0L), Aedes = c(0L, 0L, 0L, 0L, 0L, 0L), Aegilops = c(0L,
0L, 0L, 0L, 0L, 0L), Aeromonas = c(0L, 0L, 0L, 0L, 5L, 0L
), Ageratum = c(0L, 0L, 0L, 0L, 0L, 0L), Aggregatibacter = c(0L,
0L, 0L, 0L, 0L, 0L), Albugo = c(0L, 0L, 0L, 0L, 0L, 0L),
Alcaligenes = c(0L, 0L, 0L, 0L, 0L, 0L), Alcanivorax = c(0L,
0L, 0L, 0L, 0L, 0L), Allygidius = c(0L, 0L, 0L, 0L, 0L, 0L
), Amblyraja = c(0L, 0L, 0L, 0L, 0L, 0L), Amoebogregarina = c(0L,
0L, 1L, 1L, 0L, 0L), Amphidinium = c(0L, 0L, 0L, 0L, 0L,
0L), Amphiprion = c(0L, 0L, 0L, 0L, 0L, 0L), Amphipyra = c(0L,
1L, 1L, 1L, 0L, 1L), Amycolatopsis = c(0L, 0L, 0L, 0L, 0L,
0L), Ananas = c(1L, 1L, 1L, 1L, 0L, 0L), Anas = c(0L, 0L,
0L, 0L, 0L, 0L), Andhravirus = c(0L, 0L, 0L, 0L, 0L, 0L),
Andrena = c(0L, 0L, 0L, 0L, 0L, 0L), Anolis = c(0L, 0L, 0L,
0L, 0L, 0L), Anopheles = c(0L, 1L, 0L, 0L, 0L, 0L), Anoplophora = c(0L,
0L, 0L, 0L, 0L, 0L), Anoxybacillus = c(0L, 0L, 0L, 0L, 0L,
0L), Anthocharis = c(0L, 1L, 2L, 1L, 0L, 1L), Aphanomyces = c(0L,
0L, 0L, 1L, 0L, 0L), Aphyllon = c(0L, 0L, 0L, 0L, 0L, 0L),
Apilactobacillus = c(2L, 0L, 0L, 0L, 0L, 0L), Apotomis = c(0L,
0L, 0L, 0L, 0L, 0L), Apteryx = c(0L, 0L, 0L, 0L, 0L, 0L),
Aquila = c(0L, 0L, 0L, 0L, 0L, 0L), Arabidopsis = c(0L, 0L,
0L, 0L, 0L, 0L), Arabis = c(0L, 0L, 0L, 0L, 0L, 0L), Arachis = c(0L,
0L, 0L, 0L, 0L, 0L), Arctia = c(0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 6L), class = "data.frame")
SOLVED
While the code I was using for mycurve was working with other method options and quoting permutations, simply removing the quotes for permutations along with method random worked. Did not even need to convert the data into numeric.
mycurve <- specaccum(comm = data[-1], method = "random", permutations = 1000) ## REMOVED QUOTES IN PERMUTATIONS and used with INT dataframe

According to ?specaccum
permutations - Number of permutations with method = "random". Usually an integer giving the number permutations, but can also be a list of control values for the permutations as returned by the function how, or a permutation matrix where each row gives the permuted indices.
specaccum(comm = data[-1], method = "random", permutations = 1000)
-ouput
Species Accumulation Curve
Accumulation method: random, with 719 permutations
Call: specaccum(comm = data[-1], method = "random", permutations = 1000)
Sites 1.000000 2.000000 3.000000 4.000000 5.000000 6
Richness 3.502086 5.731572 7.297636 8.598053 9.831711 11
sd 1.894776 1.483049 1.228892 1.143491 0.897720 0

Related

Add new columns with defined values in R

I have a data.table named dmat. I want to add each character of missing_snps to dmat as new column and assign all rows as zero. The output remains in the same class as it was.
I would appreciate any suggestion.
dmat <- structure(list(`1:27950613:G:A` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27950883:CTA:C` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27952180:A:G` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27953106:A:G` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27953374:G:T` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27953514:T:TA` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27953608:T:C` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27954027:G:A` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27954415:T:C` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `1:27962685:T:C` = c(0L, 0L, 0L, 0L, 0L, 1L, 0L,
0L, 0L, 0L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl",
"data.frame"))
missing_snps <- c("1:169858888:G:A", "1:16985867657:T:A", "1:132862874:G:A")
dmat[,c("1:169858888:G:A", "1:16985867657:T:A", "1:132862874:G:A")] <- 0
or dmat[, missing_snps] <- 0
Using data.table,
dmat <- setDT(dmat)
missing_snps <- c("1:169858888:G:A", "1:16985867657:T:A", "1:132862874:G:A")
dmat[,(missing_snps ):=0]
Output
> dmat[,..missing_snps ]
1:169858888:G:A 1:16985867657:T:A 1:132862874:G:A
1: 0 0 0
2: 0 0 0
3: 0 0 0
4: 0 0 0
5: 0 0 0
6: 0 0 0
7: 0 0 0
8: 0 0 0
9: 0 0 0
10: 0 0 0
The columns you want to mutate has been added.

in R, how can I find the row number of the first occurrence and last occurrence of a value in a Matrix?

In R, I've created 25x25 matrices of values of 1 and 0 and I need to find the height between the first occurrence of 1 in the matrix and the last occurrence of 1 in the matrix.
Heres an example of a matrix of the letter a, where each 1 represents a black pixel and each 0 represents a white pixel:
a <- read.csv(csv_files[1])
a
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
13 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
16 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
17 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
My idea is the find the row number of the last occurrence of 1 and the row number of the first occurrence of 1 and take them away from eachother, which will give me the height of the symbol.
In this case it would be 19 - 6 = 13, so the height is 13.
For context, I drew images of different letters and symbols on GIMP, and the imported them into R and saved them in a matrix as a CSV file.
Try the code below
> diff(range(which(a == 1, arr.ind = TRUE)[, "row"]))
[1] 13
Data
> dput(a)
structure(list(V1 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
V2 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V3 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V4 = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L), V5 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), V6 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
), V7 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), V8 = c(0L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), V9 = c(0L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L), V10 = c(0L, 0L, 0L, 0L, 0L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L), V11 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L), V12 = c(0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L
), V13 = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V14 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V15 = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), V16 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 0L, 0L, 0L, 0L), V17 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L), V18 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L), V19 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
V20 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V21 = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V22 = c(0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L), V23 = c(0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L), V24 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), V25 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25"))

R Printing ftable() output to csv with factor names

I'm working with ftable in R to create contingency tables.
I want to print an ftable object to a csv, but when I use write.csv() on the ftable object the csv no longer lists the factor names that are included in the ftable on R. This is the type of output that I get
Here's an example ftable in R
structure(c(1L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 2L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L), .Dim = c(12L, 7L), class = "ftable", row.vars = list(
ï..petal_size = c("large ", "small", "small "), stem_length = c("long",
"long ", "short", "short ")), col.vars = list(flow_color = c("blue",
"green", "indigo ", "orange", "red ", "violet", "yellow")))
Is there a solution to this such that I can keep the factor names?
One option would be to use the function write.ftable but you will have a lot of manual work to do, due to the fact everything (in CSV file) will be written in a single column
write.ftable(ftable(df), file = "table.csv", quote = FALSE)
# And the otuput NOTE: WHEN OPENING CSV EVERYTHING WILL BE IN SINGLE COLUMN
flow_color blue green indigo orange red violet yellow
i..petal_size stem_length
large long 1 0 1 1 2 1 1
long 0 0 0 0 0 0 0
short 0 0 0 0 0 1 1
short 0 1 0 1 0 0 0
small long 1 2 0 0 1 0 0
long 0 0 1 0 0 0 0
short 0 0 1 0 0 1 0
short 1 0 0 0 0 0 1
small long 0 0 0 0 0 0 0
long 0 0 0 0 0 0 0
short 0 0 0 1 0 0 0
short 0 0 0 0 0 0
0
Or another option using stats to first format ftable and then use write.table
df <- ftable(df)
cont <- stats:::format.ftable(df, quote = FALSE)
write.table(cont, sep = ";", file = "table.csv")
And the output

how to prepare an adjacency matrix for network analysis

I am trying to convert the raw data below to an adjacent matrix by assigning the value on the column "s_chloramphenicol" in preparation for a network analysis.
df <- structure(list(studyid0 = c(1L, 5L, 6L, 8L, 9L, 11L, 3052L, 3057L,
3058L, 3058L, 3060L, 3063L, 3064L, 3067L), s_chloramphenicol = c(0L,
0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 0L, 0L)), row.names = c(NA,
-14L), class = "data.frame", .Names = c("studyid0", "s_chloramphenicol"
))
The expected output is
df<-structure(list(`1` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `5` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `6` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `8` = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L,
0L, 0L, 0L), `9` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `11` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `3052` = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 0L), `3057` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `3058` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `3060` = c(0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L,
0L, 0L, 0L), `3063` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `3064` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L), `3067` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L)), .Names = c("1", "5", "6", "8", "9", "11", "3052",
"3057", "3058", "3060", "3063", "3064", "3067"), class = "data.frame", row.names = c(1L,
5L, 6L, 8L, 9L, 11L, 3052L, 3057L, 3058L, 3060L, 3063L, 3064L,
3067L))
You can use the function outer:
df2 <- outer(df$s_chloramphenicol, df$s_chloramphenicol)
rownames(df2) <- colnames(df2) <- df$studyid0
df2
Output:
1 5 6 8 9 11 3052 3057 3058 3058 3060 3063 3064 3067
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0
8 0 0 0 1 0 0 1 0 0 0 1 0 0 0
9 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3052 0 0 0 1 0 0 1 0 0 0 1 0 0 0
3057 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3058 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3058 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3060 0 0 0 1 0 0 1 0 0 0 1 0 0 0
3063 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3064 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3067 0 0 0 0 0 0 0 0 0 0 0 0 0 0

retrieving names(table) following subset where only 1 observation

My issue is that when I try to retrieve names(myresults) after subsetting a table I get null when the returned subset has only 1 result. Rather than returning a character vector of row names r returns an integer (in this case of 1).
Here is a table
head(tbl)
1 2 3 4 5 6
afford 0 1 0 0 0 0
app 0 0 0 1 0 0
back 0 1 0 0 0 0
cancel 0 0 0 0 1 0
charg 0 0 0 0 0 1
download 0 0 0 0 0 1
I have been subsetting the table within a loop to return a table for each group. If a term belongs to a group it has a value of 1:
for (i in 1:ncol(tbl)) {
t <- tbl[which(tbl[,i]==1),i]
nam <- names(t)
df <- as.data.frame(nam)
names(df) <- paste0("Cluster ",i)
print(kable(df))
}
This loop seems to work OK when there are more than one instance of a term returned by which(). But the group 4, which has only 1 term "app" gives me issues. Here's an example on group 3, which works as expected then on group 4, which does not:
> t <- tbl[which(tbl[,4]==1),4] # only 1 observation meets this criteria
> t
[1] 1
> t <- tbl[which(tbl[,3]==1),3] # 3 observations meet this criteria
> t
aword cat dog
1 1 1
So I can get names(t) for tbl[,3] where it has 3 returned instances but not for tbl[,4] which only has 1.
> t <- fintab[which(fintab[,4]==1),4]
> names(t)
NULL # expected "app"
> t <- fintab[which(fintab[,4]==1),4]
> names(t)
[1] "aword" "cat" "dog"
How can I get names(t) when I have only 1 instance returned like in the example?
Some further context following comment below:
> str(tbl)
'table' int [1:33, 1:6] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:33] "aword" "app" "cat" "dog" ...
..$ : chr [1:6] "1" "2" "3" "4" ...
>
and
> dput(tbl)
structure(c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), .Dim = c(33L, 6L), .Dimnames = structure(list(
c("aword", "app", "back", "cancel", "charg", "download",
"enough", "expens", "get", "great", "just", "like", "love",
"cat", "dog", "bla", "month", "much", "need",
"never", "phone", "pleas", "blabla", "realli", "term", "sign",
"thank", "time", "triangle", "use", "want", "will", "work"), c("1",
"2", "3", "4", "5", "6")), .Names = c("", "")), class = "table")
As we are subsetting a single column, we get the logical index (tbl[,4] ==1 - no need to wrap with which unless there are NAs. In that case, the which remove those NAs) and use that to subset the column vector.
tbl[,4][tbl[,4]==1]
# app
# 1
tbl[,3][tbl[,3]==1]
# cat blabla time
# 1 1 1

Resources