R: Why can't use dimnames() to assign dim names - r

fg = read.table("fungus.txt", header=TRUE, row.names=1);fg
names(dimnames(fg)) = c("Temperature", "Area");names(dimnames(fg))#doesn't work
dimnames(fg) = list("Temperature"=row.names(fg), "Area"=colnames(fg));dimnames(fg)
#doesn't work
You can look at the picture of data I used below:
Using dimnames() to assign dim names to the data.frame doesn't work.
The two R command both do not work. The dimnames of fg didn't change, and the names of dimnames of fg is still NULL.
Why does this happen? How to change the dimnames of this data.frame?

Finally I found change the data frame to matrix works well.
fg = as.matrix(read.table("fungus.txt", header=TRUE, row.names=1))
dimnames(fg) = list("Temp"=row.names(fg), "Isolate"=1:8);fg
And got the output:
Isolate
Temp 1 2 3 4 5 6 7 8
55 0.66 0.67 0.43 0.41 0.69 0.63 0.46 0.52
60 0.82 0.81 0.80 0.79 0.85 0.91 0.53 0.66
65 0.91 1.09 0.81 0.86 0.95 0.93 0.64 1.10
70 1.02 1.22 1.03 1.08 1.10 1.13 0.80 1.17
75 1.06 1.17 0.89 1.02 1.06 1.29 0.94 1.01
80 0.80 0.81 0.73 0.77 0.80 0.79 0.59 0.95
85 0.26 0.40 0.36 0.53 0.67 0.53 0.57 0.18
Reply to the comments: if you do not know anything about the code, then do not ask me why I post such a question.

Related

Add values of multiple dataframes together cell by cell

I am trying to add multiple dataframes together but not in a bind fashion.
Is there an easy way to overlay & add dataframes on top of each other? As shown in this picture:
The number of columns will always be same; the row count will differ.
I want to sum the cells by row position. So Result[1,1] = Table1[1,1] + Table2[1,1] and so on, such that the resulting frame adds whatever cells have data and resulting table is the size of biggest table's size.
The table are generated dynamically so I'd like to refrain from any hardcoding.
Consider the following two data frames:
table1 <- replicate(4,round(runif(10,0,1),2)) %>% as.data.frame %>% setNames(LETTERS[1:4])
table2 <- replicate(4,round(runif(6,0,1),2)) %>% as.data.frame %>% setNames(LETTERS[1:4])
table1
A B C D
1 0.81 0.08 0.85 0.89
2 0.88 0.82 0.62 0.77
3 0.12 0.13 0.99 0.02
4 0.17 0.54 0.37 0.62
5 0.77 0.10 0.81 0.34
6 0.58 0.15 0.00 0.56
7 0.61 0.15 0.59 0.15
8 0.52 0.36 0.12 0.99
9 0.83 0.93 0.29 0.30
10 0.52 0.02 0.48 0.46
table2
A B C D
1 0.95 0.81 0.99 0.92
2 0.18 0.99 0.35 0.09
3 0.73 0.10 0.02 0.68
4 0.37 0.53 0.78 0.02
5 0.48 0.54 0.79 0.83
6 0.75 0.32 0.41 0.04
We might create a new variable called ID from their row numbers and use that to sum the values after binding the rows:
library(dplyr)
library(tibble)
bind_rows(table1 %>% rowid_to_column("ID"),table2 %>% rowid_to_column("ID")) %>%
group_by(ID) %>%
summarise(across(everything(),sum))
# A tibble: 10 x 5
ID A B C D
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1.76 0.89 1.84 1.81
2 2 1.06 1.81 0.97 0.86
3 3 0.85 0.23 1.01 0.7
4 4 0.54 1.07 1.15 0.64
5 5 1.25 0.64 1.6 1.17
6 6 1.33 0.47 0.41 0.6
7 7 0.61 0.15 0.59 0.15
8 8 0.52 0.36 0.12 0.99
9 9 0.83 0.93 0.290 0.3
10 10 0.52 0.02 0.48 0.46
A potentially more dangerous base R approach is to subset table1 to the dimensions of table2, and add them together:
table1[seq(1,nrow(table2)),seq(1,ncol(table2))] <- table1[seq(1,nrow(table2)),seq(1,ncol(table2))] + table2
table1
A B C D
1 1.76 0.89 1.84 1.81
2 1.06 1.81 0.97 0.86
3 0.85 0.23 1.01 0.70
4 0.54 1.07 1.15 0.64
5 1.25 0.64 1.60 1.17
6 1.33 0.47 0.41 0.60
7 0.61 0.15 0.59 0.15
8 0.52 0.36 0.12 0.99
9 0.83 0.93 0.29 0.30
10 0.52 0.02 0.48 0.46
# Create your data frames
df1<-data.frame(a=c(1,2,3),b=c(2,3,4),c=c(3,4,5))
df2<-data.frame(a=c(1,2),b=c(2,3),c=c(3,4))
# Create a new data frame from the bigger of the two
if (nrow(df1)>nrow(df2)){
df3 <-df1
} else {
df3<-df2
}
# For each line in the smaller data frame add it to the larger
for (number in 1:min(nrow(df1),nrow(df2))){
df3[number,] <- df1[number,]+df2[number,]
}

Filtering all rows if any value in a row is less than a threshold value

I would like to remove all rows if any value of the row is less than 0.05. Any suggestions? I need dplyr and base R simple subset solutions.
library(magrittr)
text = '
INNO RISK PRO AMB MKT IP
1 0.00 0.01 0.00 0.00 0.19 0.24
2 1.00 0.83 0.04 0.48 0.60 0.03
3 0.01 0.07 0.79 0.05 0.19 0.00
4 0.99 0.99 0.92 0.86 0.01 0.10
5 0.72 0.93 0.28 0.48 1.00 0.90
6 0.96 1.00 1.00 0.86 1.00 0.75
7 0.02 0.07 0.01 0.86 0.60 0.00
8 0.02 0.01 0.01 0.12 0.60 0.24
9 0.02 0.93 0.92 0.02 0.19 0.90
10 0.99 0.97 0.92 0.86 0.99 0.90'
d10 = textConnection(text) %>% read.table(header = T)
Created on 2020-11-28 by the reprex package (v0.3.0)
We can use rowSums
d10[!rowSums(d10 < 0.05),]
# INNO RISK PRO AMB MKT IP
#5 0.72 0.93 0.28 0.48 1.00 0.90
#6 0.96 1.00 1.00 0.86 1.00 0.75
#10 0.99 0.97 0.92 0.86 0.99 0.90
Or with dplyr
library(dplyr)
d10 %>%
filter(across(everything(), ~ . >= 0.05))
# INNO RISK PRO AMB MKT IP
#5 0.72 0.93 0.28 0.48 1.00 0.90
#6 0.96 1.00 1.00 0.86 1.00 0.75
#10 0.99 0.97 0.92 0.86 0.99 0.90

Fail to split string at each instance of character in R

I am using R to try and separate a long string of numbers all separated by the ";" character. The string looks like this:
";0,38;0,33;0,24;0,28; 0,33;0,33;0,38;0,23; 0,33;0,33; 0,38; 0,43; 0,51;0,56;0,33;0,56;0,33;0,43;0,51;0,56;\n\n0,61; 0,66;0,56; 0,66;0,56; 0,61; 0,66;0,61; 0,63; 0,66; 0,71;0,81;0,86; 0,99;0,86; 0,99; 1,12;1,27; 1,54; 1,57"
I have tried to do
strsplit(string,";")
and
str(string,";")
What is the quick way to do this so that I end up with a list of all the numbers in my list? Is there a way to do this with tidy verse?
The scan function allow using semicolons as separators and commas as decimal points (at least for input).
> vals <- scan(text=string, sep=";", dec=",")
Read 42 items
> vals
[1] NA 0.38 0.33 0.24 0.28 0.33 0.33 0.38 0.23 0.33 0.33 0.38 0.43 0.51 0.56 0.33 0.56 0.33
[19] 0.43 0.51 0.56 NA 0.61 0.66 0.56 0.66 0.56 0.61 0.66 0.61 0.63 0.66 0.71 0.81 0.86 0.99
[37] 0.86 0.99 1.12 1.27 1.54 1.57

extract entry of a vector or Matrix in R

i have this matrix
> dist
1 2 3 4 5 6 7
1 0.00 0.52 0.34 0.37 0.37 0.52 0.54
2 0.52 0.00 0.77 0.57 0.57 0.00 0.56
3 0.34 0.77 0.00 0.29 0.29 0.77 0.53
4 0.37 0.57 0.29 0.00 0.00 0.57 0.32
5 0.37 0.57 0.29 0.00 0.00 0.57 0.32
6 0.52 0.00 0.77 0.57 0.57 0.00 0.56
7 0.54 0.56 0.53 0.32 0.32 0.56 0.00
I WANT TO extract the row number 1 but since the third position
0.34 0.37 0.37 0.52 0.54
i try with dis[1,>=3] but there is a error
You can give a sequence for the columns to select with [:
dist[1,3:7]

A better way to plot lots of lines (in ggplot perhaps)?

Using R 3.0.2, I have a dataframe that looks like
head()
0 5 10 15 30 60 120 180 240
YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
Where row.names() are variable names, names() are measurement times, and the values are measurements. It's several thousand rows deep. Let's call it tmp.
I want to do a sanity check of plotting every variable as time versus value as a line-plot on one plot. What's a better way to do it than naively plotting each line with plot() and lines():
timez <- names(tmp)
plot(x=timez, y=tmp[1,], type="l", ylim=c(-5,5))
for (i in 2:length(tmp[,1])) {
lines(x=timez,y=tmp[i,])
}
The above crude answer is good enough, but I'm looking for a way to do this right. I had a concusion recently, so sorry if I'm missing something obvious. I've been doing that a lot.
Could it be something with transposing the data.frame so it's each timepoint observed across several thousand variables? Or melt()-ing the data.frame in some meaningful way? Is there someway of handling it in ggplot using aggregate()s of data.frames or something? This isn't the right way to do this, is it?
At a loss.
I personally prefer ggplot2 for all of my plotting needs. Assuming I've understood you correctly, you can put the data in long format with reshape2 and then use ggplot2 to plot all of your lines on the same plot:
library(reshape2)
df2<-melt(df,id.var="var")
names(df2)<-c("var","time","value")
df2$time<-as.numeric(substring(df2$time,2))
library(ggplot2)
ggplot(df2,aes(x=time,y=value,colour=var))+geom_line()
You can simply use matplot as follows
DF
## 0 5 10 15 30 60 120 180 240
## YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
## YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
## YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
## YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
## YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
## YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
matplot(t(DF), type = "l", xaxt = "n", ylab = "") + axis(side = 1, at = 1:length(names(DF)), labels = names(DF))
xaxt = "n" suppresses ploting x axis annotations. axis function allows you to specify details for any axis, in this case we are using to specify labels of x axis.
It should produce plot as below.

Resources