A better way to plot lots of lines (in ggplot perhaps)? - r

Using R 3.0.2, I have a dataframe that looks like
head()
0 5 10 15 30 60 120 180 240
YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
Where row.names() are variable names, names() are measurement times, and the values are measurements. It's several thousand rows deep. Let's call it tmp.
I want to do a sanity check of plotting every variable as time versus value as a line-plot on one plot. What's a better way to do it than naively plotting each line with plot() and lines():
timez <- names(tmp)
plot(x=timez, y=tmp[1,], type="l", ylim=c(-5,5))
for (i in 2:length(tmp[,1])) {
lines(x=timez,y=tmp[i,])
}
The above crude answer is good enough, but I'm looking for a way to do this right. I had a concusion recently, so sorry if I'm missing something obvious. I've been doing that a lot.
Could it be something with transposing the data.frame so it's each timepoint observed across several thousand variables? Or melt()-ing the data.frame in some meaningful way? Is there someway of handling it in ggplot using aggregate()s of data.frames or something? This isn't the right way to do this, is it?
At a loss.

I personally prefer ggplot2 for all of my plotting needs. Assuming I've understood you correctly, you can put the data in long format with reshape2 and then use ggplot2 to plot all of your lines on the same plot:
library(reshape2)
df2<-melt(df,id.var="var")
names(df2)<-c("var","time","value")
df2$time<-as.numeric(substring(df2$time,2))
library(ggplot2)
ggplot(df2,aes(x=time,y=value,colour=var))+geom_line()

You can simply use matplot as follows
DF
## 0 5 10 15 30 60 120 180 240
## YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
## YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
## YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
## YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
## YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
## YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
matplot(t(DF), type = "l", xaxt = "n", ylab = "") + axis(side = 1, at = 1:length(names(DF)), labels = names(DF))
xaxt = "n" suppresses ploting x axis annotations. axis function allows you to specify details for any axis, in this case we are using to specify labels of x axis.
It should produce plot as below.

Related

Getting a null score vector using princomp funtion in R

I'm trying to do PCA analysis on some data. I'm not given the raw data, just the correlation matrix in this way:
Tmax Tmin P H PT V Vmax
Tmax 1.00 0.70 -0.08 -0.41 -0.09 -0.23 -0.08
Tmin 0.70 1.00 -0.30 0.07 0.14 -0.03 -0.01
P -0.08 -0.30 1.00 -0.18 -0.13 -0.29 -0.25
H -0.41 0.07 -0.18 1.00 0.32 -0.15 -0.19
PT -0.09 0.14 -0.13 0.32 1.00 0.11 0.07
V -0.23 -0.03 -0.29 -0.15 0.11 1.00 0.83
Vmax -0.08 -0.01 -0.25 -0.19 0.07 0.83 1.00
For this I'm trying to use the princomp() function since it has the covmat option so I can introduce data as a correlation matrix. For the pca analysis I'm using the following code:
pca_prim <- princomp(covmat=Primavera, cor = T, scores = TRUE)
I need the scores in order to plot a biplot in following steps but the scores vector I get is null:
biplot(pca_prim)
Error in biplot.princomp(pca_prim) : object 'pca_prim' has no scores
pca_prim$scores
NULL
I can't seem to find what the problem is in order to get the scores. Any suggestions?

Show different data in top and bottom of Rcirclize

I have 2 dataframes with different number of rows and columns, and I'd like to show both of them in a circos plot with circlize.
My data looks like this:
df1=data.frame(replicate(7,sample(-200:200,200,rep=TRUE))/100)
df2=data.frame(replicate(2,sample(-200:200,200,rep=TRUE))/100)
#head(df1)
X1 X2 X3 X4 X5 X6 X7
1 -0.03 0.63 -0.33 0.73 -1.37 -1.39 1.96
2 -1.81 -1.24 -1.63 1.58 0.13 1.39 -0.76
3 0.02 -2.00 -1.93 -1.35 1.06 -0.58 -0.77
4 -1.11 -1.38 -0.66 -0.40 1.69 -0.47 -1.55
5 0.98 0.06 0.00 -0.35 1.97 1.74 0.72
6 1.51 -1.68 -0.44 -1.74 0.15 0.26 0.36
#head(df2)
X1 X2
1 0.16 -0.81
2 -1.38 -0.16
3 -0.22 -0.74
4 0.73 -0.82
5 0.58 -1.87
6 -0.63 1.50
I want to build a single circos plot where the top is showing df1 and bottom is showing df2, but I can only show individual dfs. For instance, this is how I show df1:
col_fun1=colorRamp2(c(min(df1), 0, max(df1)), c("blue", "white", "red"))
circos.heatmap(df1, col = col_fun1, cluster = T, track.height = 0.2, rownames.side = "outside", rownames.cex = 0.6)
circos.clear()
How can I df1 only in the top half, and df2 only in the bottom half?

How to retrieve observation scores for each Principal Component in R using principal Function

pc_unrotate = principal(correlate1,nfactors = 4,rotate = "none")
print(pc_unrotate)
output:
Principal Components Analysis
Call: principal(r = correlate1, nfactors = 4, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 PC3 PC4 h2 u2 com
ProdQual 0.25 -0.50 -0.08 0.67 0.77 0.232 2.2
Ecom 0.31 0.71 0.31 0.28 0.78 0.223 2.1
TechSup 0.29 -0.37 0.79 -0.20 0.89 0.107 1.9
CompRes 0.87 0.03 -0.27 -0.22 0.88 0.119 1.3
Advertising 0.34 0.58 0.11 0.33 0.58 0.424 2.4
ProdLine 0.72 -0.45 -0.15 0.21 0.79 0.213 2.0
SalesFImage 0.38 0.75 0.31 0.23 0.86 0.141 2.1
ComPricing -0.28 0.66 -0.07 -0.35 0.64 0.359 1.9
WartyClaim 0.39 -0.31 0.78 -0.19 0.89 0.108 2.0
OrdBilling 0.81 0.04 -0.22 -0.25 0.77 0.234 1.3
DelSpeed 0.88 0.12 -0.30 -0.21 0.91 0.086 1.4
PC1 PC2 PC3 PC4
SS loadings 3.43 2.55 1.69 1.09
Proportion Var 0.31 0.23 0.15 0.10
Cumulative Var 0.31 0.54 0.70 0.80
Proportion Explained 0.39 0.29 0.19 0.12
Cumulative Proportion 0.39 0.68 0.88 1.00
Mean item complexity = 1.9
Test of the hypothesis that 4 components are sufficient.
The root mean square of the residuals (RMSR) is 0.06
Fit based upon off diagonal values = 0.97
Now i need to get the scores, Tried pc_unrotate$scores but it returns null.
executed names(pc_unrotate),
Name of PCA
and found that Scores attribute is missing...so what can i do to get PCA scores?
Add argument scores=TRUE to the principal() function call: https://www.rdocumentation.org/packages/psych/versions/1.9.12.31/topics/principal
pc_unrotate = principal(correlate1,nfactors = 4,rotate = "none", scores = TRUE)

R: Why can't use dimnames() to assign dim names

fg = read.table("fungus.txt", header=TRUE, row.names=1);fg
names(dimnames(fg)) = c("Temperature", "Area");names(dimnames(fg))#doesn't work
dimnames(fg) = list("Temperature"=row.names(fg), "Area"=colnames(fg));dimnames(fg)
#doesn't work
You can look at the picture of data I used below:
Using dimnames() to assign dim names to the data.frame doesn't work.
The two R command both do not work. The dimnames of fg didn't change, and the names of dimnames of fg is still NULL.
Why does this happen? How to change the dimnames of this data.frame?
Finally I found change the data frame to matrix works well.
fg = as.matrix(read.table("fungus.txt", header=TRUE, row.names=1))
dimnames(fg) = list("Temp"=row.names(fg), "Isolate"=1:8);fg
And got the output:
Isolate
Temp 1 2 3 4 5 6 7 8
55 0.66 0.67 0.43 0.41 0.69 0.63 0.46 0.52
60 0.82 0.81 0.80 0.79 0.85 0.91 0.53 0.66
65 0.91 1.09 0.81 0.86 0.95 0.93 0.64 1.10
70 1.02 1.22 1.03 1.08 1.10 1.13 0.80 1.17
75 1.06 1.17 0.89 1.02 1.06 1.29 0.94 1.01
80 0.80 0.81 0.73 0.77 0.80 0.79 0.59 0.95
85 0.26 0.40 0.36 0.53 0.67 0.53 0.57 0.18
Reply to the comments: if you do not know anything about the code, then do not ask me why I post such a question.

Put result of forecast::ma() as a matrix and compute RMSE

I am really new to R. I am trying to calculate some MA[n] forecasts in R.
Here is my code,
# simple reproducible example
set.seed(0); factory <- round(rnorm(84), 1)
library(forecast)
factory.ts <- ts(factory, start = 1947, frequency = 12)
fit_EMA <- ma(factory.ts, order=5)
It works fine. Below is what fit_EMA looks like in R console. But I don't like the format as I couldn't find a way to take those fitted points for further usage. For example, how can I extract a row or column?
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1947 NA NA 0.80 0.24 0.12 -0.20 -0.46 -0.06 0.40 0.42 0.26 0.20
1948 -0.34 -0.58 -0.36 -0.32 -0.18 -0.36 -0.32 -0.30 -0.10 -0.02 0.20 0.34
1949 0.48 0.32 -0.10 -0.08 -0.22 -0.54 -0.48 -0.34 -0.20 0.08 0.38 0.38
1950 0.74 0.54 0.66 0.58 0.56 0.16 -0.02 -0.60 -1.04 -0.70 -0.38 -0.18
1951 0.10 0.34 0.58 0.26 0.28 0.28 0.48 -0.04 -0.32 -0.56 -0.54 -0.66
1952 -0.80 -0.38 -0.28 -0.32 -0.60 -0.34 -0.28 -0.10 -0.14 0.20 0.00 -0.06
1953 0.06 0.28 0.24 0.34 0.18 -0.24 -0.62 -0.38 -0.20 -0.06 NA NA
Also, how can I calculate RMSE or other error methods? forecast::ma or TTR::SMA, TTR::EMA doesn't give a calculated error measures in summary. Or I have missed a library function?
The result of forecast::ma() is always a "ts" object. Although your fit_EMA appears as a matrix when you print it to screen (because frequence = 12 so you have 12 columns), it is essentially a vector. You can use str(fit_EMA) to inspect it. You can do
mat <- matrix(fit_EMA, ncol = 12, byrow = TRUE)
to get a matrix. Then mat[1, ] gives the fitted values for the first year (year 1947).
Getting RMSE is so straightforward that a function / library routine is not needed. Do:
MSE <- mean((fit_EMA - factory.ts) ^ 2, na.rm = TRUE)
# [1] 0.55876
RMSE <- sqrt(MSE)
# [1] 0.7475025

Resources