R - Read Matrix Data from file with free spaces - r

I have an map of efficiency data as matrix stored in a text file. With increasing row number data points are missing at the end of the line since there are no data available. When I try to read the data as matrix in R:
mymatrix = as.matrix(read.table(file="~/Desktop/Map_.txt"))
I get the error message that the row 3 (in this case the first row with missing values at the rows end) has not all values expected.
Is there a way to read the map file as matrix even with free spaces (no values) in some lines?
Thank you!
example:
1000 2000 3000 4000 5000
-1 85 75 65 60 58
-2 86 74 64 58 52
-3 83 78 68 59
-4 86 80 72
-5 86 81 71

Related

rowsums accross specific row in a matrix

final.marks
# raj sanga rohan rahul
#physics 45 43 44 49
#chemistry 47 45 48 47
#total 92 88 92 96
This is the matrix I have. Now I want to find the total for each subject separately across respective subject rows and add them as a new column to the above matrix as the 5th column . However my code i.e class.marks.chemistry<- rowSums(final.marks[2,]) keeps producing an error saying
Error saying
rowSums(final.marks[2, ]) :
'x' must be an array of at least two dimensions
Can you please help me solve it. I am very new to R or any form of scripting or programming background.
Do you mean this?
# Sample data
df <- read.table(text =
" raj sanga rohan rahul
physics 45 43 44 49
chemistry 47 45 48 47
total 92 88 92 96", header = T)
# Add column total with row sum
df$total <- rowSums(df);
df;
# raj sanga rohan rahul total
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368
The above also works if df is a matrix instead of a data.frame.
If you look at ?rowSums you can see that the x argument needs to be
an array of two or more dimensions, containing numeric,
complex, integer or logical values, or a numeric data frame.
So in your case we must pass the entire data.frame (or matrix) as an argument, rather than a specific column (like you did).
Another option would be to use addmargins on a matrix
addmargins(as.matrix(df), 2)
# raj sanga rohan rahul Sum
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368

What can do to find and remove semi-duplicate rows in a matrix?

Assume I have this matrix
set.seed(123)
x <- matrix(rnorm(410),205,2)
x[8,] <- c(0.13152348, -0.05235148) #similar to x[5,]
x[16,] <- c(1.21846582, 1.695452178) #similar to x[11,]
The values are very similar to the rows specified above, and in the context of the whole data, they are semi-duplicates. What could I do to find and remove them? My original data is an array that contains many such matrices, but the position of the semi duplicates is the same across all matrices.
I know of agrep but the function operates on vectors as far as I understand.
You will need to set a threshold, but you can just compute the distance between each row using dist and find the points that are sufficiently close together. Of course, Each point is near itself, so you need to ignore the diagonal of the distance matrix.
DM = as.matrix(dist(x))
diag(DM) = 1 ## ignore diagonal
which(DM < 0.025, arr.ind=TRUE)
row col
8 8 5
5 5 8
16 16 11
11 11 16
48 48 20
20 20 48
168 168 71
91 91 73
73 73 91
71 71 168
This finds the "close" points that you created and a few others that got generated at random.

Converting spectral density values produced by spectrum() in R to values produced by SAS PROC SPECTRA

I am converting SAS programs that demonstrate temporal data analysis into R. I would like to reproduce SAS PROC SPECTRA output using R.
So, my question is, can the spectral density values produced by the spectrum() function in R be converted into the values for spectral density produced by SAS PROC SPECTRA?
DATA AR1_09;
INPUT t U;
OUTPUT;
CARDS;
1 -5.19859
2 4.91364
3 -3.86515
4 4.02932
5 -4.12263
6 3.46548
7 -3.01139
8 3.13753
9 -2.34875
10 2.1531
11 -2.01086
12 1.88911
13 -2.22766
14 1.94077
15 0.1786
16 0.84228
17 -1.51301
18 2.62644
19 -3.44148
20 3.13813
21 -2.34959
22 2.70754
23 -2.54789
24 2.04427
25 -2.34041
26 1.13443
27 -0.11853
28 0.74645
29 0.02448
30 0.57811
31 -1.54715
32 1.05646
33 -0.56458
34 0.6863
35 -0.53347
36 0.60813
37 -1.22044
38 0.13136
39 -0.45568
40 0.13459
41 -0.10892
42 0.46324
43 1.01367
44 -2.44015
45 1.62849
46 1.54928
47 -2.7146
48 2.20448
49 -1.58668
50 1.06419
51 -1.41402
52 1.30755
53 -1.55331
54 1.58191
55 -2.38216
56 1.45702
57 0.79562
58 -0.91078
59 -0.59827
60 1.44958
61 -1.81996
62 -0.05101
63 -0.13188
64 1.34861
65 -1.81912
66 0.73641
67 -0.32049
68 -0.37179
69 2.26288
70 -2.2773
71 0.95193
72 -1.24679
73 0.67123
74 -0.40868
75 1.46308
76 -0.71945
77 1.07481
78 -2.25127
79 1.87573
80 -1.52811
81 1.27772
82 -2.96657
83 3.58684
84 -1.7656
85 2.92004
86 -2.36525
87 2.17087
88 -1.65458
89 0.86588
90 0.19505
91 -2.34264
92 3.51124
93 -3.33501
94 3.13522
95 -1.8957
96 0.93527
97 -0.96551
98 0.08307
99 -0.14018
100 0.48641
;
PROC SPECTRA DATA=AR1_09 OUT=AR1_09PSPEC1 P S WHITETEST;
VAR U;
WEIGHTS 1 2 1;
RUN;
PROC SPECTRA DATA=AR1_09 OUT=AR1_09PSPEC2 S WHITETEST;
VAR U;
WEIGHTS 1 2 3 4 5 4 3 2 1;
RUN;
DATA AR1_09PSPEC12;
SET AR1_09PSPEC1;
n=100;
fre=0.5*n*FREQ/(4*ATAN(1));
P_01=P_01/(16*ATAN(1));
KEEP fre P_01 S_01;
RUN;
DATA AR1_09PSPEC22;
SET AR1_09PSPEC2;
n=100;
fre=0.5*n*FREQ/(4*ATAN(1));
S_02=S_01;
KEEP fre S_02;
DATA AR1_09TRUESPEC;
SET AR1_09PSPEC1;
n=100;
rho=-0.9;
theoreticalS=1.0/(8*ATAN(1)*(1-2*rho*cos(FREQ)+rho*rho));
fre=0.5*n*FREQ/(4*ATAN(1));
KEEP fre theoreticalS;
DATA AR1_09PSPEC;
MERGE AR1_09PSPEC12 AR1_09PSPEC22 AR1_09TRUESPEC;
PROC PRINT DATA=AR1_09PSPEC;
VAR fre P_01 S_01 S_02 theoreticalS;
RUN;
and so far, after entering the data using read.xlsx() in R, this is what I've got:
AR1.09 <- as.ts(AR1.09[, 2])
install.packages("forecast")
library(forecast)
Using ma for the moving average smoother of order 2.
MA2AR1.09 <- ma(AR1.09, order = 2)
AR1.09PSPEC1 <- spectrum(na.omit(MA2AR1.09))
Tests for White Noise for Variable U
Box.test (MA2AR1.09)
Box.test (MA2AR1.09, type = "Ljung")
Periodogram of Fourier analysis. The periodogram values produced by SAS look to be approximately P (below) times 4. This isn't surprising, I have been told that some software produces periodogram values divided by 4pi.
n <- length(AR1.09)
FF <- abs(fft(AR1.09) / sqrt(n))^2
P <- (4 / n) * FF[1:((n / 2) + 1)]
f <- (0:(n/2)) / n
plot(f, P, type = "h")
So, the $spec values that are the spectral density values produced by R's spectrum() function are not the same as those produced by SAS PROC SPECTRA. Can I transform my R values into the SAS values?
That's all. Thank you for your time.

Data frame to 3D array and calculate mean in Z

I have a data frame read from CSV which contains 14 columns and 990 rows. Each set of 110 rows contains repeats of structured data (not the values) with the first 5 columns being labels.
I now want to create a new grid of 14x110, such that if columns are labelled with letters and rows are numbered numerically, then A1 to E110 of the new grid are the labels and F1 contains the mean average of F1 in the original frame, and so on through to N110.
I have never used R before, and have got as far as calculating the mean of one cell with
mean(data[c(seq.int(3,nrow(d),110)),c(6)])
but I need some help with repeating this for the rest of the cells and constructing a resulting data frame, please.
To transform an matrix to a 3D array
yourarray=array(unlist(yourmatrix),dim = c(110,14,9))
Then to take an average of z values you can do something like
out=matrix(NA,110,14)
for(n in 1:14){
for(i in 1:110){out[i,n]=mean(b[i,n,])}}
Example
a=matrix(1:125,25,5)
b=array(unlist(a),dim = c(5,5,5))
out=matrix(NA,5,5)
for(n in 1:5){
for(i in 1:5){out[i,n]=mean(b[i,n,])}}
> out
[,1] [,2] [,3] [,4] [,5]
[1,] 51 56 61 66 71
[2,] 52 57 62 67 72
[3,] 53 58 63 68 73
[4,] 54 59 64 69 74
[5,] 55 60 65 70 75
Hope this is what you're after.

Display values on heatmap in R

I am working on a heatmap using heatmap.2 and would like to know if there is anyway to display the values on all heatmap positions. For example for the area representing "1" and rating I would like to display value "43", for "2" and privileges the value 51 and so on.
My sample data is as follows:
rating complaints privileges learning raises critical advance
1 43 51 30 39 61 92 45
2 63 64 51 54 63 73 47
3 71 70 68 69 76 86 48
4 61 63 45 47 54 84 35
Is this what you mean? By providing the data object as the cellnote argument, the values are printed in the heatmap.
heatmap.2(data, # cell labeling
cellnote=data,
notecex=1.0,
notecol="cyan",
na.color=par("bg"))
The answer is just for "For Cell labeling is there anyway not to display values that are 0".
cellnote=ifelse(data==0, NA, data) will work as you want.
In python when using seaborn.heatmap by simply using annot=True, all the values are displayed in the heatmap plot. See the example below:

Resources