I am importing a file and trying to display only the numbers in each row, with any commas or labels. With the following code, my output is given below:
mydata <- read.table("/home/mukhera3/Desktop/Test/part-r-00000", sep=",")
mydata
Output
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V461 V462 V463 V464 V465 V466 V467 V468 V469 V470 V471 V472 V473 V474 V475 V476 V477 V478 V479 V480 V481 V482 V483
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V484 V485 V486 V487 V488 V489 V490 V491 V492 V493 V494 V495 V496 V497 V498 V499 V500 V501 V502 V503 V504 V505 V506
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V507 V508 V509 V510 V511 V512 V513 V514 V515 V516 V517 V518 V519 V520 V521 V522 V523 V524 V525 V526 V527 V528 V529
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V530 V531 V532 V533 V534 V535 V536 V537 V538 V539 V540 V541 V542 V543 V544 V545 V546 V547 V548 V549 V550 V551 V552
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
V553 V554 V555 V556 V557 V558 V559 V560 V561 V562 V563 V564 V565 V566 V567 V568 V569 V570 V571 V572 V573 V574 V575
When I replace the "," for sep with whitespace (sep=""), keeping everything else the same. this is what I get:
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
I want to display the numbers 0,1 .. without any commas or other row numbers etc. I am new to R programming, and do not know how to do this. Any help would be appreciated.
If you want your file to be read directly as a vector and not as a dataframe, you can, for instance, use scan instead of read.table. Example with your example file saved as a.txt in my working directory:
> mydata <- scan(file="a.txt",sep=",")
Read 46 items
> mydata
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
You can also get that result from read.table with some additional steps:
> mydata <- read.table("a.txt",sep=",") # Reads your file as a data.frame
> mydata <- unlist(mydata) # Transforms into a named vector
> names(mydata) <- NULL # Gets rid of the names
> mydata
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
If you just want to "display" it like that but don't want to change the nature of your table, you can simply use cat (combined with unlist):
> mydata <- read.table("a.txt",sep=",")
> cat(unlist(mydata))
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Related
I have a geotiff image created using GDAL.
I would like to know if it is possible using GDAL and Python (or only one of them) to extract the pixel percentage of specific value.
In particular, if I do:
gdalinfo -hist input.tif
I get all the metadata info and, in particular,
Size is 4901, 2867
...
Band 1 Block=4901x1 Type=Byte, ColorInterp=Palette
Minimum=0.000, Maximum=5.000, Mean=2.263, StdDev=1.135
0...10...20...30...40...50...60...70...80...90...100 - done.
256 buckets from -0.5 to 255.5:
1740973 365790 6385650 3688110 1757506 113138 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Is there a way to calculate the pixel percentage of the 6 values defined in the histogram?
The size of the .tif file is 4901x2867 so if I can extract each of those fields using GDAL and/or Python then I can calculate something like this:
pixel_value_0 = 1740973/(4901x2867)
and get the percentage of the pixel value 0
You can convert raster image to a Numpy array and then do the calculation if using Python
from collections import Counter
from osgeo import gdal_array
# Read raster data as numeric array from file
rasterArray = gdal_array.LoadFile('RGB.byte.tif')
# The 3rd band
band3 = rasterArray[2]
# Flatten the 2D array to 1D and count occurrences of each values
# Then simple to get the stat for a pixel value in particular
print(Counter(band3.flatten()))
I am working on a homework assignment using intervention analysis. The question is:
Generate a simulation of the difference equation y_t=a_0+〖a_1 y〗_(t-1)+〖c_0 z〗_t+x_t where x_t is the forcing process x_t=w_t, w_t is a white noise, and 〖|a〗_1 |<1. Define the intervention variable z_t as binary (0,1) but you may choose the start time of the intervention; assume the intervention lasts for 2 units of time.
So I wrote this code:
set.seed(50)
y <- w <- rnorm(200, sd=1)
alpha0 <- 1
alpha1 <- 0.9
cee0 <- 1
z <-rep(0, 200)
for (t in 1:200) {z[t] <- ifelse( t = 78:79,1,0)}
So the intervention would occur at the 78th and 79th instant.
But this does not work. I keep getting this error/warning message:
In z[t] <- ifelse(t = 77:78, 1, 0) :
number of items to replace is not a multiple of replacement length
I have tried the analysis using a continuous intervention at the 100th instant and it works fine:
z <-rep(0, 200)
for (t in 1:200) {z[t] <- ifelse( t > 100,1,0)}
So why does the t > 100 work but t = 77:78 not work? Is there something I am missing here?
You could change your command as follows.
for (t in 1:200) {z[t] <- ifelse( t %in% 78:79,1,0)}
> z
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[57] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[113] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[169] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I am trying to make function that returns geometric mean with data.
I want to make loop with try() to pass valid data, but what i tried actually didn't worked.
This is function
R=function(g)
{
k=1
n=length(g)
for(i in 1 : n)
{
ifelse(g[i]>0, k<-k*g[i], stop("Negative component"))
k
}
t=k^(1/n)
t
}
And i want to use this function in this loop
set.seed(123)
data <- matrix(rnorm(10000, mean=3), ncol=25, dimnames=list(NULL, paste("X",
1:25, sep=".")))
v=rep(0,400)
for(i in 1 : 400)
{
try("v[i]=R(data[,i])",TRUE)
}
v
I want to get means for valid data, but it makes all value to 0
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[109] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[145] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[181] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[217] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[253] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[289] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[325] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[361] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[397] 0 0 0 0
Can you let me know where was wrong??
Thanks
I guess the problem is with the missing declaration of t as a numeric vector and the syntax of try() function. The following code should work, it worked for me.
R=function(g)
{
t = numeric(length(g))
k=1
n=length(g)
for(i in 1 : n)
{
ifelse(g[i]>0, k<-k*g[i], stop("Negative component"))
}
t=k^(1/n)
t
}
set.seed(123)
data <- matrix(rnorm(10000, mean=3), ncol=25, dimnames=list(NULL,paste("X",
1:25, sep=".")))
v=numeric(400)
for(i in 1 : 400)
{
try(v[i]<-R(data[i,]),TRUE)
}
v
I am able to follow the Circlize example in the description of the package on CRAN easily:
library('circlize')
set.seed(123)
mat = matrix(sample(1:100, 18, replace = TRUE), 3, 6)
rownames(mat) = letters[1:3]
colnames(mat) = LETTERS[1:6]
### basic settings
par(mfrow = c(3, 2))
par(mar = c(1, 1, 1, 1))
chordDiagram(mat)
however, when I replace mat with myMatrix I get this error:
Error in circos.initialize(factors = factor(cate, levels = cate), xlim = cbind(rep(0, :
Since `xlim` is a matrix, it should have same number of rows as the length of the level of `factors` and number of columns of 2.
Can somebody explain why I am getting that message? I do not see a difference between mat and myMatrix other than myMatrix is larger:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A2 B2 C2 D2
A 1060360.659 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 32143148.75 996976.8445 0 4944648.524 5688385.041 61990.5913 0 0 0 0 -1563.225 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31922242.6
C 0 0 6342776.843 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 28617385.81 17842142.64 0 0 0 0 0 0 0 0 409444.5633 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 4990921.202 105686446.3 536246.2188 0 0 0 0 0 0 0 8587899.583 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 378565.5746
F 0 92732.7741 0 4282.9319 33543553.89 36773976.59 1894761.93 0 0 333209.342 0 20739.0655 327956.7365 0 1022673.163 12229.0255 0 0 386112.1743 224039.3207 0 2395066.197 268247.2897 0 0 0 0 0 0 11926701.96
G 0 0 0 0 0 0 7753767.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 5184133.29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 462767.7374 0 0 0 8992223.296 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 1950552.642 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 891032.5584 0 0 0 0 0 520107.9821 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26724.8402 0 0 0 418902.5203
L 0 0 0 0 32044317.54 28147.5693 0 0 0 0 0 5383919.293 0 489912.5412 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4559115.003
M 0 0 0 0 0 3125823.41 0 0 0 0 0 0 1738293.164 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
N 0 1053825.966 -8526.9758 1283429.314 60333051.34 2621812.931 -1130.1924 0 -779545.8004 8055145.684 918.8702 -379747.1919 -177.6205 298563606.5 -9316.8654 0 0 0 0 0 2631991.077 0 0 0 0 0 1107369.803 0 0 118812465
O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1500451.292 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7432418.396
P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 0 1496058.76 0 -4056617.74 294503 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 410.4 0 0 0 0 0 0 0 1765984767
Code
dd <- read.table(header = TRUE, text = " rn A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A2 B2 C2 D2
A 1060360.659 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
B 0 32143148.75 996976.8445 0 4944648.524 5688385.041 61990.5913 0 0 0 0 -1563.225 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31922242.6
C 0 0 6342776.843 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 0 28617385.81 17842142.64 0 0 0 0 0 0 0 0 409444.5633 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 4990921.202 105686446.3 536246.2188 0 0 0 0 0 0 0 8587899.583 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 378565.5746
F 0 92732.7741 0 4282.9319 33543553.89 36773976.59 1894761.93 0 0 333209.342 0 20739.0655 327956.7365 0 1022673.163 12229.0255 0 0 386112.1743 224039.3207 0 2395066.197 268247.2897 0 0 0 0 0 0 11926701.96
G 0 0 0 0 0 0 7753767.003 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 5184133.29 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 462767.7374 0 0 0 8992223.296 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
J 0 0 0 0 0 0 0 0 0 1950552.642 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 891032.5584 0 0 0 0 0 520107.9821 0 0 0 0 0 0 0 0 0 0 0 0 0 0 26724.8402 0 0 0 418902.5203
L 0 0 0 0 32044317.54 28147.5693 0 0 0 0 0 5383919.293 0 489912.5412 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4559115.003
M 0 0 0 0 0 3125823.41 0 0 0 0 0 0 1738293.164 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
N 0 1053825.966 -8526.9758 1283429.314 60333051.34 2621812.931 -1130.1924 0 -779545.8004 8055145.684 918.8702 -379747.1919 -177.6205 298563606.5 -9316.8654 0 0 0 0 0 2631991.077 0 0 0 0 0 1107369.803 0 0 118812465
O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1500451.292 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7432418.396
P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Q 0 0 1496058.76 0 -4056617.74 294503 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 410.4 0 0 0 0 0 0 0 1765984767")
myMatrix <- as.matrix(dd[, -1])
rownames(myMatrix) <- dd[, 1]
chordDiagram(myMatrix)
In the old version of circlize, the matrix must be of a matrix class instead of a data.frame, so you need to convert the data frame explicitly by:
myMatrix = as.matrix(A + B)
In circlize, a data frame is for data stored as a adjacency list (e.g the first column for group1, second column for group2, third column for the strength of the relation).
Since read.table() always returns a data.frame class, in the newer version of circlize, it is fine if the matrix represents as a data frame. When it is a data frame, the chordDiagram() will first check whether the number of columns is larger than 3 and all columns are numeric. If so, it will be converted to a matrix internally.
I have the following sample data:
Hostname Date-Time hdisk86 hdisk88 hdisk90 hdisk89 hdisk91 hdisk92 hdisk93 hdisk94 hdisk96 hdisk95
1: hostname1 2015-01-26 00:15:22 0 0 0 0 0 0 0 0 0 0
2: hostname1 2015-01-26 00:30:24 0 0 0 0 0 0 0 0 0 0
3: hostname1 2015-01-26 00:45:25 0 0 0 0 0 0 0 0 0 0
4: hostname1 2015-01-26 01:00:25 0 0 0 0 0 0 0 0 0 0
5: hostname1 2015-01-26 01:15:28 0 0 0 0 0 0 0 0 0 0
6: hostname1 2015-01-26 01:30:29 0 0 0 0 0 0 0 0 0 0
hdisk98 hdisk97 hdisk99 hdisk100 hdisk101 hdisk102 hdisk103 hdisk108 hdisk107 hdisk104 hdisk105 hdisk109 hdisk110
1: 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk112 hdisk111 hdisk113 hdisk114 hdisk115 hdisk116 hdisk117 hdisk87 hdisk118 hdisk120 hdisk119 hdisk122
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk123 hdisk124 hdisk125 hdisk121 hdisk127 hdisk126 hdisk2 hdisk3 hdisk5 hdisk4 hdisk6 hdisk10 hdisk11 hdisk8
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk12 hdisk9 hdisk18 hdisk14 hdisk15 hdisk17 hdisk16 hdisk13 hdisk106 hdisk19 hdisk20 hdisk7 hdisk21 hdisk28
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk33 hdisk32 hdisk27 hdisk30 hdisk23 hdisk35 hdisk40 hdisk25 hdisk41 hdisk39 hdisk38 hdisk43 hdisk22 hdisk36
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk31 hdisk45 hdisk29 hdisk44 hdisk34 hdisk37 hdisk48 hdisk24 hdisk47 hdisk42 hdisk46 hdisk49 hdisk53 hdisk50
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk56 hdisk55 hdisk54 hdisk52 hdisk59 hdisk62 hdisk58 hdisk64 hdisk61 hdisk65 hdisk60 hdisk67 hdisk66 hdisk57
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk51 hdisk69 hdisk63 hdisk74 hdisk70 hdisk72 hdisk75 hdisk68 hdisk73 hdisk76 hdisk71 hdisk78 hdisk85 hdisk81
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk80 hdisk83 hdisk79 hdisk82 hdisk77 hdisk84 hdisk26 hdisk0 hdisk1 hdisk128 hdisk129 hdisk130 hdisk131 hdisk132
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk133 hdisk134 hdisk135 hdisk136 hdisk137 hdisk138 hdisk139 hdisk140 hdisk141 hdisk142 hdisk143 hdisk144
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
4: 0 0 0 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0 0 0 0
6: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk145 hdisk146 hdisk147 hdisk148 hdisk149
1: 0 0 0 0 0
2: 0 0 0 0 0
3: 0 0 0 0 0
4: 0 0 0 0 0
5: 0 0 0 0 0
6: 0 0 0 0 0
What I'm trying to do is to take the mean, weighted.mean, and max values of each hdisk column, transpose this data to then sort by weighted.mean, max and mean. Then transpose back to plot in a bar chart. Here we go...
First taking the summary info (mean, weighted.mean, and max):
# Creating summary of I/O data (avg, wavg, max)...
c <- grep( "hdisk", names(DISKAVGRIO))
b <- c("Avg", "WAvg", "Max")
wavg = function(x) {
wavg.return <- weighted.mean(x, x)
if (is.nan(wavg.return)) {
return(0)
} else {
return(wavg.return)
}
}
my.summary = function(x) list(avg = mean(x), wavg = wavg(x), max = as.numeric(max(x)))
DT <- DISKAVGRIO[, lapply(.SD, my.summary), .SDcols=c]
DT[, `summary` := list("Avg", "WAvg", "Max")]
setcolorder(DT, c("summary", setdiff(names(DT), "summary")))
Them I have the following data table:
summary hdisk86 hdisk88 hdisk90 hdisk89 hdisk91 hdisk92 hdisk93 hdisk94 hdisk96 hdisk95 hdisk98 hdisk97 hdisk99
1: Avg 0 0 0 0 0 0 0 0 0 0 0 0 0
2: WAvg 0 0 0 0 0 0 0 0 0 0 0 0 0
3: Max 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk100 hdisk101 hdisk102 hdisk103 hdisk108 hdisk107 hdisk104 hdisk105 hdisk109 hdisk110 hdisk112 hdisk111
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk113 hdisk114 hdisk115 hdisk116 hdisk117 hdisk87 hdisk118 hdisk120 hdisk119 hdisk122 hdisk123 hdisk124
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk125 hdisk121 hdisk127 hdisk126 hdisk2 hdisk3 hdisk5 hdisk4 hdisk6 hdisk10 hdisk11 hdisk8 hdisk12 hdisk9
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk18 hdisk14 hdisk15 hdisk17 hdisk16 hdisk13 hdisk106 hdisk19 hdisk20 hdisk7 hdisk21 hdisk28 hdisk33 hdisk32
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk27 hdisk30 hdisk23 hdisk35 hdisk40 hdisk25 hdisk41 hdisk39 hdisk38 hdisk43 hdisk22 hdisk36 hdisk31 hdisk45
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk29 hdisk44 hdisk34 hdisk37 hdisk48 hdisk24 hdisk47 hdisk42 hdisk46 hdisk49 hdisk53 hdisk50 hdisk56 hdisk55
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk54 hdisk52 hdisk59 hdisk62 hdisk58 hdisk64 hdisk61 hdisk65 hdisk60 hdisk67 hdisk66 hdisk57 hdisk51 hdisk69
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk63 hdisk74 hdisk70 hdisk72 hdisk75 hdisk68 hdisk73 hdisk76 hdisk71 hdisk78 hdisk85 hdisk81 hdisk80 hdisk83
1: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk79 hdisk82 hdisk77 hdisk84 hdisk26 hdisk0 hdisk1 hdisk128 hdisk129 hdisk130 hdisk131 hdisk132 hdisk133
1: 0 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0 0
hdisk134 hdisk135 hdisk136 hdisk137 hdisk138 hdisk139 hdisk140 hdisk141 hdisk142 hdisk143 hdisk144 hdisk145
1: 0 0 0 0 0 0 0 0 0 0 0 0
2: 0 0 0 0 0 0 0 0 0 0 0 0
3: 0 0 0 0 0 0 0 0 0 0 0 0
hdisk146 hdisk147 hdisk148 hdisk149
1: 0 0 0 0
2: 0 0 0 0
3: 0 0 0 0
Then I transform from wide to long:
# Converting from wide to long...
d <- grep("hdisk", names(DT), value = T)
DT_mdf <- melt(DT,
id.vars="summary",
measure.vars=d,
variable.name="hdisks",
value.name="percentage")
And get the following data table:
summary hdisks percentage
1: Avg hdisk86 0
2: WAvg hdisk86 0
3: Max hdisk86 0
4: Avg hdisk88 0
5: WAvg hdisk88 0
---
446: WAvg hdisk148 0
447: Max hdisk148 0
448: Avg hdisk149 0
449: WAvg hdisk149 0
450: Max hdisk149 0
Then, I try to transpose:
# Transpose to sort by wavg...
DT3 <- dcast(DT_mdf, summary ~ hdisks)
And I get the error message:
Using percentage as value column: use value.var to override.
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
If I try to set value.var = percentage I get the following error message:
Error in match(x, table, nomatch = 0L) :
'match' requires vector arguments
Why this is not working? Aparently it suppose to work. Somebody has any idea?
Your function returns a list, and using lapply() on each column therefore results in each cell of the aggregated result as a list as well. You should be able to check this by looking at the class of all the columns. dcast() is looking for atomic type.
It's much more straightforward to get to your final result by using c() instead of list() in this case (note tested due to lack of MRE):
summary.funs = c("mean", "wavg", "max")
my.summary = function(x) c(mean(x), wavg(x), as.numeric(max(x)))
DT <- DISKAVGRIO[, lapply(.SD, my.summary), .SDcols=c][, summary := summary.funs]
should get you the result in the final format.
The Introduction to data.table vignette explains how to efficiently use j to get the data in the format you desire.
Also of use might be the Efficient reshaping using data.tables vignette.
For updates on vignettes, bookmark/check the Getting started page on project wiki. Also keep an eye on issue #944 and the CRAN data.table page for vignettes corresponding the current version.