I would like to plot the following matrix x, so the column data are plotted according to their column name (i.e. 0.1, 0.2, etc.) on the x-axis.
> x
0.1 0.2 0.3 0.4 0.5
[1,] 5.000000e-01 5.000000e-01 5.000000e-01 5.000000e-01 0.5000000000
[2,] 2.500000e-02 5.000000e-02 7.500000e-02 1.000000e-01 0.1250000000
[3,] 2.437500e-03 9.500000e-03 2.081250e-02 3.600000e-02 0.0546875000
[4,] 2.431559e-04 1.881950e-03 6.113802e-03 1.388160e-02 0.0258483887
[5,] 2.430967e-05 3.756817e-04 1.822927e-03 5.475560e-03 0.0125901247
[6,] 2.430908e-06 7.510810e-05 5.458812e-04 2.178231e-03 0.0062158067
[7,] 2.430902e-07 1.502049e-05 1.636750e-04 8.693947e-04 0.0030885852
[8,] 2.430902e-08 3.004053e-06 4.909445e-05 3.474555e-04 0.0015395229
[9,] 2.430902e-09 6.008089e-07 1.472761e-05 1.389339e-04 0.0007685764
[10,] 2.430902e-10 1.201617e-07 4.418219e-06 5.556585e-05 0.0003839928
But when I use
plot(x, pch=20, ylim=c(0, 1))
I get the following: Plot of R matrix.
I want a plot, where x[1, 1] (i.e. 5.000000e-01) is plotted as a point on 0.1 on the x-axis and 0.5 on the y-axis.
set.seed(123)
mat<-matrix(rnorm(25),5,5)
colnames(mat)<-seq(0.1,0.5,length.out=5)
plot(x=matrix(rep(as.numeric(colnames(mat)),5), 5,5,byrow=T),y=mat)
here the first argument x will repeat the number on the x axis by 5, so 5 x 5 I'll get a matrix which will give the right x position to each y column.
matplot(x=matrix(rep(as.numeric(colnames(mat)),5), 5,5,byrow=T),y=mat)
Can also be used
Related
I am given an empirical distribution FXemp of a real-valued random variable X. Given now X1,..., Xn having the same distribution as X and dependencies given by a copula C. I would like now to produce random samples of X1,..., Xn element of R.
E.g. I am given a vector of samples and the corresponding cdf
x <- rnorm(1000)
df <- ecdf(x)
Assume that I pick for a example a t-student or Clayton copula C. How can I produce random samples of for example 10 copies of x, where their dependency is determined by C.
Is there an easy way?
Or are their any packages that can be used here?
You can sample from the copula (with uniform margins) by using the copula package, and then apply the inverse ecdf to each component:
library(copula)
x <- rnorm(100) # sample of X
d <- 5 # desired number of copies
copula <- claytonCopula(param = 2, dim = d)
nsims <- 25 # number of simulations
U <- rCopula(nsims, copula) # sample from the copula (with uniform margins)
# now sample the copies of X ####
Xs <- matrix(NA_real_, nrow = nsims, ncol = d)
for(i in 1:d){
Xs[,i] <- quantile(x, probs = U[,i], type = 1) # type=1 is the inverse ecdf
}
Xs
# [,1] [,2] [,3] [,4] [,5]
# [1,] -0.5692185 -0.9254869 -0.6821624 -1.2148041 -0.682162391
# [2,] -0.4680407 -0.4263257 -0.3456553 -0.6132320 -0.925486872
# [3,] -1.1322063 -1.2148041 -0.8115089 -1.0074435 -1.430405604
# [4,] 0.9760268 1.2600186 1.0731551 1.2369623 0.835024471
# [5,] -1.1280825 -0.8995429 -0.5761037 -0.8115089 -0.543125426
# [6,] -0.1848303 -1.2148041 -0.5692185 0.8974921 -0.613232036
# [7,] -0.5692185 -0.3070884 -0.8995429 -0.8115089 -0.007292346
# [8,] 0.1696306 0.4072428 0.7646646 0.4910863 1.236962330
# [9,] -0.7908557 -1.1280825 -1.2970952 0.3655081 -0.633521404
# [10,] -1.3226053 -1.0074435 -1.6857615 -1.3226053 -1.685761474
# [11,] -2.5410325 -2.3604936 -2.3604936 -2.3604936 -2.360493569
# [12,] -2.3604936 -2.2530003 -1.9311289 -2.2956444 -2.360493569
# [13,] 0.4072428 -0.2150035 -0.3564803 -0.1051930 -0.166434458
# [14,] -0.4680407 -1.0729763 -0.6335214 -0.8995429 -0.899542914
# [15,] -0.9143225 -0.1522242 0.4053462 -1.0729763 -0.158375658
# [16,] -0.4998761 -0.7908557 -0.9813504 -0.1763604 -0.283013334
# [17,] -1.2148041 -0.9143225 -0.5176347 -0.9143225 -1.007443492
# [18,] -0.2150035 0.5675260 0.5214050 0.8310799 0.464151265
# [19,] -1.2148041 -0.6132320 -1.2970952 -1.1685962 -1.132206305
# [20,] 1.4456635 1.0444720 0.7850181 1.0742214 0.785018119
# [21,] 0.3172811 1.2369623 -0.1664345 0.9440006 1.260018624
# [22,] 0.5017980 1.4068250 1.9950305 1.2600186 0.976026807
# [23,] 0.5675260 -1.0729763 -1.2970952 -0.3653535 -0.426325703
# [24,] -2.5410325 -2.2956444 -2.3604936 -2.2956444 -2.253000326
# [25,] 0.4053462 -0.5431254 -0.5431254 0.8350245 0.950891450
I have the following predictions which I obtained from library(vars). Lets call this vecm.pred
$price
fcst lower upper CI
[1,] 4956.787 4864.032 5049.543 92.75548
[2,] 4948.936 4844.545 5053.327 104.39064
[3,] 5089.440 4979.941 5198.939 109.49891
[4,] 5076.999 4939.429 5214.569 137.56992
[5,] 5000.012 4854.955 5145.068 145.05669
[6,] 5072.107 4910.435 5233.780 161.67272
$people
fcst lower upper CI
[1,] 2529.799 2417.699 2641.899 112.1000
[2,] 2498.627 2269.438 2727.817 229.1893
[3,] 2410.037 2116.672 2703.402 293.3648
[4,] 2418.197 2094.965 2741.429 323.2320
[5,] 2371.373 2028.816 2713.929 342.5561
[6,] 2289.163 1941.386 2636.939 347.7764
I am trying to use fanchart to show my forecasts below:
fanchart(vecm.pred, ylab = c("Price (€)","Volume"), main = c("Price","People"))
But I cannot get past the following issues:
1) How do I change the colors from the default grey scale to a heatmap of red to yellows?
2) How do I have alternative ylabs for my first and second plot? As my ylab function above just provides two y-axis names for each plot.
I have two vectors of latitudes and longitudes. I would like to find the maximum distance between the points. The way I see it, I should get a matrix of distances between all points and get the max of those.
So far I’ve done (using geosphere package for the last command):
> lat = dt[assetId == u_assetIds[1000], latitude]
> lon = dt[assetId == u_assetIds[1000], longitude]
>
> head(cbind(lat, lon))
lat lon
[1,] 0.7266145 -1.512977
[2,] 0.7270650 -1.504216
[3,] 0.7267265 -1.499622
[4,] 0.7233676 -1.487970
[5,] 0.7232196 -1.443160
[6,] 0.7225059 -1.434848
>
> distm(c(lat_1K[1], lon_1K[1]), c(lat_1K[4], lon_1K[4]), fun = distHaversine)
[,1]
[1,] 2807.119
How do I convert the last command into giving me a matrix of all pairwise distances? I am not familiar of how to do that in R, having more experience in Python.
Thanks.
Just briefly read the help document of distm, here is what I found:
distm(x, y, fun=distHaversine)
x: longitude/latitude of point(s). Can be a vector of two numbers, a matrix of 2 columns (first one is longitude, second is latitude) or a SpatialPoints* object
y: Same as x. If missing, y is the same as x
So what you should do is to simply input your cbind(lat, lon) as the first argument x. Here is some test:
> lat <- c(0.7266145, 0.7270650, 0.7267265, 0.7233676, 0.7232196, 0.7225059)
> lon <- c(-1.512977, -1.504216, -1.499622, -1.487970, -1.443160, -1.434848)
> distm(cbind(lon,lat))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0000 976.4802 1486.6045 2806.912 7780.5544 8708.6036
[2,] 976.4802 0.0000 512.7471 1854.601 6809.6464 7738.0538
[3,] 1486.6045 512.7471 0.0000 1349.813 6296.9308 7225.3240
[4,] 2806.9123 1854.6008 1349.8129 0.000 4987.8561 5913.8213
[5,] 7780.5544 6809.6464 6296.9308 4987.856 0.0000 928.6189
[6,] 8708.6036 7738.0538 7225.3240 5913.821 928.6189 0.0000
I am using RBrewer to to manually colour my ggplot bar chart but i'm having no luck.
I create my colour palette of blues and then assign it to a function for it to ramp.
blues <- brewer.pal(9, "Blues")
blue_range <- colorRamp(blues)
I then plot my stacked bar chart, where I know i have 20 groups.
ggplot(Month.Summary, aes(x=Calendar.Month, y = Measure, fill = Groups)) + geom_bar(stat="Identity", position = "fill") +scale_fill_manual(values = blue_range(20))
I unfortunately get the following error:
Error: Insufficient values in manual scale. 20 needed but only 3
provided.
I'm using Groups as my fill, where I know there are 2 instances. I'm passing 20 to the blue_range function so i'm not sure why it's saying i'm only passing 3 colours.
The blue_range() function expects values between 0 and 1. To get the discrete palette, pass a sequence to this function:
> blue_range(seq(0, 1, length.out = 20))
[,1] [,2] [,3]
[1,] 247.00000 251.00000 255.0000
[2,] 236.47368 244.26316 251.6316
[3,] 225.94737 237.52632 248.2632
[4,] 215.68421 230.78947 244.8947
[5,] 205.57895 224.05263 241.5263
[6,] 193.78947 217.21053 237.5263
[7,] 176.94737 210.05263 231.6316
[8,] 160.10526 202.89474 225.7368
[9,] 139.21053 191.68421 220.9474
[10,] 117.73684 179.89474 216.3158
[11,] 98.36842 168.10526 210.6316
[12,] 81.10526 156.31579 203.8947
[13,] 64.26316 144.26316 197.1053
[14,] 50.36842 130.36842 189.9474
[15,] 36.47368 116.47368 182.7895
[16,] 25.10526 102.89474 173.1053
[17,] 14.57895 89.42105 162.5789
[18,] 8.00000 75.78947 148.2632
[19,] 8.00000 61.89474 127.6316
[20,] 8.00000 48.00000 107.0000
This should work in the ggplot() call -- not tested because you didn't provide a reproducible example.
Note that recent ggplot2 has scale_fill_distiller() which provides a similar functionality with a more convenient interface.
Here is an excerpt of numeric matrix that I have
[1,] 30 -33.129487 3894754.1 -39.701738 -38.356477 -34.220534
[2,] 29 -44.289487 -8217525.9 -44.801738 -47.946477 -41.020534
[3,] 28 -48.439487 -4572815.9 -49.181738 -48.086477 -46.110534
[4,] 27 -48.359487 -2454575.9 -42.031738 -43.706477 -43.900534
[5,] 26 -38.919487 -2157535.9 -47.881738 -43.576477 -46.330534
[6,] 25 -45.069487 -5122485.9 -47.831738 -47.156477 -42.860534
[7,] 24 -46.207487 -2336325.9 -53.131738 -50.576477 -50.410534
[8,] 23 -51.127487 -2637685.9 -43.121738 -47.336477 -47.040534
[9,] 22 -45.645487 3700424.1 -56.151738 -47.396477 -50.720534
[10,] 21 -56.739487 1572594.1 -49.831738 -54.386577 -52.470534
[11,] 20 -46.319487 642214.1 -39.631738 -44.406577 -41.490534
What I want to do now, is to scale the values for each column to have values from 0 to 1.
I tried to accomplish this using the scale() function on my matrix (default parameters), and I got this
[1,] -0.88123100 0.53812440 -1.05963281 -1.031191482 -0.92872324
[2,] -1.17808251 -1.13538649 -1.19575096 -1.289013031 -1.11327085
[3,] -1.28847084 -0.63180980 -1.31265244 -1.292776849 -1.25141017
[4,] -1.28634287 -0.33914007 -1.12182012 -1.175023107 -1.19143220
[5,] -1.03524267 -0.29809911 -1.27795565 -1.171528133 -1.25738083
[6,] -1.19883019 -0.70775576 -1.27662116 -1.267774342 -1.16320727
[7,] -1.22910054 -0.32280189 -1.41807728 -1.359719044 -1.36810940
[8,] -1.35997055 -0.36443973 -1.15091204 -1.272613537 -1.27664977
[9,] -1.21415156 0.51127451 -1.49868058 -1.274226602 -1.37652260
[10,] -1.50924749 0.21727976 -1.33000083 -1.462151358 -1.42401647
[11,] -1.23207969 0.08873245 -1.05776452 -1.193844887 -1.12602635
Which is already close to what I want, but values from 0:1 were even better. I read the help manual of scale(), but I really don't understand how I would do that.
Try the following, which seems simple enough:
## Data to make a minimal reproducible example
m <- matrix(rnorm(9), ncol=3)
## Rescale each column to range between 0 and 1
apply(m, MARGIN = 2, FUN = function(X) (X - min(X))/diff(range(X)))
# [,1] [,2] [,3]
# [1,] 0.0000000 0.0000000 0.5220198
# [2,] 0.6239273 1.0000000 0.0000000
# [3,] 1.0000000 0.9253893 1.0000000
And if you were still to use scale:
maxs <- apply(a, 2, max)
mins <- apply(a, 2, min)
scale(a, center = mins, scale = maxs - mins)
Install the clusterSim package and run the following command:
normX = data.Normalization(x,type="n4");
scales package has a function called rescale:
set.seed(2020)
x <- runif(5, 100, 150)
scales::rescale(x)
#1.0000000 0.5053362 0.9443995 0.6671695 0.0000000
Not the prettiest but this just got the job done, since I needed to do this in a dataframe.
column_zero_one_range_scale <- function(
input_df,
columns_to_scale #columns in input_df to scale, must be numeric
){
input_df_replace <- input_df
columncount <- length(columns_to_scale)
for(i in 1:columncount){
columnnum <- columns_to_scale[i]
if(class(input_df[,columnnum]) !='numeric' & class(input_df[,columnnum])!='integer')
{print(paste('Column name ',colnames(input_df)[columnnum],' not an integer or numeric, will skip',sep='')) }
if(class(input_df[,columnnum]) %in% c('numeric','integer'))
{
vec <- input_df[,columnnum]
rangevec <- max(vec,na.rm=T)-min(vec,na.rm=T)
vec1 <- vec - min(vec,na.rm=T)
vec2 <- vec1/rangevec
}
input_df_replace[,columnnum] <- vec2
colnames(input_df_replace)[columnnum] <- paste(colnames(input_df)[columnnum],'_scaled')
}
return(input_df_replace)
}