xarray DataArray.where() reduced coordinate when masking - netcdf

xarray novice here. Very simple case, I have a precipitation type array (ntim x nlat x nlon) and a total precipitation array (same dimensions). Both are in separate netCDF files. I want to mask the precipitation array where A) precipitation is falling (> 1e-8 m/s rate) and B) the precipitation type is snow (maskvar = 0.0). The output array is therefore a "where is it snowing?" array.
When using xarray where() with multiple conditions from two different (but same-sized) arrays, only two latitudes persist (north and south pole) in the resulting masked array.
However, if I use a pre-masked array (from NCL, written as netCDF w/ same dims) as a test, it behaves as expected (i.e., returns ntim x nlat x nlon) array.
The only obvious thing that sticks out to me are that the lat coordinate is not identically typed between both arrays, although it's unclear why that would cause this to fail in this manner.
Any help appreciated.
Sample code:
ensnum='001'
indir = '/glade/u/home/zarzycki/scratch/LENS-snow/'
files = [indir+'/b.e11.B20TRC5CNBDRD.f09_g16.'+ensnum+'.cam.h2.PTYPE.1990010100Z-2005123118Z.nc']
indir2 = '/glade/p_old/cesmLE/CESM-CAM5-BGC-LE/atm/proc/tseries/hourly6/PRECT/'
files2 = [indir2+'/b.e11.B20TRC5CNBDRD.f09_g16.'+ensnum+'.cam.h2.PRECT.1990010100Z-2005123118Z.nc']
indir3 = indir
files3 = [indir3+'/b.e11.B20TRC5CNBDRD.f09_g16.'+ensnum+'.cam.h2.PRECT_SNOW.1990010100Z-2005123118Z.nc']
for idx, val in enumerate(files):
ds = xr.open_dataset(files[idx])
ds2 = xr.open_dataset(files2[idx])
ds3 = xr.open_dataset(files3[idx])
ptype = ds.PTYPE[1:11,:,:] # 10 time x 192 lat x 288 lon
prect1 = ds2.PRECT[1:11,:,:] # 10 time x 192 lat x 288 lon
prect2 = ds3.PRECT_SNOW[1:11,:,:] # 10 time x 192 lat x 288 lon
print('---------')
print(ptype)
print(prect1)
print(prect2)
ptype1 = ptype.where((ptype > -0.1) & (ptype < 0.1) & (prect1 > 1e-8))
ptype2 = ptype.where((ptype > -0.1) & (ptype < 0.1) & (prect2 > 1e-8))
print('---------')
print(ptype1)
print(ptype2)
Sample output showing that all read vars are (time: 10, lat: 192, lon: 288) but returned masked vars are (time: 10, lat: 2, lon: 288) and (time: 10, lat: 192, lon: 288)
---------
<xarray.DataArray 'PTYPE' (time: 10, lat: 192, lon: 288)>
[552960 values with dtype=float32]
Coordinates:
* lat (lat) float32 -90.0 -89.0576 -88.1152 -87.1728 -86.2304 -85.288 ...
* lon (lon) float32 0.0 1.25 2.5 3.75 5.0 6.25 7.5 8.75 10.0 11.25 ...
* time (time) datetime64[ns] 1990-01-01T12:00:00 1990-01-01T18:00:00 ...
<xarray.DataArray 'PRECT' (time: 10, lat: 192, lon: 288)>
[552960 values with dtype=float32]
Coordinates:
* lat (lat) float64 -90.0 -89.06 -88.12 -87.17 -86.23 -85.29 -84.35 ...
* lon (lon) float64 0.0 1.25 2.5 3.75 5.0 6.25 7.5 8.75 10.0 11.25 ...
* time (time) datetime64[ns] 1990-01-01T12:00:00 1990-01-01T18:00:00 ...
Attributes:
units: m/s
long_name: Total (convective and large-scale) precipitation rate (liq...
cell_methods: time: mean
<xarray.DataArray 'PRECT_SNOW' (time: 10, lat: 192, lon: 288)>
[552960 values with dtype=float32]
Coordinates:
* lat (lat) float32 -90.0 -89.0576 -88.1152 -87.1728 -86.2304 -85.288 ...
* lon (lon) float32 0.0 1.25 2.5 3.75 5.0 6.25 7.5 8.75 10.0 11.25 ...
* time (time) datetime64[ns] 1990-01-01T12:00:00 1990-01-01T18:00:00 ...
Attributes:
units: m/s
---------
<xarray.DataArray (time: 10, lat: 2, lon: 288)>
array([[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
...,
[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]]], dtype=float32)
Coordinates:
* lat (lat) float64 -90.0 90.0
* lon (lon) float32 0.0 1.25 2.5 3.75 5.0 6.25 7.5 8.75 10.0 11.25 ...
* time (time) datetime64[ns] 1990-01-01T12:00:00 1990-01-01T18:00:00 ...
<xarray.DataArray (time: 10, lat: 192, lon: 288)>
array([[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan],
...,
[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan],
...,
[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
...,
[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan],
...,
[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]],
[[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan],
...,
[ nan, nan, ..., nan, nan],
[ nan, nan, ..., nan, nan]]], dtype=float32)
Coordinates:
* lat (lat) float32 -90.0 -89.0576 -88.1152 -87.1728 -86.2304 -85.288 ...
* lon (lon) float32 0.0 1.25 2.5 3.75 5.0 6.25 7.5 8.75 10.0 11.25 ...
* time (time) datetime64[ns] 1990-01-01T12:00:00 1990-01-01T18:00:00 ...

Related

How can I efficiently convert a pandas dataframe with x y z coordinates into 4D numpy array?

example: Let's say we have coordinates such as x,y,z (these coordinates can be different) and specific value for each coordinate as shown below:
x y z a
0 219115 166637 923 NaN
1 219116 166637 923 NaN
2 219117 166637 923 NaN
3 219118 166637 923 NaN
4 219119 166637 923 NaN
... ... ... ... ...
124995 219160 166686 972 NaN
124996 219161 166686 972 NaN
124997 219162 166686 972 NaN
124998 219163 166686 972 NaN
124999 219164 166686 972 NaN
I want to convert it into 4 dimensional numpy array as shown below. I am going to use this 4D array to save the data in TIFF file, but it requires 4 dimensional numpy array dataset, that is why I am struggling little bit.
array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
...,
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]])
Thanks very much, hoping for the best.
I tried
`pd.pivot_table `
and then .to_numpy(),
but i could get only matrix of array, not the matrix of matrices (4D).

Looping through a matrix and plotting in R

I have two matrices in R lag_mat and r_mat and both have dimensions 16x16x3x2x2.
I have the following code that I use to create plot these in R.
library(R.matlab)
library("wesanderson")
library("ggplot2")
library("ggsci")
library(corrplot)
library(plotly)
library(viridis)
#CCO left and right stimulation time window 2
lag_mat = matrix(CCO_lag[, , 1,2], 16)
r_mat = matrix(CCO[, , 1,2], 16)
row = c(row(lag_mat))
col = c(col(lag_mat))
dd = data.frame( lag = c(lag_mat), r = c(r_mat), row, col )
p1 <- ggplot(dd, aes(x = row, y = col, size = lag, color = r)) +
geom_point( alpha = 1.5, stroke = 2.5) +
ggtitle("CCO, RIGHT Stimulation") +
theme(plot.title = element_text(size=10, face="bold"),
legend.position = "none",
axis.title.x=element_blank(),
axis.title.y=element_blank(),
panel.grid.major = element_line(size = 0.5, linetype = 'solid',
colour = "white"),
panel.grid.minor = element_line(size = 0.5, linetype = 'solid',
colour = "white"),axis.text.x = element_text(size=8)) +
# scale_color_viridis( begin = 0.2 , end = 1, direction = 1 )+
scale_color_gradient2(low = "#4169E1" , mid = "#ffffbf" , high = "#FF8C00", limits=c(-1 ,1)) +
# scale_y_reverse() +
# scale_size_area(trans = "reverse")+
scale_size_continuous(range = c(5,0),limits=c(-12,0))+
scale_x_discrete(limits=c("CP1","P7","P3","Pz","PO3","T1", "M1","Oz","M2","T2","PO4","P4","P8", "CP2","Cz","Fz")) +
scale_y_continuous(limits = c(1,16),breaks=seq(1,16,1))
The issue that I am having is that I need to loop through the last dimension. I have run some more analyses and instead of the last dimension of the matrices being 2, it's now 21. I used to just have two scripts that I used, one where I plotted (i.e. each of the dimensions in different scripts - not very efficient, I know).
r_mat = matrix(CCO[, , 1,1], 16)
and the other for
r_mat = matrix(CCO[, , 1,2], 16)
But now of course I can't have 21 scripts but I'm unsure how to loop and plot in R.
Can anyone help me with this? So I could loop through the last dimension and plot 21 figures using ggplot?
Thanks!
Here is the data, I have reproduced a smaller matrix such that both matrices are not dimension 16x16x1x2.
CCO<-structure(c(-0.492578655481339, NaN, NaN, NaN, -0.492525190114975,
-0.492525696754456, NaN, -0.492627799510956, -0.492677986621857,
-0.492468953132629, NaN, NaN, NaN, -0.49228835105896, -0.492546766996384,
-0.492437690496445, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, -0.521651923656464, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
0.473261743783951, NaN, 0.472789525985718, -0.600778460502625,
NaN, NaN, -0.600829541683197, -0.6008580327034, -0.601057589054108,
NaN, -0.600822031497955, -0.600911736488342, -0.600730240345001,
NaN, NaN, NaN, -0.600953936576843, -0.600802004337311, -0.600861430168152,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, -0.521026790142059, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, -0.577225089073181, NaN, NaN, -0.577208399772644,
-0.577145278453827, -0.577321112155914, NaN, -0.577184557914734,
-0.577165722846985, -0.577133357524872, NaN, NaN, NaN, -0.577190637588501,
-0.577230930328369, -0.577144026756287, -0.41020467877388, NaN,
NaN, NaN, -0.410186648368835, -0.410334318876266, NaN, -0.410211980342865,
-0.410197377204895, -0.410110324621201, NaN, NaN, NaN, -0.410272806882858,
NaN, NaN, -0.733388960361481, NaN, NaN, NaN, NaN, -0.733434438705444,
NaN, -0.733347833156586, -0.733303666114807, -0.733347356319427,
NaN, NaN, NaN, NaN, -0.733397245407104, -0.73332667350769, -0.702324509620667,
NaN, NaN, NaN, NaN, NaN, NaN, -0.702237844467163, -0.702238082885742,
-0.702193081378937, NaN, NaN, NaN, -0.702261865139008, -0.702301025390625,
NaN, -0.80294394493103, NaN, NaN, -0.802956938743591, -0.802938997745514,
-0.803096830844879, NaN, -0.802961885929108, -0.802923500537872,
-0.802861630916595, NaN, NaN, NaN, -0.803063333034515, -0.802979350090027,
-0.802873134613037, -0.684592604637146, NaN, NaN, -0.684580564498901,
-0.684580743312836, -0.684802889823914, NaN, -0.684630811214447,
-0.684578239917755, -0.684465110301971, NaN, NaN, NaN, -0.684730887413025,
-0.684608578681946, -0.684436023235321, -0.606923937797546, NaN,
NaN, NaN, -0.606987476348877, NaN, NaN, -0.606982827186584, NaN,
NaN, NaN, NaN, NaN, -0.606993675231934, NaN, NaN, -0.746234655380249,
NaN, NaN, -0.7463099360466, -0.746258854866028, -0.746564209461212,
NaN, -0.746362566947937, -0.746387183666229, -0.746385276317596,
NaN, NaN, NaN, -0.746756434440613, -0.746286571025848, -0.746472299098969,
NaN, NaN, NaN, NaN, -0.526792407035828, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, -0.526629209518433, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, -0.402197241783142, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, -0.515719473361969, NaN, NaN, NaN, -0.515782594680786,
-0.516006171703339, NaN, -0.515946447849274, -0.515853404998779,
-0.515883803367615, NaN, NaN, NaN, -0.515994668006897, -0.515867114067078,
-0.515911042690277, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, -0.4820496737957, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
0.535082995891571, NaN, 0.534462213516235, -0.567049205303192,
NaN, NaN, -0.567097425460815, -0.567124307155609, -0.567312657833099,
NaN, -0.567090332508087, -0.567174971103668, -0.567003667354584,
NaN, NaN, NaN, -0.567214787006378, -0.567071437835693, -0.567127525806427,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, -0.437827885150909, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, -0.496517241001129, NaN, NaN, -0.496502816677094,
-0.496448516845703, -0.496599793434143, NaN, -0.496482282876968,
-0.496466100215912, -0.496438264846802, NaN, NaN, NaN, -0.496487557888031,
-0.496522217988968, -0.496447324752808, 0.43168780207634, NaN,
NaN, NaN, 0.43162015080452, 0.431624948978424, NaN, 0.43173423409462,
0.431787043809891, 0.431506514549255, NaN, NaN, NaN, 0.431388199329376,
NaN, NaN, -0.673626005649567, NaN, NaN, NaN, NaN, -0.673667669296265,
NaN, -0.67358809709549, -0.673547565937042, -0.673587679862976,
NaN, NaN, NaN, NaN, -0.673633456230164, -0.673568665981293, -0.657320320606232,
NaN, NaN, NaN, NaN, NaN, NaN, -0.65728884935379, -0.657253861427307,
-0.657285273075104, NaN, NaN, NaN, -0.657291948795319, -0.657335460186005,
NaN, -0.793729186058044, NaN, NaN, -0.793741881847382, -0.793724238872528,
-0.793880224227905, NaN, -0.793746829032898, -0.793708860874176,
-0.793647706508636, NaN, NaN, NaN, -0.793846964836121, -0.793764173984528,
-0.793659150600433, -0.639408528804779, NaN, NaN, -0.639397382736206,
-0.63939756155014, -0.639605164527893, NaN, -0.639444351196289,
-0.63939505815506, -0.639289438724518, NaN, NaN, NaN, -0.639537692070007,
-0.639423429965973, -0.63926237821579, -0.567462205886841, NaN,
NaN, NaN, -0.567524492740631, NaN, NaN, -0.567518472671509, NaN,
NaN, NaN, NaN, NaN, -0.567527711391449, NaN, NaN, -0.76900988817215,
NaN, NaN, -0.769101619720459, -0.769054174423218, -0.769321501255035,
NaN, -0.769179046154022, -0.769175291061401, -0.769182145595551,
NaN, NaN, NaN, -0.769531965255737, -0.769078016281128, -0.769262313842773,
NaN, NaN, NaN, NaN, -0.0669489949941635, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, -0.0665916055440903, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, 0.425303876399994, NaN, NaN, NaN,
NaN, NaN, NaN, NaN), .Dim = c(16L, 16L, 2L))
CCO_lag<-structure(c(0, NaN, NaN, NaN, 0, 0, NaN, 0, 1, 0, NaN, NaN, NaN,
1, 0, 0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, -3, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 5, NaN, 5, -3, NaN, NaN,
-3, -3, -3, NaN, -3, -3, -3, NaN, NaN, NaN, -3, -3, -3, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, -1, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, -3, NaN, NaN, -3, -3, -3, NaN, -3, -3, -3, NaN, NaN,
NaN, -3, -3, -3, -4, NaN, NaN, NaN, -4, -4, NaN, -4, -4, -4,
NaN, NaN, NaN, -4, NaN, NaN, 0, NaN, NaN, NaN, NaN, 0, NaN, 0,
0, 0, NaN, NaN, NaN, NaN, 0, 0, 0, NaN, NaN, NaN, NaN, NaN, NaN,
0, 0, 0, NaN, NaN, NaN, 0, 0, NaN, 0, NaN, NaN, 1, 0, 1, NaN,
0, 1, 1, NaN, NaN, NaN, 1, 0, 1, -2, NaN, NaN, -2, -2, -2, NaN,
-2, -2, -1, NaN, NaN, NaN, -2, -2, -2, 0, NaN, NaN, NaN, 0.5,
NaN, NaN, 0.5, NaN, NaN, NaN, NaN, NaN, 0.5, NaN, NaN, 1, NaN,
NaN, 1, 1, 1, NaN, 1, 1, 1, NaN, NaN, NaN, 1, 1, 1, NaN, NaN,
NaN, NaN, 0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 0.5, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, 0, NaN, NaN, NaN, 0, 0, NaN, 0, 0, 0, NaN,
NaN, NaN, 0, 0, 0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, -4, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, 5, NaN, 5,
-2, NaN, NaN, -2, -2, -2, NaN, -2, -2, -2, NaN, NaN, NaN, -2,
-2, -2, NaN, NaN, NaN, NaN, NaN, NaN, NaN, -2, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, NaN, -2, NaN, NaN, -2, -2, -2, NaN, -2, -2,
-2, NaN, NaN, NaN, -2, -2, -2, -2, NaN, NaN, NaN, -2, -2, NaN,
-2, -2, -2, NaN, NaN, NaN, -2, NaN, NaN, -1, NaN, NaN, NaN, NaN,
-1, NaN, -1, -1, -1, NaN, NaN, NaN, NaN, -1, -1, -1, NaN, NaN,
NaN, NaN, NaN, NaN, -1, -1, -1, NaN, NaN, NaN, -1, -1, NaN, 0,
NaN, NaN, 0, 0, 0, NaN, 0, 0, 0, NaN, NaN, NaN, 0, 0, 0, -3,
NaN, NaN, -3, -3, -3, NaN, -3, -3, -2, NaN, NaN, NaN, -3, -3,
-3, 0, NaN, NaN, NaN, 0, NaN, NaN, 0, NaN, NaN, NaN, NaN, NaN,
0, NaN, NaN, 0, NaN, NaN, 0, 0, 0, NaN, 0, 0, 0, NaN, NaN, NaN,
0, 0, 0, NaN, NaN, NaN, NaN, 0, NaN, NaN, NaN, NaN, NaN, NaN,
NaN, NaN, NaN, NaN, 0, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN,
0, NaN, NaN, NaN, NaN, NaN, NaN, NaN), .Dim = c(16L, 16L, 2L))
You could loop along the desired dimension of your array by using lapply(seq_len(dim(my_array)[n]), ...), wherein n is your dimension of interest.
If you then use function(i) {...} inside the lapply() and put the i at the correct spot in the subsetting operation, it should pick out the appropriate data.
If the last line of the function outputs a ggplot object, it automatically gets saved in a list. Simplified example below:
library(ggplot2)
CCO<- array(rnorm(prod(16, 2, 1, 21)), c(16, 2, 1, 21))
CCO_lag <- array(rnorm(prod(16, 2, 1, 21)), c(16, 2, 1, 21))
plots <- lapply(seq_len(dim(CCO)[4]), function(i) {
lag_mat = matrix(CCO_lag[, , 1,i], 16)
r_mat = matrix(CCO[, , 1,i], 16)
row = c(row(lag_mat))
col = c(col(lag_mat))
dd = data.frame( lag = c(lag_mat), r = c(r_mat), row, col )
ggplot(dd, aes(x = row, y = col)) +
geom_point(alpha = 1.5, stroke = 2.5)
})
# Just to show plots come out
patchwork::wrap_plots(plots)
Created on 2021-01-07 by the reprex package (v0.3.0)

How to obtain the path from a traveling salesman problem in R using TSP package

Let's suppose I have the following cost matrix, and I would like the path (and total cost) starting from node 20 from the perspective of Traveling salesman problem via nearest insertion method.
ds.ex <- structure(c(0, Inf, Inf, 1.9, 1.7, Inf, 0, 7.3, 7.4, 7.2, Inf,
7.3, 0, 7.7, 7.8, 1.9, 7.4, 7.7, 0, 9.2, 1.7, 7.2, 7.8, 9.2,
0), .Dim = c(5L, 5L), .Dimnames = list(c("2", "13", "14", "17",
"20"), c("2", "13", "14", "17", "20")))
ds.ex
2 13 14 17 20
2 0.0 Inf Inf 1.9 1.7
13 Inf 0.0 7.3 7.4 7.2
14 Inf 7.3 0.0 7.7 7.8
17 1.9 7.4 7.7 0.0 9.2
20 1.7 7.2 7.8 9.2 0.0
I am using TSP package to solve:
ds.ex.tsp <- as.TSP(ds.ex)
(a <- solve_TSP(ds.ex.tsp, method = "nearest_insertion", start=5))
object of class ‘TOUR’
result of method ‘nearest_insertion’ for 5 cities
tour length: 25.8
Can I get the path from:
`attr(a, "names")
[1] "20" "2" "17" "14" "13"
?
If that is really the path, why isn't the path 20-2-17-13-14 the result? Once after having nodes 20, 2 and 17 visited, the one with smaller cost is the 13 and not 14.
Thanks in advance!
We can use labels.TSP, i.e.
library(TSP)
ds.ex.tsp <- as.TSP(ds.ex)
a <- solve_TSP(ds.ex.tsp, method = "nearest_insertion", start = 5)
labels(a)
#[1] "20" "13" "14" "17" "2"
Note that in the nearest insertion heuristic you add cities to a route based on its minimal distance to all the cities that are already in the route. A city is chosen at random if there are two cities that have the same distance. So solve_TSP may return different optimal paths upon replication. This seems to be the case in the example you give.
Sample data
ds.ex <- structure(c(0, Inf, Inf, 1.9, 1.7, Inf, 0, 7.3, 7.4, 7.2, Inf,
7.3, 0, 7.7, 7.8, 1.9, 7.4, 7.7, 0, 9.2, 1.7, 7.2, 7.8, 9.2,
0), .Dim = c(5L, 5L), .Dimnames = list(c("2", "13", "14", "17",
"20"), c("2", "13", "14", "17", "20")))

R - convert nan to 0 results in all 0's

I have a data frame containing NaN's that I'd like to convert to 0's. I wrote a function that I think should work:
fix_nan <- function(x){
return(x[is.nan(x)] <- 0)
}
And then I apply it to the data frame:
train_e <- structure(list(pack_id = structure(1:10, .Label = c("1", "2",
"4", "5", "7", "8", "9", "10", "11", "14"), class = "factor"),
item_1 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), item_2 = c(NaN,
NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN), item_3 = c(1.45225232891169,
0.613104472886409, NaN, 1.02450431651439, 0.735706794978741,
0.741937344729377, NaN, 0.83034830207343, 0.97650959186721,
0.750305594399894), item_4 = c(0.645137961373585, 0.615792803650477,
Inf, 0.752866415261568, 0.84901755126673, 0.646398200985872,
Inf, 0.786548355648346, 0.725113372622438, 0.709897990984761
), item_5 = c(NaN, NaN, NaN, 0, 0, 0, NaN, NaN, 0, 0), item_6 = c(0.510825623765991,
0.510825623765991, NaN, 0.510825623765991, 0.510825623765991,
0.510825623765991, NaN, 0.510825623765991, 0.847297860387204,
0.510825623765991)), .Names = c("pack_id", "item_1", "item_2",
"item_3", "item_4", "item_5", "item_6"), row.names = c(26155L,
6236L, 6281L, 6014L, 6035L, 26217L, 5576L, 6316L, 5594L, 26244L
), class = "data.frame")
vtf1 <- c('item_1','item_2','item_3','item_4','item_5','item_6')
train_e[,vtf1] <- as.data.frame(lapply(train_e[,vtf1], fix_nan))
head(train_e)
And I get all 0's:
> head(train_e)
pack_id item_1 item_2 item_3 item_4 item_5 item_6
26155 1 0 0 0 0 0 0
6236 2 0 0 0 0 0 0
6281 4 0 0 0 0 0 0
6014 5 0 0 0 0 0 0
6035 7 0 0 0 0 0 0
26217 8 0 0 0 0 0 0
Any suggestions ?
x[is.nan(x)] <- 0 returns only those elements of x that were NaN (and are now zero). To fix this, change your function:
fix_nan <- function(x){
x[is.nan(x)] <- 0
x
}

R - Error using sapply to remove constant columns in matrix

I'm trying to remove all columns that are constant in a matrix, but am receiving this error:
Error in X[, sapply(X, function(x) length(unique(x)) != 1)] :
(subscript) logical subscript too long
I'm not entirely sure why this error is popping up
Example
X <- structure(c(143.3, 152.37, 138.74, 149.87, 103.21, 130.98, 151.21,
103.34, 126.5, 86.87, 561.24, 633.21, 529.73, 621.18, 319.53,
476.16, 620.08, 279.21, 416.97, 184.58, 25.97, 30.05, 17.14,
37.7, 9.7, 15.9, 24.95, -1.84, 7.5, -9.95, 4.74, 14.32, 4.39,
5.1, 5.46, 4.87, 7.21, 4.31, 3.77, 4.32, 22.47, 205.1, 19.29,
25.96, 29.8, 23.74, 52.04, 18.6, 14.18, 18.66, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0), .Dim = c(10L, 8L), .Dimnames = list(c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10"), c("dday0_10", "dday10_30",
"dday30C", "prec", "prec_sq", "(Intercept)", "statear", "statede"
)))
X[,sapply(X,function(x) length(unique(x))!=1)]
> Error in X[, sapply(X, function(x) length(unique(x)) != 1)] :
(subscript) logical subscript too long
I'd like solutions which keep the data in a matrix format.
If you need to keep your data in matrix format, then try this:
X[,apply(X,2,function(x) length(unique(x))!=1)]
Output:
dday0_10 dday10_30 dday30C prec prec_sq
1 143.30 561.24 25.97 4.74 22.47
2 152.37 633.21 30.05 14.32 205.10
3 138.74 529.73 17.14 4.39 19.29
4 149.87 621.18 37.70 5.10 25.96
5 103.21 319.53 9.70 5.46 29.80
6 130.98 476.16 15.90 4.87 23.74
7 151.21 620.08 24.95 7.21 52.04
8 103.34 279.21 -1.84 4.31 18.60
9 126.50 416.97 7.50 3.77 14.18
10 86.87 184.58 -9.95 4.32 18.66

Resources