Find the minimum number of shipments between two countries to minimize the cost of the system - recursion

Let's suppose transportation between two countries. We have a list of containers with different weights. Our goal is to minimize the number of shipments between two countries to minimize the cost of the system.
In this problem, our ships have a limited capacity to load containers for each shipment. For Example
Total Weight = 80 and list of countries countries = [19, 29, 43, 45, 32, 22, 51, 65, 31, 13, 62]
Here is the code i've written so for
from itertools import chain, combinations
def powerset(list_name):
s = list(list_name)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
A = list(cargo.values())
#A.append(19)
print(A)
res = []
for x in powerset(sorted(A)):
if sum(x)==80:
if x not in res:
res.append(x)
print(res)
And I got the output as:
[(29, 51), (13, 22, 45), (19, 29, 32)]
Here 29 has occurred twice which shouldn't happen and i want to find the remaining possible combinations which should give the overall output as 5.

Related

Kaplan-Meier (KM) / Cumulative Hazard (CH) in R

I have survival data, not as data frame, but vectors of groups sizes and cumulative events in each group at consecutive time points (5 days difference, for that matter), like this:
note that people usually move from group 1 to group 2 during the study period
(and some are just censored from either group due to other reasons)
group_1_sizes_at_time_points = c(550648, 453524, 329688, 284252, 264512, 250861, 243292, 238311, 233847)
group_2_sizes_at_time_points = c(12817, 109774, 233373, 278549, 298038, 311424, 318775, 323619, 328022)
group_1_cumulative_deaths_at_time_points = c(0, 2, 17, 51, 85, 125, 163, 193, 232)
group_2_cumulative_deaths_at_time_points = c(0, 0, 4, 17, 28, 49, 65, 79, 92)
Which function should I use to calculate Kaplan-Meier / cumulative hazard and plot something like this, and how should I organize the data in order to pass it to this function?
what I found is only for data frame organized as it should be.

Creating subgraphs with overlapping vertices

I've been looking for packages using which I could create subgraphs with overlapping vertices.
From what I understand in Networkx and metis one could partition a graph into two or multi-parts. But I couldn't find how to partition into subgraphs with overlapping nodes.
Suggestions on libraries that support partitioning with overlapping vertices will be really helpful.
EDIT: I tried the angel algorithm in CDLIB to partition the original graph into subgraphs with 4 overlapping nodes.
import networkx as nx
from cdlib import algorithms
if __name__ == '__main__':
g = nx.karate_club_graph()
coms = algorithms.angel(g, threshold=4, min_community_size=10)
print(coms.method_name)
print(coms.method_parameters) # Clustering parameters)
print(coms.communities)
print(coms.overlap)
print(coms.node_coverage)
Output:
ANGEL
{'threshold': 4, 'min_community_size': 10}
[[14, 15, 18, 20, 22, 23, 27, 29, 30, 31, 32, 8], [1, 12, 13, 17, 19, 2, 21, 3, 7, 8], [14, 15, 18, 2, 20, 22, 30, 31, 33, 8]]
True
0.6470588235294118
From the communities returned, I understand 1 and 3 have an overlap of 4 nodes but 2 and 3 or 1 and 3 don't have an overlap size of 4 nodes.
It is not clear to me how the overlap threshold (4 overlaps) has to be specified
here algorithms. angel(g, threshold=4, min_community_size=10). I tried setting threshold=4 here to define an overlap size of 4 nodes. However, from the documentation available for angel
:param threshold: merging threshold in [0,1].
I am not sure how to translate the 4 overlaps to the value that has to be set between the bounds [0, 1]. Suggestions will be really helpful.
You can check out CDLIB:
They have a great amount of community finding algorithms applicable to networkX, including some overlapping communities algorithms.
On a side note:
The return type of the functions is called Node Clustering which might be a little confusing at first so here are the methods applicable to it, usually you simply want to convert to a Python dictionary.
Specifically about the angel algorithm in CDLIB:
According to ANGEL: efficient, and effective, node-centric community discovery in static and dynamic networks, the threshold is not the overlapping threshold, but used as follows:
If the ratio is greater than (or equal to) a given threshold, the merge is applied and the node label updated.
Basically, this value determines whether to further merge the nodes into bigger communities, and is not equivalent to the number of overlapping nodes.
Also, don't mistake "labels" with "node's labels" (as in nx.relabel_nodes(G, labels)). The "labels" referred are actually correlated with the Label Propagation Algorithm which is used by ANGEL.
As for the effects of varying this threshold:
[...] Increasing the threshold, we obtain a higher number of communities since lower quality merges cannot take place.
[based on the comment by #J. M. Arnold]
From ANGEL's github repository you can see that when threshold >= 1 only the min_comsize value is used:
self.threshold = threshold
if self.threshold < 1:
self.min_community_size = max([3, min_comsize, int(1. / (1 - self.threshold))])
else:
self.min_community_size = min_comsize

how to do fourier frequency matrix multiplication if size is different?

sorry this is not a program issue.
I just get confused for this Theory:
The FFT of a convolution is equal to the multiplication of their own's FFT.
i.e.:
FFT(conv(x,y)) = FFT(x) * FFT(y)
for the left side:
lets say i have a image with 100x100 size and kernel 3x3, if I convolve, i will get a matrix of 98x98, then its FFT will also be 98x98
for the right side:
if I take FFT for each I will get a frequency matrix of 3x3 and 100x100 respectively.
Then how should i do the multiplication? Some of you may say we can pad the 3x3 kernel to 100x100 and take FFT, but still we will get a matrix of 100x100 instead of 98x98?
Can someone give me some hints?
A convolution of two signals of size L and P respectively will have a result of size N = L + N - 1.
Therefore, the mathematically correct implementation of conv(x,y) will have size 102x102. You should zero pad to both x and y to make them of size 102.
When you perform the convolution as CNN convolution layers does (which is what I think you are doing) without any zero padding, you are actually cropping the result (you are leaving outside the border results).
Therefore, you can just do a 102x102 fft result and crop accordingly for the 98x98 result (crop 2 at the start and 2 and the end).
ATTENTION: Unlike how zero padding usually works for Convolutional layers, for this case add zeros at the END. If not, you will be adding a shift that will be reflected in a shift in the output. ex. the expected result could be [1, 2, 3, 4] and if you apply 1 zero at the beggining and 1 at the end (instead of 2 at the end) you will have [4, 1, 2, 3].
ATTENTION 2: Not making the sizes to 102 when using iff(fft()) technique will produce something call Aliasing. This will make for example, an expected result of 30, 31, 57, 47, 87, 47, 33, 27, 5 to be 77, 64, 84, 52, 87. Note this results is actually product of making:
30, 31, 57, 47, 87
+ 47, 33, 27, 5
--------------------
77, 64, 84, 52, 87

Finding value of Y-axis from a given X-axis value in R

Fairly new to R and I'm trying to run analysis of FTIR spectra for my dissertation through the ChemoSpec package. In specialist software like Spectragryph (can't access on my own computer, hence using R) it's possible to locate peak values very easily but I can't seem to work out the right way to do it here.
This is the formula I'm hoping to perform on all of my spectra:
Carbonyl Index (CI) = Absorbance at 1740cm-1 (the maximum of carbonyl
peak)/ Absorbance at 1460cm-1 x (the maximum of carbonyl peak)
Here is an example of the plot code for the spectra:
## ChemoSpec plot
plotSpectra(HDPE_samples,
main = "48 hr exposure",
which = c(8, 9, 10, 11, 12, 13, 14, 15, 16,
39, 40, 41, 42, 43, 44, 60, 61),
## y axis shows absorbance (%)
yrange = c(0, 0.9),
offset = 0.005,
lab.pos = 2450,
## x axis shows wave numbers (cm-1)
xlim = c(1300, 3000))
For now I'd be happy just to retrieve the absorbance values associated with the wave numbers in the formula if anyone could give me pointers on which functions/packages to look at
Here is an example of reading data at a specific frequency.
library(ChemoSpec)
#> Loading required package: ChemoSpecUtils
data(metMUD1)
plotSpectra(metMUD1)
# Where is the maximum of signal 1?
which.max(metMUD1$data[1,])
#> [1] 1098
# What is the frequency and intensity at the max value?
metMUD1$freq[1098]
#> [1] 1.340894
metMUD1$data[1, 1098]
#> [1] 0.0680055
Created on 2020-01-15 by the reprex package (v0.3.0)

Extracting information on terminal nodes in partykit:ctree with a large number of multivariate responses

I am using partykit:ctree to explore my dataset, which is a set of about 15,000 beach surveys, investigating the number of pieces of debris found from 50 different categories. There are lots of zeros in the data, and a large spread of total debris amounts. I also have a series of independent variables, including some factors, some count data, and some continuous data.
Here is a very small sample dataset:
Counts<- as.data.frame(matrix (rpois(100,1), ncol=5))
colnames(Counts)<-c("Glass", "HardPlastic", "SoftPlastic", "PlasticBag", "Fragments")
State<-rep(c("CA","OR","WA"), each=6)
Counts$State<-c(State,"CA","OR")
County<-rep((1:9), each=2)
Counts$County<-c(County, 1,4)
Counts$Distance<-c(10, 15, 13, 19, 18, 23, 38, 40, 49, 44, 47, 45, 52, 53, 55, 59, 51, 53, 14, 33)
Year<-rep(c("2010","2011","2012"), times=7)
Counts$Year<-Year[1:20]
I have used the following code to partition my data:
M.2<-ctree(Glass + HardPlastic + SoftPlastic + PlasticBag + Fragments ~
as.factor (State) + as.factor (County) + Distance + as.factor (Year), data=Counts)
plot(M.2, terminal_panel = node_barplot, cex = 0.5)
This comes up with a lovely graph, but how do I extract the membership of each of the terminal nodes? I can see it in the graph if there are only a few items, but once the number of possible categories increases to 50, it becomes much harder to look at it graphically. I would like to see the information contained within the nodes; particularly the relative probabilities of each individual category being contained in each terminal node.
I know that if this were a BinaryTree class, I could use the nodes argument, but when I query the class(M.2) it tells me it is from the constaparty class, and I haven't been able to find how to get node information from this class.
I have also run into a secondary problem, which is that when I run the ctree on my sample data set, it crashes R every time! It works fine with my actual data set, but I can't figure out what is wrong with the sample set.
EDIT: The desired output would be something along the lines of:
Node15:
Hard Plastic 30
Glass 5
Soft Plastic 23
Plastic Bag 6
Fragments 12
I just e-mailed with the package maintainer (Torsten Hothorn) and principal author of ctree() to which such requests would really best be directed. (He currently does not participate in SO.) Apparently, this is a bug in the partykit version of ctree() and he is working on resolving this. For the time being it is best to use the old party version for this - and hopefully a fixed partykit version will become available soon.

Resources