How can I find percentage of variables from between each other

How can I find percentage of variables from between each other - percentage

I have a big data from my project. I try to figure out the percentage from between each other. However, I could not find any result my data. For example, I have a cherry data, I have to find percentage of color as sugar on my data. I want to find the sugar ratio of color.

Note that you didn't specify any technology, so I'll keep this generic.
There are several ways to do this. This is one way:
Add up all the values of your collection to get the total
For each item (or group) divide the value by that total to get the ratio, and multiply by 100 to get the percentage.

Related

Turning a band of Sentinel 2 image into an array

I am new to Google Earth Engine and have started playing with mathematically combining different bands to define new index. The problem I am having is the visualisation of the new index - I need to define the max and min parameter when adding it to the map, and I am having troubles understanding what these two end points should be. So here come my two questions:
Is it possible to get the matrix of my image in terms of pixel values? Then I could easily see from what values they range and hence could define min and max!
What values are taken in different bands? Is it from 0 to 1 and measures intensity at given wavelength, or is it something else?
Any help would be much appreciated, many thanks in advance!

Is it possible to get the matrix of my image in terms of pixel values? Then I could easily see from what values they range and hence could define min and max!
If this is what you want to do, there's a built in way to do it. Go to the layer list, click on the gear for the layer, and in the “Range” section, pick one of the “Stretch:” options from the menu, then click “Apply”. You can choose a range in standard deviations, or 100% (min and max).
You can then use the “Import” button to save these parameters as a value you can use in your script.
(All of this applies to the region of the image that's currently visible on screen — not the entire image.)
What values are taken in different bands? Is it from 0 to 1 and measures intensity at given wavelength, or is it something else?
This is entirely up to the individual dataset you are using; Earth Engine only knows about numbers stored in bands and not units of measure or spectra. There may be sufficient information in the dataset's description in the data catalog, or you may need to consult the original provider's documentation.

Is there a way to use `sf::st_join` to get the proportion of area captured?

I want to use:
st_join(polygon_A, polygon_B, join=st_intersects)
to get the same output as of the current version with an extra column that specifies the proportion of the area captured from polygon_A by polygon_B? Ideally this would work with largest=TRUE (returning unique records of polygon_A with only one match from polygon_B) or with largest=FALSE (returning all matches from polygon_B with the proportions of captured areas by polygon_A).
My workaround was to use st_intersection(polygon_A, polygon_B), calculate the area of the new shapes (the result of the intersection) using st_area, and then divide it by the area of the original shape of polygon_A. Is there a better way of achieving this? Or maybe plans to add this as a feature to st_join.
For the record and to avoid the XY problem, my objective is to identify cases where the proportions are close to 0.50/0.50 or 0.70/0.30 and take a closer look to decide if largest=TRUE is meaningful. I am not very interested in cases where the proportions are 0.99/0.01 - largest=TRUE is good enough for these cases.

How to nudge a graph to the right without altering the total surface it covers

I have a document collection with size 1000, they all have 1 feature, a vector with 5 elements. The total sum of the 5 elements equals 100. So for example I can have a document with feature: [10,15,40,20,15].
Each vector element equals a sentiment, ranging from very negative to very positive.
The results I get for the 1000 text documents come out a little on the negative side,
so I am trying to nudge them all a little to the right without altering the total sum.
For example [10,15,40,20,15] should, after applying the formula, result to [7,13,32,40,8].
How can I manage this?
Thanks in advance!

As I understand, you want the first (left) elements of that vector to get smaller, and the right part to get bigger, right? This can be accomplished by adding something like [-10,-5,0,5,10] to each vector.

If the issue is that the corpus is genuinely more negative than you'd like it to be, then how about pre-prending to each document, just before the analysis:
I am a happy bunny!
And if that isn't enough, then also add in:
The sun is shining beautifully in Happy Bunny Land today!!
If the issue is that your analysis is producing a more negative result than what you believe is the correct answer, then fiddle with the weights (if using a weighted approach); if not using a weighted word approach, and you have a list of positive and negative words, then review those lists for the document context and either remove some negative words, or add in some more words to the positive list.

Problem with Principal Component Analysis

I'm not sure this is the right place but here I go:
I have a database of 300 picture in high-resolution. I want to compute the PCA on this database and so far here is what I do: - reshape every image as a single column vector - create a matrix of all my data (500x300) - compute the average column and substract it to my matrix, this gives me X - compute the correlation C = X'X (300x300) - find the eigenvectors V and Eigen Values D of C. - the PCA matrix is given by XV*D^-1/2, where each column is a Principal Component
This is great and gives me correct component.
Now what I'm doing is doing the same PCA on the same database, except that the images have a lower resolution.
Here are my results, low-res on the left and high-res on the right. Has you can see most of them are similar but SOME images are not the same (the ones I circled)
Is there any way to explain this? I need for my algorithm to have the same images, but one set in high-res and the other one in low-res, how can I make this happen?
thanks

It is very possible that the filter you used could have done a thing or two to some of the components. After all, lower resolution images don't contain higher frequencies that, too, contribute to which components you're going to get. If component weights (lambdas) at those images are small, there's also a good possibility of errors.
I'm guessing your component images are sorted by weight. If they are, I would try to use a different pre-downsampling filter and see if it gives different results (essentially obtain lower resolution images by different means). It is possible that the components that come out differently have lots of frequency content in the transition band of that filter. It looks like images circled with red are nearly perfect inversions of each other. Filters can cause such things.
If your images are not sorted by weight, I wouldn't be surprised if the ones you circled have very little weight and that could simply be a computational precision error or something of that sort. In any case, we would probably need a little more information about how you downsample, how you sort the images before displaying them. Also, I wouldn't expect all images to be extremely similar because you're essentially getting rid of quite a few frequency components. I'm pretty sure it wouldn't have anything to do with the fact that you're stretching out images into vectors to compute PCA, but try to stretch them out in a different direction (take columns instead of rows or vice versa) and try that. If it changes the result, then perhaps you might want to try to perform PCA somewhat differently, not sure how.

How to detect a trend inside unsteady data (e.g. Trendly)?

I was wondering what kind of model / method / technique Trendly might use to achieve this model:
[It tries to find the moments where significant changes set in and ignores random movements]
Any pointers very welcome! :)

I've never seen 'Trendly', and don't know anything about it, but if I wanted to produce that red line from that blue line, in an algorithmic fashion, I would try:
Fourier the whole data set
Choose a block size longer than the period of the dominant frequency
Divide the data up into blocks of the chosen size
Compare adjacent ones with a statistical test of some sort.
Where the test says two blocks belong to the same underlying distribution, merge them.
If any were merged, go back to 4.
Red trend line is the mean of each block.

A simple "median" function could produce smoother curves over a mostly un-smooth curve.
Otherwise, a brute-force or genetic algorithm could be used; attempting to find the way to split the data into sections, so that more sections = worse solution, and less accuracy of the lines = worse solution.
Another way would be like this: Start at the beginning. As soon as the line moves outside of some radius (3 above or 3 below the first, for instance) set the new height to an average of the current line's height and the previous marker.
If you keep doing that, it would ignore small fluctuations. However, if the fluctuation was large enough, it would still effect it.