I have two texts that I convert to bag of words. One bag of words for text 1, one bag of words for text 2.
I am trying to find a way to plot both those documents' words together to understand how much they are different.
One way I Was thinking is to have two barplots one over the other and see in which words (word count) they are the same and in which they differ.
I was able to launch a simple bar plot
from the guide here
http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know
(see last plot)
but I have now two bar plots that I can not directly compare.
I was thinking for example to put the words together on the same plot.
Either as two histograms one over the other or create some 2d clustering, showing areas of words that the two documents are different but also their overlapping areas.
Which package will you suggest and procedure to compare such two bags of words?
Thanks
Alex
Related
I have data on excel and I want to represent one persons data graphically in bar charts beside each other. All the values are numerical and I want it to look something like this:
where the w,x,y represent different variables like games played and turnovers and the three colours represent three different people. I have data on 20 people.
I don't know how to single out the individual data or represent multiple data points on the barchart.
Any questions, I'll try to describe as best as I can. Thanks in advance.
I would like to count the amount of lines within polygons.
The questions I would like to answer are:
How many lines are within each polygon (enveloped AND intersecting)?
How long is each line within each polygon OR how long are all the lines (combined) within each polygon?
I am using QGIS 2.18.12 (do not know how to write code)
first calculate all lines lenght,
second use Select By Location proccess (intersect etc.),
last one is Statistics By Categories proccess.
I have this problem:
I would like to create a pie chart from a column of a attribute table, and I would like to see this pie chart above the map. The column contains names not nubers...
I work with marine species distribution data and I built a database of records of many many species...
In the specific, I have a column, called 'species', where there are many records (names) of some marine species. Several species may have many records, other species may have only a few numbers of records, so my objective is to graphically see the distribuition of records among the species.
If build a pie chart is a very time consuming procedure, I'd be happy to create a new column of the attribute table with the numbers of different species per year (see the attachments) or to try a totally new approach with R.
Thankyou for your help
img1 http://postimg.org/image/rn56c8l4z/
img2 http://postimg.org/image/e6918ynmj/
You'll most probably get many answers that pie charts are evil because they distort perception.
But along with better alternatives, namely stacked bar charts, you find code examples here
and, as always,
? pie
helps.
You may need to summarize your factor first, e.g. by table.
I want to plot to data sets on one canvas. To make one dataset stand out from the other I'll plot it twice using two different characters e.g. a circle around a dot.
Now I want to add this composite character into the legend.
How?
Thank you!
PK
^_^
Example plot:
http://i56.tinypic.com/eagjfn.jpg
Created with:
qplot(score, ..count.., data=df, fill=method, geom='density', position='stack')
Pretty much impossible to tell what goes with what. Any way to make this better? Ideally the legend draws lines "connecting" the areas to the item in the legend. Alternatively, I'd at least need some very different filling patterns for the areas.
The human eye does not do well distinguishing between more than 7-10 different categories whether they are indicated using color, shading or pattern. Adding lines or shadings here will, I think, only make this graph harder to read.
In situations like this, I often think that it's best to take a step back and rethink what message you intend for the graph to convey. Do you really need to compare all ~23 methods in a single graph, or can the methods be placed into subgroups and compared in multiple plots or facets? Are some of the methods' curves so similar that they could be combined into a single category?
For instance, I see ~3-4 natural groups just based on the similarity of the curves in your plot. You could plot a single, representative, method from each group to illustrate the large scale differences, and then create additional plots that focus in on the differences between methods within groups.