ggplot2 barplot where categorical variable has three components - r

New to the R/ggplot.
I have a data set like this. Each mol-code is made of 3 components and copies represent how many times each mol-code appears. There are 8 unique components available and it is represented as smile files.
full.mol.code2 Copies Pair1.Acids Pair2.Acids Pair3.Acids
1 1.301241e+23 18 OC(C1=COC(CCl)=N1)=O OC(C1=CC=C(CCl)C=C1)=O O=C(O)C1=C(C)OC=C1
2 1.303241e+23 18 OC(C1=CSC(CCl)=N1)=O OC(C1=CSC(CCl)=N1)=O OC([C#H](C)Br)=O.[R]
3 1.301241e+23 17 OC(C1=COC(CCl)=N1)=O OC(C1=COC(CCl)=N1)=O O=C(O)C1=C(C)OC=C1
4 1.304241e+23 12 ClC/C(C)=C/[C##H](C)C(O)=O OC(C1=COC(CCl)=N1)=O OC([C#H](C)Cl)=O.[S]
5 1.309240e+23 12 OC(C1=CSC(CCl)=N1)=O OC(C1=CC=C(CCl)C=C1)=O O=C(O)C1=C(C)OC=C1
6 1.301241e+23 11 OC(C1=COC(CCl)=N1)=O OC(C1=CC=C(CCl)C=C1)=O OC([C#H](C)Cl)=O.[S]
Edit: thanks Allan for formatting this properly.
'full.mol.code2' is a number like this (130124051501260617102804), it will not be considered as value.
I want to represent this data in a barplot where x-axis will be mol-code and y-axis represents copies and each bar represent the combination of three components in different color.
I hope that made sense and appreciate any help.
Thanks.

Related

Cluster analysis in R on large data set

I have a data set with rankings as the column names and about 15,000 contestants. My data looks like:
contestant
1
2
3
4
101
13
0
5
12
14
0
1
34
6
...
...
...
...
...
500
0
2
23
3
I've been working on doing cluster analysis on this dataset. The dendrograms are obviously not very helpful with this dataset--it produces a thick block line because of the large number of entries.
I'm wondering if there is a better way to do cluster analysis with this type of data. I've tried
fviz_cluster()
and similar commands, as well as went through multiple tutorials. Many tutorials guided me through making dendograms. The data all seems to be different than mine (comparing two variables, etc) and much smaller. Essentially, I'm asking which types of cluster analysis may work well with this type of data.

How do I draw region borders with ggplot? - R

I have some regions defined in a dataset that looks like this:
> head(regions_gg)
lon lat group
12 -69.75 -19.75 3
13 -69.25 -19.75 3
14 -68.75 -19.75 3
15 -68.25 -19.75 3
16 -67.75 -19.75 3
17 -67.25 -19.75 3
where every point in space has assigned a number, which indicates which region it belongs to.
What I want to do is outline these regions and create a border around them.
So far, this is the best I could do, but as you can see, it looks hideous.
ggplot(regions_gg,aes(x=lon,y=lat,fill=group,z=group))+theme_bw()+
geom_raster(interpolate=F)+geom_contour()
geom_contour() is trying to interpolate and adds a lot of unnecessary lines. Is there an easy way to fix this?
EDIT: here's the data
regions_gg

Excel: Select data for graph

To put it simple, I have three columns in excel like the ones below:
Vehicle x y
1 10 10
1 15 12
1 12 9
2 8 7
2 11 6
3 7 12
x and y are the coordinates of customers assigned to the corresponding vehicle. This file is the output of a program I run in advance. The list will always be sorted by vehicle, but the number of customers assigned to vehicle "k" may change from one experiment to the next.
I would like to plot a graph containing 3 series, one for each vehicle, where the customers of each vehicle would appear (as dots in 2D based on their x- and y- values) in different color.
In my real file, I have 12 vehicles and 3200 customers, and the ranges change from one experiment to the next, so I would like to automate the process, i.e copy-paste the list on my excel and see the graph appear automatically (if this is possible).
Thanks in advance for your time and effort.
EDIT: There is a similar post here: Use formulas to select chart data but requires the use of VB. Moreover, I am not sure whether it has been indeed answered.
you should try this free online tool - www.cloudyexcel.com/excel-to-graph/

different color for different range on x axis of line chart in flex 4.6

for different classes i have NSCC count ,now i have to make line chart showing this NSCC count falling in range like 1-10 is low risk,10-20 is moderate risk,20-50 is high risk and above 50 horrible.How to plot data with this range on x axis?And how to color different range width.
Please help me
One possible solution is use Multiple Line series with different Color
i suppose you have data some thing like this
|NSCC| |count|
A 10
B 12
C 54
D 25
you could convert to matrix like
|NSCC| |count| |LOW| |MODERATE| |HIGH|
A 10 10 null null
B 12 12 null null
C 54 null null 54
D 25 null 25 null
and create Multiple Line Series on chart,
you may found split among series, to overcome this you could add dummy boundry points
There are also other options like
Use customize background with differnt colors
Use customize itemrendrer
Hopes that Helps

Sorting data in R

I have a dataset that I need to sort by participant (RECORDING_SESSION_LABEL) and by trial_number. However, when I sort the data using R none of the sort functions I have tried put the variables in the correct numeric order that I want. The participant variable comes out ok but the trial ID variable comes out in the wrong order for what I need.
using:
fix_rep[order(as.numeric(RECORDING_SESSION_LABEL), as.numeric(trial_number)),]
Participant ID comes out as:
118 118 118 etc. 211 211 211 etc. 306 306 306 etc.(which is fine)
trial_number comes out as:
1 1 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 2 2 20 20 .... (which is not what I want - it seems to be sorting lexically rather than numerically)
What I would like is trial_number to be order like this within each participant number:
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 ....
I have checked that these variables are not factors and are numeric and also tried without the 'as.numeric', but with no joy. Looking around I saw suggestions that sort() and mixedsort() might do the trick in place of 'order', both come up with errors. I am slowly pulling my hair out over what I think should be a simple thing. Can anybody help shed some light on how to do this to get what I need?
Even though you claim it is not a factor, it does behave exactly as if it were a factor. Testing if something is a factor can be tricky since a factor is just an integer vector with a levels attribute and a class label. If it is a factor, your code needs to have a call to as.character() nested inside the as.numeric():
fix_rep[order(as.numeric(RECORDING_SESSION_LABEL), as.numeric(as.character(trial_number))),]
To be really sure if it's a factor, I recommend the str() function:
str(trial_number)
I think it may be worthwhile for you to design your own function in this case. It wouldn't be too hard, basically you could just design a bubble-sort algorithm with a few alterations. These alterations could change each number to a string, and begin by sorting those with different numbers of digits into different bins (easily done by finding which numbers, which are now strings, have the greatest numbers of indices). Then, in a similar fashion, the numbers in these bins could be sorted by converting the least significant digit to a numeric type and checking to see which are the largest/smallest. If you're interested, I could come up with some code for this, however, it looks like the two above me have beat me to the punch with some of the built-in functions. I've never used those functions, so I'm not sure if they'll work as you intend, but there's no use in reinventing the wheel.

Resources