How to calculate mean and sample vector containing strings - r

I need to calculate the mean of this jar variable that contains both red and blue marbles. so I created two objects red and blue, 50 each, created a vector and inserted it into a object called Jar. So my jar contains 50 blue and 50 red.
I have to sample 10 marbles, determine how many are red and then calculate the percentage. For the sample, I used the following code:
sample(length(jar), 10, replace = TRUE)
[1] 43 48 77 98 34 21 29 64 95 54
I am not sure if I am using the right code. And second, how does it show which one is red and which one is blue? Sorry I am new to R and stats. Any help is appreciated.

Related

Why is R adding empty factors to my data?

I have a simple data set in R -- 2 conditions called "COND", and within those conditions adults chose between one of 2 pictures, we call house or car. This variable is called "SAW"
I have 69 people, and 69 rows of data
FOR SOME Reason -- R is adding an empty factor to both, How do I get rid of it?
When I type table to see how many are in each-- this is the output
table(MazeData$SAW)
car house
2 9 59
table(MazeData$COND)
Apples No_Apples
2 35 33
Where the heck are these 2 mystery rows coming from? it wont let me make my simple box plots and bar plots or run t.test because of this error - can someone help? thanks!!

How do I interpret the results of the findCorrelation() function?

I am having trouble with the findCorrelation() function, Here is my input and the output.
>findCorrelation(cor(subset(segdata, select=-c(56))),cutoff=0.9)
[1] 16 17 14 15 30 51 31 25 40
>cor(segdata)[c(16,17,14,15,30,51,31,25,40),c(16,17,14,15,30,51,31,25,40)]
enter image description here
I deleted the 56 colum because this is factor variable.
Above the code, I use cutoff=0.9. it means print only those variables whose correlation is greater than or equal to 0.9.
But, in the result image file, the end variable(P12002900) has very very low correlation. As i use "cutoff=0.9", Low correlations such as P12002900 should not be output.
why this is printed??
so I use Vehicle bigdata that presented in R.
>library(mlbench)
>library(caret)
>data(Vehicle)
>findCorrelation(cor(subset(Vehicle,select=-c(Class))),cutoff=0.9)
[1]3 8 11 7 9 2
>cor(subset(Vehicle,select=-c(Class)))[c(3,8,11,7,9,2),c(3,8,11,7,9,2)]
this is result image.
enter image description here
the last variable(Circ) has lower than 0.9 correlation.
but it is printed....
please help me... thanks you for your help!

Finding max and identify name's cell for another column

Hopefully someone can solve me the following problem.
Here I have a data about different birds and their maximum lengths:
a<-c("bird1","bird2","bird1","bird3","bird2","bird2")
b<-c(32,45,35,25,51,47)
c<-data.frame(animal=a,max=b)
animal max
1 bird1 32
2 bird2 45
3 bird1 35
4 bird3 25
5 bird2 51
6 bird2 47
My purpose is to identify the name of the animal which has the maximum length. I know that using max()and which.max()is easy to identify the maximum length and the corresponding cell but how can I know the name of the animal?
Any valuable comment will be helpful for me!
This will provide the output of the bird with highest age
Modification
a<-c("bird1","bird2","bird1","bird3","bird2","bird2")
b<-c(32,45,35,25,51,47)
compined_birds<-data.frame(animal=a,max=b)
compined_birds$animal[which.max(compined_birds$max)]

Looping within a loop in R

I'm trying to build quite a complex loop in R.
I have a set of data set as an object called p_int (p_int is peak intensity).
For this example the structure of p_int i.e. str(p_int) is:
num [1:1599]
The size of p_int can vary i.e. [1:688], [1:1200] etc.
What I'm trying to do with p_int is to construct a complex loop to extract the monoisotopic peaks, these are peaks with certain characteristics which will be extracted into a second object: mono_iso:
search for the first eight sets of data results in p_int. Of these eight, find the set of data with the greatest score (this score also needs to be above 50).
Once this result has been found, record it into mono_iso.
The loop will then fix on to this position of where this result is located within the large dataset. From this position it will then skip the next result along the dataset before doing the same for the next set of 8 results.
So something similar to this:
16 Results: 100 120 90 66 220 90 70 30 70 100 54 85 310 200 33 41
** So, to begin with, the loop would take the first 8 results:
100 120 90 66 220 90 70 30
**It would then decide which peak is the greatest:
220
**It would determine whether 220 was greater than 50
IF YES: It would record 220 into "mono_iso"
IF NO: It would move on to the next set of 8 results
**220 is greater than 50... so records into mono_iso
The loop would then place it's position at 220 it would then skip the "90" and begin the same thing again for the next set of 8 results beginning at the next data result in line: in this case at the 70:
70 30 70 100 54 85 310 200
It would then record the "310" value (highest value) and do the same thing again etc etc until the end of the set of data.
Hope this makes perfect sense. If anyone could possibly help me out into making such a loop work with R-script, I'd very much appreciate it.
Use this:
mono_iso <- aggregate(p_int, by=list(group=((seq_along(p_int)-1)%/%8)+1), function(x)ifelse(max(x)>50,max(x),NA))$x
This will put NA for groups such that max(...)<=50. If you want to filter those out, use this:
mono_iso <- mono_iso[!is.na(mono_iso)]

different color for different range on x axis of line chart in flex 4.6

for different classes i have NSCC count ,now i have to make line chart showing this NSCC count falling in range like 1-10 is low risk,10-20 is moderate risk,20-50 is high risk and above 50 horrible.How to plot data with this range on x axis?And how to color different range width.
Please help me
One possible solution is use Multiple Line series with different Color
i suppose you have data some thing like this
|NSCC| |count|
A 10
B 12
C 54
D 25
you could convert to matrix like
|NSCC| |count| |LOW| |MODERATE| |HIGH|
A 10 10 null null
B 12 12 null null
C 54 null null 54
D 25 null 25 null
and create Multiple Line Series on chart,
you may found split among series, to overcome this you could add dummy boundry points
There are also other options like
Use customize background with differnt colors
Use customize itemrendrer
Hopes that Helps

Resources