I have the following generated data frame called Raw_Data:
Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c
When plotting each Type, with the following:
ggplot(Raw_Data, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
the y-axis increments as:
1, 11, 13, 2, 3, 4, 5, 6, 7, 8, 9
The y-axis labels should be in order - why has 11 and 12 appeared after 1?
I have created the data frame as follows using your sample data:
mydata <- read.table(text="Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c", header=TRUE)
Followed by the command
ggplot(mydata, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
which correctly displays the plot as shown in picture below
Note: changing the call to ggplot as shown below:
ggplot(mydata, aes(x=Time, y=as.character(Velocity))) +
geom_point() +
facet_grid(Type ~.)
reproduces the problem you mentioned. So you need to convert the Velocity variable to appropriate type i.e. integer in your case.
Related
I want to use conditional statement to consecutive values in the sliding manner.
For example, I have dataset like this;
data <- data.frame(ID = rep.int(c("A","B"), times = c(24, 12)),
+ time = c(1:24,1:12),
+ visit = as.integer(runif(36, min = 0, max = 20)))
and I got table below;
> data
ID time visit
1 A 1 7
2 A 2 0
3 A 3 6
4 A 4 6
5 A 5 3
6 A 6 8
7 A 7 4
8 A 8 10
9 A 9 18
10 A 10 6
11 A 11 1
12 A 12 13
13 A 13 7
14 A 14 1
15 A 15 6
16 A 16 1
17 A 17 11
18 A 18 8
19 A 19 16
20 A 20 14
21 A 21 15
22 A 22 19
23 A 23 5
24 A 24 13
25 B 1 6
26 B 2 6
27 B 3 16
28 B 4 4
29 B 5 19
30 B 6 5
31 B 7 17
32 B 8 6
33 B 9 10
34 B 10 1
35 B 11 13
36 B 12 15
I want to flag each ID by continuous values of "visit".
If the number of "visit" continued less than 10 for 6 times consecutively, I'd attach "empty", and "busy" otherwise.
In the data above, "A" is continuously below 10 from rows 1 to 6, then "empty". On the other hand, "B" doesn't have 6 consecutive one digit, then "busy".
I want to apply the condition to next segment of 6 values if the condition weren't fulfilled in the previous segment.
I'd like achieve this using R. Any advice will be appreciated.
I'm trying to plot a ggplot graph and instead of the size of point indicating the count, I need to plot the overlapping count number. Can you help me?
https://imgur.com/a/pm1SsWd
Thank you very much!
My data:
ID CIM DD
1 8 8
2 8 8
3 8 4
4 4 4
5 2 2
6 8 8
7 8 8
8 8 8
9 2 2
10 2 2
11 2 4
12 4 4
13 8 4
14 2 2
15 4 4
16 4 8
17 2 4
18 16 8
19 8 16
20 16 16
21 2 4
22 16 8
23 8 8
24 8 8
25 8 8
26 4 4
27 1 2
28 4 8
29 8 8
30 2 4
31 8 8
32 2 2
33 1 2
34 4 8
35 8 8
36 16 8
37 8 8
38 4 4
39 4 8
40 4 8
41 8 8
42 8 8
43 2 2
I used the code below to make an overlapping count graph as shown in an image link:
https://imgur.com/a/pm1SsWd
breaks = c(1,2,4,8,16)
labels = as.character(breaks)
ggplot(data = Data,aes(CIM,DD)) +
geom_count()+
scale_x_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "CIM")+
scale_y_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "DD")
Take a look at this example:
Add count as label to points in geom_count
You could do the following with your data:
p <- ggplot(data = Data,aes(CIM,DD)) +
geom_count(show.legend = FALSE)+
scale_x_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "CIM") +
scale_y_continuous(limits = c(1, 32), breaks = breaks, labels = labels,name = "DD") +
scale_size_continuous(range = c(10, 10))
p + geom_text(data = ggplot_build(p)$data[[1]], aes(x, y, label = n), color = "#ffffff")
You can adjust the range in scale_size_continuous if you wish to vary the size of points.
I am trying to create an animated plot using gganimate.
When I pass the following factor dat$period to transition_states,
I get 3 static images. I would prefer to have the points "move" from state-to-
state.
Here is my code:
plot <-
ggplot(data = dat, aes(x = age, y = value, color = period)) +
geom_point(size = 3, aes(group = period)) +
facet_wrap(~group)+
transition_states(states=period, transition_length = 2, state_length = 1) +
ease_aes('linear')+
enter_fade()+
exit_fade()
plot
Here is my data:
record period value age group
1 1 start 45 24 a
2 2 start 6 22 c
3 3 start 23 32 b
4 4 start 67 11 a
5 1 middle 42 24 a
6 2 middle 65 22 c
7 3 middle 28 32 b
8 4 middle 11 11 a
9 1 end 23 24 a
10 2 end 14 22 c
11 3 end 34 32 b
12 4 end 21 11 a
13 5 start 5 12 c
14 6 start 9 23 c
15 7 start 53 47 b
16 8 start 17 32 a
17 5 middle 15 12 c
18 6 middle 6 23 c
19 7 middle 23 47 b
20 8 middle 67 32 a
21 5 end 51 12 c
22 6 end 16 23 c
23 7 end 8 47 b
24 8 end 41 32 a
The points appear/disappear - I would like the points to travel on the screen between states - any help appreciated
The group aesthetic is used to determine which rows in each period's data are treated as the same objects. You need group = record here:
ggplot(data = dat, aes(x = age, y = value, color = period)) +
geom_point(size = 3, aes(group = record)) +
facet_wrap(~ group)+
transition_states(states=period, transition_length = 2, state_length = 1) +
ease_aes('linear')+
enter_fade()+
exit_fade()
My data looks like this:
x y
1 1
2 2
3 2
4 4
5 5
6 6
7 6
8 8
9 9
10 9
11 11
12 12
13 13
14 13
15 14
16 15
17 14
18 16
19 17
20 18
y is a grouping variable. I would like to see how well this grouping went.
Because of this I want to extract a sample of n pairs of cases that are grouped together by variable y
and n pairs of cases that are not grouped together by variable y. In order to calculate the number of
false positives and false negatives (either falsly grouped or not). How do I extract a sample of grouped pairs
and a sample of not-grouped pairs?
I would like the samples to look like this (for n=6) :
Grouped sample:
x y
2 2
3 2
9 9
10 9
15 14
17 14
Not-grouped sample:
x y
1 1
2 2
6 8
6 8
11 11
19 17
How would I go about this in R?
I'm not entirely clear on what you like to do, partly because I feel there is some context missing as to what you're trying to achieve. I also don't quite understand your expected output (for example, the not-grouped sample contains an entry 6 8 that does not exist in your original data...)
That aside, here is a possible approach.
# Maximum number of samples per group
n <- 3;
# Set fixed RNG seed for reproducibility
set.seed(2017);
# Grouped samples
df.grouped <- do.call(rbind.data.frame, lapply(split(df, df$y),
function(x) if (nrow(x) > 1) x[sample(min(n, nrow(x))), ]));
df.grouped;
# x y
#2.3 3 2
#2.2 2 2
#6.6 6 6
#6.7 7 6
#9.10 10 9
#9.9 9 9
#13.13 13 13
#13.14 14 13
#14.15 15 14
#14.17 17 14
# Ungrouped samples
df.ungrouped <- df[sample(nrow(df.grouped)), ];
df.ungrouped;
# x y
#7 7 6
#1 1 1
#9 9 9
#4 4 4
#3 3 2
#2 2 2
#5 5 5
#6 6 6
#10 10 9
#8 8 8
Explanation: Split df based on y, then draw min(n, nrow(x)) samples from subset x containing >1 rows; rbinding gives the grouped df.grouped. We then draw nrow(df.grouped) samples from df to produce the ungrouped df.ungrouped.
Sample data
df <- read.table(text =
"x y
1 1
2 2
3 2
4 4
5 5
6 6
7 6
8 8
9 9
10 9
11 11
12 12
13 13
14 13
15 14
16 15
17 14
18 16
19 17
20 18", header = T)
I'm not sure why I am having such a problem with my x-scale labels repeating as opposed to just labeling where there is a measured point. Additionally, my labels for my legend are not working.
FamIncome Ethnicity mean.bmi
1 1 1 28.54250
2 1 2 26.66300
3 1 3 26.62105
4 1 4 29.51396
5 1 5 25.66722
6 2 1 29.62404
7 2 2 28.08393
8 2 3 28.62215
9 2 4 28.97561
10 2 5 25.57714
11 3 1 29.52630
12 3 2 28.27235
13 3 3 29.67060
14 3 4 31.36768
15 3 5 26.13361
16 4 1 30.83368
17 4 2 30.80814
18 4 3 29.29594
19 4 4 29.18521
20 4 5 24.80550
21 5 1 29.76500
22 5 2 29.24404
23 5 3 28.89435
24 5 4 31.48172
25 5 5 28.02522
26 6 1 30.05087
27 6 2 29.88574
28 6 3 29.53793
29 6 4 30.97993
30 6 5 25.57857
31 7 1 30.31787
32 7 2 29.28055
33 7 3 28.50421
34 7 4 30.65427
35 7 5 26.66094
36 8 1 29.15000
37 8 2 29.02789
38 8 3 28.36507
39 8 4 33.51915
40 8 5 28.38263
41 9 1 28.17679
42 9 2 28.74731
43 9 3 28.06196
44 9 4 31.38483
45 9 5 26.96000
46 10 1 28.71633
47 10 2 33.44409
48 10 3 30.63048
49 10 4 30.22587
50 10 5 27.36375
51 14 1 30.78161
52 14 2 27.43575
53 14 3 28.96817
54 14 4 32.22378
55 14 5 25.62778
56 15 1 29.15982
57 15 2 27.42672
58 15 3 27.60567
59 15 4 30.05013
60 15 5 26.80271
code below:
a <- ggplot(nh1, aes(x=FamIncome, y=mean.bmi)) + geom_line(aes(group=Ethnicity, colour = Ethnicity)) + geom_point()
a = a + labs(list(title="Average BMI versus Family Income", x = "Family Income", y = "Average BMI"))
a = a + scale_x_discrete(breaks=c("1","2","3","4","5","6","7","8","9","10","14","15"),
labels = c("0-4,999", "5K-9,999", "10K-14,999", "15K-19,999", "20K-24,999", "25K-34,999", "35K-44,999", "45K-54,999", "55K-64,999", "65K-74,999", "75K-100K", "Over 100K"))
a = a + theme(axis.text.x=element_text(angle=-90))
a = a + scale_colour_continuous(name = "Ethnicity",
breaks=c("5","4","3","2","1"),
labels=c("Other Race/Multi", "Black","White","Other Hispanic", "Mexican-American"))
a
I cannot post a picture of the image that I'm getting until I get 2 more "reputation" points
Try converting your x variable to a factor:
a <- ggplot(nh1, aes(x=factor(FamIncome), y=mean.bmi)) + geom_line(aes(group=Ethnicity, colour = factor(Ethnicity)))
a = a + labs(list(title="Average BMI versus Family Income", x = "Family Income", y = "Average BMI"))
a = a + scale_x_discrete("Family Income", labels = c("0-4,999", "5K-9,999", "10K-14,999", "15K-19,999", "20K-24,999", "25K-34,999", "35K-44,999", "45K-54,999", "55K-64,999", "65K-74,999", "75K-100K", "Over 100K"))
a = a + opts(axis.text.x=theme_text(angle=-90))
a = a + scale_colour_discrete(name = "Ethnicity",
breaks=c("5","4","3","2","1"),
labels=c("Other Race/Multi", "Black","White","Other Hispanic", "Mexican-American"))
With a numeric x variable, ggplot is treating it as a numeric scale, when you really intended it to be categorical. Also note the confusing between fill and colour. fill is for two dimensional filled regions.