I am creating a scatter plot using ggplot2. The default gives me an x axis that has every value form 0 to 30. I'd prefer to have it go by 5s or something like that. I have been trying to use scale_x_continuous(), but I get this:
Error: Discrete value supplied to continuous scale
Here is the code that I am trying to work with:
Daily.Average.plot <- ggplot(data = Daily.average, aes(factor(Day), Mass))+
geom_point(aes(color = factor(Temp))) +
scale_x_continuous(breaks = seq(0,30,5))
Daily.Average.plot
When I run this without the scale_x_continuous I get a graph that looks fine with no errors, just the incorrect x axis. All of the columns in the data set are numeric when I check str(), if that matters. Do I have an error in my code, or should I be using something different to change the scale?
Here is a sample of my data set:
N Day Mass Temp
1 1 0.00000000 5
2 2 0.00000000 5
3 3 0.07692308 5
4 4 0.07692308 5
5 5 0.07692308 5
6 6 0.15384615 5
7 7 0.15384615 5
8 8 0.23076923 5
9 9 0.38461538 5
10 10 0.46153846 5
11 1 0.00000000 10
12 2 0.00000000 10
13 3 0.00000000 10
14 4 0.09090909 10
15 5 0.09090909 10
16 6 0.54545455 10
17 7 0.54545455 10
18 8 0.63636364 10
19 9 0.90909091 10
20 10 1.36363636 10
21 1 0.00000000 15
22 2 0.07692308 15
23 3 0.61538462 15
24 4 0.76923077 15
25 5 0.76923077 15
26 6 1.23076923 15
27 7 1.69230769 15
28 8 2.07692308 15
29 9 2.46153846 15
30 10 3.07692308 15
Related
I want to use conditional statement to consecutive values in the sliding manner.
For example, I have dataset like this;
data <- data.frame(ID = rep.int(c("A","B"), times = c(24, 12)),
+ time = c(1:24,1:12),
+ visit = as.integer(runif(36, min = 0, max = 20)))
and I got table below;
> data
ID time visit
1 A 1 7
2 A 2 0
3 A 3 6
4 A 4 6
5 A 5 3
6 A 6 8
7 A 7 4
8 A 8 10
9 A 9 18
10 A 10 6
11 A 11 1
12 A 12 13
13 A 13 7
14 A 14 1
15 A 15 6
16 A 16 1
17 A 17 11
18 A 18 8
19 A 19 16
20 A 20 14
21 A 21 15
22 A 22 19
23 A 23 5
24 A 24 13
25 B 1 6
26 B 2 6
27 B 3 16
28 B 4 4
29 B 5 19
30 B 6 5
31 B 7 17
32 B 8 6
33 B 9 10
34 B 10 1
35 B 11 13
36 B 12 15
I want to flag each ID by continuous values of "visit".
If the number of "visit" continued less than 10 for 6 times consecutively, I'd attach "empty", and "busy" otherwise.
In the data above, "A" is continuously below 10 from rows 1 to 6, then "empty". On the other hand, "B" doesn't have 6 consecutive one digit, then "busy".
I want to apply the condition to next segment of 6 values if the condition weren't fulfilled in the previous segment.
I'd like achieve this using R. Any advice will be appreciated.
The following randomly splits a data frame into halves.
df <- read.csv("https://raw.githubusercontent.com/HirokiYamamoto2531/data/master/data.csv")
head(df, 3)
# dv iv subject item
#1 562 -0.5 1 7
#2 790 0.5 1 21
#3 NA -0.5 1 19
r <- seq_len(nrow(df))
first <- sample(r, 240)
second <- r[!r %in% first]
df_1 <- df[first, ]
df_2 <- df[second, ]
However, in this way, each data frame (df_1 and df_2) is not balanced on subject and item: e.g.,
table(df_1$subject)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
# 7 8 3 5 5 3 8 1 5 7 7 6 7 7 9 8 8 9 6 7 8 5 4 4 5 2 7 6 9
# 30 31 32 33 34 35 36 37 38 39 40
# 7 5 7 7 7 3 5 7 5 3 8
table(df_1$item)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# 12 11 12 12 9 11 11 8 11 12 10 8 14 7 14 10 8 7 9 9 7 11 9 8
# There are 40 subjects and 24 items, and each subject is assigned to 12 items and each item to 20 subjects.
I would like to know how to split the data frame into halves that are balanced on subject and item (i.e., exactly 6 data points from each subject and 10 data points from each item).
You can use the createDataPartition function from the caret package to create a balanced partition of one variable.
The code below creates a balanced partition of the dataset according to the variable subject:
df <- read.csv("https://raw.githubusercontent.com/HirokiYamamoto2531/data/master/data.csv")
partition <- caret::createDataPartition(df$subject, p = 0.5, list = FALSE)
first.half <- df[partition, ]
second.half <- df[-partition, ]
table(first.half$subject)
table(second.half$subject)
I'm not sure whether it's possible to balance two variables at once. You can try balancing for one variable and checking if you're happy with the partition of the second variable.
With the following data set
u_data
rSLn rwave rexpd y_ij rwave2 u_ij
1 1 1 199.929886 5.302956 1 5.302956
2 1 2 27.738826 3.358249 4 3.358249
3 1 3 144.000000 4.976734 9 4.976734
4 1 4 72.000000 4.290459 16 4.290459
5 1 5 0.000000 0.000000 25 0.000000
6 2 1 392.606361 5.975351 1 5.975351
7 2 2 749.524990 6.620773 4 6.620773
8 2 3 3120.000000 8.045909 9 8.045909
9 2 4 1600.000000 7.378384 16 7.378384
10 2 5 1000.000000 6.908755 25 6.908755
11 2 6 5840.000000 8.672657 36 8.672657
12 2 7 3960.000000 8.284252 49 8.284252
13 2 8 4700.000000 8.455531 64 8.455531
14 2 9 1660.000000 7.415175 81 7.415175
15 2 10 5620.000000 8.634265 100 8.634265
16 3 1 1566.117441 7.356993 1 7.356993
17 3 2 739.702016 6.607598 4 6.607598
18 3 3 0.000000 0.000000 9 0.000000
19 3 4 0.000000 0.000000 16 0.000000
20 3 5 0.000000 0.000000 25 0.000000
21 3 6 0.000000 0.000000 36 0.000000
22 3 7 0.000000 0.000000 49 0.000000
23 3 8 0.000000 0.000000 64 0.000000
24 3 9 600.000000 6.398595 81 6.398595
25 3 10 720.000000 6.580639 100 6.580639
26 4 1 249.912358 5.525104 1 5.525104
27 4 2 9.246275 2.326914 4 2.326914
28 4 3 848.000000 6.744059 9 6.744059
29 4 4 820.000000 6.710523 16 6.710523
30 4 5 968.000000 6.876265 25 6.876265
31 4 6 4800.000000 8.476580 36 8.476580
32 4 7 1572.000000 7.360740 49 7.360740
33 4 8 1960.000000 7.581210 64 7.581210
34 4 9 1800.000000 7.496097 81 7.496097
35 4 10 1700.000000 7.438972 100 7.438972
36 5 1 0.000000 0.000000 1 0.000000
37 5 2 6768.273444 8.820149 4 8.820149
38 5 3 520.000000 6.255750 9 6.255750
39 5 4 1020.000000 6.928538 16 6.928538
40 5 5 1520.000000 7.327123 25 7.327123
41 5 6 2075.000000 7.638198 36 7.638198
42 5 7 1760.000000 7.473637 49 7.473637
43 5 8 1270.000000 7.147559 64 7.147559
44 5 9 5400.000000 8.594339 81 8.594339
45 5 10 6550.000000 8.787373 100 8.787373
And with following values
ux_data=as.matrix(u_data[,c(2,5)])
ux_data=cbind(1, ux_data)
class=rbinom(length(unique(u_data$rSLn)),1,0.48)+1
thet.value=c(4.25,5.85,1.26,9.78,6.86)
n_g_i=numeric()
for ( d in unique(u_data$rSLn)){
n_g_i[d]=length(u_data$rwave[u_data$rSLn==d])
}
sigma2=0.7849
SIGMA=matrix(c(100,0,0,
0,1,0,
0,0,1/100), nrow = 3, ncol = 3, byrow = T)
I would like to execute the following code, which are working perfectly.
u_ij_C1=(u_data$u_ij[rep(class,times=n_g_i)==1] #u_ij_new belongs to cluster-1
-rep(thet.value[class==1], n_g_i[class==1]))
m_beta_C1=(solve((t(ux_data[rep(class,times=n_g_i)==1,])%*%ux_data[rep(class,times=n_g_i)==1,]/
(sigma2))+solve(SIGMA)) %*%(t(ux_data[rep(class,times=n_g_i)==1,])%*%u_ij_c1/sigma2))
sig2_beta_C1=(solve((t(ux_data[rep(class,times=n_g_i)==1,])
%*%ux_data[rep(class,times=n_g_i)==1,]/(sigma2))+solve(SIGMA)))
u_ij_C2=(u_data$u_ij[rep(class,times=n_g_i)==2] #u_ij_new belongs to cluster-2
-rep(thet.value[class==2], n_g_i[class==2]))
m_beta_C2=(solve((t(ux_data[rep(class,times=n_g_i)==2,])%*%ux_data[rep(class,times=n_g_i)==2,]/
(sigma2))+solve(SIGMA)) %*%(t(ux_data[rep(class,times=n_g_i)==2,])%*%u_ij_C2/sigma2))
sig2_beta_C2=(solve((t(ux_data[rep(class,times=n_g_i)==2,])
%*%ux_data[rep(class,times=n_g_i)==2,]/(sigma2))+solve(SIGMA)))
Each m_beta is a vector of size 3 and sig2_beta is a matrix of order 3x3
I am trying to do it with for loop, Unfortunately, it is not working
ngrp=2
u_ij_New_C12=numeric()
mu_beta_C12=numeric()
sig_beta_C12=array()
for ( k in 1:ngrp){
u_ij_New_C12[k]=(u_data$u_ij[rep(class,times=n_g_i)==k] #u_ij_new belongs to cluster-k
-rep(theta_i[class==k], n_g_i[class==k])) #repeting thetas belongs to cluster-k
sig_beta_C12[k]=(solve((t(ux_data[rep(class,times=n_g_i)==k,])
%*%ux_data[rep(class,times=n_g_i)==k,]/
(sigma2))+solve(SIGMA)))
mu_beta_C12[k]=(sig_beta_C12[k] %*%(t(ux_data[rep(class,times=n_g_i)==k,])%*%u_ij_New_C12[k]/sigma2))
}
For k=1, I am expecting the same result for the cluster-1 and the same for cluster-2. For example mu_beta_C12[1] and sig_beta_C12[1] should be exactly similar to m_beta_C1 and sig2_beta_C1 respectively.
Any help is appreciated.
I have a table with a column "Age" that has a values from 1 to 10, and a column "Population" that has values specified for each of the "age" values. I want to generate a cumulative function for population such that resultant values start from ages at least 1 and above, 2 and above, and so on. I mean, the resultant array should be (203,180..and so on). Any help would be appreciated!
Age Population Withdrawn
1 23 3
2 12 2
3 32 2
4 33 3
5 15 4
6 10 1
7 19 2
8 18 3
9 19 1
10 22 5
You can use cumsum and rev:
df$sum_above <- rev(cumsum(rev(df$Population)))
The result:
> df
Age Population sum_above
1 1 23 203
2 2 12 180
3 3 32 168
4 4 33 136
5 5 15 103
6 6 10 88
7 7 19 78
8 8 18 59
9 9 19 41
10 10 22 22
I have a data frame d and I'd like to add a VALUE_GROUP column that looks at the value field and returns the upper limit of the bucket the value falls into
Value Value_group
0<=value<5 5
5<=value<10 10
10<=value<15 15
15<=value<20 20
You can see the Value_group is the max possible value in the bucket i.e. for value between 0 and 5 Value_group = 5
d =data.frame(group = rep("A",20),value = seq(1,20,1))
d
d$Value_Group = ??
Value_group can be added using multiple ifelse() statements but is there a better way?
The result would be:
group value Value_Group
1 A 1 5
2 A 2 5
3 A 3 5
4 A 4 5
5 A 5 5
6 A 6 10
7 A 7 10
8 A 8 10
9 A 9 10
10 A 10 10
11 A 11 15
12 A 12 15
13 A 13 15
14 A 14 15
15 A 15 15
16 A 16 20
17 A 17 20
18 A 18 20
19 A 19 20
20 A 20 20
Thank you.