Note: as I'm writing this I can't figure out how to insert images, I'll work on it after posting, but if you run the code below, you should be able to see the graphs I'm talking about....sorry!
Essentially, I have these two graphs and I want them to be on the same plot (overlayed on top of one another), but I need them to use different color schemes or I won't be able to tell them apart very easily.
I've looked everywhere on this site and while there are a lot of similar questions, none of them have worked quite in the way that I need them to. The closest ones I've linked below, just know that I've read them and they did not solve my issues:
Distinct color palettes for two different groups in ggplot2
R ggplot two color palette on the same plot
The first graph uses this data (shortened to 50 lines, actually goes to about 1000), RuleCount repeats 1-14 over and over, TrainingPass goes up until about 60
RuleCount TrainingPass m4Accuracy
1 1 -1 0.000000000
2 2 -1 0.000000000
3 3 -1 0.004225352
4 4 -1 0.014225352
5 5 -1 0.022816901
6 6 -1 0.182957746
7 7 -1 0.194507042
8 8 -1 0.207183099
9 9 -1 0.239859155
10 10 -1 0.362394366
11 11 -1 0.430704225
12 12 -1 0.567887324
13 13 -1 0.582535211
14 14 -1 0.602676056
15 1 0 0.000000000
16 2 0 0.000281690
17 3 0 0.006901408
18 4 0 0.018732394
19 5 0 0.031267606
20 6 0 0.202676056
21 7 0 0.215633803
22 8 0 0.231830986
23 9 0 0.262253521
24 10 0 0.373661972
25 11 0 0.440281690
26 12 0 0.573802817
27 13 0 0.588169014
28 14 0 0.608873239
29 1 1 0.000985915
30 2 1 0.014788732
31 3 1 0.032957746
32 4 1 0.071408451
33 5 1 0.113943662
34 6 1 0.276760563
35 7 1 0.290281690
36 8 1 0.303943662
37 9 1 0.335633803
38 10 1 0.438028169
39 11 1 0.501971831
40 12 1 0.625070423
41 13 1 0.637323944
42 14 1 0.658169014
43 1 2 0.000985915
44 2 2 0.015915493
45 3 2 0.030704225
46 4 2 0.076619718
47 5 2 0.119436620
48 6 2 0.280563380
49 7 2 0.294507042
50 8 2 0.308732394
I graphed it using this code:
ggplot(df_m4, aes(x=RuleCount, y=m4Accuracy, group = TrainingPass, color = TrainingPass)) +
geom_line()+
scale_color_gradient(low = "green", high = "blue")
Resulting in this graph:
m4 Accuracy
The second graph is essentially the same data and code, except rather than getting a bunch of slightly varying lines on the graph, each of the lines ends up being the same line
data:
RuleCount TrainingPass Accuracy
1 1 -1 0.000422535
2 2 -1 0.000422535
3 3 -1 0.002676056
4 4 -1 0.005915493
5 5 -1 0.007746479
6 6 -1 0.053239437
7 7 -1 0.059718310
8 8 -1 0.068309859
9 9 -1 0.099859155
10 10 -1 0.197042254
11 11 -1 0.256197183
12 12 -1 0.421971831
13 13 -1 0.440422535
14 14 -1 0.468028169
15 1 0 0.000422535
16 2 0 0.000422535
17 3 0 0.002676056
18 4 0 0.005915493
19 5 0 0.007746479
20 6 0 0.053239437
21 7 0 0.059718310
22 8 0 0.068309859
23 9 0 0.099859155
24 10 0 0.197042254
25 11 0 0.256197183
26 12 0 0.421971831
27 13 0 0.440422535
28 14 0 0.468028169
29 1 1 0.000422535
30 2 1 0.000422535
31 3 1 0.002676056
32 4 1 0.005915493
33 5 1 0.007746479
34 6 1 0.053239437
35 7 1 0.059718310
36 8 1 0.068309859
37 9 1 0.099859155
38 10 1 0.197042254
39 11 1 0.256197183
40 12 1 0.421971831
41 13 1 0.440422535
42 14 1 0.468028169
43 1 2 0.000422535
44 2 2 0.000422535
45 3 2 0.002676056
46 4 2 0.005915493
47 5 2 0.007746479
48 6 2 0.053239437
49 7 2 0.059718310
50 8 2 0.068309859
code:
ggplot(df_rules_only, aes(x=RuleCount, y=Accuracy, group = TrainingPass, color = TrainingPass)) +
geom_line() +
scale_color_gradient(low = "green", high = "blue")
Resulting in this graph:
rules only Accuracy
I understand how to get the data on to the same graph. By combining my two data frames and using the code below, I can add the 'rules_only' data to the 'm4' graph:
ggplot(df_Training, aes(x=ruleCount, y=m4Accuracy, group = training_pass, color = training_pass)) +
geom_line() +
scale_color_gradient(low = "green", high = "blue")+
geom_line(aes(x=ruleCount, y=rulesOnlyAccuracy))
Resulting in this graph:
both_data_sets
The problem is that the new data blends right in with the old because it has the same color scheme.
At first I tried keeping them in the same data frame and just adding "color = 'orange'" to the last line of the previous code, but that gives me the error: "Error: Discrete value supplied to continuous scale"
Next I split them up into the two data frames you see above and tried to graph them this way:
ggplot(df_m4, aes(x=RuleCount, y=m4Accuracy, group = TrainingPass, color = TrainingPass)) +
geom_line() +
scale_color_gradient(low = "green", high = "blue")+
geom_line(df_rules_only, aes(x=RuleCount, y=Accuracy, color = "orange"))
but I get the error: "Error: mapping must be created by aes()"
Those last two attempts were kind of shots in the dark since I couldn't find anything else to try, but I'm pretty certain R doesn't work that way.
I'd really prefer for answers to use ggplot since other graphs never look quite as good. Just really feel like I've been going about this all wrong and could really use some help! Thank you in advance :)
Very complicated question for a very simple answer. Wanted to move this out of the comments but #aosmith helped me out. The code below makes my second group of data a different color:
ggplot(df_Training, aes(x=ruleCount, y=m4Accuracy, group = training_pass, color = training_pass)) +
geom_line() +
geom_line(aes(x=ruleCount, y=rulesOnlyAccuracy), color = "orange")
Just have to work on adding a second legend now!
I am having trouble summing select columns within a data frame, a basic problem that I've seen numerous similar, but not identical questions/answers for on StackOverflow.
With this perhaps overly complex data frame:
site<-c(223,257,223,223,257,298,223,298,298,211)
moisture<-c(7,7,7,7,7,8,7,8,8,5)
shade<-c(83,18,83,83,18,76,83,76,76,51)
sampleID<-c(158,163,222,107,106,166,188,186,262,114)
bluestm<-c(3,4,6,3,0,0,1,1,1,0)
foxtail<-c(0,2,0,4,0,1,1,0,3,0)
crabgr<-c(0,0,2,0,33,0,2,1,2,0)
johnson<-c(0,0,0,7,0,8,1,0,1,0)
sedge1<-c(2,0,3,0,0,9,1,0,4,0)
sedge2<-c(0,0,1,0,1,0,0,1,1,1)
redoak<-c(9,1,0,5,0,4,0,0,5,0)
blkoak<-c(0,22,0,23,0,23,22,17,0,0)
my.data<-data.frame(site,moisture,shade,sampleID,bluestm,foxtail,crabgr,johnson,sedge1,sedge2,redoak,blkoak)
I want to sum the counts of each plant species (bluestem, foxtail, etc. - columns 4-12 in this example) within each site, by summing rows that have the same site number. I also want to keep information about moisture and shade (these are consistant withing site, but may also be the same between sites), and want a new column that is the count of number of rows summed.
the result would look like this
site,moisture,shade,NumSamples,bluestm,foxtail,crabgr,johnson,sedge1,sedge2,redoak,blkoak
211,5,51,1,0,0,0,0,0,1,0,0
223,7,83,4,13,5,4,8,6,1,14,45
257,7,18,2,4,2,33,0,0,1,1,22
298,8,76,3,2,4,3,9,13,2,9,40
The problem I am having is that, my real data sets (and I have several of them) have from 50 to 300 plant species, and I want refer a range of columns (in this case, [5:12] ) instead of my.data$foxtail, my.data$sedge1, etc., which is going to be very difficult with 300 species.
I know I can start off by deleting the column I don't need (SampleID)
my.data$SampleID <- NULL
but then how do I get the sums? I've messed with the aggregate command and with ddply, and have seen lots of examples which call particular column names, but just haven't gotten anything to work. I recognize this is a variant of a commonly asked and simple type of question, but I've spent hours without resolving it on my own. So, apologies for my stupidity!
This works ok:
x <- aggregate(my.data[,5:12], by=list(site=my.data$site, moisture=my.data$moisture, shade=my.data$shade), FUN=sum, na.rm=T)
library(dplyr)
my.data %>%
group_by(site) %>%
tally %>%
left_join(x)
site n moisture shade bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 211 1 5 51 0 0 0 0 0 1 0 0
2 223 4 7 83 13 5 4 8 6 1 14 45
3 257 2 7 18 4 2 33 0 0 1 1 22
4 298 3 8 76 2 4 3 9 13 2 9 40
Or to do it all in dplyr
my.data %>%
group_by(site) %>%
tally %>%
left_join(my.data) %>%
group_by(site,moisture,shade,n) %>%
summarise_each(funs(sum=sum)) %>%
select(-sampleID)
site moisture shade n bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 211 5 51 1 0 0 0 0 0 1 0 0
2 223 7 83 4 13 5 4 8 6 1 14 45
3 257 7 18 2 4 2 33 0 0 1 1 22
4 298 8 76 3 2 4 3 9 13 2 9 40
Try following using base R:
outdf<-data.frame(site=numeric(),moisture=numeric(),shade=numeric(),bluestm=numeric(),foxtail=numeric(),crabgr=numeric(),johnson=numeric(),sedge1=numeric(),sedge2=numeric(),redoak=numeric(),blkoak=numeric())
my.data$basic = with(my.data, paste(site, moisture, shade))
for(b in unique(my.data$basic)) {
outdf[nrow(outdf)+1,1:3] = unlist(strsplit(b,' '))
for(i in 4:11)
outdf[nrow(outdf),i]= sum(my.data[my.data$basic==b,i])
}
outdf
site moisture shade bluestm foxtail crabgr johnson sedge1 sedge2 redoak blkoak
1 223 7 83 13 5 4 8 6 1 14 45
2 257 7 18 4 2 33 0 0 1 1 22
3 298 8 76 2 4 3 9 13 2 9 40
4 211 5 51 0 0 0 0 0 1 0 0
Im trying to estimate the present value of a stream of payments using the fvm in the financial package.
y <- tvm(pv=NA,i=2.5,n=1:10,pmt=-c(5,5,5,5,5,8,8,8,8,8))
The result that I obtain is:
y
Time Value of Money model
I% #N PV FV PMT Days #Adv P/YR C/YR
1 2.5 1 4.99 0 -5 30 0 12 12
2 2.5 2 9.97 0 -5 30 0 12 12
3 2.5 3 14.94 0 -5 30 0 12 12
4 2.5 4 19.90 0 -5 30 0 12 12
5 2.5 5 24.84 0 -5 30 0 12 12
6 2.5 6 47.65 0 -8 30 0 12 12
7 2.5 7 55.54 0 -8 30 0 12 12
8 2.5 8 63.40 0 -8 30 0 12 12
9 2.5 9 71.26 0 -8 30 0 12 12
10 2.5 10 79.09 0 -8 30 0 12 12
There is a jump in the PV from 5 to 6 (when the price changes to 8) that appears to be incorrect. This affects the result in y[10,3] which is the result that I'm interested in obtaining.
The NPV formula in Excel produces similar results when the payments are the same throughout the whole stream, however, when the vector of paymets is variable, the resuls with the tvm formula and the NPV differ. I need to obtain the same result that the NPV formula provides in Excel.
What should I do to make this work?
The cf formula helps but it is not always consistent with Excel.
I solved my problem using the following function:
npv<-function(a,b,c) sum(a/(1+b)^c)