I have two tables as follows:
A<-data.frame("Task"=c("a","b","c","d","e"),"FC"=(c(100,120,200,300,400)))
B<-data.frame("Task"=c("a","b","c"),"FC"=(c(20,50,30)))
Task FC
1 a 100
2 b 120
3 c 200
4 d 300
5 e 400
Task FC
1 a 20
2 b 50
3 c 30
How can I create table C with output is summarise of coresposing Task from A and B?
Task FC
1 a 120
2 b 170
3 c 230
merge dfs
df=merge(A,B,by="Task",all=F)
summarise the data
df$sum=apply(df[,2:3],1,sum)#sum, sd, min, max or ...
> df
Task FC.x FC.y sum
1 a 100 20 120
2 b 120 50 170
3 c 200 30 230
I have written some code for a university assignment. The assignment is based on various concrete samples and their tensile strengths. There are 20 types of concrete mixtures (made from four different accelerators, and five different plasticisers). Our job is to do a statistical analysis on this data frame:
TStrength accelerator plasticiser
1 3.417543 1 1
2 2.887113 1 2
3 3.600988 1 3
4 3.702631 1 4
5 3.686944 1 5
6 3.699785 1 1
7 3.112972 1 2
8 3.918160 1 3
9 3.600538 1 4
10 2.748832 1 5
11 3.404498 1 1
12 3.735437 1 2
13 3.347577 1 3
14 3.101556 1 4
15 3.527621 1 5
16 3.856831 1 1
17 3.492118 1 2
18 3.928343 1 3
19 3.511689 1 4
20 3.371985 1 5
21 3.069794 2 1
22 3.168010 2 2
23 3.316657 2 3
24 3.455162 2 4
25 2.818250 2 5
26 4.054507 2 1
27 3.065984 2 2
28 3.201351 2 3
29 3.417554 2 4
30 3.364320 2 5
31 3.218677 2 1
32 2.647151 2 2
33 3.222705 2 3
34 3.145210 2 4
35 3.636642 2 5
36 3.317620 2 1
37 3.645922 2 2
38 2.556071 2 3
39 3.177663 2 4
40 3.014374 2 5
41 3.838183 3 1
42 4.155951 3 2
43 3.886330 3 3
44 3.723898 3 4
45 4.425442 3 5
46 3.738460 3 1
47 3.217834 3 2
48 3.942241 3 3
49 3.699851 3 4
50 3.797089 3 5
51 3.652456 3 1
52 4.851609 3 2
53 3.359099 3 3
54 4.089559 3 4
55 4.282991 3 5
56 3.803784 3 1
57 3.519551 3 2
58 3.935084 3 3
59 3.890324 3 4
60 4.611936 3 5
61 3.343098 4 1
62 3.713952 4 2
63 3.629883 4 3
64 3.082509 4 4
65 3.346548 4 5
66 3.277845 4 1
67 3.509506 4 2
68 3.490567 4 3
69 3.235009 4 4
70 3.970925 4 5
71 3.504646 4 1
72 3.270798 4 2
73 3.547298 4 3
74 3.278489 4 4
75 3.322743 4 5
76 2.975010 4 1
77 3.384996 4 2
78 3.399486 4 3
79 3.703567 4 4
80 3.214973 4 5
My first step was to attempt to find out the means of the Tstrength values for each of the 20 concrete types (there are four types of each unique concrete sample). I am very new to R, and my code is certainly not beautiful, but this is the code I wrote to find the means:
#Setting the correct directory
setwd("C:/Users/Matthew/Desktop/Work/Engineering")
#Creating the data frame object, Concrete.
#Note that this will only work if the file
#s...-CW.dat is in the current working directory
#Therefore for this code to work, CreateData.r must
#be run on the individual computer with the
#given matriculation number, and the file must be saved
#in the specified directory
Concrete<-read.table(file='s...-CW.dat',header=TRUE)
#Since the samples of concrete are made from 4 different accelerators and
#5 different plasticisers there will be 4*5=20 unique combinations from
#which concrete samples can come from (i.e. 1,1; 1,2; 4,5 etc).
# There are four samples of each combination
#The next section of code is used to find the mean of the four samples,
#for each combination (20 total)
#creating a list with Tstrength from all (1,1) combinations
#Then finding average
combo1 = list(Concrete[1,1],Concrete[6,1],Concrete[11,1],Concrete[16,1])
combo1mean = mean(unlist(combo1))
#Repeating for (1,2)
combo2 = list(Concrete[2,1],Concrete[7,1],Concrete[12,1],Concrete[17,1])
combo2mean = mean(unlist(combo2))
#Repeating for (1,3)
combo3 = list(Concrete[3,1],Concrete[8,1],Concrete[13,1],Concrete[18,1])
combo3mean = mean(unlist(combo3))
#Repeating for (1,4)
combo4 = list(Concrete[4,1],Concrete[9,1],Concrete[14,1],Concrete[19,1])
combo4mean = mean(unlist(combo4))
#Repeating for (1,5)
combo5 = list(Concrete[5,1],Concrete[10,1],Concrete[15,1],Concrete[20,1])
combo5mean = mean(unlist(combo5))
#Repeating for (2,1)
combo6 = list(Concrete[21,1],Concrete[26,1],Concrete[31,1],Concrete[36,1])
combo6mean = mean(unlist(combo6))
#Repeating for (2,2)
combo7 = list(Concrete[22,1],Concrete[27,1],Concrete[32,1],Concrete[37,1])
combo7mean = mean(unlist(combo7))
#Repeating for (2,3)
combo8 = list(Concrete[23,1],Concrete[28,1],Concrete[33,1],Concrete[38,1])
combo8mean = mean(unlist(combo8))
#Repeating for (2,4)
combo9 = list(Concrete[24,1],Concrete[29,1],Concrete[34,1],Concrete[39,1])
combo9mean = mean(unlist(combo9))
#Repeating for (2,5)
combo10 = list(Concrete[25,1],Concrete[30,1],Concrete[35,1],Concrete[40,1])
combo10mean = mean(unlist(combo10))
#Repeating for (3,1)
combo11 = list(Concrete[41,1],Concrete[46,1],Concrete[51,1],Concrete[56,1])
combo11mean = mean(unlist(combo11))
#Repeating for (3,2)
combo12 = list(Concrete[42,1],Concrete[47,1],Concrete[52,1],Concrete[57,1])
combo12mean = mean(unlist(combo12))
#Repeating for (3,3)
combo13 = list(Concrete[43,1],Concrete[48,1],Concrete[53,1],Concrete[58,1])
combo13mean = mean(unlist(combo13))
#Repeating for (3,4)
combo14 = list(Concrete[44,1],Concrete[49,1],Concrete[54,1],Concrete[59,1])
combo14mean = mean(unlist(combo14))
#Repeating for (3,5)
combo15 = list(Concrete[45,1],Concrete[50,1],Concrete[55,1],Concrete[60,1])
combo15mean = mean(unlist(combo15))
#Repeating for (4,1)
combo16 = list(Concrete[61,1],Concrete[66,1],Concrete[71,1],Concrete[76,1])
combo16mean = mean(unlist(combo16))
#Repeating for (4,2)
combo17 = list(Concrete[62,1],Concrete[67,1],Concrete[72,1],Concrete[77,1])
combo17mean = mean(unlist(combo17))
#Repeating for (4,3)
combo18 = list(Concrete[63,1],Concrete[68,1],Concrete[73,1],Concrete[78,1])
combo18mean = mean(unlist(combo18))
#Repeating for (4,4)
combo19 = list(Concrete[64,1],Concrete[69,1],Concrete[74,1],Concrete[79,1])
combo19mean = mean(unlist(combo19))
#Repeating for (4,5)
combo20 = list(Concrete[65,1],Concrete[70,1],Concrete[75,1],Concrete[80,1])
combo20mean = mean(unlist(combo20))
A few notes about the code: "s..." is just my matriculation number. I have triple checked that I have not made a mistake here regarding either the file name or the directory with where it is stored. CreataData.r is just a script provided to us the generates the data used to create 'Concrete' based on our matriculation number (so we're not just blindly copying each other I suppose).
The problem I am having with the code is that whenever it runs, the object Concrete is created, as is combo1mean, combo2mean and combo3mean. However, I just cannot figure out why the rest of the objects aren't being created.
I have had no success using running the script in the Rgui. After running the script, it tells I check that Concrete has initialised, and I check to see if the combo4mean and above have initialised too, but they never do. I thought it maybe had to do with running the wrong file, or that I hadn't saved the data properly, but the script definitely contains all the code, and I created a new file to see if that would work, but unfortunately it didn't. Also, I have read an introduction to R by W.N. Venables, D.M. Smith and the R Core Team, but nothing there has helped me figure this out.
PS I am not doing this as an easy way out of homework. I have genuinely tried to figure out what is going wrong but I cannot seem to find the problem. I also apologise if the question is inaccurate in anyway, or if I have had misunderstandings, I am very new to R and am trying my best to learn it! Cheers in advance.
EDIT: Just in case anyone is curious, I managed to get the exact same code to work on a different computer, starting from an empty workspace. I'm still not very sure why it didn't work on the first computer, but thanks 42 for the code suggestions.
Adding code that should bypass issues related to reading a text file. This shouls succeed on any R installation:
Concrete <- read.table(text="TStrength accelerator plasticiser
1 3.417543 1 1
2 2.887113 1 2
3 3.600988 1 3
4 3.702631 1 4
5 3.686944 1 5
6 3.699785 1 1
7 3.112972 1 2
8 3.918160 1 3
9 3.600538 1 4
10 2.748832 1 5
11 3.404498 1 1
12 3.735437 1 2
13 3.347577 1 3
14 3.101556 1 4
15 3.527621 1 5
16 3.856831 1 1
17 3.492118 1 2
18 3.928343 1 3
19 3.511689 1 4
20 3.371985 1 5
21 3.069794 2 1
22 3.168010 2 2
23 3.316657 2 3
24 3.455162 2 4
25 2.818250 2 5
26 4.054507 2 1
27 3.065984 2 2
28 3.201351 2 3
29 3.417554 2 4
30 3.364320 2 5
31 3.218677 2 1
32 2.647151 2 2
33 3.222705 2 3
34 3.145210 2 4
35 3.636642 2 5
36 3.317620 2 1
37 3.645922 2 2
38 2.556071 2 3
39 3.177663 2 4
40 3.014374 2 5
41 3.838183 3 1
42 4.155951 3 2
43 3.886330 3 3
44 3.723898 3 4
45 4.425442 3 5
46 3.738460 3 1
47 3.217834 3 2
48 3.942241 3 3
49 3.699851 3 4
50 3.797089 3 5
51 3.652456 3 1
52 4.851609 3 2
53 3.359099 3 3
54 4.089559 3 4
55 4.282991 3 5
56 3.803784 3 1
57 3.519551 3 2
58 3.935084 3 3
59 3.890324 3 4
60 4.611936 3 5
61 3.343098 4 1
62 3.713952 4 2
63 3.629883 4 3
64 3.082509 4 4
65 3.346548 4 5
66 3.277845 4 1
67 3.509506 4 2
68 3.490567 4 3
69 3.235009 4 4
70 3.970925 4 5
71 3.504646 4 1
72 3.270798 4 2
73 3.547298 4 3
74 3.278489 4 4
75 3.322743 4 5
76 2.975010 4 1
77 3.384996 4 2
78 3.399486 4 3
79 3.703567 4 4
80 3.214973 4 5", header=TRUE)
This probably does what you are attempting with about 1/10th (or less) code (and more importantly no errors):
> means.by.type <- with( Concrete, tapply(TStrength,
list( acc=accelerator, plas=plasticiser),
FUN=mean))
> means.by.type
plas
acc 1 2 3 4 5
1 3.594664 3.306910 3.698767 3.479103 3.333845
2 3.415150 3.131767 3.074196 3.298897 3.208397
3 3.758221 3.936236 3.780689 3.850908 4.279364
4 3.275150 3.469813 3.516808 3.324893 3.463797
Importantly, you forgot to offer str or dput on Concrete, so cannot really tell whether you problem is data-prep or coding.
I need to replicate - or at least find an alternative solution - for a SUMIFS function I have in Excel.
I have a transactional database:
SegNbr Index Revenue SUMIF
A 1 10 30
A 1 20 30
A 2 30 100
A 2 40 100
B 1 50 110
B 1 60 110
B 3 70 260
B 3 80 260
and I need to create another column that sums the Revenue, by SegmentNumber, for all indexes that are equal or less the Index in that row. It is a distorted rolling revenue as it will be the same for each SegmentNumber/Index key. This is the formula is this one:
=SUMIFS([Revenue],[SegNbr],[#SegNbr],[Index],"<="&[#Index])
Let's say you have this sample data.frame
dd<-read.table(text="SegNbr Index Revenue
A 1 10
A 1 20
A 2 30
A 2 40
B 1 50
B 1 60
B 3 70
B 3 80", header=T)
Now if we make sure the data is ordered by segment and index, we can do
dd<-dd[order(dd$SegNbr, dd$Index), ] #sort data
dd$OUT<-with(dd,
ave(
ave(Revenue, SegNbr, FUN=cumsum), #get running sum per seg
interaction(SegNbr, Index, drop=T),
FUN=max, na.rm=T) #find largest sum per index per seg
)
dd
This gives
SegNbr Index Revenue OUT
1 A 1 10 30
2 A 1 20 30
3 A 2 30 100
4 A 2 40 100
5 B 1 50 110
6 B 1 60 110
7 B 3 70 260
8 B 3 80 260
as desired.