Deal with cycle in R - r

I have data as follows:
Schedule<-data.frame("Task"=c("A","B","C","D"),"CycleTask"=c(100,150,120,30))
FDate<-data.frame("Date"=seq(1:6),"Cycle"=c(90,100,130,150,170,200))
Task TaskCycle
1 A 100
2 B 150
3 C 120
4 D 30
Date Cycle
1 1 90
2 2 100
3 3 130
4 4 150
5 5 170
6 6 200
Each task has one required cycle and repeat over time. If the Task is done, it will be counted again as a loop. For example: Task D repeat every 30 cycles
I want to find the date that need to do the Task. Like this
Date Cycle Task
1 1 90 D
2 2 100 A
3 3 130 C,D
4 4 150 B
5 5 170 D
6 6 200 A,D
How can I use loop function to deal with this cycle?

Related

Removing rows in R where the contiguous values of rows in one column are further than 1 numeric value apart, whilst accounting for participant ID?

I am trying to clean up a timeseries with multiple data points. The data is arranged by day and by 'beep'. I want to only keep items that are 1 beep away from each other on the same day.
In order to do this I have created a dummy variable by multiplying day number by 10 and adding the beep number to it.
I was wondering if it would be possible to use some kind of clause to specify that I want to keep data that is contiguously = 1 to its lead OR lag variable, but also less than 50 (so that it will keep the days isolated). Alternatively is there a way to group by participant and then by day so that it will apply across participant and across each day in a way that won't delete incorrect data between days e.g. it should not delete day 2 beep 1 for being too far away from day 1 beep 7.
I am doing this so I can use a function called lagvar from an ESM package to created a time-lagged series. Before doing this I want to make sure that any variables in day_beep that are greater than 1 from their contiguous neighbours are removed.
E.g.
Take the following rows and day_beep values
Participant ID Day Beep Dummy Variable
1 1 1 101
1 1 2 102
1 1 4 104
**1 1 7 107**
1 2 3 203
1 2 4 204
2 1 2 102
2 1 3 103
**2 2 5 205
2 3 4 305**
**3 1 1 101**
3 2 4 204
3 2 5 205
**4 1 7 107**
4 4 4 404
4 4 5 405
In this instance I would want to remove the data held between the asterisks as it is either contiguously more than 1 beep from its neighbours, or an isolated beep in the series.
What would be the easiest way to do this for the entire dataframe?
Any help would be greatly appreciated!
You can use lead and lag from dplyr to keep only the rows that have a contiguous value before or after:
library(dplyr)
df %>%
group_by(Participant_ID) %>%
filter(((Dummy_Variable - lag(Dummy_Variable)) == 1) |
(lead(Dummy_Variable) - Dummy_Variable == 1))
output
Participant_ID Day Beep Dummy_Variable
1 1 1 1 101
2 1 1 2 102
3 1 2 3 203
4 1 2 4 204
5 2 1 2 102
6 2 1 3 103
7 3 2 4 204
8 3 2 5 205
9 4 4 4 404
10 4 4 5 405

Summarise value from two table in R

I have two tables as follows:
A<-data.frame("Task"=c("a","b","c","d","e"),"FC"=(c(100,120,200,300,400)))
B<-data.frame("Task"=c("a","b","c"),"FC"=(c(20,50,30)))
Task FC
1 a 100
2 b 120
3 c 200
4 d 300
5 e 400
Task FC
1 a 20
2 b 50
3 c 30
How can I create table C with output is summarise of coresposing Task from A and B?
Task FC
1 a 120
2 b 170
3 c 230
merge dfs
df=merge(A,B,by="Task",all=F)
summarise the data
df$sum=apply(df[,2:3],1,sum)#sum, sd, min, max or ...
> df
Task FC.x FC.y sum
1 a 100 20 120
2 b 120 50 170
3 c 200 30 230

Coloring dgCMatrix image by factor in R

I am trying to color a sparse matrix image according to a grouping factor. I know the solution is related to matrix coloring in the lattice package but I have troubles to handle it.
I have a list of hits on an app list. Every hit is related to a user and a app at a specific time.
- On the y axis are users sorted by first install of the app
Every user then has a new line for his pages hits
- On the x axis is the time
Points are hits
Here is a preview of the data:
library(Matrix)
indexUser indexInstall time
1 1 1 3
2 1 1 17
3 1 1 19
4 1 1 32
5 1 1 81
6 1 1 86
7 1 1 124
8 1 1 231
9 1 1 233
10 1 2 249
11 2 3 4
12 2 3 6
13 2 3 7
14 2 3 15
15 2 3 25
16 2 3 32
17 2 3 45
18 2 3 74
19 2 3 75
20 3 4 36
21 3 4 37
22 3 4 113
23 4 5 69
24 4 5 70
25 4 5 71
I then create a sparse matrix as the full dataset is way larger than that (10000+ x 1000)
sM <- sparseMatrix(i=dat$indexInstall, j=dat$time, x=1)
And show an image of it:
image(sM)
I want to color every lines according to the indexUser column. For example to plot user 1 in blue and all others un red
Thanks in advance

R code is not creating objects?

I have written some code for a university assignment. The assignment is based on various concrete samples and their tensile strengths. There are 20 types of concrete mixtures (made from four different accelerators, and five different plasticisers). Our job is to do a statistical analysis on this data frame:
TStrength accelerator plasticiser
1 3.417543 1 1
2 2.887113 1 2
3 3.600988 1 3
4 3.702631 1 4
5 3.686944 1 5
6 3.699785 1 1
7 3.112972 1 2
8 3.918160 1 3
9 3.600538 1 4
10 2.748832 1 5
11 3.404498 1 1
12 3.735437 1 2
13 3.347577 1 3
14 3.101556 1 4
15 3.527621 1 5
16 3.856831 1 1
17 3.492118 1 2
18 3.928343 1 3
19 3.511689 1 4
20 3.371985 1 5
21 3.069794 2 1
22 3.168010 2 2
23 3.316657 2 3
24 3.455162 2 4
25 2.818250 2 5
26 4.054507 2 1
27 3.065984 2 2
28 3.201351 2 3
29 3.417554 2 4
30 3.364320 2 5
31 3.218677 2 1
32 2.647151 2 2
33 3.222705 2 3
34 3.145210 2 4
35 3.636642 2 5
36 3.317620 2 1
37 3.645922 2 2
38 2.556071 2 3
39 3.177663 2 4
40 3.014374 2 5
41 3.838183 3 1
42 4.155951 3 2
43 3.886330 3 3
44 3.723898 3 4
45 4.425442 3 5
46 3.738460 3 1
47 3.217834 3 2
48 3.942241 3 3
49 3.699851 3 4
50 3.797089 3 5
51 3.652456 3 1
52 4.851609 3 2
53 3.359099 3 3
54 4.089559 3 4
55 4.282991 3 5
56 3.803784 3 1
57 3.519551 3 2
58 3.935084 3 3
59 3.890324 3 4
60 4.611936 3 5
61 3.343098 4 1
62 3.713952 4 2
63 3.629883 4 3
64 3.082509 4 4
65 3.346548 4 5
66 3.277845 4 1
67 3.509506 4 2
68 3.490567 4 3
69 3.235009 4 4
70 3.970925 4 5
71 3.504646 4 1
72 3.270798 4 2
73 3.547298 4 3
74 3.278489 4 4
75 3.322743 4 5
76 2.975010 4 1
77 3.384996 4 2
78 3.399486 4 3
79 3.703567 4 4
80 3.214973 4 5
My first step was to attempt to find out the means of the Tstrength values for each of the 20 concrete types (there are four types of each unique concrete sample). I am very new to R, and my code is certainly not beautiful, but this is the code I wrote to find the means:
#Setting the correct directory
setwd("C:/Users/Matthew/Desktop/Work/Engineering")
#Creating the data frame object, Concrete.
#Note that this will only work if the file
#s...-CW.dat is in the current working directory
#Therefore for this code to work, CreateData.r must
#be run on the individual computer with the
#given matriculation number, and the file must be saved
#in the specified directory
Concrete<-read.table(file='s...-CW.dat',header=TRUE)
#Since the samples of concrete are made from 4 different accelerators and
#5 different plasticisers there will be 4*5=20 unique combinations from
#which concrete samples can come from (i.e. 1,1; 1,2; 4,5 etc).
# There are four samples of each combination
#The next section of code is used to find the mean of the four samples,
#for each combination (20 total)
#creating a list with Tstrength from all (1,1) combinations
#Then finding average
combo1 = list(Concrete[1,1],Concrete[6,1],Concrete[11,1],Concrete[16,1])
combo1mean = mean(unlist(combo1))
#Repeating for (1,2)
combo2 = list(Concrete[2,1],Concrete[7,1],Concrete[12,1],Concrete[17,1])
combo2mean = mean(unlist(combo2))
#Repeating for (1,3)
combo3 = list(Concrete[3,1],Concrete[8,1],Concrete[13,1],Concrete[18,1])
combo3mean = mean(unlist(combo3))
#Repeating for (1,4)
combo4 = list(Concrete[4,1],Concrete[9,1],Concrete[14,1],Concrete[19,1])
combo4mean = mean(unlist(combo4))
#Repeating for (1,5)
combo5 = list(Concrete[5,1],Concrete[10,1],Concrete[15,1],Concrete[20,1])
combo5mean = mean(unlist(combo5))
#Repeating for (2,1)
combo6 = list(Concrete[21,1],Concrete[26,1],Concrete[31,1],Concrete[36,1])
combo6mean = mean(unlist(combo6))
#Repeating for (2,2)
combo7 = list(Concrete[22,1],Concrete[27,1],Concrete[32,1],Concrete[37,1])
combo7mean = mean(unlist(combo7))
#Repeating for (2,3)
combo8 = list(Concrete[23,1],Concrete[28,1],Concrete[33,1],Concrete[38,1])
combo8mean = mean(unlist(combo8))
#Repeating for (2,4)
combo9 = list(Concrete[24,1],Concrete[29,1],Concrete[34,1],Concrete[39,1])
combo9mean = mean(unlist(combo9))
#Repeating for (2,5)
combo10 = list(Concrete[25,1],Concrete[30,1],Concrete[35,1],Concrete[40,1])
combo10mean = mean(unlist(combo10))
#Repeating for (3,1)
combo11 = list(Concrete[41,1],Concrete[46,1],Concrete[51,1],Concrete[56,1])
combo11mean = mean(unlist(combo11))
#Repeating for (3,2)
combo12 = list(Concrete[42,1],Concrete[47,1],Concrete[52,1],Concrete[57,1])
combo12mean = mean(unlist(combo12))
#Repeating for (3,3)
combo13 = list(Concrete[43,1],Concrete[48,1],Concrete[53,1],Concrete[58,1])
combo13mean = mean(unlist(combo13))
#Repeating for (3,4)
combo14 = list(Concrete[44,1],Concrete[49,1],Concrete[54,1],Concrete[59,1])
combo14mean = mean(unlist(combo14))
#Repeating for (3,5)
combo15 = list(Concrete[45,1],Concrete[50,1],Concrete[55,1],Concrete[60,1])
combo15mean = mean(unlist(combo15))
#Repeating for (4,1)
combo16 = list(Concrete[61,1],Concrete[66,1],Concrete[71,1],Concrete[76,1])
combo16mean = mean(unlist(combo16))
#Repeating for (4,2)
combo17 = list(Concrete[62,1],Concrete[67,1],Concrete[72,1],Concrete[77,1])
combo17mean = mean(unlist(combo17))
#Repeating for (4,3)
combo18 = list(Concrete[63,1],Concrete[68,1],Concrete[73,1],Concrete[78,1])
combo18mean = mean(unlist(combo18))
#Repeating for (4,4)
combo19 = list(Concrete[64,1],Concrete[69,1],Concrete[74,1],Concrete[79,1])
combo19mean = mean(unlist(combo19))
#Repeating for (4,5)
combo20 = list(Concrete[65,1],Concrete[70,1],Concrete[75,1],Concrete[80,1])
combo20mean = mean(unlist(combo20))
A few notes about the code: "s..." is just my matriculation number. I have triple checked that I have not made a mistake here regarding either the file name or the directory with where it is stored. CreataData.r is just a script provided to us the generates the data used to create 'Concrete' based on our matriculation number (so we're not just blindly copying each other I suppose).
The problem I am having with the code is that whenever it runs, the object Concrete is created, as is combo1mean, combo2mean and combo3mean. However, I just cannot figure out why the rest of the objects aren't being created.
I have had no success using running the script in the Rgui. After running the script, it tells I check that Concrete has initialised, and I check to see if the combo4mean and above have initialised too, but they never do. I thought it maybe had to do with running the wrong file, or that I hadn't saved the data properly, but the script definitely contains all the code, and I created a new file to see if that would work, but unfortunately it didn't. Also, I have read an introduction to R by W.N. Venables, D.M. Smith and the R Core Team, but nothing there has helped me figure this out.
PS I am not doing this as an easy way out of homework. I have genuinely tried to figure out what is going wrong but I cannot seem to find the problem. I also apologise if the question is inaccurate in anyway, or if I have had misunderstandings, I am very new to R and am trying my best to learn it! Cheers in advance.
EDIT: Just in case anyone is curious, I managed to get the exact same code to work on a different computer, starting from an empty workspace. I'm still not very sure why it didn't work on the first computer, but thanks 42 for the code suggestions.
Adding code that should bypass issues related to reading a text file. This shouls succeed on any R installation:
Concrete <- read.table(text="TStrength accelerator plasticiser
1 3.417543 1 1
2 2.887113 1 2
3 3.600988 1 3
4 3.702631 1 4
5 3.686944 1 5
6 3.699785 1 1
7 3.112972 1 2
8 3.918160 1 3
9 3.600538 1 4
10 2.748832 1 5
11 3.404498 1 1
12 3.735437 1 2
13 3.347577 1 3
14 3.101556 1 4
15 3.527621 1 5
16 3.856831 1 1
17 3.492118 1 2
18 3.928343 1 3
19 3.511689 1 4
20 3.371985 1 5
21 3.069794 2 1
22 3.168010 2 2
23 3.316657 2 3
24 3.455162 2 4
25 2.818250 2 5
26 4.054507 2 1
27 3.065984 2 2
28 3.201351 2 3
29 3.417554 2 4
30 3.364320 2 5
31 3.218677 2 1
32 2.647151 2 2
33 3.222705 2 3
34 3.145210 2 4
35 3.636642 2 5
36 3.317620 2 1
37 3.645922 2 2
38 2.556071 2 3
39 3.177663 2 4
40 3.014374 2 5
41 3.838183 3 1
42 4.155951 3 2
43 3.886330 3 3
44 3.723898 3 4
45 4.425442 3 5
46 3.738460 3 1
47 3.217834 3 2
48 3.942241 3 3
49 3.699851 3 4
50 3.797089 3 5
51 3.652456 3 1
52 4.851609 3 2
53 3.359099 3 3
54 4.089559 3 4
55 4.282991 3 5
56 3.803784 3 1
57 3.519551 3 2
58 3.935084 3 3
59 3.890324 3 4
60 4.611936 3 5
61 3.343098 4 1
62 3.713952 4 2
63 3.629883 4 3
64 3.082509 4 4
65 3.346548 4 5
66 3.277845 4 1
67 3.509506 4 2
68 3.490567 4 3
69 3.235009 4 4
70 3.970925 4 5
71 3.504646 4 1
72 3.270798 4 2
73 3.547298 4 3
74 3.278489 4 4
75 3.322743 4 5
76 2.975010 4 1
77 3.384996 4 2
78 3.399486 4 3
79 3.703567 4 4
80 3.214973 4 5", header=TRUE)
This probably does what you are attempting with about 1/10th (or less) code (and more importantly no errors):
> means.by.type <- with( Concrete, tapply(TStrength,
list( acc=accelerator, plas=plasticiser),
FUN=mean))
> means.by.type
plas
acc 1 2 3 4 5
1 3.594664 3.306910 3.698767 3.479103 3.333845
2 3.415150 3.131767 3.074196 3.298897 3.208397
3 3.758221 3.936236 3.780689 3.850908 4.279364
4 3.275150 3.469813 3.516808 3.324893 3.463797
Importantly, you forgot to offer str or dput on Concrete, so cannot really tell whether you problem is data-prep or coding.

Replicating an Excel SUMIFS formula

I need to replicate - or at least find an alternative solution - for a SUMIFS function I have in Excel.
I have a transactional database:
SegNbr Index Revenue SUMIF
A 1 10 30
A 1 20 30
A 2 30 100
A 2 40 100
B 1 50 110
B 1 60 110
B 3 70 260
B 3 80 260
and I need to create another column that sums the Revenue, by SegmentNumber, for all indexes that are equal or less the Index in that row. It is a distorted rolling revenue as it will be the same for each SegmentNumber/Index key. This is the formula is this one:
=SUMIFS([Revenue],[SegNbr],[#SegNbr],[Index],"<="&[#Index])
Let's say you have this sample data.frame
dd<-read.table(text="SegNbr Index Revenue
A 1 10
A 1 20
A 2 30
A 2 40
B 1 50
B 1 60
B 3 70
B 3 80", header=T)
Now if we make sure the data is ordered by segment and index, we can do
dd<-dd[order(dd$SegNbr, dd$Index), ] #sort data
dd$OUT<-with(dd,
ave(
ave(Revenue, SegNbr, FUN=cumsum), #get running sum per seg
interaction(SegNbr, Index, drop=T),
FUN=max, na.rm=T) #find largest sum per index per seg
)
dd
This gives
SegNbr Index Revenue OUT
1 A 1 10 30
2 A 1 20 30
3 A 2 30 100
4 A 2 40 100
5 B 1 50 110
6 B 1 60 110
7 B 3 70 260
8 B 3 80 260
as desired.

Resources