In R; I would like to do something in R rather than excel because excel can't handle the calculation. In excel the calculation is: =A2+SUM($B$2:B2) - r

I want col c phys_pos to be the value in col a position plus the accumulative value of col b length. In excel the calculation is: =A2+SUM($B$2:B2), but excel can't handle such a lot of data. Thanks all.
The data I would like:
position length phys_pos
12 45 57
97 0 142
135 0 180
498 0 543
512 0 557
16 67 128
76 0 188
89 0 201
101 0 213
152 0 264
3 103 218
19 0 234
76 0 291
88 0 303

Look into dplyr https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
install.packages("dplyr")
library(dplyr)
df <- df %>% mutate(phys_pos=cumsum(length)+position)
I am assuming your data.frame is named df
Or with base R
df$phys_pos <- cumsum(df$length) + df$position

Assuming your data is stored in a dataframe called "dat":
acc <- 0
for(i in 1:nrow(dat)){
acc <- acc + dat[i,"length"]
dat[i,"phys_pos"] <- dat[i,"position"]+acc
}
This is simple stuff. If you would do some tutorials you could learn to do it on your own pretty fast.

Related

Updating Matrix using an apply

I have a predefined matrix M = matrix(0,5,4) . I want to update the matrix elements from value zero to proper value basis the value of the dataframe df object as per condition df$colA = x (matrix row element) and df$colB = y (matrix column element). I have set the row names and col names with the respective unique colA and colB values. ColA and ColB values are discrete integers instead of taking regular sequence values.
M=matrix(0,5,4)
rownames(M)=c(135,138,145,146,151)
colnames(M)=c(192,204,206,207)
192 204 206 207
135 0 0 0 0
138 0 0 0 0
145 0 0 0 0
146 0 0 0 0
151 0 0 0 0
df-> ColA ColB ColC
135 192 1
135 204 1
135 206 -1
138 192 -1
138 206 1
138 207 1
145 192 -1
145 204 -1
145 206 -1
145 207 1
146 206 1
146 207 1
151 192 -1
151 207 1
for (r in rownames(M)) {
for (c in colnames(M)) {
tmp = df[(df$colA == c & df$colB==r),]$colC
if (!(length(tmp) == 0)) {
M[(rownames(M) == r),(colnames(M) == c)]= tmp
}
}
}
Instead of using for loop, wondering if this can be achieved with an apply or outer function with the matric updation part being handled using a custom function. Please help how to achieve this.
I was trying to refer to this link, but no luck.
To modify a matrix, you can use apply() function. The code will be:
apply(M,1,function(x) <your_function>) # If you want to run for each row
apply(M,2,function(x) <your function>) # If you want to run for each column
If you write the function you want or a short example, we will offer you a better help.

R populating columns based on previous values

I am trying to populate a series like this.
My result ACTUAL Expected
FWK_SEQ_NBR a initial_d initial_c b c d b c d
914 9.161 131 62 0 62 69 0 62 69
915 9.087 131 0 0 53 78 0 53 78
916 8.772 131 0 0 44 140 0 44 87
917 8.698 131 0 0 0 140 0 35 96
918 7.985 131 0 69 52 139 69 96 35
919 6.985 131 0 78 63 138 78 168 0
920 7.077 131 0 140 126 138 87 247 0
921 6.651 131 0 140 126 138 96 336 0
922 6.707 131 0 139 125 138 35 364 0
Logic
a given
b lag of d by 4
c initial c for first week thereafter (c previous row + b current - a current)
d initial d - c current
Here is the code i used
DS1 = DS %>%
mutate(c = ifelse(FWK_SEQ_NBR == min(FWK_SEQ_NBR), intial_c, 0) ) %>%
mutate(c = lag(c) + b - a)) %>%
mutate(d = initial_d - c) %>%
mutate(d = ifelse(d<0,0,d)) %>%
mutate(b = shift(d, n=4, fill=0, type="lag"))
I am not getting the c right, do you know what i am missing. I have also attached the image of the actual and expected output. Thank you for your help!
Actual and Expected values Image
Second Image - Added Product and Store to the list of columns
Image - Product and Store as the first two columns- please help
Below is the actual code, I have also copied the image of the expected and actual output. thank you!
Your example is not what I would call reproducible and the code snippet also did not provide much insight on what you were trying to do. However the screen image from excel was very helpful. Here is my solution
df <- as.data.frame(cbind(a = c(1:9), b = 0, c = 0, d = NA))
c_init = 62
d_init = 131
df$d <- d_init
df$c[1] <- c_init # initial data frame is ready at this stage
iter <- dim(df)[1] # for the loop to run item times
for(i in 1:iter){
if(i>4){
df[i, "b"] = df[i-4,"d"] # Calculate b with the lag
}
if(i>1){
df[i, "c"] = df[i-1, "c"] + df[i, "b"] - df[i, "a"] # calc c
}
df[i, "d"] <- d_init - df[i, "c"] # calc d
if(df[i, "d"] < 0) {
df[i, "d"] <- 0 # reset negative d values
}
}

adding and subtracting values in multiple data frames of different lengths - flow analysis

Thank you jakub and Hack-R!
Yes, these are my actual data. The data I am starting from are the following:
[A] #first, longer dataset
CODE_t2 VALUE_t2
111 3641
112 1691
121 1271
122 185
123 522
124 0
131 0
132 0
133 0
141 626
142 170
211 0
212 0
213 0
221 0
222 0
223 0
231 95
241 0
242 0
243 0
244 0
311 129
312 1214
313 0
321 0
322 0
323 565
324 0
331 0
332 0
333 0
334 0
335 0
411 0
412 0
421 0
422 0
423 0
511 6
512 0
521 0
522 0
523 87
In the above table, we can see the 44 land use CODES (which I inappropriately named "class" in my first entry) for a certain city. Some values are just 0, meaning that there are no land uses of that type in that city.
Starting from this table, which displays all the land use types for t2 and their corresponding values ("VALUE_t2") I have to reconstruct the previous amount of land uses ("VALUE_t1") per each type.
To do so, I have to add and subtract the value per each land use (if not 0) by using the "change land use table" from t2 to t1, which is the following:
[B] #second, shorter dataset
CODE_t2 CODE_t1 VALUE_CHANGE1
121 112 2
121 133 12
121 323 0
121 511 3
121 523 2
123 523 4
133 123 3
133 523 4
141 231 12
141 511 37
So, in order to get VALUE_t1 from VALUE_t2, I have, for instance, to subtract 2 + 12 + 0 + 3 + 2 hectares (first 5 values of the second, shorter table) from the value of land use type/code 121 of the first, longer table (1271 ha), and add 2 hectares to land type 112, 12 hectares to land type 133, 3 hectares to land type 511 and 2 hectares to land type 523. And I have to do that for all the land use types different than 0, and later also from t1 to t0.
What I have to do is a sort of loop that would both add and subtract, per each land use type/code, the values from VALUE_t2 to VALUE_t1, and from VALUE_t1 to VALUE_t0.
Once I estimated VALUE_t1 and VALUE_t0, I will put the values in a simple table showing the relative variation (here the values are not real):
CODE VALUE_t0 VALUE_t2 % VAR t2-t0
code1 50 100 ((100-50)/50)*100
code2 70 80 ((80-70)/70)*100
code3 45 34 ((34-45)/45)*100
What I could do so far is:
land_code <- names(A)[-1]
land_code
A$VALUE_t1 <- for(code in land_code{
cbind(A[1], A[land_code] - B[match(A$CODE_t2, B$CODE_t2), land_code])
}
If I use the loop I get an error, while if I take it away:
A$VALUE_t1 <- cbind(A[1], A[land_code] - B[match(A$CODE_t2, B$CODE_t2), land_code])
it works but I don't really get what I want to get... so far I was working on how to get a new column which would contain the new "add & subtract" values, but haven't succeeded yet. So I worked on how to get a new column which would at least match the land use types first, to then include the "add and subtract" formula.
Another problem is that, by using "match", I get a shorter A$VALUE_t1 table (13 rows instead of 44), while I would like to keep all the land use types in dataset A, because I will have then to match it with the table including VALUES_t0 (which I haven't shown here).
Sorry that I cannot do better than this at the moment... and I hope to have explained better what I have to do. I am extremely grateful for any help you can provide to me.
thanks a lot

Creating data continuously using rnorm until an outlier occurs in R

Sorry for the confusing title, but i wasn't sure how to title what i am trying to do. My objective is to create a dataset of 1000 obs each would be the length of the run. I have created a phase1 dataset, from which a set of control limits are produced. What i am trying to do now is create a phase2 dataset most likely using rnorm. what im trying to do is create a repeat loop that will continuously create values in the phase2 dataset until one of those values is outside of the control limits produced from the phase1 dataset. for example if i had 3.0 and -3.0 as control limits the phase2 dataset would create a bunch of observations until obs 398 when the value here happens to be 3.45, thus stopping the creation of data. my objective is then to record the number 398. Furthermore, I am then trying to loop the code back to the phase1 dataset/ control limits portion and create a new set of control limits and then run another phase2, until i have 1000 run lengths recorded. the code i have for the phase1/ control limits works fine and looks like this:
nphase1=50
nphase2=1000
varcount=1
meanshift= 0
sigmashift= 1
##### phase1 dataset/ control limits #####
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- apply(phase1, 2, mean)
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
I have previously created this code in SAS and it looks like this. might be a better reference for what i am trying to achieve then me trying to explain it.
%macro phase2_dataset (n=,varcount=, meanshift=, sigmashift=, nphase1=,simID=,);
%do z=1 %to &n;
%phase1_dataset (n=&nphase1, varcount=&varcount);
data phase2; set control_limits n=lastobs;
call streaminit(0);
do until (phase2_var1<Lower_SPC_limit_method1_var1 or
phase2_var1>Upper_SPC_limit_method1_var1);
phase2_var1 = rand("normal", &meanshift, &sigmashift);
output;
end;
run;
ods exclude all;
proc means data=phase2;
var phase2_var1;
ods output summary=x;
run;
ods select all;
data run_length; set x;
keep Phase2_var1_n;
run;
proc append base= QA.Phase2_dataset&simID data=Run_length force; run;
%end;
%mend;
Also been doing research about using a while loop in replace of the repeat loop.
Im new to R so Any ideas you are able to throw my way are greatly appreciated. Thanks!
Using a while loop indeed seems to be the way to go. Here's what I think you're looking for:
set.seed(10) #Making results reproducible
replicate(100, { #100 is easier to display here
phase1 <- matrix(rnorm(nphase1*varcount, 0, 1), nrow = nphase1, ncol=varcount)
mean_var <- colMeans(phase1) #Slightly better than apply
std_var <- apply(phase1, 2, sd)
df_var <- data.frame(mean_var, std_var)
Upper_SPC_Limit_Method1 <- with(df_var, mean_var + 3 * std_var)
Lower_SPC_Limit_Method1 <- with(df_var, mean_var - 3 * std_var)
df_control_limits<- data.frame(Upper_SPC_Limit_Method1, Lower_SPC_Limit_Method1)
#Phase 2
x <- 0
count <- 0
while(x > Lower_SPC_Limit_Method1 && x < Upper_SPC_Limit_Method1) {
x <- rnorm(1)
count <- count + 1
}
count
})
The result is:
[1] 225 91 97 118 304 275 550 58 115 6 218 63 176 100 308 844 90 2758
[19] 161 311 1462 717 2446 74 175 91 331 210 118 1517 420 32 39 201 350 89
[37] 64 385 212 4 72 730 151 7 1159 65 36 333 97 306 531 1502 26 18
[55] 67 329 75 532 64 427 39 352 283 483 19 9 2 1018 137 160 223 98
[73] 15 182 98 41 25 1136 405 474 1025 1331 159 70 84 129 233 2 41 66
[91] 1 23 8 325 10 455 363 351 108 3
If performance becomes a problem, perhaps it would be interesting to explore some improvements, like creating more numbers with rnorm() at a time and then counting how many are necessary to exceed the limits and repeat if necessary.

Custom sorting of a dataframe in R

I have a binomail dataset that looks like this:
df <- data.frame(replicate(4,sample(1:200,1000,rep=TRUE)))
addme <- data.frame(replicate(1,sample(0:1,1000,rep=TRUE)))
df <- cbind(df,addme)
df <-df[order(df$replicate.1..sample.0.1..1000..rep...TRUE..),]
The data is currently soreted in a way to show the instances belonging to 0 group then the ones belonging to the 1 group. Is there a way I can sort the data in a 0-1-0-1-0... fashion? I mean to show a row that belongs to the 0 group, the row after belonging to the 1 group then the zero group and so on...
All I can think about is complex functions. I hope there's a simple way around it.
Thank you,
Here's an attempt, which will add any extra 1's at the end:
First make some example data:
set.seed(2)
df <- data.frame(replicate(4,sample(1:200,10,rep=TRUE)),
addme=sample(0:1,10,rep=TRUE))
Then order:
with(df, df[unique(as.vector(rbind(which(addme==0),which(addme==1)))),])
# X1 X2 X3 X4 addme
#2 141 48 78 33 0
#1 37 111 133 3 1
#3 115 153 168 163 0
#5 189 82 70 103 1
#4 34 37 31 174 0
#6 189 171 98 126 1
#8 167 46 72 57 0
#7 26 196 30 169 1
#9 94 89 193 134 1
#10 110 15 27 31 1
#Warning message:
#In rbind(which(addme == 0), which(addme == 1)) :
# number of columns of result is not a multiple of vector length (arg 1)
Here's another way using dplyr, which would make it suitable for within-group ordering. It's also probably pretty quick. If there's unbalanced numbers of 0's and 1's, it will leave them at the end.
library(dplyr)
df %>%
arrange(addme) %>%
mutate(n0 = sum(addme == 0),
orderme = seq_along(addme) - (n0 * addme) + (0.5 * addme)) %>%
arrange(orderme) %>%
select(-n0, -orderme)

Resources