How do I use loops for automating work in R? - r

I have a file with data on the delivery of products to the store.I need to calculate the total number of products in the store. I want to use the knowledge of cycles to calculate the total quantity of the product in the store, but my cycle only counts the total quantity of the last product. Why?
Here is the delivery data:
"Day" "Cott.cheese, pcs." "Kefir, pcs." "Sour cream, pcs."
1 104 117 119
2 94 114 114
3 105 107 117
4 99 112 120
5 86 104 111
6 88 110 126
7 95 106 129
I put this table in the in1 variable
Here is code:
s<-0
for (p in (2:ncol(in1))){
s<-sum(in1[,p]) }
s

Not sure I understand correctly your question but if you only want to add all values of your data.frame except for the first column (Day), you just need to do this:
sum(in1[,-1])

You are rewriting the s variable each iteration, that's why it only shows the result for the last column. Try
s<-c()
for (p in 2:ncol(in1)) {
s<-c(s,sum(in1[,p]))
}
alternatively
colSums(in1[,-1])

Related

Indexing through 'names' in a list and performing actions on contained values in R

I have a data set of counts from standard solutions passed through an instrument that analyses chemical concentrations (an ICPMS for those familiar). The data is over a range of different standards and for each standard I have four repeat measurements that I want to calculate the mean and variance of.
I'm importing the data from an excel spreadsheet and then, following some housekeeping such as getting dates and times in the right format, I split the the dataset up into a list identified by the name of the standard solution using Count11.sp<-split(Count11.raw, Count11.raw$Type). Count11.raw$Type then becomes the list element name and I have the four count results for each chemical element in that list element.
So far so good.
I find I can yield an average (mean, median etc) easily enough by identifying the list element specifically i.e. mean(Count11.sp$'Ca40') , or sapply(Count11$'Ca40', median), but what I'm not able to do is automate that in a loop so that I can calculate the means for each standard and drop that into a numerical matrix for further manipulation. I can extract the list element names with names() and I can even use a loop to make a vector of all the names and reference the specific list element using these in a for loop.
For instance Count11.sp[names(Count11.sp[i])]will extract the full list element no problem:
$`Post Ca45t`
Type Run Date 7Li 9Be 24Mg 43Ca 52Cr 55Mn 59Co 60Ni
77 Post Ca45t 1 2011-02-08 00:13:08 114 26101 4191 453525 2632 520 714 2270
78 Post Ca45t 2 2011-02-08 00:13:24 114 26045 4179 454299 2822 524 704 2444
79 Post Ca45t 3 2011-02-08 00:13:41 96 26372 3961 456293 2898 520 762 2244
80 Post Ca45t 4 2011-02-08 00:13:58 112 26244 3799 454702 2630 510 792 2356
65Cu 66Zn 85Rb 86Sr 111Cd 115In 118Sn 137Ba 140Ce 141Pr 157Gd 185Re 208Pb
77 244 1036 56 3081 44 520625 78 166 724 10 0 388998 613
78 250 982 70 3103 46 526154 76 174 744 16 4 396496 644
79 246 1014 36 3183 56 524195 60 198 744 2 0 396024 612
80 270 932 60 3137 44 523366 70 180 824 2 4 390436 632
238U
77 24
78 20
79 14
80 6
but sapply(Count11.sp[names(count11.sp[i])produces an error message: Error in median.default(X[[i]], ...) : need numeric data
while sapply(Input$Post Ca45t, median) <'Post Ca45t' being name Count11.sp[i] i=4> does exactly what I want and produces the median value (I can clean that vector up later for medians that don't make sense) e.g.
Type Run Date 7Li 9Be 24Mg
NA 2.5 1297109612.5 113.0 26172.5 4070.0
43Ca 52Cr 55Mn 59Co 60Ni 65Cu
454500.5 2727.0 520.0 738.0 2313.0 248.0
66Zn 85Rb 86Sr 111Cd 115In 118Sn
998.0 58.0 3120.0 45.0 523780.5 73.0
137Ba 140Ce 141Pr 157Gd 185Re 208Pb
177.0 744.0 6.0 2.0 393230.0 622.5
238U
17.0
Can anyone give me any insight into how I can automate (i.e. loop through) these names to produce one median vector per list element? I'm sure there's just some simple disconnect in my logic here that may be easily solved.
Update: I've solved the problem. The way to do so is to use tapply on the original dataset with out the need to split it. tapply allows functions to be applied to data based on a user defined grouping criteria. In my case I could group according to the Count11.raw$Type and then take the mean of the data subset. tapply(Count11.raw$Type, Count11.raw[,3:ncol(Count11.raw)], mean), job done.

How to sum column based on value in another column in two dataframes?

I am trying to create a limit order book and in one of the functions I want to return a list that sums the column 'size' for the ask dataframe and the bid dataframe in the limit order book.
The output should be...
$ask
oid price size
8 a 105 100
7 o 104 292
6 r 102 194
5 k 99 71
4 q 98 166
3 m 98 88
2 j 97 132
1 n 96 375
$bid
oid price size
1 b 95 100
2 l 95 29
3 p 94 87
4 s 91 102
Total volume: 318 1418
Where the input is...
oid,side,price,size
a,S,105,100
b,B,95,100
I have a function book.total_volumes <- function(book, path) { ... } that should return total volumes.
I tried to use aggregate but struggled with the fact that it is both ask and bid in the limit order book.
I appreciate any help, I am clearly a complete beginner. Only hear to learn :)
If there is anything more I can add to this question so is more clear feel free to leave a comment!

merge two data frames in R based on email address

I have a data file that has my subject responses listed by their emails and I have another file with each subject email next to his/her subject ID. How do replace all the emails in the main data file with their subject IDs?
One of the things that's great about R is the ease with which one can create a minimal, complete and verifiable example. For this question, it's a simple matter of generating some example data, reading it into R and working out a potential solution. We'll create a list of student email addresses, IDs, and a separate data set containing exam scores.
nameData <- "email ID
Alicia#gmail.com 1
Jane#aol.com 2
Thomas#msn.com 3
Henry#yale.edu 4
LaShawn#uga.edu 5
"
examData <- "email exam1 exam2 exam3
Alicia#gmail.com 98 77 87
Jane#aol.com 99 88 93
Thomas#msn.com 73 62 73
Henry#yale.edu 100 98 99
LaShawn#uga.edu 84 98 92"
names <- read.table(text=nameData,header=TRUE,stringsAsFactors=FALSE)
exams <- read.table(text=examData,header=TRUE,stringsAsFactors=FALSE)
# merge data and drop email, which is first column
mergedData <- merge(names,exams)[,-1]
mergedData[order(mergedData$ID),]
...and the output, sorted by ID:
> mergedData[order(mergedData$ID),]
ID exam1 exam2 exam3
1 1 98 77 87
3 2 99 88 93
5 3 73 62 73
2 4 100 98 99
4 5 84 98 92
>

Given a list of rectangle coordinates, how can we find all possible combinations of polygons from these coordinates?

Given the following list of region coordinates:
x y Width Height
1 1 65 62
1 59 66 87
1 139 78 114
1 218 100 122
1 311 126 84
1 366 99 67
1 402 102 99
7 110 145 99
I wish to identify all possible rectangle combinations that can be formed by combining any two or more of the above rectangles. For instance, one rectangle could be
1 1 66 146 by combining 1 1 65 62 and 1 59 66 87
What would be the most efficient way to find all possible combinations of rectangles using this list?
Okay. I'm sorry for not being specific about the problem.
I have been working on an algorithm for object detection that identifies different windows across the image that might have the object. But sometimes, the object gets divided into several windows. So, when looking for objects, I want to use all the windows that I identified as well as try their different combinations for object detection.
So far I have tried using 2 loops and going across all the windows one by one
: If the x coordinate of the second loop window lies in the first loop window, then I merge those two by taking the left-most, top-most, right-most and bottom-most coordinates.
However, this has been taking a lot of time and there are a lot of duplicates present in the final output. I would like to see if there is a more efficient and easier way to do this.
I hope this information helps.

data.matrix() when character involved

In order to calculate the highest contribution of a row per ID I have a beautiful script which works when the IDs are a numeric. Today however I found out that it is also possible that IDs can have characters (for instance ABC10101). For the function to work, the dataset is converted to a matrix. However data.matrix(df) does not support characters. Can the code be altered in order for the function to work with all kinds of IDs (character, numeric, etc.)? Currently I wrote a quick workaround which converts IDs to numeric when ID=character, but that will slow the process down for large datasets.
Example with code (function: extract the first entry with the highest contribution, so if 2 entries have the same contribution it selects the first):
Note: in this example ID is interpreted as a factor and data.matrix() converts it to a numeric value. In the code below the type of the ID column should be character and the output should be as shown at the bottom. Order IDs must remain the same.
tc <- textConnection('
ID contribution uniqID
ABCUD022221 40 101
ABCUD022221 40 102
ABCUD022222 20 103
ABCUD022222 10 104
ABCUD022222 90 105
ABCUD022223 75 106
ABCUD022223 15 107
ABCUD022223 10 108 ')
df <- read.table(tc,header=TRUE)
#Function that needs to be altered
uniqueMaxContr <- function(m, ID = 1, contribution = 2) {
t(
vapply(
split(1:nrow(m), m[,ID]),
function(i, x, contribution) x[i, , drop=FALSE]
[which.max(x[i,contribution]),], m[1,], x=m, contribution=contribution
)
)
}
df<-data.matrix(df) #only works when ID is numeric
highestdf<-uniqueMaxContr(df)
highestdf<-as.data.frame(highestdf)
In this case the outcome should be:
ID contribution uniqID
ABCUD022221 40 101
ABCUD022222 90 105
ABCUD022223 75 106
Others might be able to make it more concise, but this is my attempt at a data.table solution:
tc <- textConnection('
ID contribution uniqID
ABCUD022221 40 101
ABCUD022221 40 102
ABCUD022222 20 103
ABCUD022222 10 104
ABCUD022222 90 105
ABCUD022223 75 106
ABCUD022223 15 107
ABCUD022223 10 108 ')
df <- read.table(tc,header=TRUE)
library(data.table)
dt <- as.data.table(df)
setkey(dt,uniqID)
dt2 <- dt[,list(contribution=max(contribution)),by=ID]
setkeyv(dt2,c("ID","contribution"))
setkeyv(dt,c("ID","contribution"))
dt[dt2,mult="first"]
## ID contribution uniqID
## [1,] ABCUD022221 40 101
## [2,] ABCUD022222 90 105
## [3,] ABCUD022223 75 106
EDIT -- more concise solution
You can use .SD which is the subset of the data.table for the grouping, and then use which.max to extract a single row.
in one line
dt[,.SD[which.max(contribution)],by=ID]
## ID contribution uniqID
## [1,] ABCUD022221 40 101
## [2,] ABCUD022222 90 105
## [3,] ABCUD022223 75 106

Resources