Need the excel formula for following function - formula

I need help to do the following function in a MS Excel sheet. The sheet example is as follows
A B C D E
1 TimeStamp Name Amount UsedBy Description
-----------------------------------------------------------
2 Date1 Me1 200 He1,She1 desc1
3 Date2 Me1 100 Me1,He1 desc2
4 Date3 She1 50 He1,She1,Me1 desc3
5 Date4 He1 70 She1,He1 desc4
6 Date5 She1 200 She1,He1,Me1 desc5
7 Date6 Me1 22 He1 desc6
I want some function which can do the following sequence of job in a single customized MS-Excel formula
Sum the cells of column "Amount" where "UsedBy" column cells contain "He1" as a single entity. Lets say result is X
Sum of the cells of column "Amount" where "UsedBy" column cells contain two entities and "He1" must be one entity. After this sum devide it by 2. Lets say result is Y.
Sum of the cells of column "Amount" where "UsedBy" column cells contain three entities and "He1" must be one entity. After this sum devide it by 3. Lets say result is Z
Total the result in steps 1,2 and 3. That means the sum of X+Y+Z
Please let me know if I am not clear in my question....

Try the SUMIF function.

Build some intermediate results like the number of values in UsedBy, or whether UsedBy contains He1 in separate columns, then use SUMIF().

You can't do this in a single formula unless you write it yourself in VBA. Since you haven't tagged the question as VBA I'll assume you'd rather use helper columns.
You'll need 3 helper columns, 1 for each of your criteria.
For your first let's say you put it in column F
=if(and(isnumber(search("He1",D2)),len(d2)=len(substitute(d2,",",""))),1,0)
What this does is ensures that D2 contains 'He1' and makes sure there are no commas.
For your second put it in column G
=if(and(isnumber(search("He1",D2)),len(d2)-1=len(substitute(d2,",",""))),1,0)
What this does is ensures that D2 contains 'He1' and makes sure there is 1 comma.
For your third put it in column H
=if(and(isnumber(search("He1",D2)),len(d2)-2=len(substitute(d2,",",""))),1,0)
What this does is ensures that D2 contains 'He1' and makes sure there are 2 commas.
Once you have your helper criteria columns you can now do a sumif for each critera.
For X you'll do =sumif(f2:f7,1,c2:c7)
For Y you'll do =sumif(g2:g7,1,c2:c7)/2
For Z you'll do =sumif(h2:h7,1,c2:c7)/3

Related

Count multiple Data in a string cell

I would count with the func table() in R how many time a value occures in a cell. But, some cell contains more value divided by colon. I report an example below:
example <- data.frame(c("A","B","A:::B"))
table(example)
the result is:
A A:::B B
1 1 1
but i want something like this
A B
2 2
I try to duplicate the rows with this characteristics, but the dataset is already too large and duplicate rows makes dataset impossible to use. How can i do?
thanks
We can split the column values by ::: and get the table
table(unlist(strsplit(example[[1]], "\\:+")))
# A B
# 2 2

Using list of row numbers as criteria to populate field

I have a list of row numbers that represent row containing outliers in a data set. I would like to add an "outlier" column to the original data set that flags the rows containing outliers, but I can't figure out how to use row numbers as criteria in r.
Example:
I have a dataframe like this:
id <-c("a","b","c","d")
values <-c(10,11,22,33)
df<-data.frame(names,values)
id values
1 a 10
2 b 11
3 c 22
4 d 33
And a list like this containing row number (more correctly "row names"):
outliers <-c(2,4)
I'd like to find a way to use the list of row numbers as criteria in something like:
df$outlier_test<-ifelse( if row number is on my list, "outlier","")
to produce something like this:
id values outlier_test
1 a 10
2 b 11 outlier
3 c 22
4 d 33 outlier
Spent quite a while trying to puzzle this out and had inspiration as soon as I posted the question. For anyone else who comes here with this question:
First:
df$rownumber<- row.names(df)
then:
df$outlier_test<- ifelse(df$rownumber %in% outliers,"outlier","")

Match each row in a table to a row in another table based on the difference between row timestamps

I have two unevenly-spaced time series that each measure separate attributes of the same system. The two series's data points are not sampled at the same times, and the series are not the same length. I would like to match each row from series A to the row of B that is closest to it in time. What I have in mind is to add a column to A that contains indexes to the closest row in B. Both series have a time column measured in Unix time (eg. 1459719755).
for example, given two datasets
a time
2 1459719755
4 1459719772
3 1459719773
b time
45 1459719756
2 1459719763
13 1459719766
22 1459719774
The first dataset should be updated to
a time index
2 1459719755 1
4 1459719772 4
3 1459719773 4
since B[1,]$time has the closest value to A[1,]$time, B[4,]$time has the closest value to A[2,]$time and A[3,]$time.
Is there any convenient way to do this?
Try something like this:
(1+ecdf(bdat$time)(adat$time)*nrow(bdat))
[1] 1 4 4
Why should this work? The ecdf function returns another function that has a value from 0 to 1. It returns the "position" in the "probability range" [0,1] of a new value in a distribution of values defined by the first argument to ecdf. The expression is really just rescaling that function's result to the range [1, nrow(bdat)]. (I think it's flipping elegant.)
Another approach would be to use approxfun on the sorted values of bdat$time which would then let get you interpolated values. These might need to be rounded. Using them as indices would instead truncate to integer.
apf <- approxfun( x=sort(bdat$time), y=seq(length( bdat$time)) ,rule=2)
apf( adat$time)
#[1] 1.000 3.750 3.875
round( apf( adat$time))
#[1] 1 4 4
In both case you are predicting a sorted value from its "order statistic". In the second case you should check that ties are handled in the manner you desire.

List all possible occurrences within a column?

I am trying to merge a data.frame and a column from another data.frame, but have so far been unsuccessful.
My first data.frame [Frequencies] consists of 2 columns, containing 47 upper/ lower case alpha characters and their frequency in a bigger data set. For example purposes:
Character<-c("A","a","B","b")
Frequency<-(100,230,500,420)
The second data.frame [Sequences] is 93,000 rows in length and contains 2 columns, with the 47 same upper/ lower case alpha characters and a corresponding qualitative description. For example:
Character<-c("a","a","b","A")
Descriptor<-c("Fast","Fast","Slow","Stop")
I wish to add the descriptor column to the [Frequencies] data.frame, but not the 93,000 rows! Rather, what each "Character" represents. For example:
Character<-c("a")
Frequency<-c("230")
Descriptor<-c("Fast")
Following can also be done:
> merge(adf, bdf[!duplicated(bdf$Character),])
Character Frequency Descriptor
1 a 230 Fast
2 A 100 Fast
3 b 420 Stop
4 B 500 Slow
Why not:
df1$Descriptor <- df2$Descriptor[ match(df1$Character, df2$Character) ]

How to build a new column (/data.frame) from a table, and assign corresponding values to the rows

I printed out the summary of a column variables as such:
summary(document$subject)
A,B,C,D,E,F,.. are the subjects belonging to a column of a data.frame where A,B,C,...appear many times in the column, and the summary above shows the number of times (frequency) these subjects have appeared in the file. Also, the term "OTHER" refers to those subjects which have appeared only once in the file, I also need to assign "1" to these subjects.
There are so many different subjects that it's difficult to list out all of them if we use command "c".
I want to build up a new column (or data.frame) and then assign these corresponding numbers (scores) to the subjects. Ideally, it will become this in the file:
A 198
B 113
C 96
D 69
A 198
E 65
F 62
A 198
C 113
BZ 21
BC 1
CJ 1
...
I wonder what command I should use to take the scores/values from the summary table and then build a new column to assign these values to the corresponding subjects in the file.
Plus, since it's a summary table printed by R, I don't know how to build it into a table in a file, or take out the values and subject names from the table. I also wonder how I could find out the subject names which appeared only once in the file, so that the summary table added them up into "OTHER".
Your question is hard to interpret without a reproducible example. Please take a look this threat for tips on how to do that:
How to make a great R reproducible example?
Having said that, here is how I interpret your question. You have two data frames, one with a score per subject and another with the subjects multiple times in a column:
Sum <- data.frame(subject=c("A","B"),score=c(1,2))
foo <- data.frame(subject=c("A","B","A"))
> Sum
subject score
1 A 1
2 B 2
> foo
subject
1 A
2 B
3 A
You can then use match() to match the subjects in one data frame to the other and create the new variable in the second data frame:
foo$score <- Sum$score[match(foo$subject, Sum$subject)]
> foo
subject score
1 A 1
2 B 2
3 A 1

Resources