I would like a formula for open office calc to get the number of occurrences for two or more columns in a single row. But have no idea how to do it. I can just use COUNTIF for a single value, but it does not seem to work with multiple values. I would like the data to remain in it's own column.
eg
34, 64 = 2
77, 35 = 0
77, 34 = 1
.
a b c d
1 77 34 64
2 75 34 64
Move the original data to start in row 2 for convenience. Then in E1 and F1, enter what we want to find, which is 34 and 64.
Now enter the following formula in E2, which determines whether the values occur in the second row.
=IFNA(IF(MATCH(E$1;$A2:$C2;0)+1=MATCH(F$1;$A2:$C2;0);1;0);0)
Drag this formula down to E3 to handle the next row, and keep dragging if there are more rows of data.
Finally in E4, add the results from each row to get the total number of occurrences: =SUM(E2:E3).
Next, enter 77 and 35 in column H and I and then copy and paste the formulas. Do the same for the third pair as well.
Documentation: MATCH function
Related
My question is related to R.
I have code snippet related to 5 answer choice. When I run this answer choice every choice except one get error. The right one also did not match with the question.
My question is
A B C D E
1 7 4 23 68 15
2 12 53 14 10 20
3 39 88 98 50 84
4 18 38 33 47 72
5 31 6 51 38 27
6 20 15 68 99 50
This dataframe is given. To create this data frame I write the following code block.
A = c(7,12,39,18,31,20)
B = c(4,53,88,38,6,15)
C = c(23,14,98,33,51,68)
D = c(68,10,50,47,38,99)
E = c(15,20,84,72,27,50)
df_x = data.frame(A,B,C,D,E)
Question: Which of the following R code will sunset data frame df_x,returning the final three rows?
My answer choice is
df_x[nrow(df_x)-2:nrow(df_x)]
df_x[(nrow(df_x)-2):nrow(df_x)]
df_x[nrow(df-x)-2:,]
df_x[-3:]
df_x[(nrow(df_x)-2):nrow(df_x)
Among them only the 1st choice df_x[nrow(df_x)-2:nrow(df_x)] some output.
Output:
D C B A
1 68 23 4 7
2 10 14 53 12
3 50 98 88 39
4 47 33 38 18
5 38 51 6 31
6 99 68 15 20
I think this is not the correct one. All other choices give error. Can any one tell me which one is the correct choice? Or what is the actual query to answer the following question? I am new to R. So it is hard for me to find out the correct one.
df_x[(nrow(df_x)-2):nrow(df_x),]
Keep in mind, convention is df[rows, columns]. And you need to specify both arguments, which is why I put a comma after the row argument in the solution
Cheers,
Joe
The answers in those choices will produce errors because they are not creating the indexes properly.
In R, when you are subsetting database, you need to give the row numbers and the column numbers.
For example,df[row,col] will give you the data that is the given row and the given column. df[row,] will select all columns for the given row number.
If you don't put a comma (,) in the index, you are only selecting the columns. For e.gdf[1:2] is going to select the first and second columns
If you want to select multiple rows or multiple columns, you can put the numbers in as well e.g df[1:3,3:9]
When you use -, R removes the given row or column. So for example, df[-1,] removes the first row. df[,-3] removes the third column. df[-1:-5,] removes the first five rows.
Those answers all have errors in them because they don't have commas in the right places. If you want to select up to the last row or column in R, you need to give the last row or column number. You get this number by using nrow(df) or ncol(df). Using the : is the Python way of doing things.
The closest answer here is: df_x[(nrow(df_x)-2):nrow(df_x)] but you need to add a comma: df_x[(nrow(df_x)-2):nrow(df_x),]
The problem you are being expected to recognize (but have not) is operator precedence. The colon operator (for sequencing) has a higher precedence than the binary minus operator, so the expression: nrow(df_x)-2:nrow(df_x) gives you the vector difference possibly with recycling of the value of nrow(df_x) and the vector 2:nrow(df_x). So option number 2 which isolates nrow(df_x)-2 from the colon-operator with parentheses will give you the correct index. Adding parentheses to make terms obvious is good programming practice. See:
?Syntax
The other problem is that there is a missing comma after those expressions ... I think your course text should have given option 2 as
df_x[(nrow(df_x)-2):nrow(df_x),]
I am trying to merge a data.frame and a column from another data.frame, but have so far been unsuccessful.
My first data.frame [Frequencies] consists of 2 columns, containing 47 upper/ lower case alpha characters and their frequency in a bigger data set. For example purposes:
Character<-c("A","a","B","b")
Frequency<-(100,230,500,420)
The second data.frame [Sequences] is 93,000 rows in length and contains 2 columns, with the 47 same upper/ lower case alpha characters and a corresponding qualitative description. For example:
Character<-c("a","a","b","A")
Descriptor<-c("Fast","Fast","Slow","Stop")
I wish to add the descriptor column to the [Frequencies] data.frame, but not the 93,000 rows! Rather, what each "Character" represents. For example:
Character<-c("a")
Frequency<-c("230")
Descriptor<-c("Fast")
Following can also be done:
> merge(adf, bdf[!duplicated(bdf$Character),])
Character Frequency Descriptor
1 a 230 Fast
2 A 100 Fast
3 b 420 Stop
4 B 500 Slow
Why not:
df1$Descriptor <- df2$Descriptor[ match(df1$Character, df2$Character) ]
I have a column with 1000 rows. It has names of 10 countries. How can i count how many times each country is repeated?
The most specific solution is to use table.
table(my.column)
summary does different things depending on data type, but table always show the number of occurrances for every unique value. If you would code countries with ID numbers instead of character strings for instance, summary would show quartiles which is not what you want.
You can use summary(name_of_data_frame) function. Example:
fff<-c("d1","d1","d2")
f1<-data.frame(fff)
summary(f1)
The result:
fff
d1:2
d2:1
If your countries names are entered as factors you can use directly summary(my_data), otherwise summary(as.factor(my_data)). For instance:
my_data <- sample(LETTERS[1:10], 1000, replace=TRUE)
summary(as.factor(my_data))
A B C D E F G H I J
99 111 106 89 90 90 109 105 96 105
I have a matrix with 2 columns as described below:
TIME PRICE
10 45
11 89
13 89
15 12
16 09
17 34
19 89
20 90
23 21
26 09
in the above matrix, I need to iterate through the TIME column adding 5 seconds and accessing the corresponding PRICE that matches the row.
For ex: I start with 10. i need to access 15 (10+5), I would've been able to get to 15 easily if the numbers in the column were continuous data, but its not. so at 15 seconds time, i need to get hold of the corresponding price. and this goes on till the end of the entire data set. my next element that needs to be accessed is 20, and its corresponding price. now i again add 5 seconds and it hence goes on. incase the element is not present, the one immediately greater than it must be accessed to obtain the corresponding price.
If the rows you want to extract are m[1,1]+5, m[1,1]+10, m[1,1]+15 etc then:
m <- cbind(TIME=c(10,11,13,15,16,17,19,20,23,26),
PRICE=c(45,89,89,12,9,34,89,90,21,9))
r <- range(m[,1]) # 10,26
r <- seq(r[1]+5, r[2], 5) # 15,20,25
r <- findInterval(r-1, m[,1])+1 # 4,8,10 (values 15,20,26)
m[r,2] # 12,90,9
findInterval finds the index for values that are equal or less than the given value, so I give it a smaller value and then add 1 to the index.
Breaking the question apart into sub-pieces...
Getting the row with value 15:
Call your Matrix, say, DATA, and
[1] extract the row of interest:
DATA[DATA[,1] == 15, ]
Then snag the second column.
[2] Adding 5 to the first column ( I'm pretty sure you can just do this ):
DATA[,1] = DATA[,1] + 5
This should get you started. The rest seems to just be some funky iteration, incrementing by 5, using [1] to get the price you want each time, swapping 15 for some variable.
I leave the rest of the solution as an exercise to the reader. For tips on looping in R, and more, see the below tutorial ( I don't expect it to be taken down any time soon, but may want to keep a local copy. Good luck :) )
http://www.stat.berkeley.edu/users/vigre/undergrad/reports/VIGRERintro.pdf
As #Tommy commented above, it is not clear what TIME you exactly want to get. For me, it seems like you want to get the PRICE for the sequence 10,15,20,25,... If true, you could do that easily suing the mod (%%) function:
TIME <- c(10,11,13,15,16,17,19,20,23,26) # Your times
PRICE <- c(45,89,89,12,9,34,89,90,21,9) # your prices
PRICE[TIME %% 5 == 0] # Get prices from times in sequence 10, 15, 20, ...
I printed out the summary of a column variables as such:
summary(document$subject)
A,B,C,D,E,F,.. are the subjects belonging to a column of a data.frame where A,B,C,...appear many times in the column, and the summary above shows the number of times (frequency) these subjects have appeared in the file. Also, the term "OTHER" refers to those subjects which have appeared only once in the file, I also need to assign "1" to these subjects.
There are so many different subjects that it's difficult to list out all of them if we use command "c".
I want to build up a new column (or data.frame) and then assign these corresponding numbers (scores) to the subjects. Ideally, it will become this in the file:
A 198
B 113
C 96
D 69
A 198
E 65
F 62
A 198
C 113
BZ 21
BC 1
CJ 1
...
I wonder what command I should use to take the scores/values from the summary table and then build a new column to assign these values to the corresponding subjects in the file.
Plus, since it's a summary table printed by R, I don't know how to build it into a table in a file, or take out the values and subject names from the table. I also wonder how I could find out the subject names which appeared only once in the file, so that the summary table added them up into "OTHER".
Your question is hard to interpret without a reproducible example. Please take a look this threat for tips on how to do that:
How to make a great R reproducible example?
Having said that, here is how I interpret your question. You have two data frames, one with a score per subject and another with the subjects multiple times in a column:
Sum <- data.frame(subject=c("A","B"),score=c(1,2))
foo <- data.frame(subject=c("A","B","A"))
> Sum
subject score
1 A 1
2 B 2
> foo
subject
1 A
2 B
3 A
You can then use match() to match the subjects in one data frame to the other and create the new variable in the second data frame:
foo$score <- Sum$score[match(foo$subject, Sum$subject)]
> foo
subject score
1 A 1
2 B 2
3 A 1