Fail to identify correct alternative from objective question - r

My question is related to R.
I have code snippet related to 5 answer choice. When I run this answer choice every choice except one get error. The right one also did not match with the question.
My question is
A B C D E
1 7 4 23 68 15
2 12 53 14 10 20
3 39 88 98 50 84
4 18 38 33 47 72
5 31 6 51 38 27
6 20 15 68 99 50
This dataframe is given. To create this data frame I write the following code block.
A = c(7,12,39,18,31,20)
B = c(4,53,88,38,6,15)
C = c(23,14,98,33,51,68)
D = c(68,10,50,47,38,99)
E = c(15,20,84,72,27,50)
df_x = data.frame(A,B,C,D,E)
Question: Which of the following R code will sunset data frame df_x,returning the final three rows?
My answer choice is
df_x[nrow(df_x)-2:nrow(df_x)]
df_x[(nrow(df_x)-2):nrow(df_x)]
df_x[nrow(df-x)-2:,]
df_x[-3:]
df_x[(nrow(df_x)-2):nrow(df_x)
Among them only the 1st choice df_x[nrow(df_x)-2:nrow(df_x)] some output.
Output:
D C B A
1 68 23 4 7
2 10 14 53 12
3 50 98 88 39
4 47 33 38 18
5 38 51 6 31
6 99 68 15 20
I think this is not the correct one. All other choices give error. Can any one tell me which one is the correct choice? Or what is the actual query to answer the following question? I am new to R. So it is hard for me to find out the correct one.

df_x[(nrow(df_x)-2):nrow(df_x),]
Keep in mind, convention is df[rows, columns]. And you need to specify both arguments, which is why I put a comma after the row argument in the solution
Cheers,
Joe

The answers in those choices will produce errors because they are not creating the indexes properly.
In R, when you are subsetting database, you need to give the row numbers and the column numbers.
For example,df[row,col] will give you the data that is the given row and the given column. df[row,] will select all columns for the given row number.
If you don't put a comma (,) in the index, you are only selecting the columns. For e.gdf[1:2] is going to select the first and second columns
If you want to select multiple rows or multiple columns, you can put the numbers in as well e.g df[1:3,3:9]
When you use -, R removes the given row or column. So for example, df[-1,] removes the first row. df[,-3] removes the third column. df[-1:-5,] removes the first five rows.
Those answers all have errors in them because they don't have commas in the right places. If you want to select up to the last row or column in R, you need to give the last row or column number. You get this number by using nrow(df) or ncol(df). Using the : is the Python way of doing things.
The closest answer here is: df_x[(nrow(df_x)-2):nrow(df_x)] but you need to add a comma: df_x[(nrow(df_x)-2):nrow(df_x),]

The problem you are being expected to recognize (but have not) is operator precedence. The colon operator (for sequencing) has a higher precedence than the binary minus operator, so the expression: nrow(df_x)-2:nrow(df_x) gives you the vector difference possibly with recycling of the value of nrow(df_x) and the vector 2:nrow(df_x). So option number 2 which isolates nrow(df_x)-2 from the colon-operator with parentheses will give you the correct index. Adding parentheses to make terms obvious is good programming practice. See:
?Syntax
The other problem is that there is a missing comma after those expressions ... I think your course text should have given option 2 as
df_x[(nrow(df_x)-2):nrow(df_x),]

Related

openoffice calc occurrences of combination

I would like a formula for open office calc to get the number of occurrences for two or more columns in a single row. But have no idea how to do it. I can just use COUNTIF for a single value, but it does not seem to work with multiple values. I would like the data to remain in it's own column.
eg
34, 64 = 2
77, 35 = 0
77, 34 = 1
.
a b c d
1 77 34 64
2 75 34 64
Move the original data to start in row 2 for convenience. Then in E1 and F1, enter what we want to find, which is 34 and 64.
Now enter the following formula in E2, which determines whether the values occur in the second row.
=IFNA(IF(MATCH(E$1;$A2:$C2;0)+1=MATCH(F$1;$A2:$C2;0);1;0);0)
Drag this formula down to E3 to handle the next row, and keep dragging if there are more rows of data.
Finally in E4, add the results from each row to get the total number of occurrences: =SUM(E2:E3).
Next, enter 77 and 35 in column H and I and then copy and paste the formulas. Do the same for the third pair as well.
Documentation: MATCH function

R - Update value of a column based on condition

I need to update all the values of a column, using as reference another df.
The two dataframes have equal structures:
cod name dom_by
1 A 3
2 B 4
3 C 1
4 D 2
I tried to use the following line, but apparently it did not work:
df2$name[df2$dom_by==df1$cod] <- df1$name[df2$dom_by==df1$cod]
It keeps saying that replacement has 92 rows, data has 2.
(df1 has 92 rows and df2 has 2).
Although it seems like a simple problem, I still can not solve it, even after some searches.

How to turn comma seperated rows of a table into multiple columns

I have a table similar to the following, where there is only one column and the cells contain data that are seperated with a comma.
1 height,weight
2 180,85
3 165,62
4 170,73
I want to split them into multiple columns by comma in order to get the following result
height weight
1 180 85
2 165 62
3 170 73
However, str_split_fixed(x$type, ",", 2) command that has been proposed on a similar topic doesn't seem to work for my case.
Thank you so much in advance for your answers.

Extract 100 sections from a vector

I have a vector of length 1000. It contains (numeric) survey answers of 100 participants, thus 10 answers per participant. I would like to drop the first three values for every participant to create a new vector of length 700 (including only the answers to questions 4-10).
I only know how to extract every n-th value of the vector, but cannot figure how to solve the above problem.
vector <- seq(1,1000,1)
Expected output:
4 5 6 7 8 9 10 14 15 16 17 18 19 20 24 ...
Using a matrix to first structure and then flatten is one method. Another somewhat similar method is to use what I am calling a "logical pattern index":
head( # just showing the first couple of "segments"
vector[ c( rep(FALSE, 3), rep(TRUE, 10-3) ) ],
15)
[1] 4 5 6 7 8 9 10 14 15 16 17 18 19 20 24
This method can also be use inside the two argument version of [ to select rows ore columns using a logical pattern index. This works because of R's recycling of logical indices.
Thanks for providing example data, based on which this thread is reproducible. Here is one solution
c(matrix(vector, 10)[4:10, ])
We first convert the vector to a matrix with 10 rows, so that each column attributes to a participant. Then use row subsetting to remove first three rows. Finally the matrix is flattened to a vector again.

Understanding the syntax for Column vs Row indexing in R

I'm a bit confused on the filtering scheme on an R data frame.
For example, let's say we have the following data frame titled dframe:
> str(dframe)
'data.frame': 143 obs. of 3 variables:
$ Year : int 1999 2005 2007 2008 2009 2010 2005 2006 2007 2008 ...
$ Name : Factor w/ 18 levels "AADAM","AADEN",..: 1 1 2 2 2 2 3 3 3 3 ...
$ Frequency: int 5 6 10 34 38 12 10 6 10 5 ...
Now if I want to filter dframe where the values of Name is of "AADAM", the proper filter is:
dframe[dframe$Name=="AADAM",]
The part where I'm confused is why the comma doesn't come first. Why isn't it this: dframe[,dframe$Name=="AARUSH"]
UPDATE: You clarified your question is really "Please give examples of what sort of logical expressions are valid for filtering columns?"
I agree with you the syntax appears weird initially, but it has the following logic.
The bottom line is that column-filter expressions are typically less rich and expressive than row-filtering expressions, and in particular you can't chain logical indexing the way you do with rows.
Best way is to think of indexing expressions as the general form:
dframe[<row-index-expression>,<col-index-expression>]
where either index-expression is optional, so you can just do one and we (crucially!) need the comma to disambiguate whether it's row- or column-indexing:
dframe[<row-index-expression>,] # such as dframe[dframe$Name=="ADAM",]
dframe[,<col-index-expression>]
Before we look at examples of col-index-expression and what's valid (and invalid) to include in one, let's review and discuss how R does indexing - I had the same confusion when I started with it.
In this example, you have three columns. You can refer to them by their string names 'Year','Name','Frequency'. You can also refer to them by column indices 1,2,3 where the numbers 1,2,3 correspond to the entries colnames(dframe). R does indexing using the '[' operator, also the '[[' operator. Here are some valid examples of ways to index column-indexing:
dframe[,2] # column 2 / Name
dframe[,'Name'] # column 2 / Name
dframe[,c('Name','Frequency')] # string vector - very common
dframe[,c(2,3)] # integer vector - also very common
dframe[,c(F,T,T)] # logical vector - very rarely seen, and a pain in the butt to compute
Now, if you choose to use a logical expression for the column-index, it must be a valid expression without using column names - inside a column it doesn't know their own names.
Suppose you wanted to dynamically filter "give me only the factor columns from dframe".
Something like:
unlist(apply(dframe[1,1:3], 2, is.factor), use.names=F) # except I can't seem to remove the colnames
For more help and examples on indexing look at the '[' operator help-page:
Type ?'['
dframe[,dframe$Name=="ADAM"] is invalid attempt at column-indexing because the columns know nothing about Name=="ADAM"
Addendum: code to generate example dataframe (because you didn't dump us a dput output)
set.seed(123)
N = 10
randomName <- function() { cat(sample(letters, size=runif(1)*6+2, replace=T), sep='') }
dframe = data.frame(Year=round(runif(N,1980,2014)),
Name = as.factor(replicate(N, randomName())),
Frequency=round(runif(N, 2,40)))
You have to remember that when you're sub-setting, the part before the comma is specifying which rows you want, and the part after the comma is specifying which columns you want. ie:
dframe[rowsyouwant, columnsyouwant]
You're filtering based on columns, but you want all of the columns in your result, so the space after the comma is blank. You want some sub-set of rows, so your filtering specification goes before the comma, where the rows you want are specified.
As others have indicated, requesting a certain subset of a data frame requires the syntax [rows, columns]. Since dframe[has 143 rows, has 3 columns], any request for some part of dframe should be of the form
dframe[which of the 143 rows do I want?, which of the 3 columns do I want?].
Because dframe$Name is a vector of length 143, the comparison dframe$Name=='AADAM' is a vector of T/F values that also has length 143. So,
dframe[dframe$Name=='AADAM',]
is like saying
dframe[of the 143 rows I want these ones, I want all columns]
whereas
dframe[,dframe$Name=='AADAM']
generates an error because it's like saying
dframe[I want all rows, of the 143 columns I want these ones]
On a side note, you may want to look into the subset() function if you're not already familiar with it. You could get the same result by writing subset(dframe, Name=='AADAM')
As others have said, the structure within brackets is row, then column.
One way I think of the syntax of selecting data from a data.frame using:
dframe[dframe$Name=="AADAM",]
is to think of a noun, then a verb where:
dframe[] is the noun. It is the object on which you want to perform an action
and
[dframe$Name=="AADAM",] is the verb. It is the action you want to perform.
I have a silly way of expressing this to myself, but it keeps things straight in my mind:
Hey, you! dframe! I am going to... ...in this case, select all of your rows in which Name is equal to AADAM!
By keeping the column portion of [dframe$Name=="AADAM",] blank you are saying you want to keep all columns.
Sometimes it can be a little difficult to remember that you have to write dframe both inside and outside the brackets.
As for exactly why row comes first and column comes second, I do not know, but row had to be either first or second.
dframe <- read.table(text = '
Year Name Frequency
1 ADAM 4
3 BOB 10
7 SALLY 5
2 ADAM 12
4 JIM 3
12 ADAM 7
', header = TRUE)
dframe[,dframe$Name=="ADAM"]
# Error in `[.data.frame`(dframe, , dframe$Name == "ADAM") :
# undefined columns selected
dframe[dframe$Name=="ADAM",]
# Year Name Frequency
# 1 1 ADAM 4
# 4 2 ADAM 12
# 6 12 ADAM 7
dframe[,'Name']
# [1] ADAM BOB SALLY ADAM JIM ADAM
# Levels: ADAM BOB JIM SALLY
dframe[dframe$Name=="ADAM",'Name']
# [1] ADAM ADAM ADAM
# Levels: ADAM BOB JIM SALLY

Resources