which() function in R - r

I have a data frame(A) of size (92047x2) and a list(B) of size (1829). I want to create a new data frame with all rows of A whose first column value is present in B.
How to use which()? Or any other good way to approach this?
All the values are in form of character. (Eg. "Vc2345")

You can do it like that:
dfA=data.frame(C1=sample(1:92047), C2=sample(1:92047))
listB=list(sample(1:1829))
dfAinB=dfA[which(dfA$C1 %in% unlist(listB)),]
str(dfAinB)

Related

improving specific code efficiency - *base R* alternative to for() loop solution

Looking for a vectorized base R solution for my own edification. I'm assigning a value to a column in a data frame based on a value in another column in the data frame.
My solution creates a named vector of possible codes, looks up the code in the original column, subsets the named list by the value found, and assigns the resulting name to the new column. I'm sure there's a way to do exactly this using the named vector I created that doesn't need a for loop; is it some version of apply?
dplyr is great and useful and I'm not looking for a solution that uses it.
# reference vector for assigning more readable text to this table
tempAssessmentCodes <- setNames(c(600,301,302,601,303,304,602,305,306,603,307,308,604,309,310,605,311,312,606,699),
c("base","3m","6m","6m","9m","12m","12m","15m","18m","18m","21m","24m","24m","27m","30m","30m",
"33m","36m","36m","disch"))
for(i in 1:nrow(rawDisp)){
rawDisp$assessText[i] <- names(tempAssessmentCodes)[tempAssessmentCodes==rawDisp$assessment[i]]
}
The standard way is to use match():
rawDisp$assessText <- names(tempAssessmentCodes)[match(rawDisp$assessment, tempAssessmentCodes)]
For each y element match(x, y) will find a corresponding element index in x. Then we use the names of y for replacing values with names.
Personally, I do it the opposite way - make tempAssesmentCodes have names that correspond to old codes, and values correspond to new codes:
codes <- setNames(names(tempAssessmentCodes), tempAssessmentCodes)
Then simply select elements from the new codes using the names (old codes):
rawDisp$assessText <- codes[as.character(rawDisp$assessment)]

Appending a variable to existing data depending on rows

So I have two columns. I need to add a third column. However this third column needs to have A for the first amount of rows, and B for the second specified amount of rows.
I tried adding this data_exercise_3 ["newcolumn"] <- (1:6)
but it didn't work. Can someone tell me what I'm doing wrong please?
Looks like you're having a problem with subsetting a data frame correctly. I'd recommend reviewing this concept before you proceed much further, either via a Coursera course or on a website like this UCLA R learning module on subsetting data frames. Subsetting is a crucial component of data wrangling with R, and you'll go much faster with a solid foundation of the basics!
You can assign values to a subset of a data frame by using [row, column] notation. Since your data frame is called data_exercise_3 and the column you'd like to assign values to is called 'newcolumn', then assuming you want the first 6 rows as 'A' and the next 3 as 'B', you could write it like this:
data_exercise_3[1:6,'newcolumn'] <- 'A'
data_exercise_3[7:9,'newcolumn'] <- 'B'
data_exercise_3$category <- c(rep("A",6),rep("B",6))

How to access a column after subsetting data frame?

It has to be really simple but it looks like my mind is not working properly anymore.
So, what I would like to do is to store one of the columns from mtcars as a vector but after subsetting it. I need one line code for the subsetting and assigning a vector.
That's what I would like to achieve but with one line:
data <- mtcars[mtcars[,11]==4,]
vec <- data[,1]
Thx!
vec<-mtcars[mtcars[,11]==4,][,1]
The mtcars[,11]==4 would be the row index and by selecting the column index as '1', we get the first column with subset of rows based on the condition.
mtcars[mtcars[,11]==4, 1]

R - Subset based on column name

My data frame has over 120 columns (variables) and I would like to create subsets bases on column names.
For example I would like to create a subset where the column name includes the string "mood". Is this possible?
I generally use
SubData <- myData[,grep("whatIWant", colnames(myData))]
I know very well that the "," is not necessary and
colnames
could be replaced by
names
but it would not work with matrices and I hate to change the formalism when changing objects.

Retaining a value in an R dataset if it's present in another dataset

I am currently working on a code which applies to various datasets from an experiment which looks at a wide range of variables which might not be present in every repetition. My first step is to create an empty dataset with all the possible variables, and then write a function which retains columns that are in the dataset being inputted and delete the rest. Here is an example of how I want to achieve this:-
x<-c("a","b","c","d","e","f","g")
y<-c("c","f","g")
Is there a way of removing elements of x that aren't present in y and/or retaining values of x that are present in y?
For your first question: "My first step is to create an empty dataset with all the possible variables", I would use factor on the concatenation of all the vectors, for example:
all_vect = c(x, y)
possible = levels(factor(all_vect))
Then, for the second part " write a function which retains columns that are in the dataset being inputted and delete the rest", I would write:
df[,names(df)%in%possible]
As akrun wrote, use intersect(x,y) or
> x[x %in% y]

Resources