Select whole columns by a word in the first row - r

I have a data frame (df) given from an excel sheet. In the first row of the date frame it's always "correct" or "wrong", the other rows are filled with data.
Now I want to select all the Columns where the first row says "correct" by using the function apply.
I tried:
apply(df,2,function(df) grepl ("correct",df))
The answer is just a data frame with TRUE and FALSE. How can I select the columns without losing the data in the other rows?

You shouldn't need a loop. The following should work,
df[,df[1,] == 'correct']

i <- sapply(df, function(x) x[1] =='correct')
df[,i]

Related

Is there a function to subset data using a qualitative requirement in a column?

I am having trouble creating a subset for a large dataframe. I need to extract all rows that match one of two correct cities in one of the columns, however any subset that I create ends up empty. Given the main dataframe, I try:
New = data[data$Home.port %in% c("ARDGLASS","NEWLYN")]
However R returns "undefined columns selected"
A comma is missing:
New = data[data$Home.port %in% c("ARDGLASS","NEWLYN"), ]
That is because you are selecting rows, not columns; if you leave out the comma, R tries to subset columns instead of rows.
I recommend to use data.table so:
# install.packages(data.table)
library(data.table)
data <- as.data.table(data)
new_data <- data[Home.port %in% c("ARDGLASS","NEWLYN")]
You can check this web to learn data.table is very fast with big data bases
The subset function will also do this task
new <- subset(data, subset = Home.port %in% c("ARDGLASS","NEWLYN"))
The base approach is functionally the same, its just a matter of using a declarative function for the task or not.
When using subset() the first argument is the data frame you want to subset. When you want to check for several variables you do not need to put "data$" in front. This save time and makes it easier to read.
datasubset <- subset(data, Home.port %in% c("ARDGLASS","NEWLYN"))
You can also use multiple conditions to subset use "&" for AND condition or "|" for OR condition depending on what you plan to do.
datasubset <- subset(data, Home.port == "ARDGLASS" & Home.port == "NEWLYN"))

How to populate/fill a dataframe column with cell values of another dataframe

I have two dataframes
dataframe 1 has around million rows.. and its has two columns named 'row' and 'columns' that has the index of row and column of another dataframe (i.e. dataframe 2)..
i want to extract the values from dataframe 2 with the indexes stated in the columns named 'row' and 'columns' for each row in dataframe1.
I used a simple for loop to get the solution but it is time consuming and takes around 9 minutes, is there any other way with functions in R to solve this problem?
for(i in 1:nrow(datafram1)) {
dataframe1$value[i] = dataframe2[dataframe1$row[i],dataframe1$columns[i]]
}
You actually don't need a for loop to do this. Just add the new column to the Data Frame using the row and column names:
DataFrame1$value <- DataFrame2[DataFrame1$row, DataFrame1$column]
This should work a lot faster. If you wanted to try it a different way you could try adding the values to a new vector and then using cbind to join the vector to the Data Frame. The fact that you're trying to update the whole Data Frame during the loop is most likely what's slowing it down.
Maybe you can try the code below
dataframe1$value <- dataframe2[as.matrix(dataframe1[c("row","columns")])]
Sionce your loop only consider the rows in df1, you can cut the surplus roes on df2 and then use cbind:
dataframe2 <- dataframe2[nrow(dataframe1),]
df3 <- cbind(dataframe1, dataframe2)

Count values per rows in a data frame R

I know, there is other questions like this one but none of them answer my specific problem.
On my data frame, I need to count the number of values in each rows between cols 3 and 8.
I want a simple NB.VAL like in Excel..
base_graphs$NB <- rowSums(!is.na(base_graphs)) # with this code, I count all values except NAs but I can't select specific columns
How to create this new column "NB" on my data frame "base_graphs" ?
You were really close:
base_graphs$NB <- rowSums(!is.na(base_graphs[, 3:8]))
The [, 3:8] subsets and selects columns 3 through 8.
apply can apply a function to each row of a data frame. Try:
base_graphs$NB <- apply(base_graphs[3:8], 1, function (x) sum(is.na(x)))

R: Assign values to a new column based on values of another column where a condition is satisfied

I want to create a new column in a data.frame where its value is equal to the value in another data.frame where a particular condition is satisfied between two columns in each data frame.
The R pseudo-code being something like this:
DF1$Activity <- DF2$Activity where DF2$NAME == DF1$NAME
In each data.frame values for $NAME are unique in the column.
Use the ifelse function. Here, I put NA when the condition is not met. However, you may choose any value or values from any vector.
Recycling rules1 apply.
DF1$Activity <- ifelse(DF2$NAME == DF1$NAME, DF2$Activity, NA)
I'm not sure this one actually needs an example. What happens when you create a column with a set of NA values and then assign the required rows with the same logical vector on both sides:
DF1$Activity <- NA
DF1$Activity[DF2$NAME == DF1$NAME] <- DF2$Activity[DF2$NAME == DF1$NAME]
without an example its quite hard to tell. But from your description it sounds like a base::merge or dplyr::inner_join operation. Those are quite fast in comparison to if statements.
Cheers

How to access a column after subsetting data frame?

It has to be really simple but it looks like my mind is not working properly anymore.
So, what I would like to do is to store one of the columns from mtcars as a vector but after subsetting it. I need one line code for the subsetting and assigning a vector.
That's what I would like to achieve but with one line:
data <- mtcars[mtcars[,11]==4,]
vec <- data[,1]
Thx!
vec<-mtcars[mtcars[,11]==4,][,1]
The mtcars[,11]==4 would be the row index and by selecting the column index as '1', we get the first column with subset of rows based on the condition.
mtcars[mtcars[,11]==4, 1]

Resources