Is there R code to remove rows with continuous blanks only - r

I have a data frame in which the first column is email addresses and the rest of the columns are statements that the person agreed with. I want to filter out only the rows in which the person agreed with nothing. If they agree with one thing they stay in (so below only doug's row would be removed.
Email C1 C2 C3
q#gmail.com agree agree
bob#gmail.com agree
doug#gmail.com]
I tried this in excel without luck. I have not seen R code that can fix it.

One base R solution using rowSums() to count non blank cells per row
df <- df[rowSums(df != "") > 1, ]
> df
Email C1 C2 C3
1 q#gmail.com agree agree
2 bob#gmail.com agree

Related

Matching phrases (word-search), noting finding in different columns

I'm working with: feedback from teachers, multiple selections from a google form. All selections end up producing a single column.
There was also the option to type in responses... so some words/responses are only present one time. There are about 50 responses, with between 1, and 7 words(ish) per entry. For the MRE I've included 5 as a vector.
eg_responses <- c("Difficult, Challenging, Fair", "Necessary, Useful", "Cruel", "Easy, Challenging", "School's shouldn't have to do that")
words_example <- c("Difficult","Easy","Fair","Challenging","Necessary","Useful")
While I can get a summary of the results, I need to create isolated responses so that I can select and compare with other variables later on...
What I would like to be able to do:
have columns representative of each word, and an extra column with other comments.
Look through each row for each word that was in the drop down.
while there are rows left to check look through word x.
if word is present mark in x column.
if word is not present mark NA in x column.
next column and next word when rows run out.
any other words/phrases left at the end go in the last column,
if no other words phrases are left mark end column with NA.
I'm sure I've been overcomplicating things. But I'm an absolute beginner. I'm working in R.
I have tried, separating the words, turning things into factors, then turning to character... Searching for exact complete phrases, or searching through words, would be really helpful.
There are 6 other sections with the same problem, and other sections of other variables.
I would like to have something like this at the end...
(apologies, just going to make it up bit of a chicken and egg)
respcolnames <- c( "Challenging", "Fair", "Unfair", "Other" )
row1 <- c("NA", "Fair", "NA", "NA")
row2 <- c("Challenging", "NA", "Unfair", "NA")
row3 <- c("NA", "NA","NA", "School's shouldn't have to do that")
teacher_responses <- cbind(row1, row2,row3)
names(teacher_responses) <- respcolnames
so that the corresponding response is in the correct column. I can then strip away the "Other" responses for a more social science analysis, and use the drop down response selections for some graphics.

How do I find the sum of a category under a subset?

So... I'm very illiterate when it comes to RStudio and I'm using this program for a class... I'm trying to figure out how to sum a subset of a category. I apologize in advance if this doesn't make sense but I'll do my best to explain because I have no clue what I'm doing and would also appreciate an explanation of why and not just what the answer would be. Note: The two lines I included are part of the directions I have to follow, not something I just typed in because I knew how to - I don't... It's the last part, the sum, that I am not explained how to do and thus I don't know what to do and would appreciate help figuring out.
For example,
I have this:
category_name category2_name
1 ABC
2 ABC
3 ABC
4 ABC
5 ABC
6 BDE
5 EFG
7 EFG
I wanted to find the sum of these numbers, so I was told to put in this:
sum(dataname$category_name)
After doing this, I'm asked to type this in, apparently creating a subset.
allabc <- subset(dataname, dataname$category_name2 == "abc")
I created this subset and now I have a new table popped up with this subset. I'm asked to sum only the numbers of this ABC subset... I have absolutely no clue on how to do this. If someone could help me out, I'd really appreciate it!
R is the software you are using. It is case-sensitive. So "abc" is not equal to "ABC".
The arguments are the "things" you put inside functions. Some arguments have the same name as the functions (which is a little confusing at first, but you get used to this eventually). So when I say the subset argument, I am talking about your second argument to the subset function, which you didn't name. That's ok, but when starting to learn R, try to always name your arguments.
So,
allabc <- subset(dataname, dataname$category_name2 == "abc")
Needs to be changed to:
allabc <- subset(dataname, subset=category2_name == "ABC")
And you also don't need to specify the name of the data again in the subset argument, since you've done that already in the first argument (which you didn't name, but almost everyone never bothers to do that).
This is the most easily done using tidyverse.
# Your data
data <- data.frame(category_name = 1:8, category_name2 = c(rep("ABC", 5), "BDE", "EFG", "EFG"))
# Installing tidyverse
install.packages("tidyverse")
# Loading tidyverse
library(tidyverse)
# For each category_name2 the category_name is summed
data %>%
group_by(category_name2) %>%
summarise(sum_by_group = sum(category_name))
# Output
category_name2 sum_by_group
ABC 15
BDE 6
EFG 15

Juxtaposing Replicate Data

I have provided a sample dataset that I have arranged in column format (called "full.table").
These data were extracted from a 96-well PCR plate, & while collecting my data, I always ran a duplicate experiment, meaning each variable (aka test) has 1 replicate. I would like to take all replicates and juxtapose them (have them be side by side), which would allow me to easily visualize replicates next to each other, and finally calculate an average value for the variable "Cq" between the two.
The complications stems from having done multiple tests over several days (complication one), and NOT having my samples always run in the same fashion on the PCR plate (complication two). Typically, as you see on my data set below, Well A1 has a duplicate in Well B1, however this is not always the case. Occasionally, Well A7 matches Well A8 (and NOT B7).
Replicates were always run on the same day, so an important variable here is “date” which I added via R before uploading to Stack Exchange. I am confused on how to re-arrange the data to get my desired result (not even sure where to start)
I have provided an example of what I would like in the end, called “sample.finished.table”
Logically, having 768 observations in this example, this should divide it in two, resulting in 384 total lines of data (385 with header)
I appreciate any feedback. Thank you
full.table<- read.table("https://pastebin.com/raw/kTQhuttv", header=T, sep="")
sample.finished.table <- read.table("https://pastebin.com/raw/Phg7C9xD", header=T, sep="")
You can use dplyr here to group by sample and extract the requested values:
library(dplyr)
full.table %>% group_by(sample,date) %>% summarise(
Well1 = first(Well), Cq1 = first(Cq),
Well2 = last(Well), sample1 = last(sample), Cq2 = last(Cq), Cq_mean = mean(Cq[Cq > 0]))

R - Filter any rows and show all columns

I would like to an output that shows the column names that has rows containing a string value. Assume the following...
Animals Sex
I like Dogs Male
I like Cats Male
I like Dogs Female
I like Dogs Female
Data Missing Male
Data Missing Male
I found an SO tread here, David Arenburg provided answer which works very well but I was wondering if it is possible to get an output that doesn't show all the rows. So If I want to find a string "Data Missing" the output I would like to see is...
Animals
Data Missing
or
Animal
TRUE
instead of
Anmials Sex
Data Missing Male
Data Missing Male
I have also found using filters such as df$columnName works but I have big file and a number of large quantity of column names, typing column names would be tedious. Assume string "Data Missing" is also in other columns and there could be different type of strings. So that is why I like David Arenburg's answer, so bear in mind I don't have two columns, as sample given above.
Cheers
One thing you could do is grep for "Data Missing" like this:
x <- apply(data, 2, grep, pattern = "Data Missing")
lapply(x, length) > 1
This will give you the:
Animal
TRUE
result you're after. It's also good because it checks all columns, which you mentioned was something you wanted.
If we want only the first row where it matches, use match
data[match("Data Missing", data$Animals), "Animals", drop = FALSE]
# Animals
#5 Data Missing

Select variables that contain value in R

I apologize if this question has been answered. I have searched this for way too long.
I have coded data that has a prefix of a letter and suffix of numbers.
ex:
A01, A02,...A99 ### (for each letter A-Z)
I need R code that mirrors this SAS code:
Proc SQL;
Create table NEW as
Select *
From DATA
Where VAR contains 'D';
Quit;
EDIT
Sorry y'all, I'm new! (also, mediocre in R at best.) I thought posting the SAS/SQL code would help make it easier.
Anyway, the data is manufacturing data. I have a variable whose values are the A01...A99, etc. values.
(rough) example of the dataframe:
OBS PRODUCT PRICE PLANT
1 phone 8.55 A87
2 paper 105.97 X67
3 cord .59 D24
4 monitor 98.65 D99
The scale of the data is massive, and I'm only wanting to focus on the observations that come from the plant 'D', so I'm trying to subset the data based on the 'PLANT' variable that contains (or starts with) 'D'. I know how to filter the data with a specific value (ie. ==, >=, != , etc.). I just can't figure out how to do it when only part of the value is known and I have yet to find anything about a 'contains' operator in R. I hope that clarifies things more.
Assuming DATA is your data.frame and VAR is your column value,
DATA <- data.frame(
VAR=apply(expand.grid(LETTERS[1:4], 1:3), 1, paste0, collapse=""),
VAL = runif(3*4)
)
then you can do
subset(DATA, grepl("D", VAR))
A slight alternative to MrFlick's solution: use a vector of row-indices:
DATA[grep('D', DATA$VAR), ]
VAR VAL
4 D1 0.31001091
8 D2 0.71562382
12 D3 0.00981055
where we defined:
DATA <- data.frame(
VAR=apply(expand.grid(LETTERS[1:4], 1:3), 1, paste0, collapse=""),
VAL = runif(3*4)
)

Resources