Create a new column in R spreadsheet with a specific calculation - r

I have looked for an answer to this on stackexchange but the questions being asked are way more complicated than what I need.
I have a table in R
Teacher Name Usage_in_MINS Usage Rate
Kelper 78
Kelper 85
Smith 85
Kelper 45
Smith 65
7th Grade 45
4th Grade 34
How do I get R to create a new column called Usage Rate
How do I get this new column to take the values in Usage_in_MINS and divide it by 60 for only those classes that are either Kelper or Smith? What about if I want it to calculate usage rates for Kelper and Smith and everyone else as well.

There are a lot of really good basic tutorials on R out there, and you would really help yourself by checking out one or two of them because your question indicates some significant niavete. :-) Here's one way to do what you'd like, assuming that your data.frame is called "data":
data$UsageRate[data$TeacherName %in% c("Kelper", "Smith")] <-
data$Usage_in_MINS[data$TeacherName %in% c("Kelper", "Smith")] * 60

Related

Why is R adding empty factors to my data?

I have a simple data set in R -- 2 conditions called "COND", and within those conditions adults chose between one of 2 pictures, we call house or car. This variable is called "SAW"
I have 69 people, and 69 rows of data
FOR SOME Reason -- R is adding an empty factor to both, How do I get rid of it?
When I type table to see how many are in each-- this is the output
table(MazeData$SAW)
car house
2 9 59
table(MazeData$COND)
Apples No_Apples
2 35 33
Where the heck are these 2 mystery rows coming from? it wont let me make my simple box plots and bar plots or run t.test because of this error - can someone help? thanks!!

R: Getting a value from a table based on a loop

I have a loop where I am trying to build a table by grabbing information from a driver table I import. What I'm stuck on is I want to loop through columns based on a loop, something like:
In the first loop through I want it to function like
df$a <- Driver$M1[i]
and then in the second loop through function like
df$a <- Driver$M2[i] and so on
Through searching I thought I had come across the solution of
df$a <- get(paste0("Driver$M",j,"[i]")) but I get the error
object 'Driver$M1[i]' not found
so I don't think "get" functions like I thought it did.
Could someone help me find out how to make this work?
Thanks
Iterating over the columns of a table "smoke"
> smoke
High Low Middle
current 51 43 22
former 92 28 21
never 68 22 9
is as simple as
> for (i in colnames(smoke)) {t = smoke[,i]; print(i); print(t)}
[1] "High"
current former never
51 92 68
[1] "Low"
current former never
43 28 22
[1] "Middle"
current former never
22 21 9
Thanks for everyone looking at this, I kept looking and came across writing it in a different way. Writing it this way seems to do what I was looking for: Driver[i,paste0("M",j)]
I'm not very experienced so I don't want to be sharing incorrect information but it seems like the $ function cant accept variables but by changing the way its written to Driver[row, column] column is looking for a string anyway so paste0() now works like I want it too.

Sum row values based on previous ones

I'll try to be specific: I want to create a new column on a data frame in which the values are the sum of the previous values in another column.
So I already have the first two columns (ID and Value) below and want to create the third one (Sum), but I don't know how to do this.
In the column "Sum", the values are the sum of the values in "Value), so for example, 31.098 (Sum) is the sum of 16.91 and 14.18 (Value):
ID Value Sum
157 16.91531834 16.91531834
142 14.18365203 31.09897037
205 11.93528052 43.03425089
89 11.83021643 54.86446732
53 6.3668838 61.23135112
204 3.99243539 65.22378651
202 3.21496113 68.43874764
17 1.93317924 70.37192688
220 1.74406388 72.11599076
147 1.59697415 73.71296491
33 1.42887161 75.14183652
138 1.28178189 76.42361841
154 1.19773062 77.62134903
It is the first time I'm posting here. Until now I found everything I was searching for already answered... so, sorry if this kind of question is already answered too (I must have been!), but I wasn't able to find. I'm not a native speaker (as you probably guessed already), so maybe I didn't use the proper key words...
Thanks!!

Merge columns with the same name R

I'm fairly new to R. I'm working with a data set that is incredibly redundant with a lot of columns (~400). There are several duplicate column names, however the data is not duplicate, so I need to sum the columns when collapsing them.
The columns all have a similar name that allows easy identification, so I'm hoping I can use that to my advantage.
I attempted to perform the following:
ColNames <- unique(colnames(df))
CombinedDf <- data.frame(sapply(ColNames, function(i)rowSums(Test[,ColNames==i, drop=FALSE])))
This works if I sum over the range of columns that only contain integers, but the issue is that other columns have strings and such in them, so rowSums throws a fit.
Assuming that the identifier is "XXX", how can I aggregate all the columns that are of the same name leaving the other columns as is?
Thank you for your time.
Edit: Sample data has been asked for, I cannot give the exact data as it is sensitive, but I will give an example:
Name COL1XXX COL2XXX COL1XXX COL3XXX COL2XXX Type
Henry 5 15 25 31 1 Orange
Tom 8 16 12 4 3 Green
Should return
Name COL1XXX COL2XXX COL3XXX Type
Henry 30 16 31 Orange
Tom 20 19 4 Green
I'm not really sure, but you may try transposing the data and then aggregating by unique names.
t_df=as.data.frame(t(df))
new_df=aggregate(t_df, by=list(rownames(t_df)),sum)
Again, without sample data I'm unsure if it'll work, but based on what you said, that might work.

R: iterating through unique values of a vector in for loop

I'm new to R and I am having some trouble iterating through the unique element of a vector. I have a dataframe "School" with 700 different teachers. Each teacher has around 40 students.
I want to be able to loop through each teacher, create a graphs for the mean score of his/her students' over time, save the graphs in a folder and automatically email that folder to that teacher.
I'm just getting started and am having trouble setting up the for-loop. In Stata, I know how to loop through each unique element in a list, but am having trouble doing that in R. Any help would be appreciated.
School$Teacher School$Student School$ScoreNovember School$ScoreDec School$TeacherEmail
A 1 35 45 A#school.org
A 2 43 65 A#school.org
B 1 66 54 B#school.org
A 3 97 99 A#school.org
C 1 23 45 C#school.org
Your question seems a bit vague and it looks like you want us to write your whole project. Could you share what you have done so far and where exactly you are struggling?
see ?subset
School=data.frame(Teacher=c("A","B"), ScoreNovember=10:11, ScoreDec=13:14)
for (teacher in unique(School$Teacher)) {
teacher_df=subset(School, Teacher==teacher)
MeanScoreNovember=mean(teacher_df$ScoreNovember)
MeanScoreDec =mean(teacher_df$ScoreDec)
# do your plot
# send your email
}
I think you have 3 questions, which will need separate questions, how do I:
Create graphs
Automatically email output
Compute a subset mean based on group
For the 3rd one, I like using the plyr package, other people will recommend data.table or dplyrpackages. You can also use aggregate from base. To get a teacher's mean:
library(plyr)
ddply(School,.(Teacher),summarise,Nov_m=mean(ScoreNovember))
If you want per student per teacher, etc. just add between the columns, like:
library(plyr)
ddply(School,.(Teacher,Student),summarise,Nov_m=mean(ScoreNovember))
You could do that for each score column (and then chart it) if your data was long rather than wide you could also add the date ('November', 'Dec') as a group in the brackets, or:
library(plyr)
ddply(School,.(Teacher,Student),summarise,Nov_m=mean(ScoreNovember),Dec_m=mean(ScoreDec))
See if that helps with the 3rd, but look at splitting your questions up too.

Resources