R: Getting a value from a table based on a loop - r

I have a loop where I am trying to build a table by grabbing information from a driver table I import. What I'm stuck on is I want to loop through columns based on a loop, something like:
In the first loop through I want it to function like
df$a <- Driver$M1[i]
and then in the second loop through function like
df$a <- Driver$M2[i] and so on
Through searching I thought I had come across the solution of
df$a <- get(paste0("Driver$M",j,"[i]")) but I get the error
object 'Driver$M1[i]' not found
so I don't think "get" functions like I thought it did.
Could someone help me find out how to make this work?
Thanks

Iterating over the columns of a table "smoke"
> smoke
High Low Middle
current 51 43 22
former 92 28 21
never 68 22 9
is as simple as
> for (i in colnames(smoke)) {t = smoke[,i]; print(i); print(t)}
[1] "High"
current former never
51 92 68
[1] "Low"
current former never
43 28 22
[1] "Middle"
current former never
22 21 9

Thanks for everyone looking at this, I kept looking and came across writing it in a different way. Writing it this way seems to do what I was looking for: Driver[i,paste0("M",j)]
I'm not very experienced so I don't want to be sharing incorrect information but it seems like the $ function cant accept variables but by changing the way its written to Driver[row, column] column is looking for a string anyway so paste0() now works like I want it too.

Related

R - Using Stringr to identify a string across hundreds of rows

I have a database where some people have multiple diagnoses. I posted a similar question in the past, but now have some more nuances I need to work through:
R- How to test multiple 100s of similar variables against a condition
I have this dataset (which was an import of a SAS file)
ID dx1 dx2 dx3 dx4 dx5 dx6 .... dx200
1 343 432 873 129 12 123 3445
2 34 12 44
3 12
4 34 56
Initially, I wanted to be able to create a new variable if any of the "dxs" equals a certain number without using hundreds of if statements? All the different variables have the same format (dx#). So I used the following code:
Ex:
dataset$highbloodpressure <- rowSums(screen[0:832] == "410") > 0
This worked great. However, there are many different codes for the same diagnosis. For example, a heart attack can be defined as:
410.1,
410.71,
410.62,
410.42,
...this goes on for 20 additional codes. BUT! They all start with 410.
I thought about using stringr (the variable is a string), to identify the common code components (410, for the example above), but am not sure how to use it in the context of rowsums.
If anyone has any suggestions for this, please let me know!
Thanks for all the help!
You can use the grepl() function that returns TRUE if a value is present. In order to check all columns simultaneously, just collapse all of them to one character per row:
df$dx.410 = NA
for(i in 1:dim(df)[1]){
if(grepl('410',paste(df[i,2:200],collapse=' '))){
df$dx.410[i]="Present"
}
}
This will loop through all lines, create one large character containing all diagnoses for this case and write "Present" in column dx.410 if any column contains a 410-diagnosis.
(The solution expects the data structure you have here with the dx-variables in columns 2 to 200. If there are some other columns, just adjust these numbers)

Why is R adding empty factors to my data?

I have a simple data set in R -- 2 conditions called "COND", and within those conditions adults chose between one of 2 pictures, we call house or car. This variable is called "SAW"
I have 69 people, and 69 rows of data
FOR SOME Reason -- R is adding an empty factor to both, How do I get rid of it?
When I type table to see how many are in each-- this is the output
table(MazeData$SAW)
car house
2 9 59
table(MazeData$COND)
Apples No_Apples
2 35 33
Where the heck are these 2 mystery rows coming from? it wont let me make my simple box plots and bar plots or run t.test because of this error - can someone help? thanks!!

Why do i have to do the "<-"? Can't i design my function to bypass that?

One of the things i don't like in r is the save process. Since i am always developing, i have large working environments, and when i save i like to save a specific object frequently. And one of the most annoying things to me is the save process can be so complicated. The object (which is one of up to 10 at a time) is a list of 10 to 20 various data frames (ranging from rasterized images, to medium and large data frames), that are all used in different ways by different functions, which can get very complex.
One of the things that i have not been able to figure out is during my function (if i am performing something that will change that data), i would like to save the changed object back to the directory automatically. Instead of i have to do something as follows. Note this is fine to do with a list of objects through a for loop, but i would like to do it for the object I input into the function.
# obtain the name of the object you will be inputing into
# the function in character form
dat.name<-ls(pattern="dat")
#or select it from a list if there are multiple
dat.name<-select.list(ls(pattern="dat"))
# do the function with the object assign it to a new name just in case
# something doesn't work
tmp.dat<-cell.creator(dat)
#next assign the tmp to the real
assign(dat, tmp.dat)
##or## just do the straight up rename if you are brave,
#and i am starting to get pretty brave with some of my functions
dat<-cell.creator(dat)
#paste .rdata on the back to create a file name
file.name<-paste(dat.name, ".rdata")
#then... FINALLY save it
save(dat, file=file.name)
What i really want to do is internalize those commands into the function, but (unless i am not understanding this) there is nothing that stores the way my object is named during the input, unless i input it with quotations. Which doesn't allow me to use the tabbing autocomplete in rgui. :(
so, lets say dat is
bob<-sample(seq(1,1000))
and my function sorts my object
bob.sorter<-function(dat){
dat<-sort(dat)
return(dat)}
So now when i input bob, i would like something to just go ahead and save bob
for me basically do the equivalent of
dat<-cell.creator(dat)
Am i missing something here?
I don't fully understand your question, but this seems to address part of it. The following is a function which will take an object assigned to a variable (e.g. bob) and automatically saves it to a file whose name is the variable name followed by .rdata (e.g. "bob.rdata") without the need to actually type the file name:
qsave <- function(dat){
dat.name <- deparse(substitute(dat))
file.name <- paste0(dat.name,".rdata")
save(list = dat.name, file=file.name)
}
To test it:
> bob <- islands
> qsave(bob)
> rm(bob) #bob is now gone
> load("~/bob.rdata") #you can check that this restores bob
You can do this:
set.seed(1492) # reproducible science
bob <- sample(1:1000, 500) # the actual way sample() shld be called
str(bob)
## int [1:500] 278 216 185 111 52 9 848 507 388 763 ...
bob_sorter <- function(dat) {
dat <- dat[order(dat)] # actual sorting happening
dat
}
str(bob_sorter(bob))
## int [1:500] 3 6 7 8 9 10 11 13 14 17 ...
bobs_silly_sorter <- function(dat) {
passed_in_name <- as.character(substitute(dat)) # pls never do this
dat <- dat[order(dat)]
assign(passed_in_name, dat, envir=.GlobalEnv) # pls never do this
}
str(bob)
## int [1:500] 278 216 185 111 52 9 848 507 388 763 ...
bobs_silly_sorter(bob)
str(bob)
## int [1:500] 3 6 7 8 9 10 11 13 14 17 ...
It's horribad. Your future self will prbly hate you for doing it. And, anyone else who has to work with your code will also end up muttering obscenities under their breath at you every time you walk by them.

R: iterating through unique values of a vector in for loop

I'm new to R and I am having some trouble iterating through the unique element of a vector. I have a dataframe "School" with 700 different teachers. Each teacher has around 40 students.
I want to be able to loop through each teacher, create a graphs for the mean score of his/her students' over time, save the graphs in a folder and automatically email that folder to that teacher.
I'm just getting started and am having trouble setting up the for-loop. In Stata, I know how to loop through each unique element in a list, but am having trouble doing that in R. Any help would be appreciated.
School$Teacher School$Student School$ScoreNovember School$ScoreDec School$TeacherEmail
A 1 35 45 A#school.org
A 2 43 65 A#school.org
B 1 66 54 B#school.org
A 3 97 99 A#school.org
C 1 23 45 C#school.org
Your question seems a bit vague and it looks like you want us to write your whole project. Could you share what you have done so far and where exactly you are struggling?
see ?subset
School=data.frame(Teacher=c("A","B"), ScoreNovember=10:11, ScoreDec=13:14)
for (teacher in unique(School$Teacher)) {
teacher_df=subset(School, Teacher==teacher)
MeanScoreNovember=mean(teacher_df$ScoreNovember)
MeanScoreDec =mean(teacher_df$ScoreDec)
# do your plot
# send your email
}
I think you have 3 questions, which will need separate questions, how do I:
Create graphs
Automatically email output
Compute a subset mean based on group
For the 3rd one, I like using the plyr package, other people will recommend data.table or dplyrpackages. You can also use aggregate from base. To get a teacher's mean:
library(plyr)
ddply(School,.(Teacher),summarise,Nov_m=mean(ScoreNovember))
If you want per student per teacher, etc. just add between the columns, like:
library(plyr)
ddply(School,.(Teacher,Student),summarise,Nov_m=mean(ScoreNovember))
You could do that for each score column (and then chart it) if your data was long rather than wide you could also add the date ('November', 'Dec') as a group in the brackets, or:
library(plyr)
ddply(School,.(Teacher,Student),summarise,Nov_m=mean(ScoreNovember),Dec_m=mean(ScoreDec))
See if that helps with the 3rd, but look at splitting your questions up too.

R stack alternative

I am trying to write code that takes values from one column of each of many files and prints out a list of the values of a different column depending on the values found in the first. If that makes sense. I have read the files in, but I am having trouble managing the table. I would like to limit the table to just those two columns, because the files are very large, cumbersome and unnecessary. In my attempt to do so I had this line:
tmp<-stack(lapply(inputFiles,function(x) x[,3]))
But ideally I would like to include two columns (3 and 1), not just one, so that I may use a line, such as these ones:
search<-tmp[tmp$values < 100, "Target"]
write(search, file = "Five", ncolumns = 2)
But I am not sure how. I am almost certain that stack is not going to work for more than one column. I tried some different things, similar to this:
tmp<-stack(lapply(inputFiles,function(x) x[,3], x[,1]))
But of course that didn't work.
But I don't know where to look. Does anyone have any suggestions?
The taRifx package has a list method for stack that will do what you want. It stacks lists of data.frames.
Untested code:
library(taRifx)
tmp<-stack(lapply(inputFiles,function(x) x[,c(1,3)]))
But you didn't change anything! Why does this work?
lapply() returns a list. In your case, it returns a list where each element is a data.frame.
Base R does not have a special method for stacking lists. So when you call stack() on your list of data.frames, it calls stack.default, which doesn't work.
Loading the taRifx library loads a method of stack that deals specifically with lists of data.frames. So everything works fine since stack() now knows how to properly handle a list of data.frames.
Tested example:
dat <- replicate(10, data.frame(x=runif(2),y=rnorm(2)), simplify=FALSE)
str(dat)
stack(dat)
x y
1 0.42692948 0.32023455
2 0.75388820 0.24154125
3 0.64035957 1.96580059
4 0.47690790 -1.89772855
5 0.41668993 0.78083412
6 0.12643784 0.38029833
7 0.01656855 0.51225268
8 0.40653094 1.09408159
9 0.94236491 -0.13410923
10 0.05578115 1.12475364
11 0.75651062 -0.65441493
12 0.48210444 1.67325343
13 0.95348755 0.04828449
14 0.02315498 -0.28481193
15 0.27370762 0.43927826
16 0.83045889 0.75880763
17 0.40049367 0.06945058
18 0.86212662 1.49918712
19 0.97611629 0.13959291
20 0.29107186 0.64483646

Categories

Resources