histograms using loop using R - r

I am trying to find a more efficient way to plot these five histograms using a for loop for example how would I use a loop for the plots below in R
hist(dat$train[dat$train[,1]==7,10])
hist(dat$train[dat$train[,1]==7,2])
hist(dat$train[dat$train[,1]==7,17])
hist(dat$train[dat$train[,1]==7,200])
hist(dat$train[dat$train[,1]==7,56])

Preferably, for this kind of question, you should post some sample data for dat. In this case, only one variable changes within the loop. The for loop can loop over a vector of these values. Conventionally, the variable is calles i. I did not change your hist-statement except for inserting the i:
for(i in c(10, 2, 17, 200, 56))
hist(dat$train[dat$train[,1]==7, i])
Personally, I prefer speaking variable names, so I would replace the i by breakslike so:
for(breaks in c(10, 2, 17, 200, 56))
hist(dat$train[dat$train[,1]==7, breaks])

Related

Writing an R function, using "for-loop", to compute the arithmetic mean from vectors with numeric data?

I need to write a function that does the above, basically. I would like it to be able to apply it to any numeric vector. I am very new to R so I'm struggling to get this off the ground. I appreciate any help!
Rather than write your own function use mean(), which comes with base R:
numbers <- c(11, 3, 4.2, 0, -12)
numbers
result <- mean(numbers)

Coding a new variable according to the range of existing variables

I wrote the code below using "ifelse" function,
which only returns "worker" to a$age_group , the new column I want to create according to the variable "age" but only.
I don't know why... can you help me debug my code?
for(i in 1:length(a$age))
{
ifelse(a$age<17, a$age_group<-"mid",
ifelse(a$age<20, a$age_group<-"high",
ifelse(a$age<24, a$age_group<-"univ",
a$age_group<-"worker")))
}
We can use cut or findInterval:
with(a, cut(age, breaks=c(17, 20, 24, Inf),
labels=c('mid', 'high', 'univ', 'worker'))
Your code is unnecessarily complicated and I suspect it might also be quite slow. Keep in mind that ifelse is vectorized. So you don't need the for loop here. The main problem, however, is that the later calls to ifelse are overwriting the earlier ones because you do the assignment inside, you need to put the assignment outside the ifelse calls. Try this one:
a$age_group <- ifelse(a$age < 17, "mid",
ifelse(a$age < 20, "high",
ifelse(a$age < 24, "univ", "worker")))

Evaluating a function for given vectors with different lengths in R

I have written an predictor function on R and I tried several combinations of inputs in the function to see how the output would change.
The problem is that my function takes 4 numeric parameters and I want to test my function by plugging all possible combinations of elements obtained from specified vectors ( vectors have different lengths)
I've tried using replicate, apply and sapply functions but I couldn't get the output that I wanted to see. I can do for loops for each parameter but when it comes to several parameters i need several loops and I don't know how to store the values after this many loops.
So my function looks like this;
predictVAR(Dataset, ColumnNumber, Correlation, Lags, FcastHorizon)
And while keeping the Dataset constant ( or i can just remove it from parameter list and assign it as the default data frame in function)
ColumnNumber takes values between 1 and 20 ( each of these are the corresponding variables from Dataset)
Correlation will take values in seq(0.15,0.9,by=0.15)
Lags will take values in c(10, 20, 30, 50, 80, 100)
and finally FcastHorizon will take values from list c(20,252)
So if I started doing this manually and evaluate each combination from these specified vectors, it would look like
predictVAR(1, 0.1, 10, 20) => predictVAR(1, 0.1, 10, 252) => predictVAR(1,
0.1, 20, 20) => predictVAR(1, 0.1, 20, 252) . . . . and finally;=> predictVAR(20, 0.9,100 ,252)
By the end of process, I should obtain 20*6*6*2=1440 different outputs and the corresponding input specifications.
Could you help me about what function would help me to obtain the results? I have read topics about the family of apply functions but I need to evaluate the model with all cross combinations and I couldn't find a solution so far.
Regards

Dynamically adding values to dynamically created vectors

I just started learning to code in R. I have a requirement where I have to keep adding unknown number of values to different vectors (number of vectors is not known). So, I tried to implement this using -
clust_oo = c()
clust_oo[k] = c(clust_oo[k],init_dataset[k,1])
Without the [k], the above code works but since i don't know the number of vectors/lists i have to use [k] as a differentiator. clust_oo[1] could have values say, 1,23,45 , clust_oo[2] could have other values 4, 40 and clust_oo[3] with values 44, 67, 455, 885. Where the values are added dynamically.
Is this the right way to proceed for this?
Try:
clust_oo = c()
for(i in 1:3)
clust_oo[length(clust_oo)+1] = i
clust_oo
[1] 1 2 3

For loop not counting correctly

I can't for the life of me figure out what is going on here. I have a data frame that has several thousands rows. One of the columns is "name" and the other columns have various factors. I'm trying to count how many unique rows (i.e. sets of factors) belong to each "name".
Here is the loop that I am running as a script:
names<-as.matrix(unique(all.rows$name))
count<-matrix(1:length(names))
for (i in 1:length(names)) {
count[i]<-dim(unique(subset(all.rows,name==names[i])[,c(1,3,4,5)]))[1]
}
When I run the line in the for loop from the console and replace "i" with an arbitrary number (i.e. 10, 27, 40, ...), it gives me the correct count. But when I run this line inside the for loop, the end result is that the counts are all the same. I can't figure out why it's not working. Any ideas?
Your code works for me:
# Sample data.
set.seed(1)
n=10000
all.rows=data.frame(a=sample(LETTERS,n,replace=T),b=sample(LETTERS,n,replace=T),name=sample(LETTERS,n,replace=T))
names<-as.matrix(unique(all.rows$name))
count<-matrix(1:length(names))
for (i in 1:length(names)) {
count[i]<-dim(unique(subset(all.rows,name==names[i])[,c(1,2)]))[1]
}
t(count)
If you want to stick with a for loop, this is a little more clear:
count<-c()
for (i in unique(all.rows$name))
count[i]<-nrow(unique(all.rows [all.rows$name==i,names(all.rows)!='name']))
count
But using by would be very concise:
c(by(all.rows,all.rows$name,function(x) nrow(unique(x))))
You can do this with much simpler code. Try just pasting together the factor values in each row and then using tapply. Here is a working example:
data(trees)
trees$name <- rep(c('elm', 'oak'), length.out = nrow(trees))
trees$HV <- with(trees, paste(Height, Volume))
tapply(trees$HV, trees$name, function (x) length(unique(x)))
The last command gives you the counts that you need. As far as I can tell, the analogous code given your variable names is
all.rows$factorCombo <- apply(all.rows[, c(1, 3:5)], 2, function (x) paste(x, collapse = ''))
tapply(all.rows$factorCombo, all.rows$name, function (x) length(unique(x)))

Resources