Trying to find mean of each column in a data set

Trying to find mean of each column in a data set - r

Hello everyone I am fairly new to r programming and i was wondering if someone could help me out. I was just playing with r and wanted to make a function that returned a vector of the means of each column in a data set that the user would put in as an argument. The problem is I am trying to do it without the mean r the apply functions so I am just manually trying it out and feel I am very close to finishing it. Just wanted to ask if someone could check it to see where I made an error.
Here is my code:
findMeans<- function(data)
{
meanVec <- numeric()
for(i in 1:6)
{
mean=0
for( j in 1:153)
{
value=0
count=0
if(is.na(data[j,i])==FALSE)
{
value= value + data[i,j]
count=count+1
}
else
{
value= value +0
}
}
mean =value/count
meanVec[i]<-mean
}
meanVec
}
and when I try to list the vector it just gives this
> meanVec
numeric(0)
could anyone possibly shed some light on what I am doing wrong?

If you're looking for function writing practice, and are already aware of the colMeans function, there's a couple errors I spotted.
1) I assume that when you're going from 1:6, you're going through each column in your data frame, and 1:153, you're going through each row. If this is accurate, your value=0 and count = 0 statements should be moved a level up, next to mean = 0. Otherwise, you're resetting the value to zero every row you go through, which won't do anything but report the last value it comes across.
2) In the line value= value + data[i,j], you need data[j,i] instead. You reversed the row and column values.
With those two changes, your function seems to work for a data set with 6 columns and 153 rows. For more practice, I'd recommend trying to find a way to generalize the function for any number of columns and rows.

Related

Mean value for different groups

I am stuck with a 'for' loop and would greatly appreciate some help.
I have a dataframe, called 'df' including data for the number of people per household (household_size), ranging from 0 (I replaced the missing values with a 0) to 8, as well as the number of car.
My aim is to write a quick code that computes the average number of cars depending on the household size.
I tried the following:
avg <- function(df){
i <- df$household_size
for (i in 0 : 8){
print(mean(df$car))
}
}
I'm pretty sure I'm missing something really basic here, but I don't know what.
Thanks everyone for your input.
I wouldn't have used a function for this. However, this is an exercise as part of an introductory coding with R module that specifically requires a for-loop.

Here a solution to print the mean for each size group using a for loop. Let me know if it worked
for(i in unique(df$household_size)){
print(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
As mentioned in a comment, I took away the function part because I don't see the point of having it. But if it's mandatory, you can use lapply, that behaves a bit like a for loop according to me:
lapply(unique(df$household_size), function(i){
return(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
)

How to refer to previous row in data table (R) without using shift function?

I am trying to use data table to make my massive modeling work easier.
I ran into this problem but have not figured out a better way to solve it.
Here's my example:
set.seed(1001)
dt1<-data.table("Country"=c("Algeria","Mongolia"),
"Year"=c(2000:2020),
"var1"=runif(20,10000,45000),
"var2"=0,
"var3"=0,
"var4"=0)
setorder(dt1,Country)
I would like to update var2 by using previous row's values in a specific period
so I tried
dt1[Year>2000&Year<2010,var2:=var1[i]-sum(.SD)[i-1],by=Country,.SDcols=c(var2,var3,var4)]
Obviously this did not work. The problem is var 2 and the sum of other variables in previous row need to be updated simultaneously. So I don't think shift function would do the work.
For now, I am using very cumbersome for loop for this but it works.Here is my forloop
for(i in 1:nrow(dt1)){
if (i <=10){
for (j in 4:6){
if(j==4){
dt1[[i,j]]=dt1[[i,3]]-rowSums(dt1[[i-1,c(4:6)]])
}
else{
dt1[[i,j]]=dt1[[i-1,j-1]] * 0.001
}
}
}
}
Any suggestion will be much appreciated!

Cannot figure out how to use IF statement

I want to create a categorical variable for my DB: I want to create the "Same_Region" group, that includes all the people that live and work in the same Region and a "Diff_Region" for those who don't. I tried to use the IF statement, but I actually don't know how to proper say "if the variable Region of residence and Region of work are the same, return...". It's the very first time I try to approach by my self R, and I feel a lil bit lost.
I tried to put the two variables (Made by 2 letters - f.i. "BO") as Characters and use the "grep" command. But it eventually took to no results.
Then I tried by putting both the variables as factors, and nothing much changed.
----In R-----
extractSamepr <- function(RegionOfRes, RegionOfWo){
if(RegionOfRes== RegionOfWo){
return("SamePr")
}
else {
return("DiffPr")
}
SamePr <- NULL
for (i in 1:nrow(Data.Base)) {
SamePr <- c(SamePr, extractSamepr(Data.Base[i, "RegionOfRes", "RegionOfWo"]))
}

The ifelse way proposed in #deepseefan's comment is a standard way of solving this type of problem.
Here is another one. It uses the fact that FALSE/TRUE are coded as integers 0/1 to create a logical vector based on equality and then add 1 to that vector, giving a vector of 1/2 values. This result is used in the function's final instruction to index a vector with the two possible outcomes.
extractSamepr <- function(DF){
i <- 1 + (DF[["RegionOfRes"]] == DF[["RegionOfWo"]])
c("DiffPr", "SamePr")[i]
}
Data.Base$SamePr <- extractSamepr(Data.Base)

Multiple regressions with loop in loop in R

I want to run the following regressions, the variable which has the problem is EP, is a dummy variable and I must to check different cases, z (lenght=1000) is the threshold variable. Ι want to crate 1000 different variables of EP from z variable and save the coefficients. I use a loop in loop but the results are completely wrong.The code runs properly and does not make an error. The square brackets and parentheses are the code I run. The problem is that there is a huge delay and the results after two hours still running.
I reduced the sample by 99% and again I did not get a result, the code ran without problem .
I do not want anything special, just for each value of z to run a different regression and end up to stored the estimates. I can not understand why take so long. Any idea?
for (k in 1:1000){
z<-u[k]
for (i in 1:length(dS)){
if (dS[i]>=z) {
EP[i]=1
} else {
EP[i]=0
}
fitT <- dynlm(dR ~ L(dR,1)+L(EN)+L(EP)+L(ΚΜ,1)
prob[[k]] <- summary(fitT)$coefficients[1, 2]
}

You don't have a closing } for the i-loop; you also don't have a closing ) for dynlm.
Note, you can really replace your i-loop by
EP <- as.integer(dS >= z)
Next time when asking question, be clear and specific. What do you mean by "I use a loop in loop but the results are completely wrong"? Error message, etc?

store results of for loop in unique objects

Here is a simple loop
for (i in seq(1,30)) {
mdl<-i
}
How do I get 30 mdl rather than just one mdl (which is happening because within the loop, mdli is being replaced by mdli+1 at every iteration. What I want is to have 30 mdl perhaps with names like mdl1, mdl2 ....mdl30
I tried this:
for (i in seq(1,30)) {
mdli<-i
}
But if I type mdl1, it says mdl1 not found whereas typing mdli gives me the value of i=5
Thank you

You can specify your store variable beforhand without determine how many values it shall store. If you want for each value a seperate variable take a look at the paste function.
x<- NULL
for (i in 1:10){
x[i] <- i*2
}
*edit: The comment above is right. This way is not the most efficent one. But I still use it when computation time is not an issue.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Trying to find mean of each column in a data set - r

Related

Mean value for different groups

How to refer to previous row in data table (R) without using shift function?

Cannot figure out how to use IF statement

Multiple regressions with loop in loop in R

store results of for loop in unique objects

Categories

Resources