how do you find the median of 2 columns using R? - r

I am trying to compute the median vector of a data set s with column A1 and B1. The median vector is the median for each observation from both columns.
I tried to do this and it did not work.
median(s[c("A1","B1")])
Is there another way to do it?

The median of two observations is simply the mean. So rowMeans(s[,c("A1","B1")]). Equivalently, apply(s[,c("A1","B1")],1,median)

Another solution:
library(plyr)
colwise(median)(s[c("A1", "B1")])
which has the advantage of returning a data frame.

Related

Median of new data using previous data statistics

How can i find combined median if I have percentile of of previous data but not all elements and a new list of data?

How to content data transformation on a almost equal valued variables?

While doing data transformation on different variables I was unable to transform variables which have higher values and have almost the same range of values. I want to know how to transform this kind of data?.
MonthlyRate
Min. : 2094
1st Qu.: 8047
Median :14236
Mean :14313
3rd Qu.:20462
Max. :26999
This is the summary of a variable.
It sounds like you are trying to normalize different variables so that they are on the same unitless scale. R has a built-in scale function
scale(my_dataframe)
which will normalize each column vector so that they have the same range of values, measured in standard deviations from the mean. This only works on numeric vectors, but if your dataframe includes other types of data you can normalize each numeric vector individually
my_dataframe$monthly_rate <- scale(my_dataframe$monthly_rate)
and acheive the same effect.

Most efficient way to replace NAs in a data frame based on a subset of other row factors (using median as an estimate) in R

I would like to estimate the values of a numeric variable in a data frame based on the median of the same variable given other factors. I would then like to replace the NA's for the numeric Variable with these estimates.
I have a data frame like this:
Fac1 Fac2 Var1
A a 20
A b 30
B a 5
B b 10
.
.
.
I have used the agregate function to find these medians for each combination of factors:
A a = 22
A b = 28
B a = 12
B b = 8
So any NA's in Var1 would be replaced with the corresponding median based on the combinations of the factors.
I understand that this may be done by replacing the missing values for each subset of the data individually, however that would become tedious quickly given more than two factors.
I was wondering if there are some more efficient ways to get this result.
You haven't provided a sample data but based on your question, I think this should work.
As #Roland mentioned no need to calculate median separately.
Assuming your dataframe as df. For every group (here Fac1 and Fac2) we calculate the median removing the NA values. Further we select only the indices which has NA values and replace it by its groups median value.
df$Var1[is.na(df$Var1)] <- ave(df$Var1,df$Fac1, df$Fac2, FUN=function(x)
median(x, na.rm = T)[is.na(df$Var1)]
UPDATE
On request of OP adding some information about ave function.
The first parameter in ave is the one on which you want to do any operation. So here the first parameter is Var1 for which we want to find the median. All the other parameters following that are the grouping variables. It could be any number. Here the grouping variables we have are Fac1 and Fac2. Now comes the function which we want to apply on our first parameter (Var1) for every group (Fac1 and Fac2) which we have defined in the grouping variable. So here for every unique group we are finding the median for that group.

R - Mean calculation for entire data instead of assigning each column individually

I am a beginner with R and I have a question about simple functions such as mean or standard deviation for a big data set. My data shows monthly returns for hedge funds for the past 30 years and has 1550 columns for all hedge funds. I saw that I can calculate the mean with the mean function for a specific column by referring to the column with the name of my dataset and a $ and the no. of the column. However, I was wondering how I can get the mean for every hedge fund (which is every column) without assigning every single column. Thanks in advance for your help!
We can use colMeans
colMeans(df1, na.rm=TRUE)
where 'df1' is the dataset.
or another option would be to loop through the columns and calculate the mean
vapply(df1, mean, na.rm=TRUE, numeric(1))

Which function in SPSS emulates R summary() function?

I'm switching from R to SPSS for a specific project (I'm not allowed to use SPSS/R integration) and need to summarize quickly a big dataset. In R, it's quite simple, one can use the summary() function and in few seconds obtain the summary of each variable.
I would need to know if there is a function in SPSS that do the same job. If not, how could I achieve it.
For the non-R users summary.default would return labelled values for Min. , 1st Quartile, Median, Mean , 3rd Quartile, Max. for each numeric column and a counts of the 6 most common items and the count of the "(Other)" category if a factor or character variable.
Descriptives comes close.
descriptives var1 var2 var3
/statistics = mean median stddev variance min max .
(I'm not sure about quartiles).
If you have a mixture of continuous and categorical variables, use DESCRIPTIVES or SUMMARIZE for continuous and FREQUENCIES for categorical. You can use the SPSSINC SELECT VARIABLES extension command installed with Statistics to create macros listing variables according to the measurement level and then use the appropriate macro for each command.

Resources