Median of new data using previous data statistics - math

How can i find combined median if I have percentile of of previous data but not all elements and a new list of data?

Related

How to content data transformation on a almost equal valued variables?

While doing data transformation on different variables I was unable to transform variables which have higher values and have almost the same range of values. I want to know how to transform this kind of data?.
MonthlyRate
Min. : 2094
1st Qu.: 8047
Median :14236
Mean :14313
3rd Qu.:20462
Max. :26999
This is the summary of a variable.
It sounds like you are trying to normalize different variables so that they are on the same unitless scale. R has a built-in scale function
scale(my_dataframe)
which will normalize each column vector so that they have the same range of values, measured in standard deviations from the mean. This only works on numeric vectors, but if your dataframe includes other types of data you can normalize each numeric vector individually
my_dataframe$monthly_rate <- scale(my_dataframe$monthly_rate)
and acheive the same effect.

R - Mean calculation for entire data instead of assigning each column individually

I am a beginner with R and I have a question about simple functions such as mean or standard deviation for a big data set. My data shows monthly returns for hedge funds for the past 30 years and has 1550 columns for all hedge funds. I saw that I can calculate the mean with the mean function for a specific column by referring to the column with the name of my dataset and a $ and the no. of the column. However, I was wondering how I can get the mean for every hedge fund (which is every column) without assigning every single column. Thanks in advance for your help!
We can use colMeans
colMeans(df1, na.rm=TRUE)
where 'df1' is the dataset.
or another option would be to loop through the columns and calculate the mean
vapply(df1, mean, na.rm=TRUE, numeric(1))

Which function in SPSS emulates R summary() function?

I'm switching from R to SPSS for a specific project (I'm not allowed to use SPSS/R integration) and need to summarize quickly a big dataset. In R, it's quite simple, one can use the summary() function and in few seconds obtain the summary of each variable.
I would need to know if there is a function in SPSS that do the same job. If not, how could I achieve it.
For the non-R users summary.default would return labelled values for Min. , 1st Quartile, Median, Mean , 3rd Quartile, Max. for each numeric column and a counts of the 6 most common items and the count of the "(Other)" category if a factor or character variable.
Descriptives comes close.
descriptives var1 var2 var3
/statistics = mean median stddev variance min max .
(I'm not sure about quartiles).
If you have a mixture of continuous and categorical variables, use DESCRIPTIVES or SUMMARIZE for continuous and FREQUENCIES for categorical. You can use the SPSSINC SELECT VARIABLES extension command installed with Statistics to create macros listing variables according to the measurement level and then use the appropriate macro for each command.

box plots in R compaing multiple data vectors over multiple groups

i have stream temp data and I'd like a a figure with box plots comparing daily mean and daily max (2 columns in data set) by site and year (each a column in data set). daily mean and daily max would be next to each other in the plot for each year (chronologically) and then years grouped by site.
i can plot daily mean or daily max individually with the code:
boxplot(data~year+site)
but can i add another data variable to be plotted?

how do you find the median of 2 columns using R?

I am trying to compute the median vector of a data set s with column A1 and B1. The median vector is the median for each observation from both columns.
I tried to do this and it did not work.
median(s[c("A1","B1")])
Is there another way to do it?
The median of two observations is simply the mean. So rowMeans(s[,c("A1","B1")]). Equivalently, apply(s[,c("A1","B1")],1,median)
Another solution:
library(plyr)
colwise(median)(s[c("A1", "B1")])
which has the advantage of returning a data frame.

Resources