Which function in SPSS emulates R summary() function? - r

I'm switching from R to SPSS for a specific project (I'm not allowed to use SPSS/R integration) and need to summarize quickly a big dataset. In R, it's quite simple, one can use the summary() function and in few seconds obtain the summary of each variable.
I would need to know if there is a function in SPSS that do the same job. If not, how could I achieve it.
For the non-R users summary.default would return labelled values for Min. , 1st Quartile, Median, Mean , 3rd Quartile, Max. for each numeric column and a counts of the 6 most common items and the count of the "(Other)" category if a factor or character variable.

Descriptives comes close.
descriptives var1 var2 var3
/statistics = mean median stddev variance min max .
(I'm not sure about quartiles).

If you have a mixture of continuous and categorical variables, use DESCRIPTIVES or SUMMARIZE for continuous and FREQUENCIES for categorical. You can use the SPSSINC SELECT VARIABLES extension command installed with Statistics to create macros listing variables according to the measurement level and then use the appropriate macro for each command.

Related

How to content data transformation on a almost equal valued variables?

While doing data transformation on different variables I was unable to transform variables which have higher values and have almost the same range of values. I want to know how to transform this kind of data?.
MonthlyRate
Min. : 2094
1st Qu.: 8047
Median :14236
Mean :14313
3rd Qu.:20462
Max. :26999
This is the summary of a variable.
It sounds like you are trying to normalize different variables so that they are on the same unitless scale. R has a built-in scale function
scale(my_dataframe)
which will normalize each column vector so that they have the same range of values, measured in standard deviations from the mean. This only works on numeric vectors, but if your dataframe includes other types of data you can normalize each numeric vector individually
my_dataframe$monthly_rate <- scale(my_dataframe$monthly_rate)
and acheive the same effect.

How to calculate column mean at intervals of row values in R?

I have dataframe which has 253 rows(locations on a chromosome in Mbps) and 1 column (Allele score at each location). I need to produce a dataframe which contains the mean of the allele score at every 0.5 Mbps on the chromosome. Please help with R code that can do this. thanks.
The picture in this case is adequate to construct an answer but not adequate to support testing. You should learn to post data in a form that doesn't require re-entry by hand. (That's why you are accumulating negative votes.)
The basic R strategy would be to use cut to create a grouping variable and then use a loop construct to accumulate and apply the mean function. Presumably this is in a dataframe which I will assume is named something specific like my_alleles:
tapply( my_alleles$Allele_score, # act on this vector
# in groups defined by this factor
cut(my_alleles$Location,
breaks=seq(0, max(my_alleles$Location), by=0.5)
),
# with this function
FUN=mean)

Calculating the proportions of Yes or No responses in R

I am new to R and really trying to wrap my head around everything (even taking online course--which so far has not helped at all).
What I started with is a large data frame containing 97 variables pertaining to compliance with regulations.
I have created multiple dataframes based on the various geographic locations (there is probably an easier way to do it).
In each of these dataframes, I have 7 variables I would like to find the mean of "Yes" and "No" responses.
I first tried:
summary(urban$vio_bag)
Length Class Mode
398 character character
However, this just tells me nothing useful except that I have 398 responses.
So I put this into a table:
urbanbag<-table(urban$vio_bag)
This at least provided me with the number of Yes and No responses
Var1 Freq
1 No 365
2 Yes 30
So I then converted to a data.frame:
urbanbag = as.data.frame(urbanbag)
Then viewed it:
summary(urbanbag)
Var1 Freq
No :1 Min. : 30.0
Yes:1 1st Qu.:113.8
Median :197.5
Mean :197.5
3rd Qu.:281.2
Max. :365.0
And the output still definitely did not help.. much more useless actually.
I am not building these Matrices in R. It is a table imported from excel.
I am just so lost and frustrated having spent days trying to figure out something that seems so elementary and googling help which did not work out.
Is there a way to actually do this?
We can use prop.table to get the proportion
v1 <- prop.table(table(urban$vio_bag))
then use barplot to plot it
barplot(v1)
Try with dplyr's n() (perfomrs counts) within sumarisse()
library(dplyr)
data %>% group_by(yes_no_column) %>% summarise(my_counts = n())
This will give you the counts you're looking for. Adjust the group_by() variables as needed -multiple variables can be used at the time for grouping purposes. Just like with n(), a function such as mean and sd can be passed to summarise. If you want to make a column out of each calculated metric, use mutate()
Oscar.
prop.table is a useful way of doing this. You can also solve this using mean:
mean(urban$vio_bag == "Yes")
mean(urban$vio_bag == "No")

R - Mean calculation for entire data instead of assigning each column individually

I am a beginner with R and I have a question about simple functions such as mean or standard deviation for a big data set. My data shows monthly returns for hedge funds for the past 30 years and has 1550 columns for all hedge funds. I saw that I can calculate the mean with the mean function for a specific column by referring to the column with the name of my dataset and a $ and the no. of the column. However, I was wondering how I can get the mean for every hedge fund (which is every column) without assigning every single column. Thanks in advance for your help!
We can use colMeans
colMeans(df1, na.rm=TRUE)
where 'df1' is the dataset.
or another option would be to loop through the columns and calculate the mean
vapply(df1, mean, na.rm=TRUE, numeric(1))

Finding percentile of a particular input in R

I have a dataset column which contains values. When a new input is given, I want to check this column and finding the percentile of that input value in that column.
I tried with quantile function. But the quantile function gives the values of 25th,50th percentile and so on. But I want the reverse of it. I want the percentile of a given value.
The following is my reproducible example,
data <- seq(90,100,length.out=1000)
input <- 97
My output should be the percentile of 97 in the data column. Is this possible to do?
Thanks
You may also use a somewhat more statistical version with an empirical cumulative distribution function:
ecdf(data)(input)
or
F <- ecdf(data)
F(input)
This approach also allows for vectorization over input.
I think you want to count the fraction of the data that are (is?) less than the input value:
mean(input>data)
## [1] 0.7

Resources