Getting "NA" when I run a standard deviation - r

Quick question. I read my csv file into the variable data. It has a column label var, which has numerical values.
When I run the command
sd(data$var)
I get
[1] NA
instead of my standard deviation.
Could you please help me figure out what I am doing wrong?

Try sd(data$var, na.rm=TRUE) and then any NAs in the column var will be ignored. Will also pay to check out your data to make sure the NA's should be NA's and there haven't been read in errors, commands like head(data), tail(data), and str(data) should help with that.

I've made the mistake a time or two of reusing variable names in dplyr strings which has caused issues.
mtcars %>%
group_by(gear) %>%
mutate(ave = mean(hp)) %>%
ungroup() %>%
group_by(cyl) %>%
summarise(med = median(ave),
ave = mean(ave), # should've named this variable something different
sd = sd(ave)) # this is the sd of my newly created variable "ave", not the original one.

You probably have missing values in var, or the column is not numeric, or there's only one row.
Try removing missing values which will help for the first case:
sd(dat$var, na.rm = TRUE)
If that doesn't work, check that
class(dat$var)
is "numeric" (the second case) and that
nrow(dat)
is greater than 1 (the third case).
Finally, data is a function in R so best to use a different name, which I've done here.

There may be Inf or -Inf as values in the data.
Try
is.finite(data)
or
min(data, na.rm = TRUE)
max(data, na.rm = TRUE)
to check if that is indeed the case.

Related

why is na.rm not working for this case only?

I have been working with weather data that does contain some NA values. Usually to sum up values for one day I use colSums like the following.colSums(df,na.rm = T). This ofcourse never created any issue.
However using the same now for a different analysis is returning the following error.
colSums(I_2011,na.rm=T)
Error in colSums(I_2011, na.rm = T) : 'x' must be numeric
I don't understand why. the only difference is I_2011 is imported from a csv
I_2011<-read.csv("2011_IMD.csv",check.names = FALSE)
does the latter require something different?
lost on what to do next. I don't need to remove the columns containing NA. only to disregard them while doing colsums .
tried `
I_2011 %>%
mutate(avg= rowSums(., na.rm=TRUE)) %>%
bind_cols(I_2011[setdiff(names(I_2011), names(.))] , .)
returns the same error.

Mean from multiple Columns (Error Message)

I'm still fairly new to R and have been practicing a bit lately.
I have the following (simplified) Data Set:
So it's basically a Questionnaire asking random People which of these Cities they prefer from 1-7.
I would like to find out which city has the highest average preference.
So what I first did was: mean(dataset[, 3], na.rm=TRUE) to find out the average preference for Prag. That worked!
Now I wanted to create a table which shows me every mean of each city.
My thought was: table(mean(dataset[3:8], na.rm=TRUE))
However, all I get is the following Error Message:
In mean.default(umfrage[37:38], na.rm = TRUE) :
argument is not numeric or logical: returning NA**
Does someone know what that means and how I could achieve the result?
I figured it out.
I simply used this function: lapply(dataset[3:8], mean, na.rm = TRUE)
You could also use dplyr and tidyr package (both packages are integrated in the tidyverse package):
library(tidyverse)
result <- dataset %>%
gather("city", "value", Pref_Prague:Pref_London) %>%
group_by(city) %>%
summarise(mean = mean(value))

How to find the mean and standard deviation of rows in dataframes with some having NAs and others not

I'm trying to find the mean and standard deviation for C and P separately.
I have toyed around with this so far:
C <- rowMeans(dplyr::select(total, C1:41), na.rm=TRUE)
This didn't yield what I needed it to.
Then I thought about just using the summary, but again it didn't give me what I needed.
So then I thought of using na.omit:
Of course though, this would take out all of the data since I have NAs throughout the dataframe.
What am I missing here? Is this a matter of aggregating my data into certain groups?
I know describeby could force these descriptives, but again I'm not sure how to do that.
So, I think the angle I want to take is to order these, then aggregate and find totals, and then find the descriptives using describeby in order to avoid NAs. I'm stuck though. Where am I going wrong?
Try using this :
library(dplyr)
total %>%
#Select only columns that have S in their name
#i.e SP and SC
select(starts_with('S')) %>%
#Get the data in long format, remove NA values
tidyr::pivot_longer(cols = everything(), values_drop_na = TRUE) %>%
#Create a group for each participant
group_by(grp = c('Participant1', 'Participant2')[grepl('C\\d+', name) + 1]) %>%
#Take mean and standard deviation for each group
summarise(mean = mean(value), sd = sd(value))

Repition in for loop stops unexpectedly [duplicate]

Quick question. I read my csv file into the variable data. It has a column label var, which has numerical values.
When I run the command
sd(data$var)
I get
[1] NA
instead of my standard deviation.
Could you please help me figure out what I am doing wrong?
Try sd(data$var, na.rm=TRUE) and then any NAs in the column var will be ignored. Will also pay to check out your data to make sure the NA's should be NA's and there haven't been read in errors, commands like head(data), tail(data), and str(data) should help with that.
I've made the mistake a time or two of reusing variable names in dplyr strings which has caused issues.
mtcars %>%
group_by(gear) %>%
mutate(ave = mean(hp)) %>%
ungroup() %>%
group_by(cyl) %>%
summarise(med = median(ave),
ave = mean(ave), # should've named this variable something different
sd = sd(ave)) # this is the sd of my newly created variable "ave", not the original one.
You probably have missing values in var, or the column is not numeric, or there's only one row.
Try removing missing values which will help for the first case:
sd(dat$var, na.rm = TRUE)
If that doesn't work, check that
class(dat$var)
is "numeric" (the second case) and that
nrow(dat)
is greater than 1 (the third case).
Finally, data is a function in R so best to use a different name, which I've done here.
There may be Inf or -Inf as values in the data.
Try
is.finite(data)
or
min(data, na.rm = TRUE)
max(data, na.rm = TRUE)
to check if that is indeed the case.

Subsets in R studio (basic questions)?

I am terrible with R and I am trying to figure out subsets. I have entered the data file into R studio via:
> Vehicle_Data <-read.table("VehicleData.txt.txt", header=T,sep="\t",quote="")
> attach(Vehicle_Data)
I'm confused about subsets. One of the columns in my data is Type which includes a variety of vehicle types. I need to narrow down Car within the type column so I can calculate the mean MPG value of the cars only.
Here's what I have tried:
> TypeCar<-subset(Vehicle_Data, Type=="Car")
I think this worked to subset the data, but I'm not sure. Also I have no idea how to calculate the mean MPG from the subset?
The code for subsetting appears to be fine. To calculate the mean, you need to use the mean() function in this way:
mean_mpg <- mean(TypeCar$MPG, na.rm = TRUE)
This code will also take care of any NA values present in your data
You can use tidyverse perform data transformations such as subsetting (filtering)
Vehicle_Data %>%
filter(Type=="Car")
You can also calculate the mean MPG per Type like so:
Vehicle_Data %>%
group_by(Type) %>%
summarise(mean.MPG=mean(MPG, na.rm = TRUE))
If you'd like to calculate the mean of an existing subset of data (i.e. TypeCar), you can just run mean(TypeCar$MPG, na.rm = TRUE)

Resources