I have an 11 x 8 data frame of numeric values in R that I want to find the standard deviation of. However, I cannot take the standard deviation of a matrix (use the sd() function), only the columns. But I need every data value used. How do I make this data frame into one column so that all values are used when finding the standard deviation? Hope this makes sense.
#generate data
df <- data.frame(matrix(rbinom(8*11, 1, .5), ncol=8))
#get sd
sd(unlist(df))
edit: just saw the comment where user fra got there first
Related
I have the following data frame in R:
df <- data.frame(time=c("10:01","10:05","10:11","10:21"),
power=c(30,32,35,36))
Problem: I want to calculate the energy consumption, so I need the sum of the time differences multiplied by the power. But every row has one timestamp, meaning I need to do subtraction between two different rows. And that is the part I cannot figure out. I guess I would need some kind of function but I couldn't find online hints.
Example: It has to subtract row2$time from row1$time, and then multiply it to row1$power.
As said, I do not know how to implement the step in one call, I am confused about the subtraction part since it takes values from different rows.
Expected output: E=662
Try this:
tmp = strptime(df$time, format="%H:%M")
df$interval = c(as.numeric(diff(tmp)), NA)
sum(df$interval*df$power, na.rm=TRUE)
I got 662 back.
I have dataframe which has 253 rows(locations on a chromosome in Mbps) and 1 column (Allele score at each location). I need to produce a dataframe which contains the mean of the allele score at every 0.5 Mbps on the chromosome. Please help with R code that can do this. thanks.
The picture in this case is adequate to construct an answer but not adequate to support testing. You should learn to post data in a form that doesn't require re-entry by hand. (That's why you are accumulating negative votes.)
The basic R strategy would be to use cut to create a grouping variable and then use a loop construct to accumulate and apply the mean function. Presumably this is in a dataframe which I will assume is named something specific like my_alleles:
tapply( my_alleles$Allele_score, # act on this vector
# in groups defined by this factor
cut(my_alleles$Location,
breaks=seq(0, max(my_alleles$Location), by=0.5)
),
# with this function
FUN=mean)
I am not used to R, so to practice I am trying to do everything that I used to do on SPSS on R.
In my dataset each row is a case. The columns are survey questions (1 per question).
Say I have columns "A1" up to "A6", "B1" to "B6" and so on
I just finished calculating the mean for each person on A1 to A6
data$meandata <- rowMeans(subset(data, select=c(A1:A6), na.rm=TRUE))
How do I calculate the standard deviation of meandata ?
Hey the easiest way to do this is with the apply() function.
Assume you have 25 rows of data and 6 columns labeled A1 through A6.
data <- data.frame(A1=rnorm(25,50,4),A2=rnorm(25,50,4),A3=rnorm(25,50,4),
A4=rnorm(25,50,4),A5=rnorm(25,50,4),A6=rnorm(25,50,4))
You can use the apply function to find the standard deviation of each row columns 1 through 6 with the code below. The first argument is your data object. The second argument is an integer specifying either 1 for rows or 2 for columns (This is the direction the function will be applied to the data frame). The final argument is the function you wish to apply to your data frame (such as mean or standard deviation (sd) in this case. See the code below.
apply(data[,1:6],1,sd)
Indexing can be used to limit the number of rows or columns of data passed to the apply function. This is done by entering a vector of numbers for either the rows or columns you are interested in within brackets after your data object.
data[c(row.vector),c(column.vector)]
Say you only want to know the sd of the first 3 columns.
apply(data[,1:3],1,sd)
Now lets see the sd of columns 4 through 6 and rows 1 through 10
apply(data[1:10,4:6],1,sd)
Just for good measure lets find the sd of each column
apply(data,2,sd)
Notice that the sd is close to 4, which, is what I specified when I generated the pseudo-random data for columns A1 through A6.
Hope this helps
My data frame has a first column of factors, and all the other columns are numeric.
Origin spectrum_740.0 spectrum_741.0 spectrum_742.0 etc....
1 Warthog 0.6516295 0.6520196 0.6523843
2 Tiger 0.4184067 0.4183569 0.4183805
3 Sperm whale 0.9028763 0.9031688 0.9034069
I would like to convert the data frame into two variables, a vector (the first column) and a matrix (all the numeric columns), so that I can do calculations on the matrix, such as applying msc from the pls package. Basically, I want the data frame to be like the gasoline data set from pls, which has one variable as a numeric vector, and a second variable called NIR as a matrix with 401 columns.
Alternatively, if you have any suggestions for applying calculations to the numeric data while keeping the Origin column connected, that would work too, but all the examples I have seen use gasoline or similarly formatted data frames to do the calculations on the NIR matrix.
Thank you!
M = as.matrix(df[,-1])
row.names(M) = df[,1]
M
spectrum_740.0 spectrum_741.0 spectrum_742.0
Warthog 0.6516295 0.6520196 0.6523843
Tiger 0.4184067 0.4183569 0.4183805
Sperm_whale 0.9028763 0.9031688 0.9034069
I have a data set in a wide format, consisting of two rows, one with the variable names and one with the corresponding values. The variables represent characteristics of individuals from a sample of size 1000. For instance I have 1000 variables regarding the size of each individual, then 1000 variables with the height, then 1000 variables with the weight etc. Now I would like to run simple regressions (say weight on calorie consumption), the only way I can think of doing this is to declare a vector that contains the 1000 observations of each variable, say for instance:
regressor1=c(mydata$height0, mydata$height1, mydata$height2, mydata$height3, ... mydata$height1000)
But given that I have a few dozen variables and each containing 1000 observations this will become cumbersome. Is there a way to do this with a loop?
I have also thought a about the reshape options of R, but this again will put me in a position where I have to type 1000 variables a few dozen times.
Thank you for your help.
Here is how I would go about your issue. t() will transpose the data for you from many columns to many rows.
Note: t() can be used with a matrix rather than a data frame, I simply coerced to data frame to show my example will work with your data.
# Many columns, 2 rows
x <- as.data.frame(matrix(nrow=2,ncol=1000,seq(1:2000)))
#2 Columns, many rows
t(x)
Based on your comments you are looking to generate vectors.
If you have transposed:
regressor1 <- x[,1]
regressor2 <- x[,2]
If you have not transposed:
regressor1 <- x[1,]
regressor2 <- x[2,]