How to save data in a dataframe? - r

Sorry, really beginner question: I want to generate a data frame with random data. I want my data frame to be 10 rows by 20 columns, where each row contains data from a random sample generated by rnorm. How do I do this?

Producing a matrix may be easier, but this can be converted to a dataframe:
rownum <- 10
colnum <- 20
yourdf <- as.data.frame(matrix(rnorm(rownum * colnum), nrow=rownum))

Related

Get the standar deviation each n rows with dplyr

I'm trying to get the ST of the last 20 values for each row in a data.frame. The procedure would be something like this in excel, but im trying to do it in r and with dplyr.
enter image description here
Your question is thin and does not provide a data source. But here is an example of what you want. This is a dataframe with random numbers with 50 rows. There are two columns of numbers.
The last twenty rows (this is what you want) from that dataframe are selected and then these twenty rows are used within a loop. A math function is applied to each of these 20 rows.
set.seed(14)
n <- sample(100, 50)
b <- sample(200, 50)
cdf <- data.frame(n, b) # new data frame with random two columns of random numbers.
nrow(cdf)
last <- nrow(cdf) - 20 # Selecting the last 20 rows of data from the data frame
for (i in (last:nrow(cdf))) { # loop to apply math to last 20 rows of data.
print(mean(cdf$b[i]))
print(i)
}

how to create subsets of data. frame in R?

I have two data frames one with 94 rows and 167 columns (df_1) and the other one with 94 rows and 1 column (df_2) and I would like to do 167 different data frames with each column of the first data frame and the same column of the second data frame, I have tried with a for loop like the next
for (i in seq_len(ncol(df_1))){
df_[[i]] <- data.frame(df_1[sort(rownames(df_1)),i,df_2[sort(rownames(df_2)),])
}
But it does not work, can someone help me?
I think to join two df if they have similar column name use the below code
library(gtools)
df3<- smartbind(df1,df2) ####df1 with 167 column and df2 with 1 column
this will give you a single data frame and to create various data frames use the answer used. in the below thread:
How to loop through the columns in an R data frame and create a new data frame using the column name in each iteration?

All unique samples from a data set in R

Here's my data
z<- c("COP","CHK","BP","BHI","CVX")
if i do
sample(z,3,replace=FALSE)
This will give me 1 unique random sample of 3 from my data set.
I want to find all possible unique samples of 3 from my data set. In this case there will be 10 outcomes.
But how do I write a R code for it?
Please help
We can use combn to get the unique combinations
t(combn(z, 3))
If we need to sample it
t(combn(sample(z), 3))

Transpose/Reshape Data in R

I have a data set in a wide format, consisting of two rows, one with the variable names and one with the corresponding values. The variables represent characteristics of individuals from a sample of size 1000. For instance I have 1000 variables regarding the size of each individual, then 1000 variables with the height, then 1000 variables with the weight etc. Now I would like to run simple regressions (say weight on calorie consumption), the only way I can think of doing this is to declare a vector that contains the 1000 observations of each variable, say for instance:
regressor1=c(mydata$height0, mydata$height1, mydata$height2, mydata$height3, ... mydata$height1000)
But given that I have a few dozen variables and each containing 1000 observations this will become cumbersome. Is there a way to do this with a loop?
I have also thought a about the reshape options of R, but this again will put me in a position where I have to type 1000 variables a few dozen times.
Thank you for your help.
Here is how I would go about your issue. t() will transpose the data for you from many columns to many rows.
Note: t() can be used with a matrix rather than a data frame, I simply coerced to data frame to show my example will work with your data.
# Many columns, 2 rows
x <- as.data.frame(matrix(nrow=2,ncol=1000,seq(1:2000)))
#2 Columns, many rows
t(x)
Based on your comments you are looking to generate vectors.
If you have transposed:
regressor1 <- x[,1]
regressor2 <- x[,2]
If you have not transposed:
regressor1 <- x[1,]
regressor2 <- x[2,]

dplyr to reference two data frame (summarize function) in R

I created a data frame from a data set with unique marketing sources. Let's say I have 20 unique marketing sources in this new data frame D1. I want to add another column that has the count of times this marketing source was in my original data frame. I'm trying to use the dplyr package but not sure how to reference more than one data frame.
original data has 16000 observations
new data frame has 20 observations as there are only 20 unique marketing sources.
How to use summarize in dplyr to reference two data frames?
My objective is to find the percentage of marketing sources.
My original data frame has two columns: NAME, MARKETING_SOURCE
This data frame has 16,000 observations and 20 distinct marketing sources (email, event, sales call, etc)
I created a new data frame with only the unique MARKETING_SOURCES and called that data frame D1
In my new data frame, I want to add another column that has the number of times each marketing source appeared in the original data frame.
My new Data frame should have two columns: MARKETING_SOURCE, COUNT
I don't know if you need to use dplyr for something like this...
First let's create some data.frames:
df1 <- data.frame(source = letters[sample(1:26, 400, replace = T)])
df2 <- data.frame(source = letters, count = NA)
Then we can use table() to get the frequencies:
counts <- table(df1$source)
df2$count <- counts
head(df2)
source count
1 a 10
2 b 22
3 c 12
4 d 17
5 e 18
6 f 18
UPDATE:
In response to #MrFlick's wise comment below, you can use take the names() of the output from table() to ensure order is preserved:
df2$source <- names(counts)
Certainly not quite as elegant and would be even less elegant if df2 had other columns. But sufficient for the simple case presented above.

Resources