R predict missing values [closed] - r

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How should I predict missing values NA based on other values in R? Mean value is not enough.
All values are dependable - columns values are tree scope rate, rows are three height in meters.
My excel file is here.
Is there any possible way to do that? I've been trying with predict function with no success.

There are a number of ways to go about this but here is one. I also tried using it on your dataset but it's either too small, has too many linear combinations or something else because it's not converging.
Amelia - http://fastml.com/impute-missing-values-with-amelia/
data(mtcars)
mtcars1<-mtcars[rep(row.names(mtcars),10),] #increasing dataset
#inserting NAs into dataset
insert_nas <- function(x) {
len <- length(x)
n <- sample(1:floor(0.2*len), 1) #randomly choosing # of missing obs
i <- sample(1:len, n) #choosing which to make missing
x[i] <- NA
x
}
mtcars1 <- sapply(mtcars1, insert_nas)
ords = c( 'cyl','hp','vs','am','gear','carb' ) #integers - your dataset has no integers so don't specify this
#idvars = c( 'these', 'will', 'be', 'ignored' )
#noms = c( 'some', 'nominal', 'columns' ) #categorical
a.out = amelia( mtcars1, ords = ords)
a.out$imputations[[1]]
#you can also ensemble your imputations if you'd like. Here we ensemble 3 of the 5 returned imputations
final_data<-as.data.frame(sapply(colnames(a.out$imputations[[1]]),function(i)
rowMeans(cbind(a.out$imputations[[1]][,i],a.out$imputations[[2]][,i],a.out$imputations[[3]][,i]))))

Related

Dice Probability in R script [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
Roll five six-sided dice. Write a script in R to calculate the probability of getting between 15 and 20 as the total sum of your roll. Exact solutions are preferred.
dice <- expand.grid(1:6, 1:6, 1:6, 1:6, 1:6)
dice.sums <- rowSums(dice)
mean(15 <= dice.sums & dice.sums <=20)
[1] 0.5570988
This is the code that I have, which the answer happens to be 0.5570988. Is there any other way to write it in one line of code? Or condense it? Any thoughts are welcome.
From this answer, which references this answer:
dDice <- Vectorize(function(k, m, n) {
# returns the probability of n m-sided dice summing to k
s <- 0:(floor((k - n)/m))
return(sum((-1)^(s)*choose(n, s)*choose(k - s*m - 1, n - 1))/m^n)
}, "k")
sum(dDice(15:20, 6, 5))
#> [1] 0.5570988
Note that I did not take care in the order in which I added the terms of the alternating sum, so the function may need to be modified to return accurate probabilities for larger input values.

Splitting one big dataframe into multiple CSV.files [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Firstly, I have a big data.frame which has 104 rows and 12 columns, I would like to split it up to 13 rows of 8 rows each with the 12 columns.
I am trying to make a code robust enough to not care how many rows there are but simple make a new data.frame every 8 rows.
Also, is it possible after this point to make a code which loops through the 13 data.frames for some calculations?
Here is a way using data.table.split
library(data.table)
#sample data
set.seed(123)
AA <- data.frame( data = rnorm(104) )
#set number of rows to split on
chunksize = 8
#split on create rowid's
l <- split( setDT(AA)[, rowID := (.I-1) %/% chunksize][], by = "rowID")
#names of the list will become the names of the data.frames
names(l) <- paste0( "df", names(l) )
#write the elements of the list to the global environment, using their names
list2env( l, envir = globalenv() )

R: sort rows, query them and add results as colum [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an R dataframe with the dimension 32 x 11. For each row I would like to determine the highest value, the second highest, and the third highest value and add these values as extra colums to the initial dataframe (32 x 14). Many thanks in advance!
library(car)
data(mtcars)
mtcars
First, create a function to get the nth highest value for a vector. Then, create a copy of the dataframe, since the second highest value may change as you add more columns. Then apply your function using apply and 1 to operate row-wise. I'm not sure what would happen if there are NAs in the data. I haven't tested it...
Something like this...
nth_highest <- function(x, n)sort(x, decreasing=TRUE)[n]
tmp <- mtcars
mtcars$highest <- apply(tmp, 1, function(x)nth_highest(x,1))
mtcars$second_highest <- apply(tmp, 1, function(x)nth_highest(x,2))
mtcars$third_highest <- apply(tmp, 1, function(x)nth_highest(x,3))
rm(tmp)

Randomly generate numbers based on given probabilities R [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need help creating a for loop to fill in a 5X5 table using R. Each row will be one observation without replacement. The number range is 1:75, and respectively I have probabilities for each of these numbers. So how would I go about creating a random number generating code that takes into account the specific probability for each number?
Here is some sample data:
A <- seq_len(75)
B <- rpois(75, 3)
B <- B / sum(B)
So now B is a probability vector for each element in A.
To pull 25 samples, simply use sample(A, size = 25, replace = FALSE, prob = B). Fill the matrix as usual MAT <- matrix(sample(A, size = 25, replace = FALSE, prob = B), nrow = 5).

computing the dot product between all column pairs in a data frame [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have an R data frame which columns are logical variables.
I need to make some kind of dot product between all possible pairs of columns.
This arise from text corpus analysis, where the data frame indicates which terms (rows) are present in which documents (columns). There are common, fast solutions for the case where one wishes to compute distances with each possible possible pairs of columns, using daisy from the cluster package or cosine from the lsa package.
I would however need to use some kind of dot product between all pairs of columns instead : the goal is to count how many words are simultaneously present in both documents been compared (and this, for each pair).
Let's use this example:
df <- data.frame(x1 = c(T, T, F), x2 = c(F, F, F), x3 = c(T, F, T))
I would turn the data.frame into a matrix then compute the crossproduct:
crossprod(data.matrix(df))
# x1 x2 x3
# x1 2 0 1
# x2 0 0 0
# x3 1 0 2

Resources