Related
I have the following code: model$data
model$data
[[1]]
Category1 Category2 Category3 Category4
3555 1 0 0 0
6447 1 0 0 0
5523 1 0 1 0
7550 1 0 1 0
6330 1 0 1 0
2451 1 0 0 0
4308 1 0 1 0
8917 0 0 0 0
4780 1 0 1 0
6802 1 0 1 0
2021 1 0 0 0
5792 1 0 1 0
5475 1 0 1 0
4198 1 0 0 0
223 1 0 1 0
4811 1 0 1 0
678 1 0 1 0
I am trying to use this formula to get an index of the column names:
sample(colnames(model$data), 1)
But I receive the following error message:
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
Is there a way to avoid that error?
Notice this?
model$data
[[1]]
The [[1]] means that model$data is a list, whose first component is a data frame. To do anything with it, you need to pass model$data[[1]] to your code, not model$data.
sample(colnames(model$data[[1]]), 1)
This seems to be a near-duplicate of Random rows in dataframes in R and should probably be closed as duplicate. But for completeness, adapting that answer to sampling column-indices is trivial:
you don't need to generate a vector of column-names, only their indices. Keep it simple.
sample your col-indices from 1:ncol(df) instead of 1:nrow(df)
then put those column-indices on the RHS of the comma in df[, ...]
df[, sample(ncol(df), 1)]
the 1 is because you apparently want to take a sample of size 1.
one minor complication is that your dataframe is model$data[[1]], since your model$data looks like a list with one element which is a dataframe, rather than a plain dataframe. So first, assign df <- model$data[[1]]
finally, if you really really want the sampled column-name(s) as well as their indices:
samp_col_idxs <- sample(ncol(df), 1)
samp_col_names <- colnames(df) [samp_col_idxs]
How to convert this
1,2,5,6,9
1,2
3,11
into this:
1,1,0,0,1,1,0,0,1,0,0
1,1,0,0,0,0,0,0,0,0,0
0,0,1,0,0,0,0,0,0,0,1
I thought I can read my data by adding na if the index is not exist.
Then, replace each na with zero, and each not na with one.
But I don't know how, and I searched to similar code and I didn't find
You can do:
lapply(z,tabulate,nbins=max(unlist(z)))
[[1]]
[1] 1 1 0 0 1 1 0 0 1 0 0
[[2]]
[1] 1 1 0 0 0 0 0 0 0 0 0
[[3]]
[1] 0 0 1 0 0 0 0 0 0 0 1
where z is a list of vectors:
z <- list(c(1,2,5,6,9),c(1,2),c(3,11))
I'm not sure what your original numbers are stored as, but here's a solution assuming it's a list of vectors:
nums <-list(
c(1,2,5,6,9),
c(1,2),
c(3,11)
)
maxn <- max(unlist(nums))
lapply(nums, function(x) {
binary <- numeric(maxn)
binary[x] <- 1
binary
})
I am a new R user. Currently I am working on a dataset wherein I have to transform the multiple binary columns into single factor column
Here is the example:
current dataset like :
$ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
$ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
$ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
$ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
Property.RealEstate Property.Insurance Property.CarOther Property.Unknown
1 0 0 0
0 1 0 0
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Recoded column should be:
Property
1 Real estate
2 Insurance
3 Real estate
4 Insurance
5 CarOther
6 Unknown
It is basically a reverse of melt.matrix function.
Thank You all for your Precious Inputs. It does work.
But one issue though,
I have some rows which takes value as:
Property.RealEstate Property.Insurance Property.CarOther Property.Unknown
0 0 0 0
I want these to be marked as NA or Null
Would be a help if you suggest on this as well.
Thank You
> mat <- matrix(c(0,1,0,0,0,
+ 1,0,0,0,0,
+ 0,0,0,1,0,
+ 0,0,1,0,0,
+ 0,0,0,0,1), ncol = 5, byrow = TRUE)
> colnames(mat) <- c("Level1","Level2","Level3","Level4","Level5")
> mat
Level1 Level2 Level3 Level4 Level5
[1,] 0 1 0 0 0
[2,] 1 0 0 0 0
[3,] 0 0 0 1 0
[4,] 0 0 1 0 0
[5,] 0 0 0 0 1
Create a new factor based upon the index of each 1 in each row
Use the matrix column names as the labels for each level
NewFactor <- factor(apply(mat, 1, function(x) which(x == 1)),
labels = colnames(mat))
> NewFactor
[1] Level2 Level1 Level4 Level3 Level5
Levels: Level1 Level2 Level3 Level4 Level5
also you can try:
factor(mat%*%(1:ncol(mat)), labels = colnames(mat))
also use Tomas solution - ifounf somewhere in SO
as.factor(colnames(mat)[mat %*% 1:ncol(mat)])
Melt is certainly a solution. I'd suggest using the reshape2 melt as follows:
library(reshape2)
df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
Property.Insurance=c(0,1,0,1,0,0),
Property.CarOther=c(0,0,0,0,1,0),
Property.Unknown=c(0,0,0,0,0,1))
#add id column (presumably you have ids more meaningful than row numbers)
df$row=1:nrow(df)
#melt to "long" format
long=melt(df,id="row")
#only keep 1's
long=long[which(long$value==1),]
#merge in ids for NA entries
long=merge(df[,"row",drop=F],long,all.x=T)
#clean up to match example output
long=long[order(long$row),"variable",drop=F]
names(long)="Property"
long$Property=gsub("Property.","",long$Property,fixed=T)
#results
long
Alternately, you can just do it in the naïve way. I think it's more transparent than any of the other suggestions (including my other suggestion).
df=data.frame(Property.RealEstate=c(0,0,1,0,0,0),
Property.Insurance=c(0,1,0,1,0,0),
Property.CarOther=c(0,0,0,0,1,0),
Property.Unknown=c(0,0,0,0,0,1))
propcols=c("Property.RealEstate", "Property.Insurance", "Property.CarOther", "Property.Unknown")
df$Property=NA
for(colname in propcols)({
coldata=df[,colname]
df$Property[which(coldata==1)]=colname
})
df$Property=gsub("Property.","",df$Property,fixed=T)
Something different:
Get the data:
dat <- data.frame(Property.RealEstate=c(1,0,1,0,0,0),Property.Insurance=c(0,1,0,1,0,0),Property.CarOther=c(0,0,0,0,1,0),Property.Unknown=c(0,0,0,0,0,1))
Reshape it:
names(dat)[row(t(dat))[t(dat)==1]]
#[1] "Property.RealEstate" "Property.Insurance" "Property.RealEstate"
#[4] "Property.Insurance" "Property.CarOther" "Property.Unknown"
If you want it cleaned up, do:
gsub("Property\\.","",names(dat)[row(t(dat))[t(dat)==1]])
#[1] "RealEstate" "Insurance" "RealEstate" "Insurance" "CarOther" "Unknown"
If you prefer a factor output:
factor(row(t(dat))[t(dat)==1],labels=names(dat))
...and cleaned up:
factor(row(t(dat))[t(dat)==1],labels=gsub("Property\\.","",names(dat)) )
I'm trying to create a matrix with 180*12 rows and 12 columns in R. I'm not sure what the specific codes for R to create something like this.
Column 1: 1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,..................0
Column 2: 0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,..................0
Column 3: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,..................0
Ect. with the same pattern until Column12. Can someone help me out? Thanks in advance.
apply(diag(12), 2, rep, each=12)
A shorter example:
apply(diag(3), 2, rep, each=2)
## [,1] [,2] [,3]
## [1,] 1 0 0
## [2,] 1 0 0
## [3,] 0 1 0
## [4,] 0 1 0
## [5,] 0 0 1
## [6,] 0 0 1
Another very similar solution, without an explicit apply:
matrix(rep(diag(12), each=12), ncol=12)
This works because as.vector(diag(N)) is a vector with N 1's, each separated by N 0'. An example with diag(3), each=2, ncol=3 is identical to the example above.
Just for laughs, here is a model.matrix version of #MatthewLundberg's answer:
model.matrix( ~ rep(factor(1:3),each=2) - 1)
a <- rep(factor(1:3),each=2)
model.matrix( ~ a - 1)
a1 a2 a3
1 1 0 0
2 1 0 0
3 0 1 0
4 0 1 0
5 0 0 1
6 0 0 1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
Or all in one line:
model.matrix( ~ rep(factor(1:3),each=2) - 1)
And the class.ind approach from nnet
class.ind(rep(factor(1:3),each=2))
I have a dataframe (df1) like this.
f1 f2 f3 f4 f5
d1 1 0 1 1 1
d2 1 0 0 1 0
d3 0 0 0 1 1
d4 0 1 0 0 1
The d1...d4 column is the rowname, the f1...f5 row is the columnname.
To do sample(df1), I get a new dataframe with count of 1 same as df1. So, the count of 1 is conserved for the whole dataframe but not for each row or each column.
Is it possible to do the randomization row-wise or column-wise?
I want to randomize the df1 column-wise for each column, i.e. the number of 1 in each column remains the same. and each column need to be changed by at least once. For example, I may have a randomized df2 like this: (Noted that the count of 1 in each column remains the same but the count of 1 in each row is different.
f1 f2 f3 f4 f5
d1 1 0 0 0 1
d2 0 1 0 1 1
d3 1 0 0 1 1
d4 0 0 1 1 0
Likewise, I also want to randomize the df1 row-wise for each row, i.e. the no. of 1 in each row remains the same, and each row need to be changed (but the no of changed entries could be different). For example, a randomized df3 could be something like this:
f1 f2 f3 f4 f5
d1 0 1 1 1 1 <- two entries are different
d2 0 0 1 0 1 <- four entries are different
d3 1 0 0 0 1 <- two entries are different
d4 0 0 1 0 1 <- two entries are different
PS. Many thanks for the help from Gavin Simpson, Joris Meys and Chase for the previous answers to my previous question on randomizing two columns.
Given the R data.frame:
> df1
a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0
Shuffle row-wise:
> df2 <- df1[sample(nrow(df1)),]
> df2
a b c
3 0 1 0
4 0 0 0
2 1 0 0
1 1 1 0
By default sample() randomly reorders the elements passed as the first argument. This means that the default size is the size of the passed array. Passing parameter replace=FALSE (the default) to sample(...) ensures that sampling is done without replacement which accomplishes a row wise shuffle.
Shuffle column-wise:
> df3 <- df1[,sample(ncol(df1))]
> df3
c a b
1 0 1 1
2 0 1 0
3 0 0 1
4 0 0 0
This is another way to shuffle the data.frame using package dplyr:
row-wise:
df2 <- slice(df1, sample(1:n()))
or
df2 <- sample_frac(df1, 1L)
column-wise:
df2 <- select(df1, one_of(sample(names(df1))))
Take a look at permatswap() in the vegan package. Here is an example maintaining both row and column totals, but you can relax that and fix only one of the row or column sums.
mat <- matrix(c(1,1,0,0,0,0,0,1,1,0,0,0,1,1,1,0,1,0,1,1), ncol = 5)
set.seed(4)
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
This gives:
R> out$perm[[1]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 1
[2,] 0 1 0 1 0
[3,] 0 0 0 1 1
[4,] 1 0 0 0 1
R> out$perm[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 1 0 1 1
[2,] 0 0 0 1 1
[3,] 1 0 0 1 0
[4,] 0 0 1 0 1
To explain the call:
out <- permatswap(mat, times = 99, burnin = 20000, thin = 500, mtype = "prab")
times is the number of randomised matrices you want, here 99
burnin is the number of swaps made before we start taking random samples. This allows the matrix from which we sample to be quite random before we start taking each of our randomised matrices
thin says only take a random draw every thin swaps
mtype = "prab" says treat the matrix as presence/absence, i.e. binary 0/1 data.
A couple of things to note, this doesn't guarantee that any column or row has been randomised, but if burnin is long enough there should be a good chance of that having happened. Also, you could draw more random matrices than you need and discard ones that don't match all your requirements.
Your requirement to have different numbers of changes per row, also isn't covered here. Again you could sample more matrices than you want and then discard the ones that don't meet this requirement also.
you can also use the randomizeMatrix function in the R package picante
example:
test <- matrix(c(1,1,0,1,0,1,0,0,1,0,0,1,0,1,0,0),nrow=4,ncol=4)
> test
[,1] [,2] [,3] [,4]
[1,] 1 0 1 0
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "frequency",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 0 1 0 1
[2,] 1 0 0 0
[3,] 1 0 1 0
[4,] 1 0 1 0
randomizeMatrix(test,null.model = "richness",iterations = 1000)
[,1] [,2] [,3] [,4]
[1,] 1 0 0 1
[2,] 1 1 0 1
[3,] 0 0 0 0
[4,] 1 0 1 0
>
The option null.model="frequency" maintains column sums and richness maintains row sums.
Though mainly used for randomizing species presence absence datasets in community ecology it works well here.
This function has other null model options as well, check out following link for more details (page 36) of the picante documentation
Of course you can sample each row:
sapply (1:4, function (row) df1[row,]<<-sample(df1[row,]))
will shuffle the rows itself, so the number of 1's in each row doesn't change. Small changes and it also works great with columns, but this is a exercise for the reader :-P
If the goal is to randomly shuffle each column, some of the above answers don't work since the columns are shuffled jointly (this preserves inter-column correlations). Others require installing a package. Yet a one-liner exist:
df2 = lapply(df1, function(x) { sample(x) })
You can also "sample" the same number of items in your data frame with something like this:
nr<-dim(M)[1]
random_M = M[sample.int(nr),]
Random Samples and Permutations ina dataframe
If it is in matrix form convert into data.frame
use the sample function from the base package
indexes = sample(1:nrow(df1), size=1*nrow(df1))
Random Samples and Permutations
Here is a data.table option using .N with sample like this:
library(data.table)
setDT(df)
df[sample(.N)]
#> a b c
#> 1: 0 1 0
#> 2: 1 1 0
#> 3: 1 0 0
#> 4: 0 0 0
Created on 2023-01-28 with reprex v2.0.2
Data:
df <- read.table(text = " a b c
1 1 1 0
2 1 0 0
3 0 1 0
4 0 0 0", header = TRUE)