Find range in which each vector element falls in [duplicate] - r

This question already has answers here:
Cut by Defined Interval
(2 answers)
Closed 4 years ago.
I have a list of random numbers.
x=sample(1:1000, 3)
Is there a simple way to get a list of range values in which each element falls in?
id=seq(1, 1000, by=50)
[1] 1 51 101 151 201 251 301 351 401 451 501 551
[13] 601 651 701 751 801 851 901 951
eg.
x
[1] 637 374 68
distribution
[1] "601~650" "351~400" "51~100"

Try this easy solution using findInterval:
cbind(x,lim_inf=id[findInterval(x,id)],lim_sup=id[findInterval(x,id)+1])
x lim_inf lim_sup
[1,] 378 351 401
[2,] 609 601 651
[3,] 496 451 501

Related

Populate new dataframe with rowMeans for every four columns minus other rowMeans value

I am trying to calculate my data's means by populating a new dataframe with data corrected by my experiment's blank.
So far, I have created my new data frame:
data_mean <- data.frame(matrix(ncol = 17, # As many columns as experimental conditions plus one for "Time(h)"
nrow = nrow(data)))
Copied the data corresponding to time:
data_mean[,1] <- data[,1]
And attempted to populate the dataframe by assigning the mean of every condition minus the mean of the blanks to each column:
data_mean[,2] <- rowMeans(data[,5:8])-rowMeans(data[,2:4])
data_mean[,3] <- rowMeans(data[,9:12])-rowMeans(data[,2:4])
data_mean[,4] <- rowMeans(data[,13:16])-rowMeans(data[,2:4])
data_mean[,5] <- rowMeans(data[,17:20])-rowMeans(data[,2:4])
and so on.
Is there an easier way to do this rather than typing the same code over and over?
res <- sapply(split.default(data[, -1], seq(ncol(data) - 1)%/%4), rowSums)
res[,-1] - res[,1] # Should give you all the differences above
example:
data <- data.frame(matrix(1:200, 10))
res <- sapply(split.default(data[, -1], seq(ncol(data) - 1)%/%4), rowSums)
res[,-1] - res[,1]
1 2 3 4
[1,] 161 321 481 641
[2,] 162 322 482 642
[3,] 163 323 483 643
[4,] 164 324 484 644
[5,] 165 325 485 645
[6,] 166 326 486 646
[7,] 167 327 487 647
[8,] 168 328 488 648
[9,] 169 329 489 649
[10,] 170 330 490 650
and you can check:
rowSums(data[, 5:8]) - rowSums(data[,2:4])
[1] 161 162 163 164 165 166 167 168 169 170 # first column
rowSums(data[, 9:12]) - rowSums(data[,2:4])
[1] 321 322 323 324 325 326 327 328 329 330 # second column

Turn extraction list to csv file

I have uploaded a raster file and polyline shapefile into R and use the extract function to to extract the data from every pixel along the polyline. How do I turn the list output by extract into a CSV file?
Here is a simple self-contained reproducible example (this one is taken from ?raster::extract)
library(raster)
r <- raster(ncol=36, nrow=18, vals=1:(18*36))
cds1 <- rbind(c(-50,0), c(0,60), c(40,5), c(15,-45), c(-10,-25))
cds2 <- rbind(c(80,20), c(140,60), c(160,0), c(140,-55))
lines <- spLines(cds1, cds2)
e <- extract(r, lines)
e is a list
> e
[[1]]
[1] 126 127 161 162 163 164 196 197 200 201 231 232 237 266 267 273 274 302 310 311 338 346 381 382 414 417 450 451 452 453 487 488
[[2]]
[1] 139 140 141 174 175 177 208 209 210 213 243 244 249 250 279 286 322 358 359 394 429 430 465 501 537
and you cannot directly write this to a csv because the list elements (vectors) have different lengths.
So first make them all the same length
x <- max(sapply(e, length))
ee <- sapply(e, `length<-`, x)
Let's see
head(ee)
# [,1] [,2]
#[1,] 126 139
#[2,] 127 140
#[3,] 161 141
#[4,] 162 174
#[5,] 163 175
#[6,] 164 177
tail(ee)
# [,1] [,2]
#[27,] 450 NA
#[28,] 451 NA
#[29,] 452 NA
#[30,] 453 NA
#[31,] 487 NA
#[32,] 488 NA
And now you can write to a csv file
write.csv(ee, "test.csv", row.names=FALSE)
If I understand what it is you're asking, I think you could resolve your situation by using unlist().
d <- c(1:10) # creates a sample data frame to use
d <- as.list(d) # converts the data frame into a list
d <- unlist(d) # converts the list into a vector

Calculating median based on segments in r [duplicate]

This question already has answers here:
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 5 years ago.
Hi I want to calculate the median of certain values based on the segment they fall into which we get by another column. The initial data structure is like given below:
Column A Column B
559 1
559 1
322 1
661 2
661 2
662 2
661 2
753 3
752 3
752 3
752 3
752 3
328 4
328 4
328 4
The calculated medians would be based on column A and the output would look like this:
Column A Column B Median
559 1 559
559 1 559
322 1 559
661 2 661
661 2 661
662 2 661
661 2 661
753 3 752
752 3 752
752 3 752
752 3 752
752 3 752
328 4 328
328 4 328
328 4 328
Median is calculated based on column A and for the set of values of column B which are same. For example we should calculate medians of all values of column A where column B values are same and paste them in the column Median.
I need to do this operation in r but haven'e been able to crack it. Is there a way to do this through dplyr or any other package?
Thanks
you can use the library(data.table) and then put your data in a data.table
dt <- as.data.table(data)
dt[,Median:=median('Column A'),by="Column B"]
here it is, done in base R and data.table way. Apologies in advance - my base r approach might be a bit cumbersome - i do not use it too often.
exampleData=data.frame(A=runif(10,0,10),B=sample(2,10,replace=T))
# Data.frame option
exampleData$Median=tapply(exampleData$A,exampleData$B,median)[as.character(exampleData$B)]
# Data.table option
library(data.table)
exampleData=data.table(exampleData)
exampleData[,Median_Data_Table_Way:=median(A),by=B]

PCA on Control and treated data for different timepoints with replicates [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am new to PCA, and so I have a confusion. I have a data that has 12 samples of which 6 are control and 6 are treated. there are 2 time-point for each control and treated and 3 replicates for each time-points which makes total 12 samples.
My data looks like this :
C21 C22 C23 C41 C42 C43 T21 T22 T23 T41 T42 T43
ENSG00000000003 660 451 493 355 495 444 743 259 422 204 149 623
ENSG00000000005 0 0 0 0 0 0 0 0 0 0 0 0
ENSG00000000419 978 928 1161 641 810807 1265 361 998 326 239 1055
ENSG00000000457 234 248 444 192 218 326 615 122 395 134 100 406
ENSG00000000460 1096 919 1253 693 907 1185 1648 381 1119 422 269 1267
Now I want to carry out PCA on this data, Showing for every gene , the point for control samples and point for treated samples (to calculate the euclidean distance between genes for the control and treated). The first six samples should be taken as control point and the last six samples should be taken as treated.
Note: I need genes to be plotted on the PCA graph for control and treated samples (Not the samples it self).
I did the PCA aready but its takes all the data and gives on one point for each gene, not separate point for control and treated for every gene. how can I deal with this? Can anybody help?
DF <- read.table( text = " C21 C22 C23 C41 C42 C43 T21 T22 T23 T41 T42 T43
ENSG00000000003 660 451 493 355 495 444 743 259 422 204 149 623
ENSG00000000005 0 0 0 0 0 0 0 0 0 0 0 0
ENSG00000000419 978 928 1161 641 810 807 1265 361 998 326 239 1055
ENSG00000000457 234 248 444 192 218 326 615 122 395 134 100 406
ENSG00000000460 1096 919 1253 693 907 1185 1648 381 1119 422 269 1267", header = TRUE)
Simply rearrange the input data prior to the PCA. Control and treatment observations should be below each other.
DFc <- DF[, 1:6]
DFt <- DF[, 7:12]
names(DFc) <- gsub("[[:alpha:]*]", "", names(DFc))
names(DFt) <- gsub("[[:alpha:]*]", "", names(DFt))
rownames(DFt) <- paste0(rownames(DFt), "_t")
DF1 <- rbind(DFc, DFt)
summary(pca <- princomp(DF1))
biplot(pca)
Note that this answer does not endorse your statistical approach and only answers the programming question.

How can I draw a boxplot of an R table?

I have a table produced by calling table(...) on a column of data, and I get a table that looks like:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
346 351 341 333 345 415 421 425 429 437 436 469 379 424 387 419 392 396 381 421
I'd like to draw a boxplot of these frequencies, but calling boxplot on the table results in an error:
Error in Axis.table(x = c(333, 368.5, 409.5, 427, 469), side = 2) :
only for 1-D table
I've tried coercing the table to an array with as.array but it seems to make no difference. What am I doing wrong?
If I understand you correctly, boxplot(c(tab)) or boxplot(as.vector(tab)) should work (credit to #joran as well).

Resources