How add rownames with no dimensions in R - r

> Cases <- c(4,46,98,115,88,34)
> Cases
[1] 4 46 98 115 88 34
> str(Cases)
num [1:6] 4 46 98 115 88 34
I want to name row as "total.cases" and I got error attempt to set rownames with no dimensions.please see expected the output to be as follow
total.cases 4 46 98 115 88 34

Your problem is that Cases as you define it is an atomic vector. There is no concept of rows or columns.
I think you probably want a list
Cases <- list(total.cases = c(4,46,98,115,88,34))
Cases
## $total.cases
## [1] 4 46 98 115 88 34
str(Cases)
## List of 1
## $ total.cases: num [1:6] 4 46 98 115 88 34

Do you want to print the output in a particular way or do you actually want rownames?
To print Cases how you want, you could just use:
> cat("total.cases ",Cases,"\n")
total.cases 4 46 98 115 88 34
To assign a rowname, you need to actually have rows first. A vector (like Cases) doesn't have any rows or columns as dimensions. You could however convert to a matrix though:
> matrix(Cases,nrow=1,dimnames=list("total.cases",1:length(Cases)))
1 2 3 4 5 6
total.cases 4 46 98 115 88 34

Related

What is a memory-efficient method to spread then gather columns? (see example)

I'm trying to rearrange my data for downstream processing. I found a way to accomplish what I want, but it is memory-intensive and I'm sure there is a more-efficient way.
Here is an example from the data:
X.1 Label X
81 81 21 367.138
82 82 21 384.295
83 83 21 159.496
84 84 21 269.927
85 85 22 364.118
86 86 22 154.475
87 87 22 265.861
I want to rearrange the data to create a table of X values for each separate object, as shown below:
1 2 3 4
1 367.138 384.295 159.496 269.927
2 364.118 154.475 265.861 NA
I can do this just fine using spread, apply, and ldply functions shown below:
X <- apply(tidyr::spread(X, Label,X), 2, function(x) na.omit(x))
X<-X[-1]
X<-plyr::ldply(X, rbind)
X<-as.data.frame(X[-1])
Here's the problem, the spread function generates the following table as an intermediate step:
X.1 1 2
1 81 367.138 NA
2 82 384.295 NA
3 83 159.496 NA
4 84 269.927 NA
5 85 NA 364.118
6 86 NA 154.475
7 87 NA 265.861
This is fine for small data sets, but for large data sets the table generated is huge and I'm running out of memory which produces the following error:
Error: cannot allocate vector of size 8.4 Gb
I'm sure there must be a more efficient way of doing this without generating that massive intermediate table. Any ideas?
An option using data.table
dcast(DT, rleid(Label) ~ rowid(Label), value.var = "X")
# Label 1 2 3 4
#1: 1 367.138 384.295 159.496 269.927
#2: 2 364.118 154.475 265.861 NA
data
library(data.table)
DT <- fread(text = " X.1 Label X
81 21 367.138
82 21 384.295
83 21 159.496
84 21 269.927
85 22 364.118
86 22 154.475
87 22 265.861")

Ordering list object of IRanges to get all elements decreasing

I am having difficulties trying to order a list element-wise by decreasing order...
I have a ByPos_Mindex object or a list of 1000 IRange objects (CG_seqP) from
C <- vmatchPattern(CG, CPGi_Seq, max.mismatch = 0, with.indels = FALSE)
IRanges object with 27 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 2 2
[2] 3 4 2
[3] 9 10 2
[4] 27 28 2
[5] 34 35 2
... ... ... ...
[23] 189 190 2
[24] 207 208 2
[25] 212 213 2
[26] 215 216 2
[27] 218 219 2
length(1000 of these IRanges)
I then change this to a list of only the start integers (which I want)
CG_SeqP <- sapply(C, function(x) sapply(as.vector(x), "[", 1))
[[1]]
[1] 1 3 9 27 34 47 52 56 62 66 68 70 89 110 112
[16] 136 140 146 154 160 163 178 189 207 212 215 218
(1000 of these)
The Problem happens when I try and order the list of elements using
CG_SeqP <- sapply(as.vector(CG_SeqP),order, decreasing = TRUE)
I get a list of what I think is row numbers so if the first IRAnge object is 27 I get this...
CG_SeqP[1]
[[1]]
[1] 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8
[21] 7 6 5 4 3 2 1
So the decreasing has worked but not for my actual list of elements>?
Any suggestions, thanks in advance.
Order returns order of the sequence not the actual elements of your vector, to extract it let us look at a toy example (I am following your idea here) :
set.seed(1)
alist1 <- list(a = sample(1:100, 30))
So, If you print alist1 with the current seed value , you will have below results:
> alist1
$a
[1] 99 51 67 59 23 25 69 43 17 68 10 77 55 49 29 39 93 16 44
[20] 7 96 92 80 94 34 97 66 31 5 24
Now to sort them either you use sort function or you can use order, sort just sorts the data, whereas order just returns the order number of the elements in a sorted sequence. It doesn't return the actual sequence, it returns the position. Hence we need to put those positions in the actual vector using square notation brackets to get the right sorted outcome.
lapply(as.vector(alist1),function(x)x[order(x, decreasing = TRUE)])
I have used lapply instead of sapply just to enforce the outcome as a list. You are free to choose any command basis your need
Will return:
#> lapply(as.vector(alist1),function(x)x[order(x, decreasing = TRUE)])
#$a
# [1] 99 97 96 94 93 92 80 77 69 68 67 66 59 55 51 49 44 43 39
#[20] 34 31 29 25 24 23 17 16 10 7 5
I hope this clarifies your doubt. Thanks

Calculate number of values in vector that exceed values in column of data.frame

I have a long list of numbers, e.g.
set.seed(123)
y<-round(runif(100, 0, 200))
And I would like to store in column y the number of values that exceed each value in column x of a data frame:
df <- data.frame(x=seq(0,200,20))
I can compute the numbers manually, like this:
length(which(y>=20)) #93 values exceed 20
length(which(y>=40)) #81 values exceed 40
etc. I know I can use a for-loop with all values of x, but is there a more elegant way?
I tried this:
df$y <- length(which(y>=df$x))
But this gives a warning and does not give me the desired output.
The data frame should look like this:
df
x y
1 0 100
2 20 93
3 40 81
4 60 70
5 80 61
6 100 47
7 120 40
8 140 29
9 160 19
10 180 8
11 200 0
You can compare each value of df$x against all value of y using sapply
sapply(df$x, function(a) sum(y>a))
#[1] 99 93 81 70 61 47 40 29 18 6 0
#Looking at your output, maybe you want
sapply(df$x, function(a) sum(y>=a))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Here's another approach using outer that allows for element wise comparison of two vectors
rowSums(outer(df$x,y, "<="))
#[1] 100 93 81 70 61 47 40 29 19 8 0
Yet one more (from alexis_laz's comment)
length(y) - findInterval(df$x, sort(y), left.open = TRUE)
# [1] 100 93 81 70 61 47 40 29 19 8 0

ASCII Character to Decimal Value in R

Given a word I need to find the decimal values of each letter in that word and store it in an array.
I used strtoi function to achieve this. But later found out below two functions which are supposed to give same output are giving different result. Can anyone explain why?
1st attempt
> strtoi("d",16L)
[1] 13
2nd attempt
> strtoi(charToRaw("d"),16L)
[1] 100
And what does 16L in the base of srtoi mean? I am fairly new to Dec, Hex, Oct representation of ASCII characters. So please share some information about it.
For illustration purposes only:
library(purrr)
library(tibble)
input_str <- "Alphabet."
charToRaw(input_str) %>%
map_df(~data_frame(letter=rawToChar(.),
hex_value=toString(.),
decimal_value=as.numeric(.)))
## # A tibble: 9 × 3
## letter hex_value decimal_value
## <chr> <chr> <dbl>
## 1 A 41 65
## 2 l 6c 108
## 3 p 70 112
## 4 h 68 104
## 5 a 61 97
## 6 b 62 98
## 7 e 65 101
## 8 t 74 116
## 9 . 2e 46
Since what you need to do can be done all in base R:
as.numeric(charToRaw(input_str))
## [1] 65 108 112 104 97 98 101 116 46
You can also do as.integer() vs as.numeric() if you just need/want integers.

In R: Indexing vectors by boolean comparison of a value in range: index==c(min : max)

In R, let's say we have a vector
area = c(rep(c(26:30), 5), rep(c(500:504), 5), rep(c(550:554), 5), rep(c(76:80), 5)) and another vector yield = c(1:100).
Now, say I want to index like so:
> yield[area==27]
[1] 2 7 12 17 22
> yield[area==501]
[1] 27 32 37 42 47
No problem, right? But weird things start happening when I try to index it by using c(A, B). (and even weirder when I try c(min:max) ...)
> yield[area==c(27,501)]
[1] 7 17 32 42
What I'm expecting is of course the instances that are present in both of the other examples, not just some weird combination of them. This works when I can use the pipe OR operator:
> yield[area==27 | area==501]
[1] 2 7 12 17 22 27 32 37 42 47
But what if I'm working with a range? Say I want index it by the range c(27:503)? In my real example there are a lot more data points and ranges, so it makes more sense, please don't suggest I do it by hand, which would essentially mean:
yield[area==27 | area==28 | area==29 | ... | area==303 | ... | area==500 | area==501]
There must be a better way...
You want to use %in%. Also notice that c(27:503) and 27:503 yield the same object.
> yield[area %in% 27:503]
[1] 2 3 4 5 7 8 9 10 12 13 14 15 17
[14] 18 19 20 22 23 24 25 26 27 28 29 31 32
[27] 33 34 36 37 38 39 41 42 43 44 46 47 48
[40] 49 76 77 78 79 80 81 82 83 84 85 86 87
[53] 88 89 90 91 92 93 94 95 96 97 98 99 100
Why not use subset?
subset(yield, area > 26 & area < 504) ## for indexes
subset(area, area > 26 & area < 504) ## for values

Resources