Any way to edit values in a matrix in R? - r

I've parsed through a file to extract certain values. A column contains a percentage with the symbol. Is there any way to remove that "%" character?
From this:
98.9% 23 43
92.2% 342 34
98.9% 53 53
82.2% 32 76
97.9% 83 45
92.9% 92 23
to:
98.9 23 43
92.2 342 34
98.9 53 53
82.2 32 76
97.9 83 45
92.9 92 23

You say in the title that you have a matrix - in which case everything in the matrix should be 'character' already. Use gsub to replace % with nothing.
> j <- matrix(c("1%", "2%", 3, 4), ncol = 2)
> j
[,1] [,2]
[1,] "1%" "3"
[2,] "2%" "4"
> gsub("%", "", j)
[,1] [,2]
[1,] "1" "3"
[2,] "2" "4"
if you want it to be numeric you could use apply along with as.numeric
> apply(gsub("%", "", j), 1, as.numeric)
[,1] [,2]
[1,] 1 2
[2,] 3 4

Use gsub to substitute the % for an empty string, then convert to numeric:
x <- c("98.9%", "92.2%", "98.9%", "82.2%", "97.9%", "92.9%")
as.numeric(gsub("%", "", x))
[1] 98.9 92.2 98.9 82.2 97.9 92.9

Related

How to map numeric vector to hex color codes in R

I'm going to provide an example to make this a little easier. Let's say I have a numeric vector ranging from -2 to +2. I would like to map the numeric values to a color hex code. The colors closer to -2 would be red and the colors close to +2 would be blue. Numeric values close to zero would be grey. So for example the vector below
x <- c(-2,0,2)
would become
x <- c("#FF5733","#8E8E8E","#355EDF")
Obviously I am going to have many numbers between -2 and +2 which is where I am having the issue. Any help appreciated.
You can use colorRamp and rgb:
colfunc <- colorRamp(c("#ff5733", "#838383", "#355edf"))
cols <- colfunc(seq(0,1,len=11))
cols
# [,1] [,2] [,3]
# [1,] 255.0 87.0 51.0
# [2,] 230.2 95.8 67.0
# [3,] 205.4 104.6 83.0
# [4,] 180.6 113.4 99.0
# [5,] 155.8 122.2 115.0
# [6,] 131.0 131.0 131.0
# [7,] 115.4 123.6 149.4
# [8,] 99.8 116.2 167.8
# [9,] 84.2 108.8 186.2
# [10,] 68.6 101.4 204.6
# [11,] 53.0 94.0 223.0
rgb(cols[,1], cols[,2], cols[,3], maxColorValue = 255)
# [1] "#FF5733" "#E65F43" "#CD6852" "#B47163" "#9B7A72" "#838383" "#737B95" "#6374A7" "#546CBA" "#4465CC" "#355EDF"
plot(1:11, rep(1, 11), col=rgb(cols[,1], cols[,2], cols[,3], maxColorValue = 255), pch=16, cex=3)
The colorRamp returns a function, to which you should pass normalized values (on [0,1]), where 0 prefers the first color and 1 the last. (This means you are responsible for scaling from c(-2,0,2) to c(0,0.5,1).)

R sum values in a column but exclude lesser of specific values

I have the following data:
>str(Maximum)
num [1:6] 1.4 1.07 1.89 0.342 0.00 1.998
I want to sum all of these values but between the second and third, and fourth and fifth, only the greater value. So in this case I'm looking for 1.4 + 1.89 + 0.342 + 1.998. How would I go about doing this in R code?
First element, plus maximum of (2,3) plus maximum of (4,5) + the 6th element.
maximum[1] + max(maximum[2:3]) + max(maximum[4:5]) + maximum[6]
If your vector Maximum always has 6 element, Florian's answer is the simplest way to do it. But if your vector is longer then you could do:
z1 = Maximum[seq(from = 2, to = length(Maximum)-1, by = 2)]
z2 = Maximum[seq(from = 3, to = length(Maximum)-1, by = 2)]
z3 = ifelse(z1>z2, z1, z2)
result = Maximum[1] + sum(z3) + Maximum[length(Maximum)]
For example:
Maximum = floor(runif(22, 1, 100))
> Maximum
[1] 96 6 1 10 90 15 58 48 94 97 78 95 42 79 61 25 61 74 93 37 44 22
z1 would be the elements at even indexes (excluding ends):
> z1
[1] 6 10 15 48 97 95 79 25 74 37
z2 the elements at odd indexes (excluding ends):
> z2
[1] 1 90 58 94 78 42 61 61 93 44
and z3 the maximum value between z1 and z2 for each index:
> z3
[1] 6 90 58 94 97 95 79 61 93 44
And then calculate the result by adding z3 and the start and end of Maximum
Note: the Maximum vector should have an even number of elements.
You can specify the positions in a vector using <vector.name>[<positions>]. Moreover, you can specify positions to skip using -. Thus,
Maximum <- c(1.4, 1.07, 1.89, 0.342, 0.00, 1.998)
Maximum[-c(2,5)]
# [1] 1.400 1.890 0.342 1.998
sum( Maximum[-c(2,5)] )
# [1] 5.63
A common method here is to use tapply to perform "group operations" and then aggregate the intermediate values.
vec <- c(1.4, 1.07, 1.89, 0.342, 0.00, 1.998)
group <- c(1, 2, 2, 3, 3, 4)
Here, calculate the max of each group
tapply(vec, group, max)
1 2 3 4
1.400 1.890 0.342 1.998
Then you can sum the resulting values
sum(tapply(vec, group, max))
[1] 5.63
One way to construct the group variable dynamically would be using rep a couple times like this.
reps <- c(1, rep(2, (length(vec) / 2) - 1), 1)
rep(seq_along(reps), reps)
[1] 1 2 2 3 3 4

What's the opposite function to lag for an R vector/dataframe?

I have a problem dealing with time series in R.
#--------------read data
wb = loadWorkbook("Countries_Europe_Prices.xlsx")
df = readWorksheet(wb, sheet="Sheet2")
x <- df$Year
y <- df$Index1
y <- lag(y, 1, na.pad = TRUE)
cbind(x, y)
It gives me the following output:
x y
[1,] 1974 NA
[2,] 1975 50.8
[3,] 1976 51.9
[4,] 1977 54.8
[5,] 1978 58.8
[6,] 1979 64.0
[7,] 1980 68.8
[8,] 1981 73.6
[9,] 1982 74.3
[10,] 1983 74.5
[11,] 1984 72.9
[12,] 1985 72.1
[13,] 1986 72.3
[14,] 1987 71.7
[15,] 1988 72.9
[16,] 1989 75.3
[17,] 1990 81.2
[18,] 1991 84.3
[19,] 1992 87.2
[20,] 1993 90.1
But I want the first value in y to be 50.8 and so forth. In other words, I want to get a negative lag. I don't get it, how can I do it?
My problem is very similar to this problem, but however I cannot solve it. I guess I still do not understand the solution(s)...
Basic lag in R vector/dataframe
How about the built-in 'lead' function? (from the dplyr package)
Doesn't it do exactly the job of Ahmed's function?
cbind(x, lead(y, 1))
If you want to be able to calculate either positive or negative lags in the same function, i suggest a 'shorter' version of his 'shift' function:
shift = function(x, lag) {
require(dplyr)
switch(sign(lag)/2+1.5, lead(x, abs(lag)), lag(x, abs(lag)))
}
What it does is creating 2 cases, one with lag the other with lead, and chooses one case depending on the sign of your lag (the +1.5 is a trick to transform a {-1, +1} into a {1, 2} alternative).
There is an easier way of doing this which I have captured fully from this link. What I will do here is explaining what should you do in steps:
First create the following function by running the following code:
shift<-function(x,shift_by){
stopifnot(is.numeric(shift_by))
stopifnot(is.numeric(x))
if (length(shift_by)>1)
return(sapply(shift_by,shift, x=x))
out<-NULL
abs_shift_by=abs(shift_by)
if (shift_by > 0 )
out<-c(tail(x,-abs_shift_by),rep(NA,abs_shift_by))
else if (shift_by < 0 )
out<-c(rep(NA,abs_shift_by), head(x,-abs_shift_by))
else
out<-x
out
}
This will create a function called shift with two arguments; one is the vector you need to operate its lag/lead and the other is number of lags/leads you need.
Example:
Suppose you have the following vector:
x<-seq(1:10)
x
[1] 1 2 3 4 5 6 7 8 9 10
if you need x's first order lag
shift(x,-1)
[1] NA 1 2 3 4 5 6 7 8 9
if you need x's first order lead (negative lag)
shift(x,1)
[1] 2 3 4 5 6 7 8 9 10 NA
Simpler solution:
y = dplyr::lead(y,1)
The opposite of lag() function is lead()

R regex / gsub : extract part of pattern

I have a list of weather stations and their locations by latitude and longitude. There was formatting issue and some of them have have hours and minutes while other have hours, minutes and seconds. I can find the pattern using regex but I'm having trouble extracting the individual pieces.
Here's data:
> head(wthrStat1 )
Station lat lon
1940 K01R 31-08N 092-34W
1941 K01T 28-08N 094-24W
1942 K03Y 48-47N 096-57W
1943 K04V 38-05-50N 106-10-07W
1944 K05F 31-25-16N 097-47-49W
1945 K06D 48-53-04N 099-37-15W
I'd like something like this:
Station latHr latMin latSec latDir lonHr lonMin lonSec lonDir
1940 K01R 31 08 00 N 092 34 00 W
1941 K01T 28 08 00 N 094 24 00 W
1942 K03Y 48 47 00 N 096 57 00 W
1943 K04V 38 05 50 N 106 10 07 W
1944 K05F 31 25 16 N 097 47 49 W
1945 K06D 48 53 04 N 099 37 15 W
I can get matches to this regex:
data.format <- "\\d{1,3}-\\d{1,3}(?:-\\d{1,3})?[NSWE]{1}"
grep(data.format, wthrStat1$lat)
But am unsure how to get the individual parts into columns. I've tried a few things like:
wthrStat1$latHr <- ifelse(grepl(data.format, wthrStat1$lat), gsub(????), NA)
but with no luck.
Here's a dput():
> dput(wthrStat1[1:10,] )
structure(list(Station = c("K01R", "K01T", "K03Y", "K04V", "K05F",
"K06D", "K07G", "K07S", "K08D", "K0B9"), lat = c("31-08N", "28-08N",
"48-47N", "38-05-50N", "31-25-16N", "48-53-04N", "42-34-28N",
"47-58-27N", "48-18-03N", "43-20N"), lon = c("092-34W", "094-24W",
"096-57W", "106-10-07W", "097-47-49W", "099-37-15W", "084-48-41W",
"117-25-42W", "102-24-23W", "070-24W")), .Names = c("Station",
"lat", "lon"), row.names = 1940:1949, class = "data.frame")
Any suggestions?
strapplyc in the gsubfn package will extract each group in the regular expression surrounded with parentheses:
library(gsubfn)
data.format <- "(\\d{1,3})-(\\d{1,3})-?(\\d{1,3})?([NSWE]{1})"
parts <- strapplyc(wthrStat1$lat, data.format, simplify = rbind)
parts[parts == ""] <- "00"
which gives:
> parts
[,1] [,2] [,3] [,4]
[1,] "31" "08" "00" "N"
[2,] "28" "08" "00" "N"
[3,] "48" "47" "00" "N"
[4,] "38" "05" "50" "N"
[5,] "31" "25" "16" "N"
[6,] "48" "53" "04" "N"
[7,] "42" "34" "28" "N"
[8,] "47" "58" "27" "N"
[9,] "48" "18" "03" "N"
[10,] "43" "20" "00" "N"
it is extremely inefficient , I hope someone else had better solution:
dat <- read.table(text =' Station lat lon
1940 K01R 31-08N 092-34W
1941 K01T 28-08N 094-24W
1942 K03Y 48-47N 096-57W
1943 K04V 38-05-50N 106-10-07W
1944 K05F 31-25-16N 097-47-49W
1945 K06D 48-53-04N 099-37-15W', head=T)
pattern <- '([0-9]+)[-]([0-9]+)([-|A-Z]+)([0-9]*)([A-Z]*)'
dat$latHr <- gsub(pattern,'\\1',dat$lat)
dat$latMin <- gsub(pattern,'\\2',dat$lat)
latSec <- gsub(pattern,'\\4',dat$lat)
latSec[nchar(latSec)==0] <- '00'
dat$latSec <- latSec
latDir <- gsub(pattern,'\\5',dat$lat)
latDir[nchar(latDir)==0] <- latDir[nchar(latDir)!=0][1]
dat$latDir <- latDir
dat
Station lat lon latHr latMin latSec latDir
1940 K01R 31-08N 092-34W 31 08 00 N
1941 K01T 28-08N 094-24W 28 08 00 N
1942 K03Y 48-47N 096-57W 48 47 00 N
1943 K04V 38-05-50N 106-10-07W 38 05 50 N
1944 K05F 31-25-16N 097-47-49W 31 25 16 N
1945 K06D 48-53-04N 099-37-15W 48 53 04 N
Another answer, using stringr:
# example data
data <-
"Station lat lon
1940 K01R 31-08N 092-34W
1941 K01T 28-08N 094-24W
1942 K03Y 48-47N 096-57W
1943 K04V 38-05-50N 106-10-07W
1944 K05F 31-25-16N 097-47-49W
1945 K06D 48-53-04N 099-37-15W"
## read string into a data.frame
df <- read.table(text=data, head=T, stringsAsFactors=F)
pattern <- "(\\d{1,3})-(\\d{1,3})(?:-(\\d{1,3}))?([NSWE]{1})"
library(stringr)
str_match(df$lat, pattern)
This produces a data.frame with one column for the whole matching string and an additional column for each capture-group.
[,1] [,2] [,3] [,4] [,5]
[1,] "31-08N" "31" "08" "" "N"
[2,] "28-08N" "28" "08" "" "N"
[3,] "48-47N" "48" "47" "" "N"
[4,] "38-05-50N" "38" "05" "-50" "N"
[5,] "31-25-16N" "31" "25" "-16" "N"
[6,] "48-53-04N" "48" "53" "-04" "N"
R's string processing ability has progressed a lot in the past few years.

R - store a matrix into a single dataframe cell

I'm trying to store an entire matrix/array into a single cell of a data frame, but can't quite remember how to do it.
Now before you say it can't be done, I'm sure I remember someone asking a question on SO where it was done, although that wasn't the point of the question so I can't find it again.
For example, you can store matrices inti a single cell of a matrix like so:
myMat <- array(list(), dim=c(2, 2))
myMat[[1, 1]] <- 1:5
myMat[[1, 2]] <- 6:10
# [,1] [,2]
#[1,] Integer,5 Integer,5
#[2,] NULL NULL
The trick was in using the double brackets [[]].
Now I just can't work out how to do it for a data frame (or if you can):
# attempt to make a dataframe like above (except if I use list() it gets
# interpreted to mean the `m` column doesn't exist)
myDF <- data.frame(i=1:5, m=NA)
myDF[[1, 'm']] <- 1:5
# Error in `[[<-.data.frame`(`*tmp*`, 1, "m", value = 1:5) :
# more elements supplied than there are to replace
# this seems to work but I have to do myDF$m[[1]][[1]] to get the 1:5,
# whereas I just want to do myDF$m[[1]].
myDF[[1, 'm']] <- list(1:5)
I think I'm almost there. With that last attempt I can do myDF[[1, 'm']] to retrieve list(1:5) and hence myDF[[1, 'm']][[1]] to get 1:5, but I'd prefer to just do myDF[[1, 'm']] and get 1:5.
I think I worked it out. It is important to initialise the data frame such that the column is ready to accept matrices.
To do this you give it a list data type. Note the I to protect the list().
myDF <- data.frame(i=integer(), m=I(list()))
Then you can add rows as usual
myDF[1, 'i'] <- 1
and then add the matrix in with [[]] notation
myDF[[1, 'm']] <- matrix(rnorm(9), 3, 3)
Access with [[]] notation:
> myDF$m[[1]]
[,1] [,2] [,3]
[1,] 0.3307403 -0.2031316 1.5995385
[2,] 0.4588922 0.1631086 -0.2754463
[3,] 0.0568791 1.0358552 -0.1623794
To initialise with non-zero rows you can do (note the I to protect the vector and the vector('list', 5) to initialise an empty list of length 5 to avoid wasting memory):
myDF <- data.frame(i=1:5, m=I(vector('list', 5)))
myDF$m[[1]] <- matrix(rnorm(9), 3, 3)
I think the trick may be to insert it in as a list:
set.seed(123)
dat <- data.frame(women, m=I(replicate(nrow(women), matrix(rnorm(4), 2, 2),
simplify=FALSE)))
str(dat)
'data.frame': 15 obs. of 3 variables:
$ height: num 58 59 60 61 62 63 64 65 66 67 ...
$ weight: num 115 117 120 123 126 129 132 135 139 142 ...
$ m :List of 15
..$ : num [1:2, 1:2] -0.5605 -0.2302 1.5587 0.0705
..$ : num [1:2, 1:2] 0.129 1.715 0.461 -1.265
...
..$ : num [1:2, 1:2] -1.549 0.585 0.124 0.216
..- attr(*, "class")= chr "AsIs"
dat[[1, "m"]]
[,1] [,2]
[1,] -0.5604756 1.55870831
[2,] -0.2301775 0.07050839
dat[[2, "m"]]
[,1] [,2]
[1,] 0.1292877 0.4609162
[2,] 1.7150650 -1.2650612
EDIT: So the question really is about initialising and then assigning. Given that, you should be able to define a data.frame like the one in your question like so:
data.frame(i=1:5, m=I(vector(mode="list", length=5)))
You can then assign to it like so:
dat[[2, "m"]] <- matrix(rnorm(9), 3, 3)

Resources