Creating a matrix by splitting a vector

Creating a matrix by splitting a vector - r

The string:
f <- c("20-04-2018","15-07-2021","11-11-2022","08-12-2021","28-01-2020")
Allocate one column for the day, month and year.
Anyone who knows what code to use to solve this question?

One way:
> f <-c("20-04-2018","15-07-2021","11-11-2022","08-12-2021","28-01-2020")
> do.call(rbind,strsplit(f, "-"))
[,1] [,2] [,3]
[1,] "20" "04" "2018"
[2,] "15" "07" "2021"
[3,] "11" "11" "2022"
[4,] "08" "12" "2021"
[5,] "28" "01" "2020"
>
Another, better, way using date functions:
> D <- data.frame(date=as.Date(f, "%d-%m-%Y"))
> D$year <- as.integer(format(D$date, "%Y"))
> D$month <- as.integer(format(D$date, "%m"))
> D$day <- as.integer(format(D$date, "%d"))
> D
date year month day
1 2018-04-20 2018 4 20
2 2021-07-15 2021 7 15
3 2022-11-11 2022 11 11
4 2021-12-08 2021 12 8
5 2020-01-28 2020 1 28
>

Yet another way:
f |>
strsplit(split = "-") |>
unlist() |>
matrix(ncol = 3, byrow = TRUE)
[,1] [,2] [,3]
[1,] "20" "04" "2018"
[2,] "15" "07" "2021"
[3,] "11" "11" "2022"
[4,] "08" "12" "2021"
[5,] "28" "01" "2020"

Base R using regex and strcapture():
as.matrix(
strcapture(
pattern = "^(\\d{2})\\-(\\d{2})\\-(\\d{4})$",
x = f,
proto = list(
day = integer(),
month = integer(),
year = integer()
)
)
)
Base R option 2:
type.convert(
simplify2array(
within(
data.frame(f_date = as.Date(f, "%d-%m-%Y")),
{
day <- strftime(f_date, "%d")
month <- strftime(f_date, "%m")
year <- strftime(f_date, "%Y")
f_date <- NULL
}
),
higher = FALSE
),
as.is = TRUE
)

Related

How to get an element from a text string in R

> my_data <- "08,23,02.06.2022,5,7,THISPRODUCT,09.02.2022,yes,89,25"
> lengths(gregexpr(",", my_data))+1
[1] 10
I need to get each element individually. I tried with
print(gregexpr(",", my_data))[[1]][1]
> print(gregexpr(",", my_data))[[1]][1]
[[1]]
[1] 3 6 17 19 21 33 44 48 51
attr(,"match.length")
[1] 1 1 1 1 1 1 1 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
[1] 3
but my_data has the first element "08" but it displays 3.. anyone give me correct syntax to display every element.

library(tidyverse)
strings <- "08,23,02.06.2022,5,7,THISPRODUCT,09.02.2022,yes,89,25" %>%
str_split(pattern = ",") %>%
unlist()
strings[1]
#> [1] "08"
Created on 2022-06-29 by the reprex package (v2.0.1)

Let's try scan
> scan(text = my_data, what = "",sep = ",",quiet = TRUE)
[1] "08" "23" "02.06.2022" "5" "7"
[6] "THISPRODUCT" "09.02.2022" "yes" "89" "25"

Using lapply:
lapply(strsplit(my_data, ","), `[`)
Output:
[[1]]
[1] "08" "23" "02.06.2022" "5" "7" "THISPRODUCT" "09.02.2022" "yes"
[9] "89" "25"

You can simply do:
unlist(strsplit(my_data, split = ","))

R - Pairs of values in a single vector: How to detect missing values?

I have a long vector comprised of pairs of values; years paired to scores. The number of characters in each value is always the same (4 character for years, 3 characters for scores).
data <- c("2018", "5.5", "2016", "8.4", "2017", "6.6", "2018", "2017", "5.5",
"2009", "7.9")
The problem is that some of the scores are missing, while all of the years are present:
matrix(data, ncol = 2, byrow = T)
[,1] [,2]
[1,] "2018" "5.5"
[2,] "2016" "8.4"
[3,] "2017" "6.6"
[4,] "2018" "2017"
[5,] "5.5" "2009"
[6,] "7.9" "2018"
This way I can't structure the data by converting it to a matrix or dataframe as the pairs of values are shifted.
Is there a way detect when a mismatch takes place ie. a year is followed by another year and insert an NA in between the two values?

Sure, here's a pretty compact way:
idx <- which(nchar(data) == 4)
cbind(Year = data[idx], Score = ifelse(nchar(data[idx + 1]) == 3, data[idx + 1], NA))
# Year Score
# [1,] "2018" "5.5"
# [2,] "2016" "8.4"
# [3,] "2017" "6.6"
# [4,] "2018" NA
# [5,] "2017" "5.5"
# [6,] "2009" "7.9"
where using nchar and your information on the lengths is key.

strsplit split on either or depending on

Once again I'm struggling with strsplit. I'm transforming some strings to data frames, but there's a forward slash, / and some white space in my string that keep bugging me. I could work around it, but I eager to learn if I can use some fancy either or in strsplit. My working example below should illustrate the issue
The strsplit function I'm currrently using
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\\s+")[[x]])) }
one type of string I got,
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#> [,1] [,2]
#> [1,] "One" "58/2"
#> [2,] "Two" "22/3"
#> [3,] "Three" "15/5"
another type I got in the same spot,
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#> [,1] [,2] [,3] [,4]
#> [1,] "One" "58" "/" "2"
#> [2,] "Two" "22" "/" "3"
#> [3,] "Three" "15" "/" "5"
They obviously create different outputs, and I can't figure out how to code a solution that work for both. Below is my desired outcome. Thank you in advance!
desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
"15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#> [,1] [,2] [,3]
#> [1,] "One" "58" "2"
#> [2,] "Two" "22" "3"
#> [3,] "Three" "15" "5"

This works:
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string1)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
str_to_df(string2)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
Another approach with tidyr could be:
string1 %>%
as_tibble() %>%
separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")
# A tibble: 3 x 3
# Col1 Col2 Col3
# <chr> <chr> <chr>
# 1 One 58 2
# 2 Two 22 3
# 3 Three 15 5

We can create a function to split at one or more space or tab or forward slash
f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
f1(string1)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
f1(string2)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
Or we can do with read.csv after replacing the spaces with a common delimiter
read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
# V1 V2 V3
#1 One 58 2
#2 Two 22 3
#3 Three 15 5

Automatically changing matrix length and row names

My data lengthens each quarter and varies start dates in different data sets.
I have written a code which runs lots of tests and produces forecasts and is automatically documented with graphs and tables of the data.
Everything works fine until the length of data or start date changes because the data in the tables is either not of a correct length or doesnt match up to the correct quarter.
Here is an example:
Test.data <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
Test.dates <- c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3")
Test <- matrix(c(Test.data,""),nrow=4,byrow=FALSE)
colnames(Test) <- c("'08","'09","'10","'11","'12","'13","'14")
rownames(Test) <- c("Qtr 1", "Qtr 2", "Qtr 3", "Qtr 4")
Which quite nicely gives:
'08 '09 '10 '11 '12 '13 '14
Qtr 1 1 5 9 13 17 21 25
Qtr 2 2 6 10 14 18 22 26
Qtr 3 3 7 11 15 19 23 27
Qtr 4 4 8 12 16 20 24
However then in the next quarter the data will increase by 1 and come up with an error:
Warning message:
In matrix(c(Test.data, ""), nrow = 4, byrow = FALSE) :
data length [29] is not a sub-multiple or multiple of the number of rows [4]
Error in `colnames<-`(`*tmp*`, value = c("'08", "'09", "'10", "'11", "'12", :
length of 'dimnames' [2] not equal to array extent
Or if a data set begins in 08Q2 instead of 08Q1 then the data will all be next to the wrong quarter.
I need to display my data in the specific way of:
'yr1 'yr2 'yr3 ...
Qtr 1
Qtr 2
Qtr 3
Qtr 4
Does anyone have any suggestions on how i can get this to automatically change to fit my data without having to change anything (as very soon it will be joined to a database which will constantly produce results so therefore it cannot be changed each time the data is different lengths)
Thankyou for your help.
Please comment below if you want any more information

Test.data.padded <- as.character(Test.data)
length(Test.data.padded) <- ceiling(length(Test.data.padded) / 4) * 4
Test.data.padded[is.na(Test.data.padded)] <- ""
Test <- matrix(Test.data.padded, nrow=4, byrow=FALSE)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "1" "5" "9" "13" "17" "21" "25"
#[2,] "2" "6" "10" "14" "18" "22" "26"
#[3,] "3" "7" "11" "15" "19" "23" "27"
#[4,] "4" "8" "12" "16" "20" "24" ""
Then use a regex to extract the years from your Test.dates.

Not sure if this helps.
library(stringi)
n <- 4
l <- length(Test.data)
m1 <- stri_list2matrix(split(Test.data,as.numeric(gl(l,n,l))), fill='')
nm1 <- do.call(rbind,strsplit(Test.dates, '(?<=[0-9])(?=[Q])', perl=TRUE))
dimnames(m1) <- list(unique(nm1[,2]), unique(nm1[,1]))
m1
# 08 09 10 11 12 13 14
#Q1 "1" "5" "9" "13" "17" "21" "25"
#Q2 "2" "6" "10" "14" "18" "22" "26"
#Q3 "3" "7" "11" "15" "19" "23" "27"
#Q4 "4" "8" "12" "16" "20" "24" ""

How to do basic row name mapping of matrix in R?

I have very big matrix called A, I need to add one column to that matrix, which is the mapped row names of this matrix from other matrix called B .
row names of matrix A are in column called ID and it's mapped name is in column Sample
Here iss simple reproduceable example and expected output.
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
and second matrix, which row names of matrix A are in column ID and it's mapped name is in Sample column:
B<-cbind(c("d1","f2","g5","y4"),c("q","L","w","r"),c("qw","we","zr","ls"))
colnames(B)<-c("M","ID","Sample"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
Here is the expected output:
A<-cbind(c("a","b", "c"), c(1,2,3), c(10,14,15), c("qw","zr","ls"))
rownames(A)<-c("q","w","r")
> A
[,1] [,2] [,3] [,4]
q "a" "1" "10" "qw"
w "b" "2" "14" "zr"
r "c" "3" "15" "ls"
>
Would someone help me to implement it in R ?

You can also use the merge function in R.
> A <-matrix( data = NA, nrow = 3, ncol =3)
> A[1,] <- c("a" , "1", "10")
> A[2,] <- c( "b" , "2" , "14")
> A[3,] <- c("c" , "3" , "15")
>
> row.names(A) = c("q","w","r")
>
>
> B <- matrix(data = "NA" , nrow = 4, ncol = 3)
> B[1,] <- c("d1" ,"q" ,"qw")
> B[2,] <- c( "f2" ,"L" ,"we")
> B[3,] <- c("g5" ,"w", "zr")
> B[4,] <- c("y4", "r", "ls" )
> colnames(B) = c("M", "ID", "Sample")
> A
[,1] [,2] [,3]
q "a" "1" "10"
w "b" "2" "14"
r "c" "3" "15"
> B
M ID Sample
[1,] "d1" "q" "qw"
[2,] "f2" "L" "we"
[3,] "g5" "w" "zr"
[4,] "y4" "r" "ls"
>
> C <- merge(A, B, by.x = 0, by.y = "ID" )
> D <- C[,-5]
> D
Row.names V1 V2 V3 Sample
1 q a 1 10 qw
2 r c 3 15 ls
3 w b 2 14 zr

You were almost there just putting the sample matrices together.
While we cannot use the $ operator on matrices, we can use the dimnames (as well as the row/column numbers) to subset the matrix. Then we can find which ID are in the row names of A with %in%
> cbind(A, B[,"Sample"][B[,"ID"] %in% rownames(A)])
# [,1] [,2] [,3] [,4]
# q "a" "1" "10" "qw"
# w "b" "2" "14" "zr"
# r "c" "3" "15" "ls"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Creating a matrix by splitting a vector - r

The string: f <- c("20-04-2018","15-07-2021","11-11-2022","08-12-2021","28-01-2020") Allocate one column for the day, month and year. Anyone who knows what code to use to solve this question?

Yet another way: f |> strsplit(split = "-") |> unlist() |> matrix(ncol = 3, byrow = TRUE) [,1] [,2] [,3] [1,] "20" "04" "2018" [2,] "15" "07" "2021" [3,] "11" "11" "2022" [4,] "08" "12" "2021" [5,] "28" "01" "2020"

Related

How to get an element from a text string in R

R - Pairs of values in a single vector: How to detect missing values?

strsplit split on either or depending on

Automatically changing matrix length and row names

How to do basic row name mapping of matrix in R?

Categories

Resources