Alphabet conversion - Cyrillic to Latin

Alphabet conversion - Cyrillic to Latin - r

I have a list of names and surnames written on Cyrillic.
head(text, n = 20)
unique(clients$RODITEL)
1 <NA>
2 ЃОРЃИ
3 ALEKSANDAR
4 000000000000
5 ТР4АЈЧЕ
6 0
7 HHHHHHH
8 0000000
9 TASKO
10 --------------------
11 ДРАГИ
12 СЛАВЧО
13 ACO
14 НИКОЛА
15 САШО
16 НАУМЧЕ
17 ОРЦЕ
18 САНДРА
19 МИРСАД
20 ОКТАЈ
What I need to do is to convert the names written on Cyrlic, such as the last 10 rows into Latin.
So the output would be:
1 <NA>
2 GJORGJI
3 ALEKSANDAR
4 000000000000
5 TRAJCHE
6 0
7 HHHHHHH
8 0000000
9 TASKO
10 --------------------
11 DRAGI
12 SLAVCHO
13 ACO
14 NIKOLA
15 SASHO
16 NAUMCHE
17 ORCE
18 SANDRA
19 MIRSAD
20 OKTAJ
The particular, Cyrlic alphabet is Macedonian.
I am not sure if there is any R package that deals with such conversion?

You can use functions from the package stringi, for example:
> stri_trans_general('ДРАГИ', 'latin')
[1] "DRAGI"

Related

R Script to rearrange the elements of a vector by interleaving it

How to write an R-script to initialize a vector with integers, rearrange the elements by interleaving the
first half elements with the second half elements and store in the same vector without using pre-defined function and display the updated vector.

This sounds like a homework question, and it would be nice to see some effort on your own part, but it's pretty straightforward to do this in R.
Suppose your vector looks like this:
vec <- 1:20
vec
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Then you can just do:
c(t(cbind(vec[1:10], vec[11:20])))
#> [1] 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 20
This works by joining the two vectors into a 10 x 2 matrix, then transposing that matrix and turning it into a vector.

We may use matrix directly and concatenate
c(matrix(vec, nrow = 2, byrow = TRUE))
-output
[1] 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 20
data
vec <- 1:20

Or using mapply:
vec <- 1:20
c(mapply(\(x,y) c(x,y), vec[1:10], vec[11:20]))
#> [1] 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 20

We can try this using order + %%
> vec[order((seq_along(vec) - 1) %% (length(vec) / 2))]
[1] 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 20

Another way is to use rbind on the 2 halves of the vector, which creates a matrix with two rows. Then, we can then turn the matrix into a vector, which will go through column by column (i.e., 1, 11, 2, 12...). However, this will only work for even vectors.
vec <- 1:20
c(rbind(vec[1:10], vec[11:20]))
# [1] 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 20
So, for uneven vectors, we can use order, which will return the indices of the numbers in the two seq_along vectors.
vec2 <- 1:21
order(c(seq_along(vec2[1:10]),seq_along(vec2[11:21])))
# [1] 1 11 2 12 3 13 4 14 5 15 6 16 7 17 8 18 9 19 10 20 21

ordering alpha numeric variable in r

I would like to order a data frame based on an alphanumeric variable. Here how my dataset looks like:
sample.data <- data.frame(Grade=c(4,4,4,4,3,3,3,3,3,3,3,3),
ItemID = c(15,15,15,15,17,17,17,17,16,16,16,16),
common.names = c("15_AS_SA1_Correct","15_AS_SA10_Correct","15_AS_SA2_Correct","15_AS_SA3_Correct",
"17_AS_2_B2","17_AS_2_B1","17_AS_5_C1","17_AS_4_D1",
"16_AS_SA1_Negative","16_AS_SA11_Prediction","16_AS_SA12_UnitMeaning","16_AS_SA3_Complete"))
> sample.data
Grade ItemID common.names
1 4 15 15_AS_SA1_Correct
2 4 15 15_AS_SA10_Correct
3 4 15 15_AS_SA2_Correct
4 4 15 15_AS_SA3_Correct
5 3 17 17_AS_2_B2
6 3 17 17_AS_2_B1
7 3 17 17_AS_5_C1
8 3 17 17_AS_4_D1
9 3 16 16_AS_SA1_Negative
10 3 16 16_AS_SA11_Prediction
11 3 16 16_AS_SA12_UnitMeaning
12 3 16 16_AS_SA3_Complete
I need to order by Grade and ItemID, then by common.names variable that contains alphanumeric.
I used this:
sample.data.ordered <- sample.data %>%
arrange(Grade, ItemID,common.names)
but it did not work for the whole set.
My desired output is:
> sample.data.ordered
Grade ItemID common.names
1 3 16 16_AS_SA1_Negative
2 3 16 16_AS_SA3_Complete
3 3 16 16_AS_SA11_Prediction
4 3 16 16_AS_SA12_UnitMeaning
5 3 17 17_AS_2_B1
6 3 17 17_AS_2_B2
7 3 17 17_AS_4_D1
8 3 17 17_AS_5_C1
9 4 15 15_AS_SA1_Correct
10 4 15 15_AS_SA2_Correct
11 4 15 15_AS_SA3_Correct
12 4 15 15_AS_SA10_Correct
Any thoughts?
Thanks!

A base R solution using order as well as a more complex procedure for common.names involving gsub, regular expression and multiple backreference to match the numbers in the strings by which the column can be ordered:
sample.data[order(sample.data$Grade,
sample.data$ItemID,
as.numeric(gsub(".*(SA|AS_)(\\d+)_(\\w)?(\\d)?.*", "\\2\\4", sample.data$common.names))),]
Grade ItemID common.names
9 3 16 16_AS_SA1_Negative
12 3 16 16_AS_SA3_Complete
10 3 16 16_AS_SA11_Prediction
11 3 16 16_AS_SA12_UnitMeaning
6 3 17 17_AS_2_B1
5 3 17 17_AS_2_B2
8 3 17 17_AS_4_D1
7 3 17 17_AS_5_C1
1 4 15 15_AS_SA1_Correct
3 4 15 15_AS_SA2_Correct
4 4 15 15_AS_SA3_Correct
2 4 15 15_AS_SA10_Correct

How to delete this [] from column?

I've downloaded a table from wikipedia and in some columns there are links next to numbers. Is this possible to delete it ?
In column in Rstudio it looks like this:
402[38]
[38] - this is what I don't want.

We can do this easily in base R with Regex:
a <- data.frame(V1 = paste0(1:20, sprintf("[%s]", 50:70))
a$V2 <- gsub("\\[.*?\\]","", a$V1)
V1 V2
1 1[50] 1
2 2[51] 2
3 3[52] 3
4 4[53] 4
5 5[54] 5
6 6[55] 6
7 7[56] 7
8 8[57] 8
9 9[58] 9
10 10[59] 10
11 11[60] 11
12 12[61] 12
13 13[62] 13
14 14[63] 14
15 15[64] 15
16 16[65] 16
17 17[66] 17
18 18[67] 18
19 19[68] 19
20 20[69] 20
21 1[70] 1
And this conveniently works for the case of multiple references as well:
a <- data.frame(V1 = paste0(1:20, sprintf("[%s][%s]", 50:70, 80:100)))

how to deal with this kind of data type

I used igraph package to detect communities. When I used membership(community) function, the result is:
1 2 3 4 5 6 7 13 17 18 19 20 22 23 24 25
12 9 1 10 12 6 12 16 1 11 6 6 3 13 16 1
29 30 31 33 34 37 38 39 40 41 42 43 44 45 46 47
9 5 11 14 13 6 13 11 12 13 1 16 11 6 12 7
...
The first line is node ID and the second line is its corresponding community ID.
Suppose the name of the above result is X. I used Y=data.frame(X). The result is:
community
1 12
2 9
3 1
4 10
5 12
6 6
7 12
13 16
...
I want to use the first column (1,2,3,...), for instance, Y[13,]=16. But in this case, it is Y[8,]=16. How to do this?
This question may be very simple. But I do not know how to google it. Thanks.

Function as.data.frame() converts a named vector to a data frame, where the names of the vector elements are used as row names.
In other words, use a construct like rownames(Y)[8] to access the first column (or the row names, actually).

Read.table error in R

I'm using R to read a text file and then subsequently manipulate it.
The input file has 22 columns. This is what the first column looks like :
NAME LENGTH A C D E F G H I K L M N P Q R S T V W Y
I am currently using:
read.table("filename", stringsAsFactors=FALSE)
to input the file. When I run the same, I get this warning:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 2 did not have 23 elements
Not sure where I am going wrong. I'm new to R and I would really appreciate your help. I've tried to make sure this isn't a repost, but if it is, please do link me to the original.

Assuming the text file looks like this:
NAME LENGTH A C D E F G H I K L M N P Q R S T V W Y
ape:APE_0001 242 15 0 1 12 10 18 2 27 9 43 7 2 8 3 5 25 15 24 3 12
ape:APE_0002 113 7 1 6 6 1 12 3 4 10 16 4 2 4 0 10 3 5 9 4 5
ape:APE_0004 305 24 2 5 8 9 25 4 36 12 43 8 11 14 2 12 20 21 27 9 12
and is called 'dat.txt' and stored in your working directory, this should just work:
dat <- read.table("dat.txt", stringsAsFactors=FALSE, header=TRUE)
# to give:
dat
NAME LENGTH A C D E F G H I K L M N P Q R S T V W Y
1 ape:APE_0001 242 15 0 1 12 10 18 2 27 9 43 7 2 8 3 5 25 15 24 3 12
2 ape:APE_0002 113 7 1 6 6 1 12 3 4 10 16 4 2 4 0 10 3 5 9 4 5
3 ape:APE_0004 305 24 2 5 8 9 25 4 36 12 43 8 11 14 2 12 20 21 27 9 12
Since that doesn't appear to be working for you, there might be something odd and invisible going on in your text file, hidden characters, etc.
Assuming your text file isn't enormous, one workaround would be to open a new R script in RStudio then type in
dat <- read.table(stringsAsFactors=FALSE, header=TRUE, text = "")
And then copy and paste all the text in your text file between the "" in the line above, without any changes to line breaks or formatting, and then select all and send it to the console.
For the example in your comment that would look like this:
dat <- read.table(header=TRUE, stringsAsFactors=FALSE, text = "NAME LENGTH A C D E F G H I K L M N P Q R S T V W Y
ape:APE_0001 242 15 0 1 12 10 18 2 27 9 43 7 2 8 3 5 25 15 24 3 12
ape:APE_0002 113 7 1 6 6 1 12 3 4 10 16 4 2 4 0 10 3 5 9 4 5
ape:APE_0004 305 24 2 5 8 9 25 4 36 12 43 8 11 14 2 12 20 21 27 9 12")
If that's not practical or possible, post a link to your text file in your question (like this: http://temp-share.com/show/dPf3a6oHW deleted automatically after 45 Days) so others can have a look.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Alphabet conversion - Cyrillic to Latin - r

You can use functions from the package stringi, for example: > stri_trans_general('ДРАГИ', 'latin') [1] "DRAGI"

Related

R Script to rearrange the elements of a vector by interleaving it

ordering alpha numeric variable in r

How to delete this [] from column?

how to deal with this kind of data type

Read.table error in R

Categories

Resources