Conversion of strings to numbers - r

Hello I'm looking for a way to turn user inputted strings into matrices for example: 28 = SPACE, 27 = ?, 26 = 0, 25 = A, 24=B 23=C 22=D 21=E 20=F 19=G 18=H
17=I 16=J 15=K 14=L 13=M 12=N 11=O 10=P 9=Q 8=R 7=S 6=T 5=U 4=V 3=W 2=X 1=Y 0=Z
"HI HOW ARE YOU?" -> "[18 17 28 18 11][3 28 25 8 21][28 1 11 5 27]"
wherein each letter/symbol of the string is converted to a numerical value (special attention to spacebar I really don't know how to turn space into numbers). I'll be using these matrices to make a cryptograph

You could use utf8ToInt
x <- "HI HOW ARE YOU?"
We need pmin to get your condition 28 = SPACE right.
pmin(abs(utf8ToInt("HI HOW ARE YOU?") - utf8ToInt("Z")), 28)
# [1] 18 17 28 18 11 3 28 25 8 21 28 1 11 5 27
From ?utf8ToInt :
Conversion of UTF-8 encoded character vectors to and from integer vectors representing a UTF-32 encoding.
First step is
utf8ToInt("HI HOW ARE YOU?")
[1] 72 73 32 72 79 87 32 65 82 69 32 89 79 85 63
from which we substract utf8ToInt("Z"), i.e. 90 because you wrote 0=Z.
Call abs on the result to get positive numbers.
abs(utf8ToInt("HI HOW ARE YOU?") - utf8ToInt("Z"))
# [1] 18 17 58 18 11 3 58 25 8 21 58 1 11 5 27
The last piece is your condition 28 = SPACE, which is where pmin helps you out.

Related

Convert "yymm" numeric to months since 9101 (Jan 1991)

I have a data set that runs from 1991-01 to 1996-12 in R. In order to calculate some time-dependent metrics on some of the data set, I am trying to have a "months since first entry" column. To do so I need to convert a number like 9107 into 07, 9207 into 12+7=19, and 9301 into 12+12+1=25. Ie, the first two digits specify a full year (12 months), and the last two digits specify months since January (01). How would I go about this?
Thank you!
May be, we can use substr (splitted in multiple lines for more clarity and also as.integer)
yrs <- as.integer(substr(v1, 1, 2))
mths <- as.integer(substr(v1, 3, 4))
mths[mths == 1] <- 0
12 * (yrs - yrs[1]) + mths
#[1] 7 19 24
Or without substr
yrs <- v1 %/% 100
12 * (yrs - yrs[1]) + (v1 %% 100)
data
v1 <- c(9107, 9207, 9301)
Using the substrings.
(as.numeric(substr(x, 2, 2)) - 1)*12 + as.numeric(substr(x, 3, 4)) - 1
# [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
# [24] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
# [47] 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
# [70] 69 70 71
I believe, 9101 should be 0, since it is the distance of itself.
Data:
x <- c(9101, 9102, 9103, 9104, 9105, 9106, 9107, 9108, 9109, 9110,
9111, 9112, 9201, 9202, 9203, 9204, 9205, 9206, 9207, 9208, 9209,
9210, 9211, 9212, 9301, 9302, 9303, 9304, 9305, 9306, 9307, 9308,
9309, 9310, 9311, 9312, 9401, 9402, 9403, 9404, 9405, 9406, 9407,
9408, 9409, 9410, 9411, 9412, 9501, 9502, 9503, 9504, 9505, 9506,
9507, 9508, 9509, 9510, 9511, 9512, 9601, 9602, 9603, 9604, 9605,
9606, 9607, 9608, 9609, 9610, 9611, 9612)

How to isolate certain letters inside a string in R?

I need to find an index for all the values that present Q2 or Q4.
So basically skipping the first element that is a title. I need to save: 3,5,7,9..... till the end of the object.
How can I do this analyzing a string?
This is the code:
line = c("ISSUE_CUR", "1993-Q1", "1993-Q2", "1993-Q3", "1993-Q4", "1994-Q1",
"1994-Q2", "1994-Q3", "1994-Q4", "1995-Q1", "1995-Q2", "1995-Q3",
"1995-Q4", "1996-Q1", "1996-Q2", "1996-Q3", "1996-Q4", "1997-Q1",
"1997-Q2", "1997-Q3", "1997-Q4", "1998-Q1", "1998-Q2", "1998-Q3",
"1998-Q4", "1999-Q1", "1999-Q2", "1999-Q3", "1999-Q4", "2000-Q1",
"2000-Q2", "2000-Q3", "2000-Q4", "2001-Q1", "2001-Q2", "2001-Q3",
"2001-Q4", "2002-Q1", "2002-Q2", "2002-Q3", "2002-Q4", "2003-Q1",
"2003-Q2", "2003-Q3", "2003-Q4", "2004-Q1", "2004-Q2", "2004-Q3",
"2004-Q4", "2005-Q1", "2005-Q2", "2005-Q3", "2005-Q4", "2006-Q1",
"2006-Q2", "2006-Q3", "2006-Q4", "2007-Q1", "2007-Q2", "2007-Q3",
"2007-Q4", "2008-Q1", "2008-Q2", "2008-Q3", "2008-Q4", "2009-Q1",
"2009-Q2", "2009-Q3", "2009-Q4", "2010-Q1", "2010-Q2", "2010-Q3",
"2010-Q4", "2011-Q1", "2011-Q2", "2011-Q3", "2011-Q4", "2012-Q1",
"2012-Q2", "2012-Q3", "2012-Q4", "2013-Q1", "2013-Q2", "2013-Q3",
"2013-Q4", "2014-Q1", "2014-Q2", "2014-Q3", "2014-Q4", "2015-Q1",
"2015-Q2", "2015-Q3", "2015-Q4", "2016-Q1", "2016-Q2", "2016-Q3",
"2016-Q4", "2017-Q1", "2017-Q2", "2017-Q3", "2017-Q4", "2018-Q1",
"2018-Q2", "2018-Q3", "2018-Q4", "2019-Q1", "2019-Q2", "2019-Q3",
"2019-Q4")
We can use grep to return the index by matching 'Q2' or (|) 'Q4' at the end ($) of the string
grep("(Q2|Q4)$", line)
#[1] 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
#[31] 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109
Or the 'Q' can be placed outside as it is common
grep("Q(2|4)$", line)
Or another option is endsWith
which(endsWith(line, "Q2")|endsWith(line, "Q4"))
You can use the grep() function.
grep("[24]$", line, value = T)
The "[24]$" is a regular expression pattern. For a good tutorial on it, you may check here.
https://www.regular-expressions.info/rlanguage.html

when you need a Kinhom rather than a Kest?

envelope of the K funcition (and its derivative such as L) is very useful for validating a fitted spatial points process model. for instance, I fit a poisson model for a data J1a2, which is as following:
J1a2.points:
# X.1 X Y
1 1 118.544 1638.445
2 2 325.995 1761.223
3 3 681.625 1553.771
4 4 677.392 1816.261
5 5 986.451 1685.016
6 6 1469.093 1354.787
7 7 1608.805 1625.744
8 8 1994.071 1782.391
9 9 1968.669 1375.955
10 10 2362.403 1337.852
11 11 2701.099 1773.924
12 12 2900.083 1820.495
13 13 2963.588 1668.081
14 14 3412.360 1676.549
15 15 3378.490 1456.396
16 16 3721.420 1464.863
17 17 3823.028 1701.951
18 18 4072.817 1790.859
19 19 4089.751 1388.656
20 20 97.375 715.497
21 21 376.799 1033.025
22 22 563.082 1126.166
23 23 935.647 1206.607
24 24 512.277 486.876
25 25 935.647 757.834
26 26 1409.821 410.670
27 27 1435.223 639.290
28 28 1706.180 1045.726
29 29 1968.669 876.378
30 30 2307.365 711.263
31 31 2624.892 897.546
32 32 2654.528 1236.243
33 33 2857.746 423.371
34 34 3039.795 639.290
35 35 3298.050 707.029
36 36 3111.767 1011.856
37 37 3361.555 1227.775
38 38 4047.414 1185.438
39 39 3569.007 508.045
40 40 4250.632 469.942
41 41 4386.110 872.144
42 42 93.141 237.088
43 43 554.614 186.283
44 44 757.832 148.180
45 45 965.283 220.153
46 46 1723.115 296.360
47 47 1744.283 423.371
48 48 1913.631 203.218
49 49 2167.653 292.126
50 50 2629.126 211.685
51 51 3217.610 283.658
52 52 3827.262 325.996
and:
J1a2.Win<-owin(c(0, 4500.42),c(0, 1917.87))
if you draw evelope for the data with Lest:
library(spatstat)
env.data<-envelope(J1a2, Lest,correction="border",
nsim=19, global=TRUE)
plot(env.data,.-r~r, shade=NULL, legend=FALSE,
xlab=expression(paste("r(",mu,"m)")),ylab="L(r)-r", main = "")
the Lest() curve goes out of the envelope. however, if you use Linhom instead of Lest, you will find the Linhom() are all inside of the envelope.
it seems that this suggest a inhomogenous density kernel of the data. so I use y as covariate in fitting:
poisson.J1a2<-ppm(J1a2~1,Poisson(),correction="border")
y1.J1a2<-ppm(J1a2~y,correction="border")
anova(poisson.J1a2,y.J1a2,test="LR") #p=0.6484
I don't find any evidence of a spatial trend of density along y, or x, or their combinations.
then why the Linhom() outperform the Lest() in this case?
furthermore, when should one decide to use Linhom() instead of Lest?
You should first decide whether or not the intensity can be assumed to be constant. To help you with this you can look at kernel density estimates or do formal tests such as a quadrat test etc. If you decide that the intensity can be assumed to be constant you use Lest() if this is not the case you use Linhom().

Saving an output from R into excel format?

After running the predict function for glm i get an output in the below format:
1 2 3 4 5 6 7 8 9 10 11 12
3.954947e-01 8.938624e-01 7.775473e-01 1.294646e-02 3.954947e-01 9.625746e-01 9.144256e-01 4.739872e-01 1.443219e-01 1.180850e-04 2.138978e-01 7.775473e-01
13 14 15 16 17 18 19 20 21 22 23 24
5.425436e-03 2.069844e-04 2.723969e-01 4.739872e-01 9.144256e-01 1.091998e-01 2.070056e-02 5.114936e-01 1.443219e-01 5.922029e-01 7.578099e-02 8.937642e-01
25 26 27 28 29 30 31 32 33 34 35 36
6.069970e-02 6.069970e-02 1.337947e-01 1.090992e-01 4.841467e-02 9.205547e-01 3.954947e-01 3.874915e-05 3.855242e-02 1.344839e-01 6.318574e-04 2.723969e-01
37 38 39 40 41 42 43 44 45 46 47 48
7.400276e-04 8.593199e-01 6.666800e-01 2.069844e-04 8.161623e-01 4.916555e-05 3.060374e-02 3.402079e-01 2.256598e-03 9.363767e-01 6.116082e-01 3.940969e-03
49 50 51 52 53 54 55 56 57 58 59 60
7.336723e-01 2.425257e-02 3.369967e-03 5.624262e-02 1.090992e-01 1.357630e-06 1.278169e-04 3.046189e-01 8.938624e-01 4.535894e-01 5.132348e-01 3.220426e-01
61 62 63 64 65 66 67 68 69 70 71 72
3.366492e-03 1.357630e-06 1.014721e-01 1.294646e-02 9.144256e-01 1.636988e-02 2.070056e-02 1.012835e-01 5.000274e-03 8.165247e-02 1.357630e-06 8.033850e-03
IS there any code by which I can get the complete output vertically or in an excel format? Thank you in advance!
The simplest way is to write a character separated value file using a comma as the delimiter:
[Acknowledge Roland's comment] write.csv(data.frame(predict(yourGLM)), "file.csv")
Excel reads these automatically, especially if you save the file with a .csv extension.
If its just a matter of viewing it vertically first create the data:
# create test data
example(predict.glm)
pred <- predict(budworm.lg)
1) Separate R Window Use View to display it in a separate R window:
View(pred)
2) R Console to display it on the R console vertically:
data.frame(pred)
3) Browser to display it in the browser vertically:
library(R2HTML)
HTMLStart(); HTML(data.frame(pred)); w <- HTMLStop()
browseURL(w)
4) Excel to display it in Excel vertically using w we just computed:
shell(paste("start excel", w))

R sorts a vector on its own accord

df.sorted <- c("binned_walker1_1.grd", "binned_walker1_2.grd", "binned_walker1_3.grd",
"binned_walker1_4.grd", "binned_walker1_5.grd", "binned_walker1_6.grd",
"binned_walker2_1.grd", "binned_walker2_2.grd", "binned_walker3_1.grd",
"binned_walker3_2.grd", "binned_walker3_3.grd", "binned_walker3_4.grd",
"binned_walker3_5.grd", "binned_walker4_1.grd", "binned_walker4_2.grd",
"binned_walker4_3.grd", "binned_walker4_4.grd", "binned_walker4_5.grd",
"binned_walker5_1.grd", "binned_walker5_2.grd", "binned_walker5_3.grd",
"binned_walker5_4.grd", "binned_walker5_5.grd", "binned_walker5_6.grd",
"binned_walker6_1.grd", "binned_walker7_1.grd", "binned_walker7_2.grd",
"binned_walker7_3.grd", "binned_walker7_4.grd", "binned_walker7_5.grd",
"binned_walker8_1.grd", "binned_walker8_2.grd", "binned_walker9_1.grd",
"binned_walker9_2.grd", "binned_walker9_3.grd", "binned_walker9_4.grd",
"binned_walker10_1.grd", "binned_walker10_2.grd", "binned_walker10_3.grd")
One would expect that order of this vector would be 1:length(df.sorted), but that appears not to be the case. It looks like R internally sorts the vector according to its logic but tries really hard to display it the way it was created (and is seen in the output).
order(df.sorted)
[1] 37 38 39 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[26] 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Is there a way to "reset" the ordering to 1:length(df.sorted)? That way, ordering, and the output of the vector would be in sync.
Use the mixedsort (or) mixedorder functions in package gtools:
require(gtools)
mixedorder(df.sorted)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[28] 28 29 30 31 32 33 34 35 36 37 38 39
construct it as an ordered factor:
> df.new <- ordered(df.sorted,levels=df.sorted)
> order(df.new)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
EDIT :
After #DWins comment, I want to add that it is even not nessecary to make it an ordered factor, just a factor is enough if you give the right order of levels :
> df.new2 <- factor(df.sorted,levels=df.sorted)
> order(df.new)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ...
The difference will be noticeable when you use those factors in a regression analysis, they can be treated differently. The advantage of ordered factors is that they let you use comparison operators as < and >. This makes life sometimes a lot easier.
> df.new2[5] < df.new2[10]
[1] NA
Warning message:
In Ops.factor(df.new[5], df.new[10]) : < not meaningful for factors
> df.new[5] < df.new[10]
[1] TRUE
Isn't this simply the same thing you get with all lexicographic shorts (as e.g. ls on directories) where walker10_foo sorts higher than walker1_foo?
The easiest way around, in my book, is to use a consistent number of digits, i.e. I would change to binned_walker01_1.grd and so on inserting a 0 for the one-digit counts.
In response to Dwin's comment on Dirk's answer: the data are always putty in your hands. "This is R. There is no if. Only how." -- Simon Blomberg
You can add 0 like so:
df.sorted <- gsub("(walker)([[:digit:]]{1}_)", "\\10\\2", df.sorted)
If you needed to add 00, you do it like this:
df.sorted <- gsub("(walker)([[:digit:]]{1}_)", "\\10\\2", df.sorted)
df.sorted <- gsub("(walker)([[:digit:]]{2}_)", "\\10\\2", df.sorted)
...and so on.

Resources