Naming the rows in a matrix in R - r

I am trying to name the rows in a matrix, but it adds the prefixes 'X.', 'X..' etc. in front of these names. Also, the row names don't come out properly. For example, the first-row name is supposed to be 'e subscript (t+1)' but it shows something else. The even-numbered row names should be vacant, but they are given names. Could you help please?
Please see the dataset here.
Below is the code I used:
rownames(table2a)=paste(c("$e_t+1$"," ","$r_t+1$", " ","$\\Delta y_{n,t+1}$"," ",
"$s_{n,t+1}$"," ", "$d_t+1-p_t+1$"," ", "$rb_t+1$"," "))
Included libraries: matrix, dplyr, tidyverse, xtable.
Below is the data from dput(table2a):
structure(c(-0.011918875562309, 0.0493186644094629, 0.00943711646402318,
0.0084043692395113, 0.0140061617086464, 0.00795790133389348,
-0.00372516684283399, 0.00631517247007723, 0.00514156266584497,
0.0039339752041611, 0.0148362913561212, 0.00793003246354337,
-0.0807656037164587, 0.0599852917766847, 0.991792361285981, 0.0102220639400435,
-0.00608828061911691, 0.00967903407684488, 0.010343002117867,
0.00768101625973846, 0.0578541455030235, 0.00478481429473926,
-0.00902328873121743, 0.00964513773477125, -0.799680080407018,
0.340494864072598, 0.0519648273240202, 0.0580235615884655, 0.0850517813830584,
0.0549411579861702, -0.0665428760388874, 0.0435997977143392,
-0.032698572959578, 0.027160069487786, 0.114163705951583, 0.0547487519805466,
0.352025366916776, 0.197746547959218, 0.0476825327079758, 0.0336978915546042,
0.0464511908714403, 0.0319077480426568, 0.904849333951824, 0.0253211146465119,
0.132904050913606, 0.0157735418364402, 0.0653710645280691, 0.0317960059066269,
0.939695537568421, 0.612311426298072, -0.0578948128653228, 0.104343687684969,
-0.0744692071400603, 0.0988006057025484, 0.121089017775182, 0.0784054537723728,
0.0345069733304992, 0.048841914052704, -0.090885199308955, 0.0984546022582597,
-0.280821673428002, 0.248826811381596, -0.0288068135696716, 0.0424024540117092,
-0.0239685609446809, 0.0401498953370305, 0.00219488911775388,
0.0318618569231297, 0.066433933135983, 0.0198480335553826, 0.871940074366622,
0.0400092888905855), .Dim = c(12L, 6L), .Dimnames = list(c("$e_t+1$",
" ", "$r_t+1$", " ", "$\\Delta y_{n,t+1}$", " ", "$s_{n,t+1}$",
" ", "$d_t+1-p_t+1$", " ", "$rb_t+1$", " "), c("ex_stock_ret_100.l1",
"real_int_100.l1", "Chg_1month.l1", "spreads.l1", "log_dp.l1",
"rb_rate_100.l1")))
My desired output (row & column names) is as shown in this picture
enter image description here

In case you want to remove that prefix, you can do the following:
rownames(table2a) <- substring(rownames(table2a), 2)

you can remove the first X and all following dots (no matter how many there are) with the gsub command:
rownames(table2a) <- gsub("^X\\.*","",rownames(table2a))
^ = beginning of the string;
X = your actual X;
\\. = a dot;
* = 0 or more of the before mentioned (in this case \\.); so in total ^X\\. means: if you find X as the first letter and all possible dots following directly behind it.
gsub replaces this find with "", meaning nothing, leaving only whatever comes after
EDIT:
to also get rid of every 2nd rowname, add a little something extra:
rownames(table2a) <- gsub("^X\\.*[1-9]*","",rownames(table2a))
which gets rid of any number directly behind the dots. This should leave those rows empty.

Related

Remove space in print statement in python

While using the below print command:
print(k,':',dict[k])
I get the output as shown below but in the output, i want to remove the space between the key and colon.How to do it?
Current Output:
Sam : 40
Required Output:
Sam: 40
You could try printing a single string consisting of a concatenation:
print(k + ': ' + dict[k])
The python print() statement has a separator parameter that defaults to a space. So the comma-separated values that you are passing into it serve as arguments each of which will get separated by white-space while printing.
I think what you are looking for is
print(name, ": ", "40", sep = '')
>>> Sam: 40
Simply specifying the "sep" parameter solves your issue.

Finding a word with condition in a vector with regex on R (perl)

I would like to find the rows in a vector with the word 'RT' in it or 'R' but not if the word 'RT' is preceded by 'no'.
The word RT may be preceded by nothing, a space, a dot, etc.
With the regex, I tried :
grep("(?<=[no] )RT", aaa,ignore.case = FALSE, perl = T)
Which was giving me all the rows with "no RT".
and
grep("(?=[^no].*)RT",aaa , perl = T)
which was giving me all the rows containing 'RT' with and without 'no' at the beginning.
What is my mistake? I thought the ^ was giving everything but the character that follows it.
Example :
aaa = c("RT alone", "no RT", "CT/RT", "adj.RTx", "RT/CT", "lang, RT+","npo RT" )
(?<=[no] )RT matches any RT that is immediately preceded with "n " or "o ".
You should use a negative lookbehind,
"(?<!no )RT"
See the regex demo.
Or, if you need to check for a whole word no,
"(?<!\\bno )RT"
See this regex demo.
Here, (?<!no ) makes sure there is no no immediately to the left of the current location, and only then RT is consumed.

Remove quotes if "=" (equal) sign exists in the middle of the string. REGEX

In this string the character “=” differentiates attributes for a product, and commas distinguish variables within an attribute. However, we found that sometimes extra quotes have been added when there are no variables to put together.
The complete string is :
Uso="Protector para patas de silla,mesas,escaleras,muebles","Topes,4-Tipo=Topes,regatones",2-Familia=Ferretería y Plomería,regatones,7-Contenido="12 unidades,4-Origen=China,4-Material=Goma,2-Modelo=Goma transparente,9-Incluye=12 unidades,3-Color=Transparente"
This is right:
Uso="Protector para patas de silla,mesas,escaleras,muebles"
This is wrong:
"Topes,4-Tipo=Topes,regatones",2-Familia=Ferretería y Plomería,regatones,7-Contenido="12 unidades,4-Origen=China,4-Material=Goma,2-Modelo=Goma transparente,9-Incluye=12 unidades,3-Color=Transparente"
Categoría="Topes,4-Tipo=Topes,regatones",2-Familia=Ferretería y Plomería,regatones,7-Contenido="12 unidades,4-Origen=China,4-Material=Goma,2-Modelo=Goma transparente,9-Incluye=12 unidades,3-Color=Transparente"
I´ve tried "|w+=" but selects all quotes. I don´t want to select text between quotes, the goal is select and remove these quotes.
We want to remove those quotes that contains an equal in between. The quotes that are ok and need to stay are those used to separate commas within the string, differentiating the variables from the string.
The regex needs to detect an = contained into and opening and closing quotes, but considering text in between. And once this is detected remove those quotes, which no need to be there.
Thanks!
I understand the quoted substring should be preceded with =. Then, you need
gsub('="([^"=]*=[^"]*)"', '=\\1', x)
See the R demo online:
x <- '10-Uso="Protector para patas de silla,mesas,escaleras,muebles",6-Características=Regaton interior 1 1/4 plástico blanco 4 unidades,1-Marca=Nagel,Tipo=Topes,5-Medidas=3 cm,3-Categoría=Topes y regatones,7-Contenido=4 unidades,4-Tipo=Regatones,2-Familia=Ferretería y Plomería,9-Incluye=4 regatones plásticos,regatones,4-Origen="Argentina,4-Material=Plástico,2-Modelo=Regatón interior 1 1/4,3-Color=Blanco"'
cat(gsub('="([^"=]*=[^"]*)"', '=\\1', x))
## => 10-Uso="Protector para patas de silla,mesas,escaleras,muebles",6-Características=Regaton interior 1 1/4 plástico blanco 4 unidades,1-Marca=Nagel,Tipo=Topes,5-Medidas=3 cm,3-Categoría=Topes y regatones,7-Contenido=4 unidades,4-Tipo=Regatones,2-Familia=Ferretería y Plomería,9-Incluye=4 regatones plásticos,regatones,4-Origen=Argentina,4-Material=Plástico,2-Modelo=Regatón interior 1 1/4,3-Color=Blanco
So, the quote after muebles is kept and quote after blanco is removed.
How does this work?
=" - matches =" substring
([^"=]*=[^"]*) - matches and captures into Group 1:
[^"=]* - zero or more chars other than " and =
= - a = sign
[^"]* - any 0+ chars other than "
" - matches ".
The replacement pattern is a = and the value stored in Group 1 memory buffer (\1, a replacement backreference).
See the regex demo.

String recognition in idl

I have the following strings:
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat
F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat
and from each I want to extract the three variables, 1. SWIR32 2. the date and 3. the text following the date. I want to automate this process for about 200 files, so individually selecting the locations won't exactly work for me.
so I want:
variable1=SWIR32
variable2=2005210
variable3=East_A
variable4=SWIR32
variable5=2005210
variable6=Froemke-Hoy
I am going to be using these to add titles to graphs later on, but since the position of the text in each string varies I am unsure how to do this using strmid
I think you want to use a combination of STRPOS and STRSPLIT. Something like the following:
s = ['F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_East_A.dat', $
'F:\Sheyenne\ROI\SWIR32_subset\SWIR32_2005210_Froemke-Hoy.dat']
name = STRARR(s.length)
date = name
txt = name
foreach sub, s, i do begin
sub = STRMID(sub, 1+STRPOS(sub, '\', /REVERSE_SEARCH))
parts = STRSPLIT(sub, '_', /EXTRACT)
name[i] = parts[0]
date[i] = parts[1]
txt[i] = STRJOIN(parts[2:*], '_')
endforeach
You could also do this with a regular expression (using just STRSPLIT) but regular expressions tend to be complicated and error prone.
Hope this helps!

grep on two strings

I'm working to grab two different elements in a string.
The string look like this,
str <- c('a_abc', 'b_abc', 'abc', 'z_zxy', 'x_zxy', 'zxy')
I have tried with the different options in ?grep, but I can't get it right, 'm doing something like this,
grep('[_abc]:[_zxy]',str, value = TRUE)
and what I would like is,
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
any help would be appreciated.
Use normal parentheses (, not the square brackets [
grep('_(abc|zxy)',str, value = TRUE)
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
To make the grep a bit more flexible, you could do something like:
grep('_.{3}$',str, value = TRUE)
Which will match an underscore _ followed by any character . three times {3} followed immediately by the end of the string $
this should work: grep('_abc|_zxy', str, value=T)
X|Y matches when either X matches or Y matches
In this case just doing:
str[grep("_",str)]
will work... is it more complicated in your specific case?

Resources