Related
I am trying to name the rows in a matrix, but it adds the prefixes 'X.', 'X..' etc. in front of these names. Also, the row names don't come out properly. For example, the first-row name is supposed to be 'e subscript (t+1)' but it shows something else. The even-numbered row names should be vacant, but they are given names. Could you help please?
Please see the dataset here.
Below is the code I used:
rownames(table2a)=paste(c("$e_t+1$"," ","$r_t+1$", " ","$\\Delta y_{n,t+1}$"," ",
"$s_{n,t+1}$"," ", "$d_t+1-p_t+1$"," ", "$rb_t+1$"," "))
Included libraries: matrix, dplyr, tidyverse, xtable.
Below is the data from dput(table2a):
structure(c(-0.011918875562309, 0.0493186644094629, 0.00943711646402318,
0.0084043692395113, 0.0140061617086464, 0.00795790133389348,
-0.00372516684283399, 0.00631517247007723, 0.00514156266584497,
0.0039339752041611, 0.0148362913561212, 0.00793003246354337,
-0.0807656037164587, 0.0599852917766847, 0.991792361285981, 0.0102220639400435,
-0.00608828061911691, 0.00967903407684488, 0.010343002117867,
0.00768101625973846, 0.0578541455030235, 0.00478481429473926,
-0.00902328873121743, 0.00964513773477125, -0.799680080407018,
0.340494864072598, 0.0519648273240202, 0.0580235615884655, 0.0850517813830584,
0.0549411579861702, -0.0665428760388874, 0.0435997977143392,
-0.032698572959578, 0.027160069487786, 0.114163705951583, 0.0547487519805466,
0.352025366916776, 0.197746547959218, 0.0476825327079758, 0.0336978915546042,
0.0464511908714403, 0.0319077480426568, 0.904849333951824, 0.0253211146465119,
0.132904050913606, 0.0157735418364402, 0.0653710645280691, 0.0317960059066269,
0.939695537568421, 0.612311426298072, -0.0578948128653228, 0.104343687684969,
-0.0744692071400603, 0.0988006057025484, 0.121089017775182, 0.0784054537723728,
0.0345069733304992, 0.048841914052704, -0.090885199308955, 0.0984546022582597,
-0.280821673428002, 0.248826811381596, -0.0288068135696716, 0.0424024540117092,
-0.0239685609446809, 0.0401498953370305, 0.00219488911775388,
0.0318618569231297, 0.066433933135983, 0.0198480335553826, 0.871940074366622,
0.0400092888905855), .Dim = c(12L, 6L), .Dimnames = list(c("$e_t+1$",
" ", "$r_t+1$", " ", "$\\Delta y_{n,t+1}$", " ", "$s_{n,t+1}$",
" ", "$d_t+1-p_t+1$", " ", "$rb_t+1$", " "), c("ex_stock_ret_100.l1",
"real_int_100.l1", "Chg_1month.l1", "spreads.l1", "log_dp.l1",
"rb_rate_100.l1")))
My desired output (row & column names) is as shown in this picture
enter image description here
In case you want to remove that prefix, you can do the following:
rownames(table2a) <- substring(rownames(table2a), 2)
you can remove the first X and all following dots (no matter how many there are) with the gsub command:
rownames(table2a) <- gsub("^X\\.*","",rownames(table2a))
^ = beginning of the string;
X = your actual X;
\\. = a dot;
* = 0 or more of the before mentioned (in this case \\.); so in total ^X\\. means: if you find X as the first letter and all possible dots following directly behind it.
gsub replaces this find with "", meaning nothing, leaving only whatever comes after
EDIT:
to also get rid of every 2nd rowname, add a little something extra:
rownames(table2a) <- gsub("^X\\.*[1-9]*","",rownames(table2a))
which gets rid of any number directly behind the dots. This should leave those rows empty.
I would like to insert characters in the places were a string change its case. I tried this to insert a '\n' after a fixed number of characters and then a ' ', as I don't figure out how to detect the case change
s <-c("FloridaIslandE7", "FloridaIslandE9", "Meta")
gsub('^(.{7})(.{6})(.*)$', '\\1\\\n\\2 \\3', s )
[1] "Florida\nIsland E7" "Florida\nIsland E9" "Meta"
This works because the positions are fixed but I would like to know how to do it for the general case.
Surely there's a less convoluted regex for this, but you could try:
gsub('([A-Z][0-9])', ' \\1', gsub('([a-z])([A-Z])', '\\1\n\\2', s))
Output:
[1] "Florida\nIsland E7" "Florida\nIsland E9" "Meta"
Here is an option
str_replace_all(s, "(?<=[a-z])(?=[A-Z])", "\n")
#[1] "Florida\nIsland\nE7" "Florida\nIsland\nE9" "Meta"
If you really want to insert \n, try this:
gsub("([a-z])([A-Z])", "\\1\\\n\\2", s)
[1] "Florida\nIsland\nE7" "Florida\nIsland\nE9" "Meta"
Hi I am trying to convert some xml to csv using xquery and found a previous post that helped me get to this point:
for $b in /root/Result
return
concat(escape-html-uri(string-join(($b/HolidayEndDate,
$b/HolidayType,
$b/FirstName,
$b/AllowanceRemainingDays,
$b/HolidayStartDate,
$b/EmployeeId,
$b/AllowanceDays,
$b/LastName,
$b/HolidayDurationDays
)
/normalize-space(),
",")
),
codepoints-to-string(10))
This returns all of the data as required but no Header row. Is there a simple addition to the above code that would also return the header row? Thanks. :)
Since your query returns a sequence of lines, you can just prepend another line before the FLWOR expression:
"HolidayEndDate,HolidayType,FirstName,AllowanceRemainingDays,HolidayStartDate,EmployeeId,AllowanceDays,LastName,HolidayDurationDays
",
for $b in /root/Result
return
concat(escape-html-uri(string-join(($b/HolidayEndDate,
$b/HolidayType,
$b/FirstName,
$b/AllowanceRemainingDays,
$b/HolidayStartDate,
$b/EmployeeId,
$b/AllowanceDays,
$b/LastName,
$b/HolidayDurationDays
)
/normalize-space(),
",")
),
codepoints-to-string(10))
Because nested sequences are flattened (i.e. concatenated) in XQuery, this results in one output sequence including the header. Note also that I used a character entity '
' for the newline character, which is much shorter than codepoints-to-string(10).
concat("HolidayEndDate,HolidayType,FirstName,AllowanceRemainingDays,HolidayStartDate,EmployeeId,AllowanceDays,LastName,HolidayDurationDays
",
string-join(
for $b in /root/Result
return
concat(escape-html-uri(string-join(($b/HolidayEndDate,
$b/HolidayType,
$b/FirstName,
$b/AllowanceRemainingDays,
$b/HolidayStartDate,
$b/EmployeeId,
$b/AllowanceDays,
$b/LastName,
$b/HolidayDurationDays
)
/normalize-space(),
",")
),
codepoints-to-string(10)), "")
)
I want to use regex to replace commands or tags around strings. My use case is converting LaTeX commands to bookdown commands, which means doing things like replacing \citep{*} with [#*], \ref{*} with \#ref(*), etc. However, lets stick to the generalized question:
Given a string <begin>somestring<end> where <begin> and <end> are known and somestring is an arbitrary sequence of characters, can we use regex to susbstitute <newbegin> and <newend> to get the string <newbegin>somestring<newend>?
For example, consider the LaTeX command \citep{bonobo2017}, which I want to convert to [#bonobo2017]. For this example:
<begin> = \citep{
somestring = bonobo2017
<end> = }
<newbegin> = [#
<newend> = ]
This question is basically the inverse of this question.
I'm hoping for an R or notepad++ solution.
Additional Examples
Convert \citet{bonobo2017} to #bonobo2017
Convert \ref{myfigure} to \#ref(myfigure)
Convert \section{Some title} to # Some title
Convert \emph{something important} to *something important*
I'm looking for a template regex that I can fill in my <begin>, <end>, <newbegin> and <newend> on a case-by-case basis.
You can try something like this with dplyr + stringr:
string = "\\citep{bonobo2017}"
begin = "\\citep{"
somestring = "bonobo2017"
end = "}"
newbegin = "[#"
newend = "]"
library(stringr)
library(dplyr)
string %>%
str_extract(paste0("(?<=\\Q", begin, "\\E)\\w+(?=\\Q", end, "\\E)")) %>%
paste0(newbegin, ., newend)
or:
string %>%
str_replace_all(paste0("\\Q", begin, "\\E|\\Q", end, "\\E"), "") %>%
paste0(newbegin, ., newend)
You can also make it a function for convenience:
convertLatex = function(string, BEGIN, END, NEWBEGIN, NEWEND){
string %>%
str_replace_all(paste0("\\Q", BEGIN, "\\E|\\Q", END, "\\E"), "") %>%
paste0(NEWBEGIN, ., NEWEND)
}
convertLatex(string, begin, end, newbegin, newend)
# [1] "[#bonobo2017]"
Notes:
Notice that I manually added an additional \ to "\\citep{bonobo2017}", this is because raw strings don't exist in R(I hope they do exist), so a single \ would be treated as an escape character. I need another \ to escape the first \.
The regex in str_extract uses positive lookbehind and positve lookahead to extract the somestring in between begin and end.
str_replace takes another approach of removing begin and end from string.
The "\\Q", "\\E" pair in the regex means "Backslash all nonalphanumeric characters" and "\\E" ends the expression. This is especially useful in your case since you likely have special characters in your Latex command. This expression automatically escapes them for you.
I'm working to grab two different elements in a string.
The string look like this,
str <- c('a_abc', 'b_abc', 'abc', 'z_zxy', 'x_zxy', 'zxy')
I have tried with the different options in ?grep, but I can't get it right, 'm doing something like this,
grep('[_abc]:[_zxy]',str, value = TRUE)
and what I would like is,
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
any help would be appreciated.
Use normal parentheses (, not the square brackets [
grep('_(abc|zxy)',str, value = TRUE)
[1] "a_abc" "b_abc" "z_zxy" "x_zxy"
To make the grep a bit more flexible, you could do something like:
grep('_.{3}$',str, value = TRUE)
Which will match an underscore _ followed by any character . three times {3} followed immediately by the end of the string $
this should work: grep('_abc|_zxy', str, value=T)
X|Y matches when either X matches or Y matches
In this case just doing:
str[grep("_",str)]
will work... is it more complicated in your specific case?