A set of positive integers that do not begin with 0, except for 0 - bnf

While trying to solve the following exercises in programming language subjects, I know my answer can't create string 201, but I can't imagine how to solve this exception.
Problem: L(G) is a set of positive decimal numbers that do not start with 0, except zero. Design grammar G.
My answer:
G is:
S -> Digit
NonZeroDigit -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Digit -> 0 | NonZeroDigit | NonZeroDigit 0 | NonZeroDigit Digit
Check correctness:
Digit => 0
Digit => NonZeroDigit => 1
Digit => NonZeroDigit Digit => 2 Digit => 20
If I add Digit -> Digit Digit, it would create Digit => Digit Digit => Digit Digit Digit => 201, but this also can create Digit => Digit Digit => Digit Digit Digit => 000. What?
How do I change the grammar I define so I can meet the condition?

Why not just Split n=0 and n>0?
S -> 0 | posDig digit
posDig -> 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
digit -> digit digit | 0 | posDig | <epsilon>
Instead of (posDig digit) in S, you could also say e.g. number (tho 1 to 9 would now be a number as well)
From there on, you just need to make sure the first digit is not

Related

Does the stem function in R handle large counts correctly?

I am wondering if the stem function in R is producing the stem and leaf plot correctly for this example. The code
X <- c(rep(1,1000),2:15)
stem(X,width = 20)
produces the output
The decimal point is at the |
1 | 00000000+980
2 | 0
3 | 0
4 | 0
5 | 0
6 | 0
7 | 0
8 | 0
9 | 0
10 | 0
11 | 0
12 | 0
13 | 0
14 | 0
15 | 0
There are 1000 ones in the data, but the output of the stem function seems to indicate that there are 988 ones (if you count the zeros in the first row and add 980). Instead of +980, I think it should display +992 at the end of the first row.
Is there an error in the stem function or am I not reading the output correctly?

understanding max in summation notation

This problem is found in https://codeforces.com/problemset/problem/1326/C. I don't seem to get the summation notation with max function.
Here's an explanation of one of the samples where n = 7 and k = 3. I don't get why the partition value of each is 18. This is my question. How was 18 derived from these?
Despite that core part of a question is not about programming, I'll answer.
The essential part in this is that numbers in brackets mean position of element, not the value. So for case 3 the mapping is:
index | value
1 | 2
2 | 7
3 | 3
4 | 1
5 | 5
6 | 4
7 | 6
So for the first partitioning ({[1,2],[3,5],[6,7]}) it will divide elements in this partitions: {{2,7}, {3, 1, 5}, {4, 6}}. And applying max to each subset you'll have:
{2, 7} -> 7
{3, 1, 5} -> 5
{4, 6} -> 6
7 + 5 + 6 = 18

How to fill in observations using other observations R or Stata

I have a dataset like this:
ID dum1 dum2 dum3 var1
1 0 1 . hi
1 0 . 0 hi
2 1 . . bye
2 0 0 1 .
What I'm trying to do is that I want to fill in information based on the same ID if observations are missing. So my end product would be something like:
ID dum1 dum2 dum3 var1
1 0 1 0 hi
1 0 1 0 hi
2 1 0 1 bye
2 0 0 1 bye
Is there any way I can do this in R or Stata?
This continues discussion of Stata solutions. The solution by #Pearly Spencer looks backward and forward from observations with missing values and so is fine for the example with just two observations per group, and possibly fine for some other situations.
An alternative approach makes use, as appropriate, of the community-contributed commands mipolate and stripolate from SSC as explained also at https://www.statalist.org/forums/forum/general-stata-discussion/general/1308786-mipolate-now-available-from-ssc-new-program-for-interpolation
Examples first, then commentary:
clear
input ID dum1a dum2a dum3a str3 var1a
1 0 1 . "hi"
1 0 . 0 "hi"
2 1 . . "bye"
2 0 0 1 ""
2 0 1 . ""
end
gen long obsno = _n
foreach v of var dum*a {
quietly count if missing(`v')
if r(N) > 0 capture noisily mipolate `v' obsno, groupwise by(ID) generate(`v'_2)
}
foreach v of var var*a {
quietly count if missing(`v')
if r(N) > 0 capture noisily stripolate `v' obsno, groupwise by(ID) generate(`v'_2)
}
list
+----------------------------------------------------------------+
| ID dum1a dum2a dum3a var1a obsno dum3a_2 var1a_2 |
|----------------------------------------------------------------|
1. | 1 0 1 . hi 1 0 hi |
2. | 1 0 . 0 hi 2 0 hi |
3. | 2 1 . . bye 3 1 bye |
4. | 2 0 0 1 4 1 bye |
5. | 2 0 1 . 5 1 bye |
+----------------------------------------------------------------+
Notes:
The groupwise option of mipolate and stripolate uses the rule: replace missing values within groups with the non-missing value in that group if and only if there is only one distinct non-missing value in that group. Thus if the non-missing values in a group are all 1, or all 42, or whatever it is, then interpolation uses 1 or 42 or whatever it is. If the non-missing values in a group are 0 and 1, then no go.
The variable obsno created here plays no role in that interpolation and is needed solely to match the general syntax of mipolate.
There is no assumption here that groups consist of just two observations or have the same number of observations. A common playground for these problems is data on families whenever some variables were recorded only for certain family members but it is desired to spread the values recorded to other family members. Naturally, in real data families often have more than two members and the number of family members will vary.
This question exposed a small bug in mipolate, groupwise and stripolate, groupwise: it doesn't exit as appropriate if there is nothing to do, as in dum1a where there are no missing values. In the code above, this is trapped by asking for interpolation if and only if missing values are counted. At some future date, the bug will be fixed and the code in this answer simplified accordingly, or so I intend as program author.
mipolate, groupwise and stripolate, groupwise both exit with an error message if any group is found with two or more distinct non-missing values; no interpolation is then done for any groups, even if some groups are fine. That is the point of the code capture noisily: the error message for dum2a is not echoed above. As program author I am thinking of adding an option whereby such groups will be ignored but that interpolation will take place for groups with just one distinct non-missing value.
Assuming your data is in df
library(dplyr)
df %>%
group_by(ID) %>%
mutate(dum1=dum1[dum1!="."][1],
dum2=dum2[dum2!="."][1],
dum3=dum3[dum3!="."][1],
var1=var1[var1!="."][1])
Using your toy example:
clear
input ID dum1a dum2a dum3a str3 var1a
1 0 1 . "hi"
1 0 . 0 "hi"
2 1 . . "bye"
2 0 0 1 "."
end
replace var1a = "" if var1a == "."
sort ID (dum2a)
list
+------------------------------------+
| ID dum1a dum2a dum3a var1a |
|------------------------------------|
1. | 1 0 1 . hi |
2. | 1 0 . 0 hi |
3. | 2 0 0 1 |
4. | 2 1 . . bye |
+------------------------------------+
In Stata you can do the following:
ds ID, not
local varlist `r(varlist)'
foreach var of local varlist {
generate `var'b = `var'
bysort ID (`var'): replace `var'b = cond(!missing(`var'[_n-1]), `var'[_n-1], ///
`var'[_n+1]) if missing(`var')
}
list ID dum?ab var?ab
+----------------------------------------+
| ID dum1ab dum2ab dum3ab var1ab |
|----------------------------------------|
1. | 1 0 1 0 hi |
2. | 1 0 1 0 hi |
3. | 2 0 0 1 bye |
4. | 2 1 0 1 bye |
+----------------------------------------+

In Unix what does the ^ do when placed in a math expression

I have been searching for the answer to this and was unable to find an exact answer help will be much appreciated.
echo $[ 2 ^ 2 ]
returns value 0
echo $[ 2 ^ 3 ]
returns 1
echo $[ 2 ^ 4 ]
returns 6
My question is what math operation is taking place when using the ^ in this context?
I expected to see a power of function. Would really appreciate any clarification, thanks in advance.
It's a bitwise XOR operation.
It compares the bits for the two numbers, and if for a given position, one of the bits is 1, the resulting bit will also be set to 1. In all other cases, the resulting bit will be 0.
So, for your examples:
2 010
2 010
--------
0 000
2 010
3 011
--------
1 001
2 010
4 100
--------
6 110
I would say, your commands are doing a bit-xor with the numbers.

Is preprocessing file with awk needed or it can be done directly in R?

I used to process csv file with awk, here is my 1st script:
tail -n +2 shifted_final.csv | awk -F, 'BEGIN {old=$2} {if($2!=old){print $0; old=$2;}}' | less
this script looks for repeating values in 2nd column (if value on line n is same as on line n+1, n+2 ...) and prints only first occurrence. For example if you feed following input:
ord,orig,pred,as,o-p
1,0,0,1.0,0
2,0,0,1.0,0
3,0,0,1.0,0
4,0,0,0.0,0
5,0,0,0.0,0
6,0,0,0.0,0
7,0,0,0.0,0
8,0,0,0.0,0
9,0,0,0.0,0
10,0,0,0.0,0
11,0,0,0.0,0
12,0,0,0.0,0
13,0,0,0.0,0
14,0,0,0.0,0
15,0,0,0.0,0
16,0,0,0.0,0
17,0,0,0.0,0
18,0,0,0.0,0
19,0,0,0.0,0
20,0,0,0.0,0
21,0,0,0.0,0
22,0,0,0.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0
Then the output will be:
1,0,0,1.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0
EDIT:
I've made this a bit challenging adding 2nd script:
The second script does the same but prints last duplicate occurrence:
tail -n +2 shifted_final.csv | awk -F, 'BEGIN {old=$2; line=$0} {if($2==old){line=$0}else{print line; old=$2; line=$0}} END {print $0}' | less
It's output will be:
22,0,0,0.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0
I suppose R is powerful language which should handle such tasks, but I've found only questions regarding calling awk scripts from R etc. How to do this in R?
Regarding the update to your question, a more general solution, thanks to #nicola:
Idx.first <- c(TRUE, tbl$orig[-1] != tbl$orig[-nrow(tbl)])
##
R> tbl[Idx.first,]
# ord orig pred as o.p
# 1 1 0 0 1 0
# 23 23 4 0 0 4
# 24 24 402 0 1 402
# 25 25 0 0 1 0
If you want to use the last occurrence of a value in a run, rather than the first, just append TRUE to #nicola's indexing expression instead of prepending it:
Idx.last <- c(tbl$orig[-1] != tbl$orig[-nrow(tbl)], TRUE)
##
R> tbl[Idx.last,]
# ord orig pred as o.p
# 22 22 0 0 0 0
# 23 23 4 0 0 4
# 24 24 402 0 1 402
# 25 25 0 0 1 0
In either case, tbl$orig[-1] != tbl$orig[-nrow(tbl)] is comparing the 2nd through nth values in column 2 with the 1st through n-1th values in column 2. The result is a logical vector, where TRUE elements indicate a change in consecutive values. Since the comparison is of length n-1, pushing an extra TRUE value to the front (case 1) will select the first occurrence in a run, whereas adding an extra TRUE to the back (case 2) will select the last occurrence in a run.
Data:
tbl <- read.table(text = "ord,orig,pred,as,o-p
1,0,0,1.0,0
2,0,0,1.0,0
3,0,0,1.0,0
4,0,0,0.0,0
5,0,0,0.0,0
6,0,0,0.0,0
7,0,0,0.0,0
8,0,0,0.0,0
9,0,0,0.0,0
10,0,0,0.0,0
11,0,0,0.0,0
12,0,0,0.0,0
13,0,0,0.0,0
14,0,0,0.0,0
15,0,0,0.0,0
16,0,0,0.0,0
17,0,0,0.0,0
18,0,0,0.0,0
19,0,0,0.0,0
20,0,0,0.0,0
21,0,0,0.0,0
22,0,0,0.0,0
23,4,0,0.0,4
24,402,0,1.0,402
25,0,0,1.0,0",
header = TRUE,
sep = ",")
For the (updated) question, you could use for example (thanks to #nrussell for his comment and suggestion):
idx <- c(1, cumsum(rle(tbl[,2])[[1]])[-1])
tbl[idx,]
# ord orig pred as o.p x
#1 1 0 0 1 0 1
#23 23 4 0 0 4 2
#24 24 402 0 1 402 3
#25 25 0 0 1 0 4
It will return the first row of each 'block' of identical values in column orig.
rle(tbl[,2])[[1]] computes the run lengths of each new (different than previous) value that appears in column orig
cumsum(...) computes the cumulative sum of those run lengths
finally, c(1, cumsum(...)[-1]) replaces the first number in that vector with a 1, so that the very first line of the data will always be present

Resources