Find substring that have the the correlation greater than k in R - r

I have a big problems in R :(. We have a dataframe named:"hcmut" show the answers of students in half term test like here:
hcmut
Code | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10
2011 | B | D | A | A | C | B | A | B | C | C
2012 | A | D | AC | B | D | B | A | B | C | C
2013 | A | D | A | A | C | D | C | B | D | D
2014 | A | B | A | C | BC | D | D | D | D | D
Question 1: find substring that have the corrrelation greater than k?
I think k is from range 0:1
Question 2: find substring that have the greatest correlation and show substring ( like "ABCD"...)
Could you help me with problem in R ?
:( :(

Related

How do you assign groups to larger groups dpylr

I would like to assign groups to larger groups in order to assign them to cores for processing. I have 16 cores.This is what I have so far
test<-data_extract%>%group_by(group_id)%>%sample_n(16,replace = TRUE)
This takes staples OF 16 from each group.
This is an example of what I would like the final product to look like (with two clusters),all I really want is for the same group id to belong to the same cluster as a set number of clusters
________________________________
balance | group_id | cluster|
454452 | a | 1 |
5450441 | a | 1 |
5444531 | b | 1 |
5404051 | b | 1 |
5404501 | b | 1 |
5404041 | b | 1 |
544251 | b | 1 |
254252 | b | 1 |
541254 | c | 2 |
54123254 | d | 1 |
542541 | d | 1 |
5442341 | e | 2 |
541 | f | 1 |
________________________________
test<-data%>%group_by(group_id)%>% mutate(group = sample(1:16,1))

Find Pattern Matching every column in excel, output as number of time repeating

I have excel file full of alphabets which is "a","b","c" only. There are total 20 columns and 300 rows. I just wanted to find pattern matching in all columns. For example, if i search for 4 rows of pattern matching where there is "aabc","bcdb" and etc the output will be how many times the same pattern repeat from the csv file. If i search for 5 rows of pattern matching where there is "abcaa", "bbaca" and etc the output will be how many times the same pattern repeat from the csv file. It is not necessary that the matching must be in same rows. If the string pattern occur anywhere else in the file can be considered too. The output may be in the next sheet should be fine. I have tried in VBA and R using Regex but only counting single cell. Any advice on how to find the pattern matching from the excel file would be greatly appreciated. Thanks in advance.
Excel File:
**A B C D**
1 a | a | a | b |
2 a | a | a | c |
3 b | b | b | a |
4 c | c | c | c |
5 d | b | d | b |
6 b | a | b | c |
7 b | b | b | b |
8 a | a | a | c |
9 a | c | a | a |
10 c | c | c | c |
11 c | a | c | c |
12 a | a | a | a |
13 b | b | a | b |
14 b | b | b | a |
15 c | c | c | c |
Output Example:
If search for 4 rows
aabc 3
dbba 2
baac 2
so on...
If search for 5 rows
aabcd 2
aacca 3
so on..
In E1 enter:
=TEXTJOIN("",TRUE,A1:D1)
and copy down. Then copy column E and PasteSpecialValues into column F.Then apply RemoveDuplicates to column F.Then in G1 enter:
=COUNTIF(E:E,F1)
and copy down:

R: How to count rows with same factor levels and a numeric in a range

If got data looking like this:
A | B | C
--------------
f | 1 | 1420h
f | 1 | 1540h
f | 3 | 600h
g | 2 | 900h
g | 2 | 930h
h | 1 | 700h
h | 3 | 400h
Now I want to create a new column which counts other rows in the data frame that meet certain conditions.
In this case I would like to know in each row how often the same combination of A and B occured in a range of 100 around C.
So the result with this data would be:
A | B | C | D
------------------
f | 1 | 1420 | 0
f | 1 | 1540 | 0
f | 3 | 1321 | 0
g | 2 | 900 | 1
g | 2 | 930 | 1
h | 1 | 700 | 0
h | 3 | 400 | 0
I actually came to a solution using for(for()). But the time R needs to compute the resuts is tooooo long.
for(i in 1:nrow(df)) {
df[i,D] <- sum( for(p in 1:nrow(df)) {
df[p,A] == df[i,A] &
df[p,B] == df[i,B] &
df[i,C] +100 > df[p,C] &
df[p,C] > df[i,C]-100 } ) }
Is there a better way?
Thanks a lot!

Creation of Panel Data set in R

Programmers,
I have some difficulties in structuring my panel data set.
My panel data set, for the moment, has the following structure:
Exemplary here only with T = 2 and N = 3. (My real data set, however, is of size T = 6 and N = 20 000 000 )
Panel data structure 1:
Year | ID | Variable_1 | ... | Variable_k |
1 | 1 | A | ... | B |
1 | 2 | C | ... | D |
1 | 3 | E | ... | F |
2 | 1 | G | ... | H |
2 | 2 | I | ... | J |
2 | 3 | K | ... | L |
The desired structure is:
Panel data structure 2:
Year | ID | Variable_1 | ... | Variable_k |
1 | 1 | A | ... | B |
2 | 1 | G | ... | H |
1 | 2 | C | ... | D |
2 | 2 | I | ... | J |
1 | 3 | E | ... | F |
2 | 3 | K | ... | L |
This data structure represents the classic panel data structure, where the yearly observations over the whole period are structured for all individuals block by block.
My question: Is there any simple and efficient R-solution that changes the data structure from Table 1 to Table 2 for very large data sets (data.frame).
Thank you very much for all responses in advance!!
Enrico
You can reorder the rows of your dataframe using order():
df=df[order(df$ID,df$Year),]

Adding "&" as an attribute Woocommerce

Hello thanks for reading,
How would I be able to add the & symbol as an option to add to cart.
Current variations:
A | B | C | D | E | F | G | H | I | J | k | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Want to add | & | on the end but it doesn't work
When I add | & | It turns into the default option as well as been unaddable to the shopping cart

Resources