I am very confused about the purpose of copy. As illustrated in this post, copy creates a shallow copy whereas deepcopy creates an independent object. If you used copy, the underlying references remain unchanged. If I did b = copy(a) and altered a, then b would change as well.
Then what is the purpose of copy? We already have =. Is there something that copy can do but = cannot do?
The documentation says:
copy(x)
Create a shallow copy of x: the outer structure is copied, but not all internal values. For example, copying an array produces a new array with identically-same elements as the original.
So if you do b = copy(a), then replace an element in b, a's contents are unchanged, because it's a different object. If you just did b = a, they'd both refer to the same array, and any replacement would show up regardless of whether you looked in a or b.
Example:
> a = [1, 2]
2-element Array{Int64,1}:
1
2
> b = a
2-element Array{Int64,1}:
1
2
> c = copy(a)
2-element Array{Int64,1}:
1
2
> a[1] = 42
42
> a
2-element Array{Int64,1}:
42
2
> b
2-element Array{Int64,1}:
42
2
> c
2-element Array{Int64,1}:
1
2
>
In the above, a refers to an array with [1, 2] in it (to start with). b is just another variable referring to the same array, but c is a shallow copy — a different array with (initially) the same elements in it. When we replace the 1 in a[1] with 42, we see that replacement whether we look through a or b because they're both looking at the same object, but c is a different object and is unaffected.
In a comment you've asked:
Why does this differ from the chosen answer in the link in my post?
The answer you refer to isn't modifying the top-level array (a) that we're either assigning to b (b = a) or copying (b = copy(a)). Since it's modifying the contents of an array within it, you see that modification.
Here's a conceptual picture of memory after a = [1, 2]:
+−−−−−−−−−−−−−+
a−−−−−−−−>| (Array) |
+−−−−−−−−−−−−−+
| Index 1: 1 |
| Index 2: 2 |
+−−−−−−−−−−−−−+
Then after b = a:
a−−−−+
| +−−−−−−−−−−−−−+
+−−−>| (Array) |
| +−−−−−−−−−−−−−+
b−−−−+ | Index 1: 1 |
| Index 2: 2 |
+−−−−−−−−−−−−−+
Then after c = copy(a):
a−−−−+
| +−−−−−−−−−−−−−+
+−−−>| (Array) |
| +−−−−−−−−−−−−−+
b−−−−+ | Index 1: 1 |
| Index 2: 2 |
+−−−−−−−−−−−−−+
+−−−−−−−−−−−−−+
c−−−−−−−−>| (Array) |
+−−−−−−−−−−−−−+
| Index 1: 1 |
| Index 2: 2 |
+−−−−−−−−−−−−−+
Then after a[1] = 42:
a−−−−+
| +−−−−−−−−−−−−−+
+−−−>| (Array) |
| +−−−−−−−−−−−−−+
b−−−−+ | Index 1: 42 |
| Index 2: 2 |
+−−−−−−−−−−−−−+
+−−−−−−−−−−−−−+
c−−−−−−−−>| (Array) |
+−−−−−−−−−−−−−+
| Index 1: 1 |
| Index 2: 2 |
+−−−−−−−−−−−−−+
In contrast, the answer you refer to was dealing with an array of arrays:
# The `a`, `b`, an `c` from the other answer (without the [4,5,6] array)
+−−−−−−−−−−−−−−+
a−−−−−−−−>| (Array) |
+−−−−−−−−−−−−−−+
| Index 1: |−−−−−+
| Index 2: ... | |
+−−−−−−−−−−−−−−+ |
|
| +−−−−−−−−−−−−−−+
+−−−−>| (Array) |
| +−−−−−−−−−−−−−−+
+−−−−−−−−−−−−−−+ | | Index 1: 1 |
b−−−−−−−−>| (Array) | | | Index 2: 2 |
+−−−−−−−−−−−−−−+ | | Index 3: 3 |
| Index 1: |−−−−−+ +−−−−−−−−−−−−−−+
| Index 2: ... |
+−−−−−−−−−−−−−−+
+−−−−−−−−−−−−−−+
c−−−−−−−−>| (Array) |
+−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−+
| Index 1: |−−−−−−−−−−>| (Array) |
| Index 2: ... | +−−−−−−−−−−−−−−+
+−−−−−−−−−−−−−−+ | Index 1: 1 |
| Index 2: 2 |
| Index 3: 3 |
+−−−−−−−−−−−−−−+
So when they did a[1][1] = 111, it changed the one a and b were (indirectly) pointing to, but not the one c was pointing to:
# The `a`, `b`, an `c` from the other answer (without the [4,5,6] array)
+−−−−−−−−−−−−−−+
a−−−−−−−−>| (Array) |
+−−−−−−−−−−−−−−+
| Index 1: |−−−−−+
| Index 2: ... | |
+−−−−−−−−−−−−−−+ |
|
| +−−−−−−−−−−−−−−+
+−−−−>| (Array) |
| +−−−−−−−−−−−−−−+
+−−−−−−−−−−−−−−+ | | Index 1: 111 |
b−−−−−−−−>| (Array) | | | Index 2: 2 |
+−−−−−−−−−−−−−−+ | | Index 3: 3 |
| Index 1: |−−−−−+ +−−−−−−−−−−−−−−+
| Index 2: ... |
+−−−−−−−−−−−−−−+
+−−−−−−−−−−−−−−+
c−−−−−−−−>| (Array) |
+−−−−−−−−−−−−−−+ +−−−−−−−−−−−−−−+
| Index 1: |−−−−−−−−−−>| (Array) |
| Index 2: ... | +−−−−−−−−−−−−−−+
+−−−−−−−−−−−−−−+ | Index 1: 1 |
| Index 2: 2 |
| Index 3: 3 |
+−−−−−−−−−−−−−−+
Related
I would like to assign groups to larger groups in order to assign them to cores for processing. I have 16 cores.This is what I have so far
test<-data_extract%>%group_by(group_id)%>%sample_n(16,replace = TRUE)
This takes staples OF 16 from each group.
This is an example of what I would like the final product to look like (with two clusters),all I really want is for the same group id to belong to the same cluster as a set number of clusters
________________________________
balance | group_id | cluster|
454452 | a | 1 |
5450441 | a | 1 |
5444531 | b | 1 |
5404051 | b | 1 |
5404501 | b | 1 |
5404041 | b | 1 |
544251 | b | 1 |
254252 | b | 1 |
541254 | c | 2 |
54123254 | d | 1 |
542541 | d | 1 |
5442341 | e | 2 |
541 | f | 1 |
________________________________
test<-data%>%group_by(group_id)%>% mutate(group = sample(1:16,1))
It seems MonetDB does not support recursive CTE. This is a useful feature that I used to get BOM from ERP systems. For a greater flexibility I used Firebird recursive stored procedures to enhance the output with extra calculations. A good example of SQLServer recursive CTE can be found here https://www.essentialsql.com/recursive-ctes-explained/
Question is: Is it any way I can achieve similar results in MonetDB?
There is currently no support for recursive CTEs in MonetDB[Lite]. The solution you have proposed yourself seems like the way to go.
It is clear that once I have access to procedures, variables and while-loop, something can be done. The following code provides me the desired result using temporary tables. I would appreciate if anybody can provide me an alternative to this solution that provides the same results without using the temporary tables overhead.
CREATE TEMPORARY TABLE BOM (parent_id string, comp_id string, qty double) ON COMMIT PRESERVE ROWS;
INSERT INTO BOM VALUES('a','b',5), ('a','c',2), ('b','d',4), ('b','c',7), ('c','e',3);
select * from BOM;
+-----------+---------+--------------------------+
| parent_id | comp_id | qty |
+===========+=========+==========================+
| a | b | 5 |
| a | c | 2 |
| b | d | 4 |
| b | c | 7 |
| c | e | 3 |
+-----------+---------+--------------------------+
CREATE TEMPORARY TABLE EXPLODED_BOM (parent_id string, comp_id string, path string, qty double, level integer) ON COMMIT PRESERVE ROWS;
CREATE OR REPLACE PROCEDURE UPDATE_BOM()
BEGIN
DECLARE prev_count int;
DECLARE crt_count int;
DECLARE crt_level int;
delete from EXPLODED_BOM; --make sure is empty
insert into EXPLODED_BOM select parent_id, comp_id, parent_id||'-'||comp_id, qty, 0 from BOM; --insert first level
SET prev_count = 0;
SET crt_count = (select count(*) from EXPLODED_BOM);
SET crt_level = 0;
-- (crt_level < 100) avoids possible infinite loop, if BOM is malformed
WHILE (crt_level < 100) and (crt_count > prev_count) DO
SET prev_count = crt_count;
insert into EXPLODED_BOM select e.parent_id, a.comp_id, e.path||'-'||a.comp_id, a.qty*e.qty, crt_level+1
from BOM a, EXPLODED_BOM e
where a.parent_id = e.comp_id and e.level=crt_level;
-- is it any chance to get the amount of "affected rows" by insert, update or delete statements, this way I can avoid checking the new count?
SET crt_count = (select count(*) from EXPLODED_BOM);
SET crt_level = crt_level +1;
END WHILE;
END;
call UPDATE_BOM();
select * from EXPLODED_BOM;
+-----------+---------+---------+--------------------------+-------+
| parent_id | comp_id | path | qty | level |
+===========+=========+=========+==========================+=======+
| a | b | a-b | 5 | 0 |
| a | c | a-c | 2 | 0 |
| b | d | b-d | 4 | 0 |
| b | c | b-c | 7 | 0 |
| c | e | c-e | 3 | 0 |
| a | d | a-b-d | 20 | 1 |
| a | c | a-b-c | 35 | 1 |
| a | e | a-c-e | 6 | 1 |
| b | e | b-c-e | 21 | 1 |
| a | e | a-b-c-e | 105 | 2 |
+-----------+---------+---------+--------------------------+-------+
Programmers,
I have some difficulties in structuring my panel data set.
My panel data set, for the moment, has the following structure:
Exemplary here only with T = 2 and N = 3. (My real data set, however, is of size T = 6 and N = 20 000 000 )
Panel data structure 1:
Year | ID | Variable_1 | ... | Variable_k |
1 | 1 | A | ... | B |
1 | 2 | C | ... | D |
1 | 3 | E | ... | F |
2 | 1 | G | ... | H |
2 | 2 | I | ... | J |
2 | 3 | K | ... | L |
The desired structure is:
Panel data structure 2:
Year | ID | Variable_1 | ... | Variable_k |
1 | 1 | A | ... | B |
2 | 1 | G | ... | H |
1 | 2 | C | ... | D |
2 | 2 | I | ... | J |
1 | 3 | E | ... | F |
2 | 3 | K | ... | L |
This data structure represents the classic panel data structure, where the yearly observations over the whole period are structured for all individuals block by block.
My question: Is there any simple and efficient R-solution that changes the data structure from Table 1 to Table 2 for very large data sets (data.frame).
Thank you very much for all responses in advance!!
Enrico
You can reorder the rows of your dataframe using order():
df=df[order(df$ID,df$Year),]
Consider the following sqlite3 table:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 200 |
| 1 | 200 |
| 1 | 100 |
| 1 | 200 |
| 2 | 400 |
| 2 | 200 |
| 2 | 100 |
| 3 | 200 |
| 3 | 200 |
| 3 | 100 |
+------+------+
I'm trying to write a query that will select the entire table and return 1 if the value in col2 is 200, and 0 otherwise. For example:
+------+--------------------+
| col1 | SOMEFUNCTION(col2) |
+------+--------------------+
| 1 | 1 |
| 1 | 1 |
| 1 | 0 |
| 1 | 1 |
| 2 | 0 |
| 2 | 1 |
| 2 | 0 |
| 3 | 1 |
| 3 | 1 |
| 3 | 0 |
+------+--------------------+
What is SOMEFUNCTION()?
Thanks in advance...
In SQLite, boolean values are just integer values 0 and 1, so you can use the comparison directly:
SELECT col1, col2 = 200 AS SomeFunction FROM MyTable
Like described in Does sqlite support any kind of IF(condition) statement in a select you can use the case keyword.
SELECT col1,CASE WHEN col2=200 THEN 1 ELSE 0 END AS col2 FROM table1
I have 2 tables:
table 1 (let it be 'products'):
----------------
| id | product |
----------------
| 1 | Apple |
| 2 | Grape |
| 3 | Orange |
table 2 (let it be 'tags'):
------------------------------
| id | product_id | tag |
------------------------------
| 1 | 1 | tag1 |
| 2 | 1 | tag2 |
| 3 | 2 | tag2 |
| 4 | 2 | tag3 |
| 5 | 3 | tag4 |
I want to make a request to SQLite database which will generate such table as result:
---------------------------
| product | tags |
---------------------------
| Apple | tag1, tag2 |
| Grape | tag2, tag3 |
| Orange | tag4 |
How can I achieve this? How can I combine tags into one column using only SQLite query language?
I think what you're looking for is "group by" and "group_concat()":
select products.product,group_concat(tags.tag) from products join tags on tags.product_id = products.rowid group by tags.product_id;