For homework, I am currently trying to figure out how to update a column while using JOIN but I can't seem to get it right.
The first part is this:
The 'company' you work for has decided to hire a new bi-lingual support position. Your job is to locate all users who have purchased a Spanish language track so they can be assigned to a new support representative. Sales have begun to identify all the albums that are classified as Spanish language, so far they have found AlbumId's of 8, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 41, 42, 45, 47, 52, 53.
I solved this with this query:
SELECT * FROM customers
JOIN invoices USING (CustomerId)
JOIN invoice_items USING (InvoiceId)
JOIN tracks USING (TrackId)
WHERE tracks.AlbumId IN (8, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 41, 42, 45, 47, 52, 53)
GROUP BY invoices.InvoiceId;
Now the second part is this:
Help out the sales team by modifying your query from Part 1. Instead of just listing all the customers, it should update the customer's assigned support representative. The new support representative's id is 6.
I tried running this:
UPDATE customers
SET SupportRepId = 6
WHERE(SELECT * FROM customers
JOIN invoices USING (CustomerId)
JOIN invoice_items USING (InvoiceId)
JOIN tracks USING (TrackId)
WHERE tracks.AlbumId IN (8, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 41, 42, 45, 47, 52, 53)
GROUP BY invoices.InvoiceId);
But I am getting an error that says:
SQLITE_ERROR: sub-select returns 33 columns - expected 1
errno: 1
code: SQLITE_ERROR
name: Error
I got this working by using the following command:
UPDATE customers
SET SupportRepId = 6
WHERE CustomerId IN (
SELECT customers.CustomerId FROM customers
JOIN invoices USING (CustomerId)
JOIN invoice_items USING (InvoiceId)
JOIN tracks USING (TrackId)
WHERE AlbumId IN (8, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 41, 42, 45, 47, 52, 53)
);
The last error is the clue. WHERE checks for 1 value; 0 is false, positive is true. * is returning 33 values (columns).
To cater for multiple updates you would need to use a correlated subquery e.g. WHERE IN or WHERE EXISTS that will provide a result for each row processed by the update.
Using WHERE EXISTS the query could be :-
UPDATE customers
SET SupportRepId = 6
WHERE EXISTS (
SELECT 1 FROM customers AS B
JOIN invoices USING (CustomerId)
JOIN invoice_items USING (InvoiceId)
JOIN tracks USING (TrackId)
WHERE tracks.AlbumId IN (8, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 41, 42, 45, 47, 52, 53)
AND B.CustomerId = customers.CustomerId
);
Thus for each row processed by the UPDATE 1 will be returned if the customer has any invoice that has a track from one of the listed albums AND if the customerId from the correlatedsubquery (B.CustomerId) is the same as the CustomerId of the row being updated.
Using WHERE IN the query could be :-
UPDATE customers
SET SupportRepId = 6
WHERE CustomerId IN (
SELECT CustomerId FROM customers
JOIN invoices USING (CustomerId)
JOIN invoice_items USING (InvoiceId)
JOIN tracks USING (TrackId)
WHERE tracks.AlbumId IN (8, 21, 22, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 41, 42, 45, 47, 52, 53)
);
Related
I will play in the Brazilian Lottery with my friends. I requested every one of them to choose seven numbers. I create a variable for all of them.
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
Then, I created a list with all of the numbers, and a list with unique numbers (in order to avoid repeated numbers)
list = c(carol, davi, pestana, valerio, victor, diuli, cynara)
list2 = unique(list)
Finally, I made a sample() for the list2
sample(list2, 7)
After that, I was wondering. Is it possible for me not to use the unique and not have repeated numbers? Because for instance, that way, repeated numbers have the same probability of appearing, when in fact, they have more (for instance, seven appeared three times).
How about this:
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
list = c(carol, davi, pestana, valerio, victor)
l <- c(unlist(list))
nums <- table(l)
probs <- nums/sum(nums)
sample(names(probs), 7, prob = probs, replace=FALSE)
#> [1] "4" "33" "44" "11" "29" "52" "8"
Created on 2022-12-14 by the reprex package (v2.0.1)
Using the prob argument, you can make some values more likely to show up than others.
I created a graph G and I have a node view as following < 0, 1,2,... 100>
I randomly removed 20 nodes and the node view of this new graph misses the nodes I removed randomly. to be precise for example , in the new graph there are some nodes missing(since they are removed
node view <0,1,3,5,6,7,9 ...100>
however, I want this graph to be a new graph having node view such as the following:
<0,1,2....80>
is there any solution? I tried relabeling, coping the same graph, they didn't work
PS. my nodes have attribute label equal to either 0,1
and i want to preserve them
Here is one approach you can take. After removing your nodes from the graph you can relabel the remaining nodes using nx.relabel_nodes to get the node view you want. See example below:
import networkx as nx
import numpy as np
#Creating random graph
N_nodes=50
G=nx.erdos_renyi_graph(N_nodes,p=0.25)
#Removing random nodes
N_del_nodes=10
del_node_list=np.random.choice(N_nodes,size=N_del_nodes,replace=False)
G.remove_nodes_from(del_node_list)
print('Node view without relabelling:' +str(G.nodes))
#Relabelling graph
label_mapping={list(G.nodes)[j]:j for j in range(N_nodes-N_del_nodes)}
G_rel=nx.relabel_nodes(G, label_mapping)
print('Node view with relabelling:' +str(G_rel.nodes))
And the output gives:
Node view without relabelling:[0, 1, 2, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 30, 31, 32, 33, 34, 36, 37, 38, 40, 41, 44, 45, 46, 47, 48, 49]
Node view with relabelling:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
I want to perform an operation over a vector without using a loop. The operation is the following:
This is how I am coding in R
meanx <- mean(rankx)
Numerador <- (rankx[] - meanx)*(rankx[+1] - meanx)
This is the input:
> dput(rankx)
c(15, 11, 12, 30, 58, 14, 41, 10, 57, 32, 28, 52, 61, 18, 54,
37, 19, 7, 29, 66, 5, 47, 25, 6, 50, 65, 62, 23, 40, 63, 42,
64, 38, 56, 45, 17, 8, 59, 55, 67, 24, 60, 2, 35, 44, 20, 3,
39, 4, 31, 26, 51, 21, 22, 53, 33, 46, 9, 16, 36, 13, 27, 34,
48, 1, 49, 43)
For example for the first case it will be: (15 - mean(rankx))(11 - mean(rankx))
For the next: (11 - mean(rankx))(12 - mean(rankx))
I am not sure how to refer to the second element and my error is in rankx[+1]
Any idea in how to solve this operation without using a loop?
You can use dplyr::lead
rankx[+1] is equivalent to rankx[1], which is 15.
If you want a copy of rankx that's displaced by one unit, use dplyr::lead(rankx) - like this:
rankx <- c(15, 11, 12, 30, 58, 14, 41)
dplyr::lead(rankx)
#> [1] 11 12 30 58 14 41 NA
meanx <- mean(rankx)
Numerador <- (rankx - meanx)*(dplyr::lead(rankx) - meanx)
Numerador
#> [1] 161.30612 205.87755 -57.40816 133.16327 -381.12245 -179.55102 NA
Created on 2021-04-20 by the reprex package (v1.0.0)
My friend suggested me to try to solve this problem before interview, but I have no idea on how to approach it.
I need to write a code to shuffle a deck of 52 cards without using a built-in standard random function.
Update
Thanks to Yifei Wu, his answer was very helpful.
Here is a link for my github project where I executed the given algorithm
https://github.com/Dantsj16/Shuffle-Without-Random.git
Your question does not say it must be a random shuffle of 52 cards. There is such a thing as a perfect shuffle, where a riffle shuffle is done with the top card remaining on the top and every other card comes from the other half of the deck. Many magicians and card sharks can do this shuffle as desired. It is well known that eight perfect shuffles in a row of a standard 52-card deck returns the cards to their original order, if the top card remains on top for each shuffle.
Here are 8 perfect shuffles in python Note that this shuffle is done differently than an actual manual shuffle would be done, to simplify the code.
In [1]: d0=[x for x in range(1,53)] # the card deck
In [2]: print(d0)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [3]: d1=d0[::2]+d0[1::2] # a perfect shuffle
In [4]: print(d1)
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52]
In [5]: d2=d1[::2]+d1[1::2]
In [6]: d3=d2[::2]+d2[1::2]
In [7]: d4=d3[::2]+d3[1::2]
In [8]: d5=d4[::2]+d4[1::2]
In [9]: d6=d5[::2]+d5[1::2]
In [10]: d7=d6[::2]+d6[1::2]
In [11]: d8=d7[::2]+d7[1::2]
In [12]: print(d8)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [13]: print(d0 == d8)
True
If you want the perfect shuffle as done by hand, use
d1=[None]*52
d1[::2]=d0[:26]
d1[1::2]=d0[26:]
This gives, for d1,
[1, 27, 2, 28, 3, 29, 4, 30, 5, 31, 6, 32, 7, 33, 8, 34, 9, 35, 10, 36, 11, 37, 12, 38, 13, 39, 14, 40, 15, 41, 16, 42, 17, 43, 18, 44, 19, 45, 20, 46, 21, 47, 22, 48, 23, 49, 24, 50, 25, 51, 26, 52]
Let me know if you really need a random shuffle. I can adapt my Borland Delphi code into python if you need it.
I have the following data:
HDL=c(26.41779, 45.99568, 70.74717, NA, 22.35012, 60.46269, 48.03072, 23.87873 ,54.92130 ,64.95151, 57.94742, 44.75888 ,32.57670,
39.26278, 53.05259, 38.76336, 74.73936, 63.62279, 99.27376, 35.42466, 35.48192, 43.56407, 45.23391, 58.27397, 15.15950, 46.20695,
35.95102, 55.05239, 39.73222, 53.68377, 35.21194, 95.44346, 38.53242, 62.78161, 56.30534, 67.35458, 22.75741, 51.94800, 66.61517,
35.35236)
LDL=c(183.83648, 169.59815 ,106.58631 ,137.96398, 164.25937, 94.39745, 189.70669, 167.62298 ,176.85359, 127.54434, 115.63603, 140.43276,
165.68687, 150.71473, 131.29033, 150.66534, 137.26902, 156.01673, 118.18147, NA, 161.35154, 157.89021, 138.93356, 139.51652,
206.24948, 168.27322, 176.91744, 92.03747, 144.61200, 127.93379, 142.59781, 88.22650, 157.32140, 149.79619, 121.23857, 141.68063,
173.50586, 133.91838, 123.99608, 138.68897)
BMI=c(35, 33 ,25, 27 ,31, 21, 32, 34, 33, 29, 23, 27, 33, 26, 26, 32, 25, 30, 22, 36, 33, 30, 27, 29, 36, 35, 35, 20, 29, 27, 29, 20, 32, 30, 22, 29, 33, 27, 22, 28)
newdf=data.frame(HDL, LDL, BMI)
newdf$BMI_group[newdf$BMI<25]="lean"
newdf$BMI_group[newdf$BMI>=25 & newdf$BMI<30]="overweight"
newdf$BMI_group[newdf$BMI>=30]="obese"
The data is completely fictitious. When I use the following code, facet_grid or facet_wrap do not split the plot correctly regarding the grouping variable.
ggplot(data = plotdf, aes(x = plotdf$HDL, y = plotdf$LDL)) +
geom_point() +
facet_wrap(~BMI_group)
In detail, it is splitted but e.g. there are no lean patients which have LDL of over 150 within the df. But the plot shows me exactly this. Maybe I have a error in reasoning.
I have no solution or explanation for this behavior. Hope you can help me.
Edit: I added the data like suggested by #Roland