R Programming - Update mysql db rows where a condition is met - r

I have a data frame that is structured identical to a table in my mysql db. I want to update the rows of the mysql db where the primary key of my data frame and that table match.
For example
DF 1
PK Count Temperature
3 1 111
4 2 100
5 3 190
6 4 200
MySQL Table
PK Count Temperature
1 1 100
2 10 11
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
Notice that I can't simply overwrite the table because I have rows in my DB that do not exist in my data frame.
After the update, what I would like to have is the following table.
PK Count Temperature
1 1 100
2 10 11
3 1 111
4 2 100
5 3 190
6 4 200
7 0 0
8 0 0
Thoughts?

So, I haven't been able to directly update a row. However, what I have done is create a holding table in my DB that I can append to from R. I then created a trigger in my db to update the desired rows in my desired table. From their, I have created another trigger to empty my holding table.
This is sort of what Dean was suggesting, but a little different.

Here, I am providing alternative approach over writing frame to a temp table and performing update in the main table or getting holding table, append, and update desired table by trigger method.
I believe following approach is easy and effective as it performs update record directly in the target table.
#install.packages("RMySQL")
#install.packages("DBI")
library(DBI)
library(RMySQL)
#Establish the connection
mydb = dbConnect(MySQL(),
user='your user',
password='your password',
dbname='your DB name',
host='Host Name')
#Eusuring the connection working by listing table
dbListTables(mydb)
#Applying update statement directly
rs = dbSendQuery(mydb, "UPDATE DB_NAME.TABLE_1
SET FIELD_1 = 0
WHERE ID = 5")
#Verifying the result
rs = dbSendQuery(mydb, "SELECT * FROM DB_NAME.TABLE_1
WHERE ID = 5")
data = fetch(rs, n=-1)
print(data)
I have tried above code in R Studio Version 1.1.453 and R 3.5.0 (64bit).

Related

Ordered Hash Keys in dynamodb

My dynamo db tables has hash key and range Key and other data columns which we can insert .
In dynamo db what i understood is that when items are inserted in GSI/Base table then items get sorted in ascending order based on range key and hash key is not ordered.
Example :
hashId - rangeKey
1 - 1
1 - 2
1 - 3
3 - 1
3 - 2
3 - 3
2 -1
2 -2
2 -3
Is there any way we can have a ordered hash keys as well in dynamo db?
like this when we save data in any random order :
hashId -rangeId
1 -1
1- 2
1- 3
2 -1
2 -2
2 -3
3 -1
3- 2
3 -3
I think this is not possible, because the way dynamo DB works is that it hashes the partition/hash Key and saves it in the respective partition. Though you can have sorted data in the dynamo DB based on the range key for the partition key.

Working with a matrix in R by rows and columns and deleting rows iteratively

I want to compute the time interval that has been needed for pressing the different keys of the keyboard by writing a message.
In order to get the data I use a program that produces a .csv after writing a text. This .csv has three columns: the first one with the key pressed and released, the second column says if the key has been pressed (0) or released (1), and the last column registers the time for each event.
Then the idea is to compute the time interval that has been needed for each key, since it is pressed until it is released.
In the following extra simple example we can see that the key 16777248 has been pressed at time 5.0067901611328125e-06 and released at time 0.21875882148742676, therefore, the time interval for this key is 0.21875882148742676-5.0067901611328125e-06. The time interval for the key 72 should be 0.1861410140991211-0.08675289154052734.
16777248 0 5.0067901611328125e-06
72 0 0.08675289154052734
72 1 0.1861410140991211
16777248 1 0.21875882148742676
At the moment I have written a code in R that, first of all, reads the table in .csv. Then it searchs the first 1 in the second column and takes the corresponding key name. Next, it searchs for the previous key with a 0. It computes the time interval, saves this value in a vector and then deletes this two rows from the matrix. Following, it should repeat this until the are no more rows.
data.csv <- read.table("example.csv",header=F, sep=",", dec=".")
myTable<- data.csv
keySearched=0
timeInterval=c( rep( 0,length(myTable[,1]) ) )
L=(length(myTable[,1]))
for( i in 1:L ){
if( myTable[i,2]==1 ){
keySearched <- myTable[i,1]
for( j in 1:(i-1) ){
if( myTable[j,1]==keySearched ){
timeInterval[j] <- (myTable[i,3]-myTable[j,3])
myTable <- myTable[ -c(j,i), ]
}
}
}
}
The problem is that sometimes the value myTable[x,y] is NA because the corresponding row has been deleted. In each iteration two rows are deleted (the one with the pressed key and the corresponding released key).
At this point I get the following error:
Error in if (myTable[j, 1] == keySearched) { :
missing value where TRUE/FALSE needed
How could I solve this problem?
You could try doing it like this:
key = c(3,6,3,8,8,3,6,3)
pressed = c(0,0,1,0,1,0,1,1)
time = c(12,14,16,17,19,22,34,35)
a = data.frame(key,time,pressed)
>a
key time pressed
1 3 12 0
2 6 14 0
3 3 16 1
4 8 17 0
5 8 19 1
6 3 22 0
7 6 34 1
8 3 35 1
First order your data frame (or matrix if you prefer) by the key number and then the time. This should group pressed and released keys together. Then you calculate the time difference between the same keys using diff. And finally, set to NA those diffs that don't make sense.
a = a[order(a$key,a$time),]
a$lapse = c(0,diff(a$time))
a$lapse[seq(1,nrow(a),2)] = NA
>a
key time pressed lapse
1 3 12 0 NA
3 3 16 1 4
6 3 22 0 NA
8 3 35 1 13
2 6 14 0 NA
7 6 34 1 20
4 8 17 0 NA
5 8 19 1 2

sqlite update a column with itself

I got a table like this
a b c
-- -- --
1 1 10
2 1 0
3 1 0
4 4 20
5 4 0
6 4 0
The b column 'points' to 'a', a bit like if a is the parent.
c was computed. Now I need to propagate the parent c value to their children.
The result would be
a b c
-- -- --
1 1 10
2 1 10
3 1 10
4 4 20
5 4 20
6 4 20
I can't make an UPDATE/SELECT combo that works
So far I got a SELECT that procuce the c column I'd like to get
select t1.c from t t1 join t t2 on t1.a=t2.b;
c
----------
10
10
10
20
20
20
But I dunno how to stuff that into c
Thanx in advance
Cheers, phi
You have to look up the value with a correlated subquery:
UPDATE t
SET c = (SELECT c
FROM t AS parent
WHERE parent.a = t.b)
WHERE c = 0;
I finnally found a way to copy back my initial 'temp' SELECT JOIN to table 't'. Something like this
create temp table u as select t1.c from t t1 join t t2 on t1.a=t2.b;
update t set c=(select * from u where rowid=t.rowid);
I'd like to know how the 2 solutions, yours with 1 query UPDATE correlated SELECT, and mine that is 2 queries and 1 correlated query each, compare perf wise. Mine seems more heavier, and less aesthetic, yet regarding perf I wonder.
On the Algo side, yours take care not to copy the parent data, only copy child data, mine copy parent on itself, but that's a nop, yet consuming some cycles :)
Cheers, Phi

New calculation loop

I want to have a loop that will perform a calculation for me, and export the variable (along with identifying information) into a new data frame.
My data look like this:
Each unique sampling point (UNIQUE) has 4 data points associated with it (they differ by WAVE).
WAVE REFLECT REFEREN PLOT LOCAT COMCOMP DATE UNIQUE
1 679.9 119 0 1 1 1 11.16.12 1
2 799.9 119 0 1 1 1 11.16.12 1
3 899.8 117 0 1 1 1 11.16.12 1
4 970.3 113 0 1 1 1 11.16.12 1
5 679.9 914 31504 1 2 1 11.16.12 2
6 799.9 1693 25194 1 2 1 11.16.12 2
And I want to create a new data frame that will look like this:
For each unique sampling point, I want to calculate "WBI" from 2 specific "WAVE" measurements.
WBI PLOT .... UNIQUE
(WAVE==899.8/WAVE==970) 1 1
(WAVE==899.8/WAVE==970) 1 2
(WAVE==899.8/WAVE==970) 1 3
Depends on the size of your input data.frame there could be better solution in terms of efficiency but the following should work ok for small or medium data sets, and is kind of simple:
out.unique = unique(input$UNIQUE);
out.plot = sapply(out.unique,simplify=T,function(uq) {
# assuming that plot is simply the first PLOT of those belonging to that
# unique number. If not yo should change this.
subset(input,subset= UNIQUE == uq)$PLOT[1];
});
out.wbi = sapply(out.unique,simplify=T,function(uq) {
# not sure how you compose WBI but I assume that are the two last
# record with that unique number so it matches the first output of your example
uq.subset = subset(input,subset= UNIQUE == uq);
uq.nrow = nrow(uq.subset);
paste("(WAVE=",uq.subset$WAVE[uq.nrow-1],"/WAVE=",uq.subset$WAVE[uq.nrow],")",sep="")
});
output = data.frame(WBI=out.wbi,PLOT=out.plot,UNIQUE=out.unique);
If the input data is big however you may want to exploit de fact that records seem to be sorted by "UNIQUE"; repetitive data.frame sub-setting would be costly. Also both sapply calls can be combined into one but make it a bit more cumbersome so I had leave it like this.

Ref cursor with dynamic columns

I am using oracle 11g and have written a stored procedure which stores values in temporary table as follows:
id count hour age range
-------------------------------------
0 5 10 61 10-200
1 6 20 61 10-200
2 7 15 61 10-200
5 9 5 61 201-300
7 10 25 61 201-300
0 5 10 62 10-20
1 6 20 62 10-20
2 7 15 62 10-20
5 9 5 62 21-30
1 8 6 62 21-30
7 10 25 62 21-30
10 15 30 62 31-40
now using this temp table i want to return two cursors. one for 61 and one for 62(age).
and for cursors there distinct range will be columns . for example cursor for age 62 should return following as dataset.
user 10-20 21-30 31-40
Count/hour count/hour count/hour
----------------------------------------------
0 5 10 - - - -
1 6 20 8 6 - -
2 7 15 - - - -
5 - - 9 5 - -
7 - - 10 25 - -
10 - - - - 15 30
this column range in temp table is is not a fixed values these are referenced from other table.
edited: i am using PIVOT for above problem, all examples i saw in internet are there for fixed values of column values (range in my case). how can i get dynamic values. following is the ex query:
SELECT *
FROM (SELECT column_2, column_1
FROM test_table)
PIVOT (SUM(column1) AS sum_values FOR (column_2) IN ('value1' AS a, 'value2' AS b, 'value3' AS c));
Instead of using handwritten value i am using following query inside 'IN'
SELECT * from(
with x as (
SELECT DISTINCT range
FROM test_table
WHERE age = 62 )
select ltrim( max( sys_connect_by_path(range, ','))
keep (dense_rank last order by curr),
',') range
from (select range,
row_number() over (order by range) as curr,
row_number() over (order by range) -1 as prev
from x)
connect by prev = PRIOR curr
start with curr = 1 )
it is giving error in this case. But when i using handwritten values its giving right output.
select * from (select user_id, nvl(count,0) count, nvl(hour,0) hour,nvl(range,0) range,nvl(age,0)
age from test_table)
PIVOT (SUM(count) as sum_count, sum(hour) as sum_hour for (range) IN
(
'10-20','21-30','31-40'
)
) where age = 62 order by userid
how can i give values dynamically there?
how can i do it.
Cursors are slow, I would recommend trying to do this in a query unless there's no alternative (or speed doesn't matter). You may want to look into: PIVOT / UNPIVOT which can rotate columns (in this case "range").
Here's some PIVOT / UNPIVOT documentation and examples:
http://www.oracle-developer.net/display.php?id=506
Based on your last edit:
Pretty sure you have two options:
Build dynamic sql based on the distinct values found in the "range" column.
You'll probably be stuck using a cursor again to build the column names but at least it will be limited to just the distinct ranges.
Oracle has a PIVOT XML command that you can use for this.
See: http://www.oracle.com/technetwork/articles/sql/11g-pivot-097235.html
And scroll down to the section: "XML Type"

Resources