Related
I have the following table
CREATE TABLE "shots" (
"player" INTEGER,
"tournament" TEXT,
"year" INTEGER,
"course" INTEGER,
"round" INTEGER,
"hole" INTEGER,
"shot" INTEGER,
"text" TEXT,
"distance" REAL,
"x" TEXT,
"y" TEXT,
"z" TEXT
);
With a sample of the data:
28237 470 2015 717 1 1 1 Shot 1 302 yds to left fairway, 257 yds to hole 10874 11451.596 10623.774 78.251
28237 470 2015 717 1 1 2 Shot 2 234 yds to right fairway, 71 ft to hole 8437 12150.454 10700.381 86.035
28237 470 2015 717 1 1 3 Shot 3 70 ft to green, 4 ft to hole 838 12215.728 10725.134 88.408
28237 470 2015 717 1 1 4 Shot 4 in the hole 46 12215.1 10729.1 88.371
28237 470 2015 717 1 2 1 Shot 1 199 yds to green, 29 ft to hole 7162 12776.03 10398.086 91.017
28237 470 2015 717 1 2 2 Shot 2 putt 26 ft 7 in., 2 ft 4 in. to hole 319 12749.444 10398.854 90.998
28237 470 2015 717 1 2 3 Shot 3 in the hole 28 12747.3 10397.6 91.027
28237 470 2015 717 1 3 1 Shot 1 296 yds to left intermediate, 204 yds to hole 10651 12596.857 9448.27 94.296
28237 470 2015 717 1 3 2 Shot 2 208 yds to green, 15 ft to hole 7478 12571.0 8825.648 94.673
28237 470 2015 717 1 3 3 Shot 3 putt 17 ft 6 in., 2 ft 5 in. to hole 210 12561.831 8840.539 94.362
I want to get for each shot the previous location (x, y, z). I wrote the below query.
SELECT cur.player, cur.tournament, cur.year, cur.course, cur.round, cur.hole, cur.shot, cur.x, cur.y, cur.z, prev.x, prev.y, prev.z
FROM shots cur
INNER JOIN shots prev
ON (cur.player, cur.tournament, cur.year, cur.course, cur.round, cur.hole, cur.shot) =
(prev.player, prev.tournament, prev.year, prev.course, prev.round, prev.hole, prev.shot - 1)
This query takes forever basically. How can I rewrite it to make it faster?
In addition, I need to make an adjustment for the first shot on a hole (shot = 1). This shot is made from tee_x, tee_y and tee_z. These values are available in table holes
CREATE TABLE "holes" (
"tournament" TEXT,
"year" INTEGER,
"course" INTEGER,
"round" INTEGER,
"hole" INTEGER,
"tee_x" TEXT,
"tee_y" TEXT,
"tee_z" TEXT
);
With data:
470 2015 717 1 1 11450 10625 78.25
470 2015 717 1 2 12750 10400 91
470 2015 717 1 3 2565 8840.5 95
Thanks
First, you need a composite index to speed up the operation:
CREATE INDEX idx_shots ON shots (player, tournament, year, course, round, hole, shot);
With that index, your query should run faster:
SELECT cur.player, cur.tournament, cur.year, cur.course, cur.round, cur.hole, cur.shot, cur.x, cur.y, cur.z,
prev.x AS prev_x, prev.y AS prev_y, prev.z AS prev_z
FROM shots cur LEFT JOIN shots prev
ON (cur.player, cur.tournament, cur.year, cur.course, cur.round, cur.hole, cur.shot) =
(prev.player, prev.tournament, prev.year, prev.course, prev.round, prev.hole, prev.shot + 1);
The changes I made:
the join should be a LEFT join so that all rows are included and
not only the ones that have a previous row
-1 should be +1 because the previous row's shot is 1 less than the current row's shot
added aliases for the previous row's x, y and z
But, if your version of SQLite is 3.25.0+ it would be better to use window function LAG() instead of a self join:
SELECT *,
LAG(x) OVER w AS prev_x,
LAG(y) OVER w AS prev_y,
LAG(z) OVER w AS prev_z
FROM shots
WINDOW w AS (PARTITION BY player, tournament, year, course, round, hole ORDER BY shot);
See the demo (I include the query plan for both queries where you can see the use of the composite index).
I am supposed to read in this file and fill in the arrays from that data file. But every time i try to run it, i receive an error saying i have invalid memory. The file looks like this
4
SanDiego
0
350
900
1100
Phoenix
350
0
560
604
Denver
900
560
0
389
Dallas
1100
604
389
0
It is basically a traveling salesman algorithm that takes gives the best distance. Here is my whole code
Program P4
IMPLICIT NONE
!Variable Declarations
INTEGER :: count, i, j, ios, distance=0, permutations=0, best_distance
CHARACTER(50) :: filename
TYPE city
CHARACTER(20) :: name
END TYPE
TYPE(city), ALLOCATABLE, DIMENSION(:) :: city_list
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: d_table
INTEGER, ALLOCATABLE, DIMENSION(:) :: path, best_path
PRINT *, "Enter filename"
READ *, filename
!Open the file and read number of cities
OPEN(UNIT = 15, FILE = filename, FORM="FORMATTED", ACTION="READ", STATUS="OLD", IOSTAT=ios)
IF(ios /= 0) THEN
PRINT *, "ERROR, could not open file.", TRIM(filename), "Error code: ", ios
STOP
END IF
READ (UNIT=15, FMT=100) count
PRINT *, "Number of cities: ", count
!Allocate memory for all needed arrays
ALLOCATE(city_list(1:count), d_table(1:count,1:count), best_path(1:count), path(1:count), STAT=ios)
IF(ios /= 0) THEN
PRINT *, "ERROR, could not allocate memory."
STOP
END IF
!Fill in arrays from data file
DO i=1, count
path(i) = i
READ(UNIT=15, FMT=200) city_list(i)
IF(ios < 0) THEN
EXIT
END IF
DO j=1, 4
PRINT *, i, j, city_list(i)
READ(UNIT=15, FMT=100) d_table(i,j)
END DO
END DO
!Use recursion to find minimal distance
CALL permute(2, count)
!Print formatted output
PRINT *
DO i=1, count
PRINT *, path(i)
END DO
DO i=1, count
PRINT *, (city_list(i))
END DO
DO i=1, count
DO j=1, count
PRINT *, d_table(i,j)
END DO
END DO
100 FORMAT (I6)
200 FORMAT (A)
CONTAINS
!Permute function
RECURSIVE SUBROUTINE permute(first, last)
!Declare intent of parameter variables
IMPLICIT NONE
INTEGER, INTENT(in) :: first, last
INTEGER :: i, temp
IF(first == last) THEN
distance = d_table(1,path(2))
PRINT *, city_list(1)%name, city_list(path(2))%name, " ", d_table(1, path(2))
DO i=2, last-1
distance = distance + d_table(path(i),path(i+1))
print *, city_list(path(i))%name, " ", city_list(path(i+1))%name, d_table(path(i),path(i+1))
END DO
distance = distance + d_table(path(last),path(1))
PRINT *, city_list(path(last))%name," ",city_list(path(1))%name, d_table(path(last),path(1))
PRINT *, "Distance is ",distance
PRINT *
permutations = permutations + 1
IF(distance < best_distance) THEN
best_distance = distance
DO i=2, count
best_path(i) = path(i)
END DO
END IF
ELSE
DO i=first, last
temp = path(first)
path(first) = path(i)
path(i) = temp
call permute(first+1,last)
temp = path(first)
path(first) = path(i)
path(i) = temp
END DO
END IF
END SUBROUTINE permute
END PROGRAM P4
And i am no longer gettting an error message and able to run the program, but it its not running properly, it is supposed to out put this
Number of cities: 4
28 San Diego Phoenix 350
29 Phoenix Denver 560
30 Denver Dallas 389
31 Dallas San Diego 1100
32 Distance is 2399
33
34 San Diego Phoenix 350
35 Phoenix Dallas 604
36 Dallas Denver 389
37 Denver San Diego 900
38 Distance is 2243
39
40 San Diego Denver 900
41 Denver Phoenix 560
42 Phoenix Dallas 604
43 Dallas San Diego 1100
44 Distance is 3164
45
46 San Diego Denver 900
47 Denver Dallas 389
48 Dallas Phoenix 604
49 Phoenix San Diego 350
50 Distance is 2243
51
52 San Diego Dallas 1100
53 Dallas Denver 389
54 Denver Phoenix 560
55 Phoenix San Diego 350
56 Distance is 2399
57
58 San Diego Dallas 1100
59 Dallas Phoenix 604
60 Phoenix Denver 560
61 Denver San Diego 900
62 Distance is 3164
63
64
65 San Diego to Phoenix -- 350 miles
66 Phoenix to Dallas -- 604 miles
67 Dallas to Denver -- 389 miles
68 Denver to San Diego -- 900 miles
69
70 Best distance is: 2243
71 Number of permutations: 6
But instead it out puts this
Enter filename
data.txt
Number of cities: 4
1 1 SanDiego
1 2 SanDiego
1 3 SanDiego
1 4 SanDiego
2 1 Phoenix
2 2 Phoenix
2 3 Phoenix
2 4 Phoenix
3 1 Denver
3 2 Denver
3 3 Denver
3 4 Denver
4 1 Dallas
4 2 Dallas
4 3 Dallas
4 4 Dallas
SanDiego Phoenix 1100
Phoenix Denver 604
Denver Dallas 389
Dallas SanDiego 0
Distance is 2093
SanDiego Phoenix 1100
Phoenix Dallas 604
Dallas Denver 0
Denver SanDiego 389
Distance is 2093
SanDiego Denver 1100
Denver Phoenix 389
Phoenix Dallas 604
Dallas SanDiego 0
Distance is 2093
SanDiego Denver 1100
Denver Dallas 389
Dallas Phoenix 0
Phoenix SanDiego 604
Distance is 2093
SanDiego Dallas 1100
Dallas Denver 0
Denver Phoenix 389
Phoenix SanDiego 604
Distance is 2093
SanDiego Dallas 1100
Dallas Phoenix 0
Phoenix Denver 604
Denver SanDiego 389
Distance is 2093
1
2
3
4
SanDiego
Phoenix
Denver
Dallas
1100
1100
1100
1100
604
604
604
604
389
389
389
389
0
0
0
0
When you first open the file, the READ command reads the first line. If you call READ again, you will read line two.
I think that this error comes because you are trying to READ the first line (for the second time) but you are actually reading the second line that is a STRING.
You have two options to fix the error in line 39 (READ statement):
Call the OPEN statement (before the line 39) for a second time in order to READ the first line
Or delete the READ statement because you have already read "count" before (best option in my opinion)
Update:For segmentation fault error
I maid the following changes:
Comment line 39 (to solve the first problem)
In line 42: FMT=200
In line 46: DO j=1, 4 (because between the cities names there are only 4 numbers)
The following code worked for me:
Program P4
IMPLICIT NONE
!Variable Declarations
INTEGER :: count, i, j, ios, distance=0, permutations=0, best_distance
CHARACTER(50) :: filename
TYPE city
CHARACTER(20) :: name
END TYPE
TYPE(city), ALLOCATABLE, DIMENSION(:) :: city_list
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: d_table
INTEGER, ALLOCATABLE, DIMENSION(:) :: path, best_path
PRINT *, "Enter filename"
READ *, filename
!Open the file and read number of cities
OPEN(UNIT = 15, FILE = filename, FORM="FORMATTED", ACTION="READ", STATUS="OLD", IOSTAT=ios)
IF(ios /= 0) THEN
PRINT *, "ERROR, could not open file.", TRIM(filename), "Error code: ", ios
STOP
END IF
READ (UNIT=15, FMT=100) count
PRINT *, "Number of cities: ", count
!Allocate memory for all needed arrays
ALLOCATE(city_list(1:count), d_table(1:count,1:count), best_path(1:count), path(1:count), STAT=ios)
IF(ios /= 0) THEN
PRINT *, "ERROR, could not allocate memory."
STOP
END IF
!Fill in arrays from data file
!READ (UNIT=15, FMT=100) count
DO i=1, count
path(i-1) = i
READ (UNIT=15, FMT=200, IOSTAT=ios) city_list(i)
IF(ios < 0) THEN
EXIT
END IF
DO j=1, 4
print*,i,j,city_list(i)
READ (UNIT=15, FMT=100) d_table(i,j)
END DO
END DO
!Use recursion to find minimal distance
CALL permute(2, count)
!Print formatted output
PRINT *
DO i=1, count
PRINT *, path(i)
END DO
DO i=1, count
PRINT *, (city_list(i))
END DO
DO i=1, count
DO j=1, count
PRINT *, d_table(i,j)
END DO
END DO
100 FORMAT (I6)
200 FORMAT (A)
CONTAINS
!Permute function
RECURSIVE SUBROUTINE permute(first, last)
!Declare intent of parameter variables
IMPLICIT NONE
INTEGER, INTENT(in) :: first, last
INTEGER :: i, temp
IF(first == last) THEN
distance = d_table(1,path(2))
PRINT *, city_list(1)%name, city_list(path(2))%name, " ", d_table(1, path(2))
DO i=2, last-1
distance = distance + d_table(path(i),path(i+1))
print *, city_list(path(i))%name, " ", city_list(path(i+1))%name, d_table(path(i),path(i+1))
END DO
distance = distance + d_table(path(last),path(1))
PRINT *, city_list(path(last))%name," ",city_list(path(1))%name, d_table(path(last),path(1))
PRINT *, "Distance is ",distance
PRINT *
permutations = permutations + 1
IF(distance < best_distance) THEN
best_distance = distance
DO i=2, count
best_path(i) = path(i)
END DO
END IF
ELSE
DO i=first, last
temp = path(first)
path(first) = path(i)
path(i) = temp
call permute(first+1,last)
temp = path(first)
path(first) = path(i)
path(i) = temp
END DO
END IF
END SUBROUTINE permute
END PROGRAM P4
I have a data frame as shown below which has around 130k data values.
Eng_RPM Veh_Spd
340 56
450 65
670 0
800 0
890 0
870 0
... ..
800 0
790 0
940 0
... ...
1490 67
1540 78
1880 81
I need to have another variable called Idling Count which increments the value when ever it finds value in Eng_RMP > = 400 and Veh_Spd ==0 , the condition is the counter has to start after 960 Data points from the data point which has satisfied the condition, also the above mentioned condition should not be applicable for the first 960 data points as shown below
Expected Output
Eng_RPM Veh_Spd Idling_Count
340 56 0
450 65 0
670 0 0
... ... 0 (Upto first 960 values)
600 0 0(The Idling time starts but counter should wait for another 960 values to increment the counter value)
... ... 0
800 0 1(This is the 961st Values after start of Idling time i.e Eng_RPM>400 and Veh_Spd==0)
890 0 2
870 0 3
... .. ..
800 1 0
790 2 0
940 3 0
450 0 0(Data point which satisfies the condition but counter should not increment for another 960 values)
1490 0 4(961st Value from the above data point)
1540 0 5
1880 81 0
.... ... ... (This cycle should continue for rest of the data points)
Here is how to do with data.table (not using for which is known to be slow in R).
library(data.table)
setDT(df)
# create a serial number for observation
df[, serial := seq_len(nrow(df))]
# find series of consective observations matching the condition
# then create internal serial id within each series
df[Eng_RPM > 400 & Veh_Spd == 0, group_serial:= seq_len(.N),
by = cumsum((serial - shift(serial, type = "lag", fill = 1)) != 1) ]
df[is.na(group_serial), group_serial := 0]
# identify observations with group_serial larger than 960, add id
df[group_serial > 960, Idling_Count := seq_len(.N)]
df[is.na(Idling_Count), Idling_Count := 0]
you can do this by for cycle like this
Creating sample data and empty column Indling_Cnt
End_RMP <- round(runif(1800,340,1880),0)
Veh_Spd <- round(runif(1800,0,2),0)
dta <- data.frame(End_RMP,Veh_Spd)
dta$Indling_Cnt <- rep(0,1800)
For counting in Indling_Cnt you can use forcycle with few if conditions, this is probably not most efficient way to do it, but it should work. There are better and yet more complex solutions. For example using packages as data.table as mentioned in other answers.
for(i in 2:dim(dta)[1]){
n <- which(dta$End_RMP[-(1:960)]>=400&dta$Veh_Spd[-(1:960)]==0)[1]+960+960
if(i>=n){
if(dta$End_RMP[i]>=400&dta$Veh_Spd[i]==0){
dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]+1
}else{
dta$Indling_Cnt[i] <- dta$Indling_Cnt[i-1]
}
}
}
I'm currently performing a multiple sequence alignment using the 'msa' package from Bioconductor. I'm using this to calculate the consensus sequence (msaConsensusSequence) and conservation score (msaConservationScore). This gives me outputs that are values ...
e.g.
ConsensusSequence:
i.llE etc (str = chr)
(lower case = 20%+ conservation, uppercase = 80%+ conservation, . = <20% conservation)
ConservationScore:
221 -296 579 71 423 etc (str = named num)
I would like to convert these into a table where the first row contains columns where each is a different letter in the consensus sequence and the second row is the corresponding conservation score.
e.g.
i . l l E
221 -296 579 71 423
Could people please advise on the best way to go about this?
Thanks
Natalie
For what you have said in the comments you can get a data frame like this:
data(BLOSUM62)
alignment <- msa(mySequences)
conservation <- msaConservationScore(alignment, BLOSUM62)
# Now create the data fram
df <- data.frame(consensus = names(conservation), conservation = conservation)
head(df)
consensus conservation
1 T 141
2 E 160
3 E 165
4 E 325
5 ? 179
6 ? 71
7 T 216
8 W 891
9 ? 38
10 T 405
11 L 204
If you prefer to transpose it you can:
df <- t(df)
colnames(df) <- 1:ncol(df)
I am trying to add two columns to data.table. The original structure is below:
> aTable
word freq
1: thanks for the follow 612
2: the end of the 491
3: the rest of the 462
4: at the end of 409
5: is going to be 359
6: for the first time 355
7: at the same time 346
8: cant wait to see 338
9: thank you for the 334
10: thanks for the rt 321
My code is as follows:
myKeyValfun <- function(line) {
ret1 = paste(head(strsplit(dtable4G$word,split=" ")[[1]],3), collapse=" ")
ret2 = tail(strsplit(line,split=" ")[[1]],1)
return(list(key = ret1, value = ret2))
}
aTable[, c("key","value") := myKeyValfun(word)]
After I execute this, I noticed that only that the value are correctly updated.Only the first row has the correct values. The other rows has the same values as the first rows.
See below:
> aTable
word freq key value
1: thanks for the follow 612 thanks for the follow
2: the end of the 491 thanks for the follow
3: the rest of the 462 thanks for the follow
4: at the end of 409 thanks for the follow
5: is going to be 359 thanks for the follow
6: for the first time 355 thanks for the follow
7: at the same time 346 thanks for the follow
8: cant wait to see 338 thanks for the follow
9: thank you for the 334 thanks for the follow
10: thanks for the rt 321 thanks for the follow
Any ideas?
Adding the expected result as requested by akrun:
> aTable
word freq key value
1: thanks for the follow 612 thanks for the follow
2: the end of the 491 the end of the
3: the rest of the 462 the rest of the
4: at the end of 409 at the end of
5: is going to be 359 is going to be
6: for the first time 355 for the first time
7: at the same time 346 at the same time
8: cant wait to see 338 cant wait to see
9: thank you for the 334 thank you for the
10: thanks for the rt 321 thanks for the rt
If we need to extract the first three words in to 'key' and the last word to 'value', one option is sub
aTable[, c('key', 'value') := list(sub('(.*)\\s+.*', '\\1', word), sub('.*\\s+', '', word))]
aTable
# word freq key value
# 1: thanks for the follow 612 thanks for the follow
# 2: the end of the 491 the end of the
# 3: the rest of the 462 the rest of the
# 4: at the end of 409 at the end of
# 5: is going to be 359 is going to be
# 6: for the first time 355 for the first time
# 7: at the same time 346 at the same time
# 8: cant wait to see 338 cant wait to see
# 9: thank you for the 334 thank you for the
#10: thanks for the rt 321 thanks for the rt
Or we use tstrsplit
aTable[, c('key', 'value') := {
tmp <- tstrsplit(word, ' ')
list(do.call(paste, tmp[1:3]), tmp[[4]])}]