I used != to exclude United States to list all countries that aren't United States but it keeps on showing the U.S. as well. Anybody know how to fix this? The query I typed is shown below.
SecurityEvent
| where AccountType == "User" and EventID == 4624 and TimeGenerated > ago(1d)
SigninLogs
| where LocationDetails != 'United States' and ResultType == 0
Your query looks correct.
Maybe for some records the value of LocationDetails is United States (note the space at the end), United States (note the two spaces between the words), United states (note the lowercase s in states) or something like that, and that's why they are not filtered out?
I'm trying to select some rows and columns from bigquery data on help requests in NYC. I want to select five columns - date request created, city where the request was made, the agency that received the request, etc.
First, I managed to select the columns I want:
conn <- dbConnect(SQLite(),'nyc311.db')
dbListFields(conn, "requests")
df<-dbGetQuery(conn, 'SELECT "Agency", "Created Date", "Complaint Type", "City", Descriptor FROM requests)
Agency Created Date Complaint Type City Descriptor
1 DOHMH 01/25/2016 02:11:12 AM Indoor Air Quality BRONX Chemical Vapors/Gases/Odors
2 NYPD 01/25/2016 02:08:08 AM Noise - Vehicle NEW YORK Car/Truck Horn
3 NYPD 01/25/2016 02:07:24 AM Noise - Street/Sidewalk NEW YORK Loud Talking
4 CHALL 01/25/2016 02:05:00 AM Opinion for the Mayor HOUSING
5 HRA 01/25/2016 02:01:46 AM Benefit Card Replacement Medicaid
6 NYPD 01/25/2016 01:54:56 AM Blocked Driveway CORONA No Access
How can I select from the .db file so that I get agency=NYPD, City=Bronx and Queens; and Created Date=year 2015? I tried the following but I am getting syntax errors.
df<-dbGetQuery(conn, 'SELECT "Agency", "Created Date", "Complaint Type", "City", Descriptor
FROM requests WHERE City IN ("BRONX", "QUEENS") AND Agency="NYPD"
AND YEAR(Created Date)=2015')
I'm a beginner so I'm not clear about how to subset the year, since Created Date shows date and time in character format, not integer. I also noticed that the code runs except for the part YEAR(Created Date)=2015
There is no YEAR() function in SQLite (although MySQL has one, hence your confusion). First note that you are storing your dates as text, and also in a non ANSI format of mm/dd/yyyy. In order to compare the year of each record, you will have to extract this information using SQLite's string functions. The following should work:
SUBSTR("Created Date", 7, 4)
Note that you also need to place the Created Date column name in double quotes to escape whitespace.
Here is the actual query I would use:
SELECT "Agency",
"Created Date",
"Complaint Type",
"City",
"Descriptor"
FROM requests
WHERE City IN ('BRONX', 'QUEENS') AND
Agency = 'DOHMH' AND
SUBSTR("Created Date", 7, 4) = '2015' -- compare against the string '2015'
Some notes: It is convention in SQL to use single quotes for string data. You may put all your column names in double quotes, but it is only necessary if you have whitespace, keywords, etc.
I think syntax error in created date
Check this
'SELECT "Agency", "Created Date", "Complaint Type", "City", Descriptor
FROM requests WHERE City IN ("BRONX", "QUEENS") AND Agency="DOHMH"
AND YEAR("Created Date")=2015'
I have a basic understanding of R that mostly entails the ability to run regressions and summary statistics, so if there appear any gaps in my knowledge I would appreciate being pointed in the correct direction.
I have time series data in CSV that is formatted as follows:
Facility ID, Utility Type, Account No, Unit Name, Date 1, Date 2, Date 3, Date 4
There will be multiple rows for a specific account number referencing a unique utility type and facility (i.e., one row entry for Unit Name = L, one row entry for Unit Name = USD). The account number values for a particular unit at every date are entered in each "date" column. I would like to be able to write a script that enables me to re-export the data where each Date column doesn't contain entries for multiple units. I would also like to then designate to R that the Date columns represent monthly time series data points, and from there do various time series analysis.
I appreciate your help in telling me how to clean up this data.
As requested, sample data:
Facility ID, Facility Name, State, Utility Type, Supplier, Account No., Unit Name, 7/1/14, 8/1/14
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, USD, 42333, 41775
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, ton-hr, 244278, 238035
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, USD, 4860, 5890
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, M^3, 7639, 8895
Example output:
Facility ID, Facility Name, State, Utility Type, Supplier, Account No., Quantity Consumed, Unit of Measure, Utility Bill, Currency, Date
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, 244278, ton-hr, 42333, USD, 7/1/14
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, 238035, ton-hr, 41775, USD, 8/1/14
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, 7639, M^3, 4860, USD, 7/1/14
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, 8895, M^3, 5890, USD, 8/1/14
library(reshape2)
d = read.csv("data.csv")
d.molten = melt(d,
id.vars=c("Facility.ID", "Facility.Name", "State", "Utility.Type", "Supplier", "Account.No.", "Unit.Name"),
variable.name = "Date"
)
The melt function breaks up a "wide" format (with an undefined numbers of columns) to a "long" format, where each row is an observation. This is actually the preferred format for most things you'd do in R, at least when using packages from the "Hadleyverse". Especially for time series.
But we're not done yet. Now you have the following structure:
Facility.ID Facility.Name … Date value
4015 Palm Court Apts X7.1.14 42333
We have to fix the dates that are currently just "strings". They had an "X" prepended since column names cannot start with a number, and cannot contain spaces.
d.molten$Date=as.Date(d.molten$Date, "X%m.%d.%y")
Now your dates will look correct, and you have one row for each observation:
Facility.ID Facility.Name … Date value
4015 Palm Court Apts 2014-07-01 42333
And now we can easily plot time series:
library(ggplot2)
ggplot(d.molten,
aes(x = Date, y = value, color = Facility.Name)) +
geom_point()
i have one requirement to "select all rows from fund table whose own fund_id is not found as replacement fund_id on other rows in fund table".
every fund record is having history record created with old status and new status.
whenever a particular fund is going through void process (i.e old status to new status : null-->'Issued'-->'void'--->'reissue' then a replacement fund_id is generated
linked to original record which is treated as new fund record with history as null--> 'issued'.
please see below data for more clarifications.
FUND HISTORY TABLE:
columns and data are
fund_hist_id fund_id old_status new_status
128 2444582 null I
127 2445579 V R
124 2445579 I v
123 2445579 null I
129 2445562 null I
FUND TABLE:
columns and it's data are
FUND_ID FUND_NAME ORIGINAL_FUND_ID REPLACEMENT_FUND_ID
2444582 ABC FUND 2444582 NULL
2445579 ABC FUND 2445579 2444582
2445562 XYZ FUND 2445562 NULL
PLEASE note: as per my requirement i have to select original fund ids from fund table :2445579,2445562
since 2444582 is linked as replacement fund id to any other record in fund table i have to ignore this record ,but pick 2445579 as this is the original record with
ONE OF THE history record'null' to 'issued' .Also 2445562 is not having any replacement records linked in history as well and hence i need to select this record as
well.
Can anybody provide me query considering performance into mind.
please let me know if any of the details is not clear ?
regards
rajesh
considered two tables hist and fund. And required query will be:
select *
from fund f, hist h
where f.FUND_ID=h.FUND_ID and f.fund_id is not null
and f.FUND_ID not in (select nvl(REPLACEMENT_FUND_ID,'0') from fund)
and h.OLD_STATUS is null and h.NEW_STATUS='I';
I have a table in Teradata with a particular column "location" like
location
Rockville County, TX
Green River County, IL
Joliet County, CA
Jones County, FL
.
.
.
What I need to do is strip off everything after the county's name and turn the column into something like
location
Rockville
Green River
Joliet
Jones
I've been trying to use the function trim like
trim(trailing ' County' from location)
but it's not working. Any ideas?
The trim function is used for removing whitespace.
You can use a combination of index and substring, e.g.:
select
'Green River County, IL' your_string
, substring(your_string, 0, index(your_string, 'County')) your_desired_result
index(target_string, string_to_find) gives you the position of a string within another string
substring(target_string, start_index, end_index) allows you to pull out a specific part of a string
This is the Standard SQL way:
substring(location from 0 for position(' County' in location))
In TD14 you might also use a regular expression:
regexp_substr(location, '.*(?= County)')