I'm wanting to plot the country vs a date range of exposures to COVID-19, as a learning tool in RStudio.
I've been trying to read the CSV and store as a dataframe, then plot via ggplot, but I think that I'm doing this incorrectly, since this is a date range.
How could I approach this to plot the infected countries to the dates, which increase daily in the header?
| Province/State | 1/21/2020 22:00 | 1/22/2020 12:00 | 1/23/2020 12:00 | 1/24/2020 0:00 |...
|----------------|-----------------|-----------------|-----------------|----------------|
| Anhui | 1 | 1 | 2 | 5 |...
| Beijing | 1 | 1 | 3 | 4 |...
| Chongqing | 2 | 4 | 5 | 6 |...
These cases are not accurate, just generated through MD to provide a table of data.
Thank you!
Related
I want to merge two files using a unique ID and timestamp, and also get measurements for next next n intervals.
The first file has over 15,000 unique IDs. Each ID has measurements taken at 15 minute intervals from Jan 1, 00:00 to Dec 31, 23:45. The database is quite big (35 GB) with over 500 million rows. The file looks something like this.
First file
| ID | Time | Measurement|
|:----:|:---------------:|:------:|
| 1 |2012-12-31 22:45| 61 |
| 1 |2012-12-31 23:00| 60 |
| 1 |2012-12-31 23:15| 61 |
| 1 |2012-12-31 23:30| 59 |
| 1 |2012-12-31 23:45| 59 |
| 2 |2012-01-01 0:00| 60 |
| 2 |2012-01-01 0:15| 61 |
| 2 |2012-01-01 0:30| 60 |
| 2 |2012-01-01 0:45| 62 |
The second file has unique IDs and a timestamp. IDs in this file is a subset of IDs in the first file. The file is realtively small (~50 MB) compared to the first file.
Second file
| ID | Time |
|:----:|:---------------:|
| 1 |2012-12-31 22:48|
| 1 |2012-12-31 23:48|
| 2 |2012-01-01 0:16|
I want to merge the two files such that the measurements are extracted for current interval, and the next n intervals. I also want to be able to specify n and and run the code dynamically.
The merged file file should look like this for n = 3. For example, for the second row the measurements for next intervals should not be derived from another ID.
After merge
| ID | Time | Measurement 1| Measurement 2| Measurement 3|
|:----:|:---------------:|:----:|:---------------:|:----:|
| 1 | 2012-12-31 22:48| 61| 60| 61 |
| 1 | 2012-12-31 23:48| 59| 59| 59 |
| 2 | 2012-01-01 0:16| 61| 60| 62 |
I started out using Firth's logistic (logistf) to deal with my small sample size (n=80), but wanted to try out exact logistic regression using the elrm package. However, I'm having trouble figuring out how to create the "collapsed" data required for elrm to run. I have a csv that I import into R as a dataframe that has the following variables/columns. Here is some example data (real data has a few more columns and 80 rows):
+------------+-----------+-----+--------+----------------+
| patien_num | asymmetry | age | female | field_strength |
+------------+-----------+-----+--------+----------------+
| 1 | 1 | 25 | 1 | 1.5 |
| 2 | 0 | 50 | 0 | 3 |
| 3 | 0 | 75 | 1 | 1.5 |
| 4 | 0 | 33 | 1 | 3 |
| 5 | 0 | 66 | 1 | 3 |
| 6 | 0 | 99 | 0 | 3 |
| 7 | 1 | 20 | 0 | 1.5 |
| 8 | 1 | 40 | 1 | 3 |
| 9 | 0 | 60 | 1 | 3 |
| 10 | 0 | 80 | 0 | 1.5 |
+------------+-----------+-----+--------+----------------+
Basically my data is one line per patient (not a frequency table). I'm trying to run a regression with asymmetry as the dependent variable and age (continuous), female (binary), and field_strength (factor) as independent variables. I'm trying to understand how to collapse this into the appropriate format so I can get that "ntrials" part required for the elrm formula.
I've looked at https://stats.idre.ucla.edu/r/dae/exact-logistic-regression/ but they start with data in a different format than mine, and having trouble. Any help appreciated!
I have a messy data in this format
Brand<–c("Brand1","Brand2","Brand3")
Sold_quantity_this_week<–c(5,8,17)
Sold_dollar_amount_this_week<–c(150,350,780)
Sold_quantity_minus_1_week<–c(7,6,8)
Sold_dollar_amount_minus_1_week<–c(200,300,350)
Sold_quantity_minus_2_week<–c(8,9,10)
Sold_dollar_amount_minus_2_week<–c(220,400,420)
| Brand | Sold quantity(this week) | Sold $amount(this week) | Sold quantity(-1 week) | Sold $amount(-1 week) | Sold quantity(-2 week) | Sold $amount(-2 week) |
|--------|--------------------------|-------------------------|------------------------|-----------------------|------------------------|-----------------------|
| Brand1 | 5 | 150 | 7 | 200 | 8 | 220 |
| Brand2 | 8 | 350 | 6 | 300 | 9 | 400 |
| Brand3 | 17 | 780 | 8 | 350 | 10 | 420 |
| | | | | | | |
This is just a simple case of my problem. I have weekly sales data with 35 weeks. I want to represent the columns in date format in order to rename all the columns with a few lines of code.
My goal is to set i column name as Date and the i+2 would be i column -7 to see the values for the previous weeks. Then the names of the columns coerce again back as character,add "quantity" to the name,(do the same for dollar amount) and then to represent the data in long format.
How can I do it?
names(data)[2] <-"26.08.2018"
for(i in seq(2,72,2)){
names(data)[,i+2]=names(data)[,i]-7
}
My code here is not working maybe because it is not possible to have Date format column names, I guess.However I do not want to rename all the names manually then make long format data. Can you please suggest possible solutions? Thanks.
I have two tables that are something as follows:
WORKDAYS
DATE | WORKDAY_LENGHT |
-----------+----------------+
12-05-2018 | 8 |
13-05-2018 | 6.5 |
14-05-2018 | 7.5 |
15-05-2018 | 8 |
ACCIDENTS
TOD | SEVERITY |
-----------------+-----------+
12-05-2018 12:00 | minor |
12-05-2018 15:00 | minor |
13-05-2018 08:00 | severe |
13-05-2018 12:00 | severe |
14-05-2018 10:30 | severe |
And I need a result that is as follows:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 2 |
13-05-2018 | 6.5 | 2 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
What I so far have tried is this:
SELECT DISTINCT
w.date,
(
SELECT
COUNT(*)
FROM
accidents a
WHERE
date(w.date) = date(a.tod)
)
AS accidents_count
FROM
workdays w
Which gives me an answer that is somewhat in the right direction. Something like this:
WORKDAYS
DATE | WORKDAY_LENGHT | ACCIDENTS_COUNT|
-----------+----------------+----------------+
12-05-2018 | 8 | 1 |
12-05-2018 | 8 | 1 |
13-05-2018 | 6.5 | 1 |
13-05-2018 | 6.5 | 1 |
14-05-2018 | 7.5 | 1 |
15-05-2018 | 8 | 0 |
This is sqlite, so the date values are stored as strings. The date function therefore should make them just dates, right? Or is that the one causing problems?
I was missing a group by and feel ashamed for opening a question before figuring this out.
adding GROUP BY date(w.date) is the solution here.
I'm trying to created a report for my asp.net application which will show the quantity of each item in combination with unit that was ordered for each day of the week. The days of the week are columns.
To be more specific:
I have two table, one is the Orders table with order id, customer name, date etc...
The second table is OrderItems, this table has order id as a foreign key, order Item id, item name, unit (exp: each, box , case), quantity, and price.
When a user picks a date range for the report, for example from 3/2/12 to 4/2/12, on my asp page, the report will group order items by week and will look as follows:
**week (1) starting from sunday of such date to saturday of such date**
item | unit | Sun | Mon | Tues | Wedn | Thur | Fri | Sat | Total Price for week
item1 | bag | 3 | 0 | 12 | 8 | 45 | 1 | 4 | $1234
item4 | box | 2 | 4 | 5 | 0 | 5 | 2 | 6 | $1234
**week (2) starting from sunday of such date to saturday of such date**
item | unit | Sun | Mon | Tues | Wedn | Thur | Fri | Sat | Total Price for week
item1 | bag | 3 | 0 | 12 | 8 | 45 | 1 | 4 | $1234
item4 | box | 2 | 4 | 5 | 0 | 5 | 2 | 6 | $2354
**week (2) starting from sunday of such date to saturday of such date**
item | unit | Sun | Mon | Tues | Wedn | Thur | Fri | Sat | Total Price for week
item1 | bag | 3 | 0 | 12 | 8 | 45 | 1 | 4 | $1234
item4 | box | 2 | 4 | 5 | 0 | 5 | 2 | 6 | $2354
I wish I could have something to show that I have already started, but crystal isn't my strong point and I dont even know where start tackling this one. I do know how to pass parameters and a datatable that I myself pre-filtered before passing it to the report. For example filtering items by date range and customer or order id.
any help would be much appreciated
Create a formula for each day of the week that totals the order.
ie Sunday quantity:
if dayOfWeek(dateField) = 'Sun'
then order.quantity
else 0
Add each day formula to the detail section of the report and then summarize it for each group level. To group it by week, just group by the date field, then set the grouping option to by week. Suppress the detail, and you'll have what you are looking for.
I don't remember the exact name of the dayOfWeek function, but it's something like that.