Insert timestamp data into MySql database from R - r

I want to insert data from an R dataframe into a MySql table.
Everything works fine except the column geburtstage which is of the type timestamp.
The class of the column geburtstage in the dataframe is "POSIXct" "POSIXt".
The result in the database is always 0000-00-00 00:00:00.
Here my R session:
library(XLConnect)
excel.file <- file.path("c:/path/test.xlsx")
elements <- readWorksheetFromFile(excel.file, sheet=1)
elements
name nummer geburtsdatum
1 Anton 1 1967-05-11
2 Berti 2 1964-05-14
3 Conni 3 1967-01-01
4 Det 4 1967-01-01
5 Edi 5 1967-01-01
6 Fritzchen 6 1967-01-01
class(elements$geburtsdatum)
[1] "POSIXct" "POSIXt"
library(RMySQL)
library(DBI)
con <- dbConnect(RMySQL::MySQL(), host = "127.0.0.1", user = "root", password = "xxxx", dbname = "test")
dbWriteTable(
+ conn = con,
+ name='testdaten3',
+ value = elements,
+ row.names = FALSE,
+ append = TRUE,
+ field.types = c(
+ name = "varchar(45)",
+ nummer = "tinyint",
+ geburtsdatum = 'timestamp'
+ )
+ )
[1] TRUE
--- end of R session ---
MySql database table testdaten3:
id name nummer geburtsdatum
1 Anton 1 0000-00-00 00:00:00
2 Berti 2 0000-00-00 00:00:00
3 Conni 3 0000-00-00 00:00:00
4 Det 4 0000-00-00 00:00:00
5 Edi 5 0000-00-00 00:00:00
6 Fritzchen 6 0000-00-00 00:00:00
I already tried to convert the data like that:
elements$geburtsdatum <- format(elements$geburtsdatum,'%Y-%m-%d %H:%M:%S')
But the result was the same.
I use RStudio Version 1.1.456 with R 3.5.1 under Windows 8.1 and a MySql Server 5.6.
Can anybody help?
Kind regards
Goetz Edinger

From your example, it seems like geburtsdatum is just a date, with no time value. In that case, why not use as.Date(elements$geburtsdatum) to change it to a date type in your data frame and then use CONCAT to add it to the MySQL db?
Like this:
CONCAT(elements$geburtsdatum, " ", "00:00:00")
Basically, you are adding the birthday to a placeholder time value in order to make a timestamp.

Thank You!!
I found the mistake. If I use a date before '1970-01-01 01:00:01' the date is changed by the database to '0000-00-00 00:00:00'. So if I use a date which is equal to '1970-01-01 01:00:01' or newer the result is correct. It doesn't matter if I do it over R or over MySQL workbench.
* PROBLEM SOLVED *

Related

Plotting of pandas DataFrame and xaxis as Timestamp produces empty plot

I have a pandas.DataFrame (df), which consists of some values and a datetime which is a string at first but which I convert to a Timestamp using
df['datetime'] = pd.to_datetime(df['Time [dd.mm.yyyy hh:mm:ss.ms]'], format="%d.%m.%Y %H:%M:%S.%f")
It seems to work and I can access the new column's element's properties like obj.day and such. So the resulting column contains a Timestamp. When I try to plot this by using either pyplot.plot(df['datetime'],df['value_name']) or df.plot(x='datetime',y='value_name'),the picture below is the reslut. I tried converting the Timestamps using obj.to_pydatetime() but that did not change anything. The dataframe itself is populated by some data coming from csvs. What confuses me, is that with a certain csvs it works but with others not. I am pretty sure that the conversion to Timestamps was successful but I could be wrong. Also my time window should be from 2015-2016 not from 1981-1700. If I try to locate the min and max Timestamp from the DataFrame, I get the right Timestamps in 2015 and 2016 respectively.
Resulting Picture form pyplot.plot
Edit:
df.head() gives:
Sweep Time [dd.mm.yyyy hh:mm:ss.ms] Frequency [Hz] Voltage [V]
0 1.0 11.03.2014 10:13:04.270 50.0252 230.529
1 2.0 11.03.2014 10:13:06.254 49.9515 231.842
2 3.0 11.03.2014 10:13:08.254 49.9527 231.754
3 4.0 11.03.2014 10:13:10.254 49.9490 231.678
4 5.0 11.03.2014 10:13:12.254 49.9512 231.719
datetime
0 2014-03-11 10:13:04.270
1 2014-03-11 10:13:06.254
2 2014-03-11 10:13:08.254
3 2014-03-11 10:13:10.254
4 2014-03-11 10:13:12.254
and df.info() gives:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 33270741 entries, 0 to 9140687
Data columns (total 5 columns):
Sweep float64
Time [dd.mm.yyyy hh:mm:ss.ms] object
Frequency [Hz] float64
Voltage [V] float64
datetime datetime64[ns]
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 1.5+ GB
I am trying to plot 'Frequency [Hz]'vs 'datetime'.
I think you need set_index and then set formatting of both axis:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
df['datetime'] = pd.to_datetime(df['Time [dd.mm.yyyy hh:mm:ss.ms]'],
format="%d.%m.%Y %H:%M:%S.%f")
print (df)
df.set_index('datetime', inplace=True)
ax = df['Frequency [Hz]'].plot()
ticklabels = df.index.strftime('%Y-%m-%d')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.2f'))
plt.show()

How to extract only year from the date in dataframes? [duplicate]

This question already has answers here:
Get the year from a timestamp in R [duplicate]
(2 answers)
Closed 7 years ago.
This is my data...& i need to extract data by using only basic R(dont use mysql, php,python ,c# or any other)
**service** **Date**
disconnected 2013-01-14
disconnected 2013-03-15
disconnected 2012-02-24
disconnected 2012-12-05
disconnected 2012-06-08
disconnected 2011-05-08
disconnected 2010-10-11
disconnected 2010-12-02
The data i need to extract is only year...from the date....& later again i need to assign it to new variable or vector.....
the following output should be....
OUTPUT
**service** **Date**
disconnected 2013
disconnected 2013
disconnected 2012
disconnected 2012
disconnected 2012
disconnected 2011
disconnected 2010
disconnected 2010
There are many options. One way is using substr to get the first 4 character elements from 'Date' column (assuming that we are not going back to > 1000 )
df1$Year <- substr(df1$Date, 1,4)
Or we match the substring that begins from - followed by one or more characters to the end of the string, replace with '' using sub.
df1$Year <- sub('-.*$', '', df1$Date)
Or we can extract the year by converting to POSIXlt class
strptime(df1$Date, '%Y-%m-%d')$year+1900
If we are allowed to use packages, library(lubridate) has a convenient function i.e. year
library(lubridate)
year(df1$Date)
data
df1 <- structure(list(service = c("disconnected", "disconnected", "disconnected",
"disconnected", "disconnected", "disconnected", "disconnected",
"disconnected"), Date = c("2013-01-14", "2013-03-15", "2012-02-24",
"2012-12-05", "2012-06-08", "2011-05-08", "2010-10-11", "2010-12-02"
)), .Names = c("service", "Date"), class = "data.frame",
row.names = c(NA, -8L))
If you make date a Date variable, format can pull out the year quite easily.
D <- data.frame(service = rep("disconnected", 3),
date = c("2013-01-14", "2013-03-15", "2012-02-24"))
D$year <- format(as.Date(D$date), format = "%Y")
D
service date year
1 disconnected 2013-01-14 2013
2 disconnected 2013-03-15 2013
3 disconnected 2012-02-24 2012

sqlSave, How to Write data to SQL developer having date Column containing hyphen

I have a dataframe data,Which Contains the columns having integers,and columns containing date and time,As shown
>head(data,2)
PRESSURE AMBIENT_TEMP OUTLET_PRESSURE COMP_STATUS DATE TIME predict
1 14 65 21 0 2014-01-09 12:45:00 0.6025863
2 17 65 22 0 2014-01-10 06:00:00 0.6657910
And Now i'm going to write this back to Sql database by the chunck
sqlSave(channel,data,tablename = "ANL_ASSET_CO",append = T)
Where channel is connection name,But this gives error
[RODBC] Failed exec in Update
22018 1722 [Oracle][ODBC][Ora]ORA-01722: invalid number
But When i try excluding the date column ,it writes back without any error.
> sqlSave(channel,data[,c(1:4,7)],tablename = "ANL_ASSET_CO",append = T)
> sqlSave(channel,data[,c(1:4,6:7)],tablename = "ANL_ASSET_CO",append = T)
Because of the date column the data is not writing to ORACLE SQL developer,Could be problem with the hyphen.
How can i write , Any help !!
>class(data$DATE)
[1] "POSIXct" "POSIXt"
So had to change the data type as character
>data$DATE <- as.character(data$DATE)
>sqlSave(channel,data,tablename = "ANL_ASSET_CO",append=T)
This one worked!!

R! posIXCT in sqldf

first time question, so if I missed something I apologize:
I imported an excel file into R! using XLconnect, the str() function is as follow:
data.frame': 931 obs. of 5 variables:
$ Media : chr "EEM" "EEM" "EEM" "EEM" ...
$ Month : POSIXct, format: "2014-08-01" "2014-08-01" "2014-08-01" "2014-08-01" ...
$ Request_Row : num 8 25 26 37 38 44 53 62 69 83 ...
$ Total_Click : num 12 9 9 8 8 8 7 7 7 7 ...
$ Match_Type : chr "S" "S" "S" "S" ...
when I use the following sqldf I get no rows selected, anyway to what could be wrong:
sqldf(" select Media, sum(Total_Click) , avg(Request_Row), min(Request_Row) , max(Request_Row), count(distinct(Media)) from All_Data
where Request_Row < 100
and month='2014-09-01'
group by 1,2 order by 2,6 desc ")
<0 rows> (or 0-length row.names)
Thanks for the help
Vj
Its not clear what is intended but the code shown has these problems:
Month is used in the data but month is used in the SQL statement
SQLite has no date or time types and so if you send a POSIXct value to SQLite it will be interpreted as the number of seconds since the UNIX epoch (in GMT time zone). Thus the comparison of the month to a character string won't work. You can convert the number of seconds to yy-mm-dd using the SQLite strftime or date functions. Alternately use a database that has datetime types. sqldf supports the H2 database and it supports date and time types.
The statement is trying to group by Media and sum(Total_Click). Grouping by an aggregated value is not legal although perhaps it could be done by nesting selects depending on what you intended.
Since the statement is grouping by Media the expressoin count(distinct(Media)) fromAll_Data will always be 1 since there can only be one Media in such a group.
You will need to clarify what your intent is but if we drop or fix up the various points we can get this:
sqldf("select
Media,
sum(Total_Click) sum_Total_Click,
avg(Request_Row) avg_Request_Row,
min(Request_Row) min_Request_Row,
max(Request_Row) max_Request_Row
from All_Data
where Request_Row < 100
and date(month, 'unixepoch', 'localtime') = '2014-08-01'
group by 1 order by 2 desc")
which gives:
Media sum_Total_Click avg_Request_Row min_Request_Row max_Request_Row
1 EEM 38 24 8 37
RH2 To use the RH2 package and H2 database instead be sure you have Java and RH2 installed (RH2 includes the H2 database so that does not need to be separately installed) and then:
library(RH2)
library(sqldf)
sqldf("...")
where the ... is replaced with the same SQL statement except the date comparison simplifies to this line:
and month = '2014-08-01'
Data: When posting to the SO R tag please show your data using dput. In this case this was used:
All_Data <-
structure(list(Media = c("EEM", "EEM", "EEM", "EEM"), Month = structure(c(1406865600,
1406865600, 1406865600, 1406865600), class = c("POSIXct", "POSIXt"
), tzone = ""), Request_Row = c(8, 25, 26, 37), Total_Click = c(12,
9, 9, 8), Match_Type = c("S", "S", "S", "S")), .Names = c("Media",
"Month", "Request_Row", "Total_Click", "Match_Type"), row.names = c(NA,
-4L), class = "data.frame")
Update: Misc revisions.

How do I change the index in a csv file to a proper time format?

I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.

Resources