Unable to create pmml file from dataframe object - r

I created a interpolation model for my data . Now i'm looking for creating a pmml file out of the result. See my code where I've tried creating a pmml file as well. It throws error at saveXML function and also my pmml file that gets generated has no data.
Error Message:
saveXML(dfi_pmml,file="test1.pmml")
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘saveXML’ for signature ‘"list"’
library(pmml)
library(XML)
library(gmodels)
library(zoo)
library("data.table")
library(pmmlTransformations)
df <- fread("C:/Users/myprofile/Desktop/test logs/test3.csv",
select = c("Timestamp", "Var1"))
head(df)
df[['Timestamp']] <- as.POSIXct(df[['Timestamp']],
format = "%Y %m %d %H:%M:%S:%OS")
rng <- range(df$Timestamp)
seq1 <- seq(df$Timestamp[[12]], df$Timestamp[[13]], 1)
dfi <- with(df, data.frame(approx(Timestamp, Var1, seq1)))
z <- read.zoo(dfi)
dfi
#pmml file gen
dfi_pmml <- WrapData(dfi)
write(toString(dfi_pmml),file="test1.pmml")
saveXML(dfi_pmml,file="test1.pmml")
Input File Data:
|Timestamp |Var1 |
|:----------------------|----:|
|2020 07 08 00:00:00:893|-0.02|
|2020 07 08 00:00:09:793|-0.02|
|2020 07 08 00:00:10:993|-0.01|
|2020 07 08 00:00:12:193|0.26 |
|2020 07 08 00:00:13:393|0.48 |
|2020 07 08 00:00:14:593|0.63 |
|2020 07 08 00:00:15:793|0.75 |
|2020 07 08 00:00:16:993|0.86 |
|2020 07 08 00:00:18:193|0.97 |
|2020 07 08 00:00:19:393|1.2 |
|2020 07 08 00:00:20:493|2.27 |
|2020 07 08 00:00:40:693|4 |
|2020 07 08 00:01:00:893|4.3 |
|2020 07 08 00:01:21:093|3.02 |
|2020 07 08 00:01:41:293|2.23 |
|2020 07 08 00:02:01:493|1.79 |
|2020 07 08 00:02:21:693|1.62 |
|2020 07 08 00:02:41:893|1.59 |
|2020 07 08 00:03:02:093|1.63 |

Related

Covert string into date in Pyspark dataframe

I was trying to convert a string column in my dataframe into date type. The string looks like this :
Fri Oct 12 18:14:29 +0000 2018
And I have tried this code
df_en.withColumn('date_timestamp',unix_timestamp('created_at','ddd MMM dd HH:mm:ss K yyyy')).show()
But I got the result of :
+--------------------+--------------------+--------------------+--------------+
| created_at| text| sentiment|date_timestamp|
+--------------------+--------------------+--------------------+--------------+
|Mon Oct 15 20:53:...|What a shock hey,...|-0.07755102040816327| null|
|Fri Oct 12 18:14:...|No Bucky, people ...| 0.0| null|
|Wed Oct 10 07:51:...|If Sarah Hanson Y...| 0.05| null|
|Mon Oct 15 02:30:...| 365 days| 0.0| null|
|Sun Oct 14 06:17:...|#HimToo: how an a...| -0.5| null|
|Tue Oct 09 07:30:...|hopefully the #Hi...| 0.0| null|
|Tue Oct 09 23:30:...|If Labor win Gove...| 0.8| null|
|Thu Oct 11 01:09:...|Hello #Perth - th...| 0.75| null|
|Sat Oct 13 21:47:...|#MeToo changed th...| 0.0| null|
|Tue Oct 09 00:41:...|Rich for Queensla...| 0.375| null|
|Mon Oct 15 12:59:...|Wonder what else ...| 0.0| null|
|Mon Oct 15 05:12:...|#dani_ries #metoo...| 0.0| null|
|Wed Oct 10 00:30:...|Hey #JackieTrad a...| 0.25| null|
|Tue Oct 16 04:00:...|“There's this ide...| 0.03611111111111113| null|
|Sun Oct 14 08:14:...|Is this the attit...|-0.01499999999999999| null|
|Sat Oct 13 11:26:...|#metoo official s...| 0.1| null|
|Tue Oct 09 00:23:...|On the limited an...|-0.01904761904761...| null|
|Tue Oct 16 14:41:...|Domestic Violence...| 0.0| null|
|Wed Oct 10 23:34:...|#australian Note ...| 0.0| null|
|Sat Oct 06 20:07:...|Wtaf, America. I ...| 0.0| null|
+--------------------+--------------------+--------------------+--------------+
Also, I have tried
df_en.select(col("created_at"),to_date(col("created_at")).alias("to_date") ).show()
The result is exactly the same. I don't know why, could anybody help me ?
Try this pattern EEE MMM dd HH:mm:ss Z yyyy with Spark config .config('spark.sql.legacy.timeParserPolicy', 'LEGACY'). Check this as well.

How can I write a shell scripts that calls another script for each day starting from start date upto current date [duplicate]

This question already has answers here:
How to loop through dates using Bash?
(10 answers)
Closed 1 year ago.
How can I write a shell scripts that calls another script for each day
i.e currently have
#!/bin/sh
./update.sh 2020 02 01
./update.sh 2020 02 02
./update.sh 2020 02 03
but I want to just specify start date (2020 02 01) and let is run update.sh for every day upto current date, but don't know how to manipulate date in shell script.
I made a stab at it, but rather messy, would prefer if it could process date itself.
#!/bin/bash
for j in {4..9}
do
for k in {1..9}
do
echo "update.sh" 2020 0$j 0$k
./update.sh 2020 0$j 0$k
done
done
for j in {10..12}
do
for k in {10..31}
do
echo "update.sh" 2020 $j $k
./update.sh 2020 $j $k
done
done
for j in {1..9}
do
for k in {1..9}
do
echo "update.sh" 2021 0$j 0$k
./update.sh 2021 0$j 0$k
done
done
for j in {1..9}
do
for k in {10..31}
do
echo "update.sh" 2021 0$j $k
./update.sh 2021 0$j $k
done
done
You can use date to convert your input dates into seconds in order to compare. Also use date to add one day.
#!/bin/bash
start_date=$(date -I -d "$1") # Input in format yyyy-mm-dd
end_date=$(date -I) # Today in format yyyy-mm-dd
echo "Start: $start_date"
echo "Today: $end_date"
d=$start_date # In case you want start_date for later?
end_d=$(date -d "$end_date" +%s) # End date in seconds
while [ $(date -d "$d" +%s) -le $end_d ]; do # Check dates in seconds
# Replace `echo` in the below with your command/script
echo ${d//-/ } # Output the date but replace - with [space]
d=$(date -I -d "$d + 1 day") # Next day
done
In this example, I use echo but replace this with the path to your update.sh.
Sample output:
[user#server:~]$ ./dateloop.sh 2021-08-29
Start: 2021-08-29
End : 2021-09-20
2021 08 29
2021 08 30
2021 08 31
2021 09 01
2021 09 02
2021 09 03
2021 09 04
2021 09 05
2021 09 06
2021 09 07
2021 09 08
2021 09 09
2021 09 10
2021 09 11
2021 09 12
2021 09 13
2021 09 14
2021 09 15
2021 09 16
2021 09 17
2021 09 18
2021 09 19
2021 09 20

How to combine two files sequentially based on certain conditions in Unix

I am trying to format files in Unix (In this case RHEL).
File 1
AAAAA|AAA|1582|YNYY
BBBBB|BAV|1234|NYYY
File 1 has 1 sample record (row). There are 4 columns in each record. In Column 4 we have 4 status values.
File 2
20190103|W 2019 01
20190203|W 2019 02
20190303|W 2019 03
20190403|W 2019 04
Output has to be as follows:
AAAAA|1582|Y|20190103|W 2019 01
AAAAA|1582|N|20190203|W 2019 02
AAAAA|1582|Y|20190303|W 2019 03
AAAAA|1582|Y|20190403|W 2019 04
BBBBB|1234|N|20190103|W 2019 01
BBBBB|1234|Y|20190203|W 2019 02
BBBBB|1234|Y|20190303|W 2019 03
BBBBB|1234|Y|20190403|W 2019 04
I have tried AWK and Paste but am not able to get the required output.
Using awk
awk -F'|' '{split($4,a,""); b=$1"|"$2"|"$3} { getline < "file2"; for (i in a ) print b"|"a[i]"|"$0 }' < file1`
Demo:
$cat file1 file2
AAAAA|AAA|1582|YNYY
BBBBB|BAV|1234|NYYY
20190103|W 2019 01
20190203|W 2019 02
20190303|W 2019 03
20190403|W 2019 04
$awk -F'|' '{split($4,a,""); b=$1"|"$2"|"$3} { getline < "file2"; for (i in a ) print b"|"a[i]"|"$0 }' < file1
AAAAA|AAA|1582|Y|20190103|W 2019 01
AAAAA|AAA|1582|N|20190103|W 2019 01
AAAAA|AAA|1582|Y|20190103|W 2019 01
AAAAA|AAA|1582|Y|20190103|W 2019 01
BBBBB|BAV|1234|N|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
BBBBB|BAV|1234|Y|20190203|W 2019 02
$
Explanation:
awk -F'|' <-- Set field seprator as |
'{split($4,a,""); <-- Split 4th field and store in array a
b=$1"|"$2"|"$3} <-- Store Column 1-2 in variable b
getline < "file2"; <-- read input from file2 row by row
for (i in a ) print b"|"a[i]"|"$0 <-- Loop through array a and append variable b and input record from file2
Note: When you use getline value of internal variables $0, NF, NR get changed

file manipulation unix

cat sample_file.txt(Extracted job info from Control-M)
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 7 2019 4:45 AM,Oct 7 2019 4:45 AM,1,1,Oct 6 2019 12:00 AM,Ended OK,3ppnc
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 17 2019 4:02 AM,Oct 17 2019 4:02 AM,3,1,Oct 16 2019 12:00 AM,Ended OK,3pqgq
I need to process this file into DB table(Oracle)
Bu I need to make sure that day is 2 number (example 7 to 07).
(example: Oct 07 2019 6:32 AM)
I used this command to get all the date in every line:
cat sample_file.txt | grep "," | while read line
do
l_start_date=`echo $line|cut -d ',' -f4`
l_end_date=`echo $line|cut -d ',' -f5`
l_order_date=`echo $line|cut -d ',' -f8`
echo $l_start_date
echo $l_end_date
echo $l_order_date
done
Output:
Oct 7 2019 4:45 AM
Oct 7 2019 4:45 AM
Oct 6 2019 12:00 AM
Oct 17 2019 4:02 AM
Oct 17 2019 4:02 AM
Oct 16 2019 12:00 AM
expected output:
FROM: Oct 7 2019 6:32 AM
To: Oct 07 2019 6:32 AM
I used this sed command but it add also to 2 number day (17)
sed command sed 's|,Oct |,Oct 0|g' sample_file.txt
Oct 17 was change to Oct 017
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 07 2019 4:45 AM,Oct 07 2019 4:45 AM,1,1,Oct 06 2019 12:00 AM,Ended OK,3ppnc
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 017 2019 4:02 AM,Oct 017 2019 4:02 AM,3,1,Oct 016 2019 12:00 AM,Ended OK,3pqgq
I wish it was easier, but I only managed the following:
awk.f:
function fmt(s) {
split(s,a," "); a[2]=substr(a[2]+100,2)
return a[1] " " a[2] " "a[3] " " a[4] " " a[5]
}
BEGIN {FS=",";OFS=","}
{gsub(/ +/," ");
$4=fmt($4); $5=fmt($5); $8=fmt($8);
print}
This is a little awk script that first removes superfluous blanks and then picks out particular columns (4,5 and 8) and reformats the second part of each date string into a two-digit number.
You run the script like this:
awk -f f.awk sample_file.txt
output:
upctm,pmdw_aud,pmdw_aud_ext_06-GAPAnalysYTD,Oct 07 2019 6:32 AM,Oct 07 2019 6:32 AM,17,17,Oct 06 2019 12:00 AM,Ended OK,3pu9v
upctm,pmdw_ddm,pmdw_ddm_dum_01-StartProjDCSDemand,Oct 07 2019 4:02 AM,Oct 07 2019 4:02 AM,3,1,Oct 06 2019 12:00 AM,Ended OK,3pqgq
upctm,pmdw_bip,pmdw_bip_mnt_35-FOLDistAutoRpt,Oct 07 2019 4:45 AM,Oct 07 2019 4:45 AM,1,1,Oct 06 2019 12:00 AM,Ended OK,3ppnc
With a fixed locale, you can make a fixed replacement like
sed -r 's/(Jan|Feb|Oct|Whatever) ([1-9]) /\1 0\2 /g' sample_file.txt

GridView Layout/Output

I have an Website using ASP.Net 2.0 with SQL Server as Database and C# 2005 as programming language. In one of the pages I have a GridView with following layout.
Date -> Time -> QtyUsed
The sample values are as follows: (Since this GridView/Report is generated for a specific month only, I have extracted and displaying only the Day part of the date ignoring the month and year part.
01 -> 09:00 AM -> 05
01 -> 09:30 AM -> 03
01 -> 10:00 AM -> 09
02 -> 09:00 AM -> 10
02 -> 09:30 AM -> 09
02 -> 10:00 AM -> 11
03 -> 09:00 AM -> 08
03 -> 09:30 AM -> 09
03 -> 10:00 AM -> 12
Now the user wants the layout to be like:
Time 01 02 03 04 05 06 07 08 09
-------------------------------------------------------------------------
09:00 AM -> 05 10 08
09:30 AM -> 03 09 09
10:00 AM -> 09 11 12
The main requirement is that the days should be in the column header from 01 to the last date (the reason why I extracted only the day part from the date). The Timeslots should be down as rows.
From my experience with Excel, the idea of Transpose comes to my mind to solve this, but I am not sure.
Please help me in solving this problem.
Thank you.
Lalit Kumar Barik
You will have to generate the dataset accordingly. I am guessing you are doing some kind of grouping based on the hour so generate a column for each hour of the day and populate the dataset accordingly.
In SQL Server, there is a PIVOT function that may be of use.
The MSDN article specifies usage and gives an example.
The example is as follows
Table DailyIncome looks like
VendorId IncomeDay IncomeAmount
---------- ---------- ------------
SPIKE FRI 100
SPIKE MON 300
FREDS SUN 400
SPIKE WED 500
...
To show
VendorId MON TUE WED THU FRI SAT SUN
---------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
FREDS 500 350 500 800 900 500 400
JOHNS 300 600 900 800 300 800 600
SPIKE 600 150 500 300 200 100 400
Use this select
SELECT * FROM DailyIncome
PIVOT( AVG( IncomeAmount )
FOR IncomeDay IN
([MON],[TUE],[WED],[THU],[FRI],[SAT],[SUN])) AS AvgIncomePerDay
Alternatively, you could select all of the data from DailyIncome and build a DataTable with the data pivoted. Here is an example.

Resources