how to split date-time and data from same column in csv using python? - arduino

I am storing data from Arduino to RaspberryPi. My code is working well, but the date-time and data are logging in the first column together (picture 1), although those are shown separately in the RaspberryPi display (picture 2). How can I log those in a seperate column? I attached my code here.
import time,datetime
from datetime import datetime
import csv
import serial
arduino_port = "/dev/ttyACM0"
baud = 9600
fileName="Arduino1.csv"
ser = serial.Serial(arduino_port, baud)
print("Connected to Arduino port:" + arduino_port)
file = open(fileName, "a")
print("Created file")
samples =float('inf')
print_labels = False
line = 0
print('Press Ctrl-C to quit...')
print('Time, CO2(ppm), Temp(C), Humi(%), Vis, IR, UV, pH')
print('-' *85)
while line <= samples:
d=datetime.now().strftime('%Y-%m-%d %H:%M:%S')
if print_labels:
if line==0:
print("Printing Column Headers")
else:
print("Line " + str(line) + ": writing...")
getData=str(ser.readline())
data=getData [1:] [1:-5]
print(d, data)
file = open(fileName, "a")
file.write(d+ data+ "\n")
line = line+1
file.close()

try adding a comma after the Date-time string to separate the data to another column? The date-time and data appear to be separate in the RaspberryPi, but not comma separated, thus both are in the same column.
Try this -
file.write(d + "," + data + "\n")

Related

How to import data from a HTML table on a website to excel?

I would like to do some statistical analysis with Python on the live casino game called Crazy Time from Evolution Gaming. There is a website that has the data to do this: https://tracksino.com/crazytime. I want the data of the lowest table 'Spin History' to be imported into excel. However, I do not now how this can be done. Could anyone give me an idea where to start?
Thanks in advance!
Try the below code:
import json
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
import csv
import datetime
def scrap_history():
csv_headers = []
file_path = '' #mention your system where you have to save the file
file_name = 'spin_history.csv' # filename
page_number = 1
while True:
#Dynamic URL fetching data in chunks of 100
url = 'https://api.tracksino.com/crazytime_history?filter=&sort_by=&sort_desc=false&page_num=' + str(page_number) + '&per_page=100&period=24hours'
print('-' * 100)
print('URL created : ',url)
response = requests.get(url,verify=False)
result = json.loads(response.text) # loading data to convert in JSON.
history_data = result['data']
print(history_data)
if history_data != []:
with open(file_path + file_name ,'a+') as history:
#Headers for file
csv_headers = ['Occured At','Slot Result','Spin Result','Total Winners','Total Payout',]
csvwriter = csv.DictWriter(history, delimiter=',', lineterminator='\n',fieldnames=csv_headers)
if page_number == 1:
print('Writing CSV header now...')
csvwriter.writeheader()
#write exracted data in to csv file one by one
for item in history_data:
value = datetime.datetime.fromtimestamp(item['when'])
occured_at = f'{value:%d-%B-%Y # %H:%M:%S}'
csvwriter.writerow({'Occured At':occured_at,
'Slot Result': item['slot_result'],
'Spin Result': item['result'],
'Total Winners': item['total_winners'],
'Total Payout': item['total_payout'],
})
print('-' * 100)
page_number +=1
print(page_number)
print('-' * 100)
else:
break
Explanation:
I have implemented the above script using python requests way. The API url https://api.tracksino.com/crazytime_history?filter=&sort_by=&sort_desc=false&page_num=1&per_page=50&period=24hours extarcted from the web site itself(refer screenshot). In the very first step script will take the dynamic URL where page number is dynamic and changed upon on every iteration. For ex:- first it will be page_num = 1 then page_num = 2 and so on till all the data will get extracted.

Why more number of duplicated data is saving in my excel sheet for my code?

Actually this code is generally used to scrape data from websites but the problem is more number of duplicated data is producing and saving in my excel sheet.
def extractor():
time.sleep(10)
souptree = html.fromstring(driver.page_source)
tburl = souptree.xpath("//table[contains(#id, 'theDataTable')]//tbody//tr//td[4]//a//#href")
for tbu in tburl:
allurl = []
allurl.append(urllib.parse.urljoin(siteurl, tbu))
for tb in allurl:
get_url = requests.get(tb)
get_soup = html.fromstring(get_url.content)
pattern = re.compile("^\s+|\s*,\s*|\s+$")
name = get_soup.xpath('//td[#headers="contactName"]//text()')
phone = get_soup.xpath('//td[#headers="contactPhone"]//text()')
mail = get_soup.xpath('//td[#headers="contactEmail"]//a//text()')
artitle = get_soup.xpath('//td[#headers="contactEmail"]//a//#href')
artit = ([x for x in pattern.split(str(artitle)) if x][-1])
title = artit[:-2]
for (nam, pho, mai) in zip(name, phone, mail):
fname = nam[9:]
allmails.append(mai)
allnames.append(fname)
allphone.append(pho)
alltitles.append(title)
fullfile = pd.DataFrame({'Names': allnames, 'Mails': allmails, 'Title': alltitles, 'Phone Numbers': allphone})
writer = ExcelWriter('G:\\Sheet_Name.xlsx')
fullfile.to_excel(writer, 'Sheet1', index=False)
writer.save()
print(fname, pho, mai, title, sep='\t')
while True:
time.sleep(10)
extractor()
try:
nextbutton()
except (WebDriverException):
driver.refresh()
except(NoSuchElementException):
time.sleep(10)
driver.quit()
I want the output should not be duplicated but almost half and more number of data are duplicating each time i run the code.

How to fix python returning multiple lines in a .csv document instead of one?

I am trying to scrape data form a public forum for a school project, but every-time I run the code, the resulting .csv file shows multiple rows for the text variable instead of just one.
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url = 'https://www.emimino.cz/diskuse/1ivf-repromeda-56566/'
uClient = uReq(my_url)
page_soup = soup(uClient.read(),"html.parser")
uClient.close()
containers = page_soup.findAll("div",{"class":"discussion_post"})
out_filename = "Repromeda.csv"
headers = "text,user_name,date \n"
f = open(out_filename, "w")
f.write(headers)
for container in containers:
text1 = container.div.p
text = text1.text
user_container = container.findAll("span",{"class":"user_category"})
user_id = user_container[0].text
date_container = container.findAll("span",{"class":"date"})
date = date_container[1].text
print("text: " + text + "\n" )
print("user_id: " + user_id + "\n")
print("date: " + date + "\n")
# writes the dataset to file
f.write(text.replace(",", "|") + ", " + user_id + ", " + date + "\n")
f.close()
Ideally I am trying to create a row for each data entry (ie. text, user_id, date in one row), but instead I get multiple rows for one text entry and only one row for user_id and date entry.
this is the actual output
this is the expected output
Just replace the new line with blank string.
for container in containers:
text1 = container.div.p
text = text1.text.replace('\n', ' ')

Python ArcPy - Print Layer with highest field value

I have some python code that goes through layers in my ArcGIS project and prints out the layer names and their corresponding highest value within the field "SUM_USER_VisitCount".
Output Picture
What I want the code to do is only print out the layer name and SUM_USER_VisitCount field value for the one layer with the absolute highest value.
Desired Output
I have been unable to figure out how to achieve this and can't find anything online either. Can someone help me achieve my desired output?
Sorry if the code layout is a little weird. It got messed up when I pasted it into the "code sample"
Here is my code:
import arcpy
import datetime
from datetime import timedelta
import time
#Document Start Time in-order to calculate Run Time
time1 = time.clock()
#assign project and map frame
p =
arcpy.mp.ArcGISProject(r'E:\arcGIS_Shared\Python\CumulativeHeatMaps.aprx')
m = p.listMaps('Map')[0]
Markets = [3000]
### Centers to loop through
CA_Centers = ['Castro', 'ColeValley', 'Excelsior', 'GlenPark',
'LowerPacificHeights', 'Marina', 'NorthBeach', 'RedwoodCity', 'SanBruno',
'DalyCity']
for Market in Markets:
print(Market)
for CA_Center in CA_Centers:
Layers =
m.listLayers("CumulativeSumWithin{0}_{1}_Jun2018".format(Market,CA_Center))
fields = ['SUM_USER_VisitCount']
for Layer in Layers:
print(Layer)
sqlClause = (None, 'ORDER BY ' + 'SUM_USER_VisitCount') # + 'DESC'
with arcpy.da.SearchCursor(in_table = Layer, field_names = fields,
sql_clause = sqlClause) as searchCursor:
print (max(searchCursor))
You can create a dictonary that stores the results from each query and then print out the highest one at the end.
results_dict = {}
for Market in Markets:
print(Market)
for CA_Center in CA_Centers:
Layers =
m.listLayers("CumulativeSumWithin{0}_{1}_Jun2018".format(Market,CA_Center))
fields = ['SUM_USER_VisitCount']
for Layer in Layers:
print(Layer)
sqlClause = (None, 'ORDER BY ' + 'SUM_USER_VisitCount') # + 'DESC'
with arcpy.da.SearchCursor(in_table = Layer, field_names = fields,
sql_clause = sqlClause) as searchCursor:
print (max(searchCursor))
results_dict[Layer] = max(searchCursor)
# get key for dictionary item with the highest value
highest_count_layer = max(results_dict, key=results_dict.get)
print(highest_count_layer)
print(results_dict[highest_count_layer])

How can I cut large csv files using any R packages like ff or data.table?

I want to cut large csv files (file size more than RAM size) and use them or save each in disk for later usage. Which R package is best for doing this for large files?
I haven't tried but using skip and nrows parameters in read.table or read.csv is worth a try. These are from ?read.table
skip integer: the number of lines of the data file to skip before
beginning to read data.
nrows integer: the maximum number of rows to read in. Negative and
other invalid values are ignored.
To avoid some troublesome issues at the end you need to do some error handling. In other words I don't know what happpens when skip value is greater than the number of rows in your big csv.
p.s. I also don't know whether header=TRUE is affecting skip or not, you also have to check that.
The answer given bu #berkorbay is OK and I can confirm that header can be used with skip. However, if your file is really large it gets painfully slow, as each subsequent reading after the first must skip over all previously read lines.
I had to do something similar and, after wasting quite a bit of time, I wrote a short script in PERL which fragments the original file in chuncks that you can read one after the other. It is much faster. I enclose the source here, translating some parts so that the intent is clear:
#!/usr/bin/perl
system("cls");
print("Fragment .csv file keeping header in each chunk\n") ;
print("\nEnter input file name = ") ;
$entrada = <STDIN> ;
print("\nEnter maximum number of lines in each fragment = ") ;
$nlineas = <STDIN> ;
print("\nEnter output file name stem = ") ;
$salida = <STDIN> ;
chop($salida) ;
open(IN,$entrada) || die "Cannot open input file: $!\n" ;
$cabecera = <IN> ;
$leidas = 0 ;
$fragmento = 1 ;
$fichero = $salida.$fragmento ;
open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
print OUT $cabecera ;
while(<IN>) {
if ($leidas > $nlineas) {
close(OUT) ;
$fragmento++ ;
$fichero = $salida.$fragmento ;
open(OUT,">$fichero") || die "Cannot open output file: $!\n" ;
print OUT $cabecera ;
$leidas = 0;
}
$leidas++ ;
print OUT $_ ;
}
close(OUT) ;
Just save with whatever name and execute. The first line might have to be changed if you have PERL in a diferent place (an, if you are on Windows, you migh have to invoke the script as "perl name-of-script").
One should have used read.csv.ffdf of ff package with specific parameters like this to read big file:
library(ff)
a <- read.csv.ffdf(file="big.csv", header=TRUE, VERBOSE=TRUE, first.rows=1000000, next.rows=1000000, colClasses=NA)
Once big file is read into a ff object, Subsetting ffobject into data frames can be done using:
a[1000:1000000,]
Rest of the code for subsetting and saving broken dataframes
totalrows = dim(a)[1]
row.size = as.integer(object.size(a[1:10000,])) / 10000 #in bytes
block.size = 200000000 #in bytes .IN Mbs 200 Mb
#rows.block is rows per block
rows.block = ceiling(block.size/row.size)
#nmaps is the number of chunks/maps of big dataframe(ff), nmaps = number of maps - 1
nmaps = floor(totalrows/rows.block)
for(i in (0:nmaps)){
if(i==nmaps){
df = a[(i*rows.block+1) : totalrows,]
}
else{
df = a[(i*rows.block+1) : ((i+1)*rows.block),]
}
#process df or save it
write.csv(df,paste0("M",i+1,".csv"))
#remove df
rm(df)
}
Alternatively you can first read the files into mysql using dbWriteTable and then use read.dbi.ffdf function from the ETLUtils package to read it back to R. Consider the function below;
read.csv.sql.ffdf <- function(file, name,overwrite = TRUE, header = TRUE, drv = MySQL(), dbname = "new", username = "root",host='localhost', password = "1234"){
conn = dbConnect(drv, user = username, password = password, host = host, dbname = dbname)
dbWriteTable(conn, name, file, header = header, overwrite = overwrite)
on.exit(dbRemoveTable(conn, name))
command = paste0("select * from ", name)
ret = read.dbi.ffdf(command, dbConnect.args = list(drv =drv, dbname = dbname, username = username, password = password))
return(ret)
}

Resources