I am learning the Apache Arrow for R and met the following issue.
My dataset has 85+ million rows what makes the utilizing of Arrow really useful.
I do the following very simple steps.
Open the existing dataset as arrow table
Sales_data <- open_dataset("Sales_Arrow", format = "csv")
Sales_data
The result is:
FileSystemDataset with 4 csv files
SBLOC: string
Cust-Loc: string
Cust-Item-Loc: string
SBPHYP: int64
SBINV: int64
Cust(Child)Entity: string
SBCUST: string
SBITEM: string
SBTYPE: string
Qty: double
SBPRIC: double
SBICST: double
Unit_Cost_Net: double
SBINDT: date32[day]
SASHIP: string
Entity: int64
ParentCustID: string
ParentCustName: string
Customer-ShipID-Loc: string
Pred_Entity_Loc: string
Cust(Child)-Entity: string
Item-Entity: string
Right after that I write the dataset to the disk as partitioned arrow data
write_dataset(Sales_data, "Sales All Partitioned", partitioning = c("Entity", "SBPHYP"))
and get the following ERROR
Error: Invalid: In CSV column #4: Row #444155: CSV conversion error to int64: invalid value '5e+06'
I checked the value in Sales_data[444155, 4]. It's absolutely the same as several previous and next rows. 201901
Please help me to understand what's going on and how to fix this issue
This seems to be related to ARROW-17241 which is caused by integers saved in scientific notation which is not recognized as int64 by the arrow csv reader.
The issue only pops up when writing the data because open_dataset is lazy so it only gets read when writing.
A workaround would be to pass a schema when opening or casting the column to float 64:
# Get the automatically inferred schema
csv_schema <- Sales_data$schema
# Change col 4 to float64()
csv_schema$SBPHYP <- float64()
# Cast to float64
Sales_data <- Sales_data$cast(target_schema = csv_schema)
You should then be able to cast back to int if you require it.
I have a excel file which i saved into #saveRDS format but now when am trying to view the same i get "no data available in the table"
countries_all2 <- saveRDS(countries_map, "as_countries_map_2019.RDS")
summary(countries_all2)
Length Class Mode
0 NULL NULL
readRDS(countries_all2)
Error in readRDS(countries_all2) : bad 'file' argument
readRDS(countries_all2.RDS)
Error in readRDS(countries_all2.RDS) :
object 'countries_all2.RDS' not found
readRDS(as_countries_map_2019.RDS)
Error in readRDS(as_countries_map_2019.RDS) :
object 'as_countries_map_2019.RDS' not found
summary(countries_all2)
Length Class Mode
0 NULL NULL
You should not assign the result of saveRDS to the variable countries_all2.
saveRDS(countries_map, "as_countries_map_2019.RDS")
Then you need to quote the file name in readRDS. Assuming you want to read into the object countries_all2:
countries_all2 <- readRDS("as_countries_map_2019.RDS")
I have a json string which is constructed by the following code:
string path1 = "C:\\Program Files (x86)\\IMAGE\\model\\net.mat";
string path2 = "C:\\Program Files (x86)\\IMAGE\\png\\Lab.png";
string path3 = "D:\\temp\\";
string[] strs={path1 ,path2 ,path3};
string json = JsonConvert.SerializeObject(strs);
Console.WriteLine(json);
List<string> paths = JsonConvert.DeserializeObject<List<string>>(json);
Console.WriteLine(paths.Count);
and there is no error when I serialize it or deserialize it.
The json string is as follows:
"[\"C:\\\\Program Files (x86)\\\\IMAGE\\\\model\\\\net.mat\",\"C:\\\\Program Files (x86)\\\\IMAGE\\\\png\\\\Lab.png\",\"D:\\\\temp\\\\\"]"
Then I pass the string JSON to a *.exe file and deserialize it.
The string the exe received is as follow,which has changed after pass:
string json="[C:\\\\Program Files (x86)\\\\IMAGE\\\\model\\\\net.mat,C:\\\\Program Files
(x86)\\\\IMAGE\\\\png\\\\Lab.png,D:\\\\temp\\]";
Then when I use the follow code to deserialize it,error appear.The code is as follow:
List<string> paths = JsonConvert.DeserializeObject<List<string>>(json);
The error is:
Unhandled JsonReaderException:Unexpected character encountered while parsing value: C. Path '', line 1, position 2.
I wonder why this is the case. Thanks.
You have to put single quotes around the paths.
string json = "[ 'C:\\Program Files (x86)\\IMAGEDL\\model\\net-e-100.mat',C:\\Program Files(x86)\\IMAGEDL\\Labelpng\\Lab.png, D:\\temp\\ ]";
I'm trying to concat 2 variables Address and Payload. After that I want to send them with http to a server but I have 2 problems. When i try to concat the 2 variables with a delimiter ';' it doesn't work. Also sending the data of Payload or Address doesn't work. This is my code:
handle_rx(Gateway, #link{devaddr=DevAddr}=Link, #rxdata{port=Port, data= RxData }, RxQ)->
Data = base64:encode(RxData),
Devaddr = base64:encode(DevAddr),
TextAddr="Device address: ",
TextPayload="Payload: ",
Address = string:concat(TextAddr, Devaddr),
Payload = string:concat(TextPayload, Data),
Json=string:join([Address,Payload], "; "),
file:write_file("/tmp/foo.txt", io_lib:fwrite("~s.\n", [Json] )),
inets:start(),
ssl:start(),
httpc:request(post, {"http://192.168.0.121/apiv1/lorapacket/rx", [], "application/x-www-form-urlencoded", Address },[],[]),
ok;
handle_rx(_Gateway, _Link, RxData, _RxQ) ->
{error, {unexpected_data, RxData}}.
I have no errors that I can show you. When I write Address or Payload individually to the file it works but sending doesn't work...
Thank you for your help!
When i try to concat the 2 variables with a delimiter ';' it doesn't work.
5> string:join(["hello", <<"world">>], ";").
[104,101,108,108,111,59|<<"world">>]
6> string:join(["hello", "world"], ";").
"hello;world"
base64:encode() returns a binary, yet string:join() requires string arguments. You can do this:
7> string:join(["hello", binary_to_list(<<"world">>)], ";").
"hello;world"
Response to comment:
In erlang the string "abc" is equivalent to the list [97,98,99]. However, the binary syntax <<"abc">> is not equivalent to <<[97,98,99]>>, rather the binary syntax <<"abc">> is special short hand notation for the binary <<97, 98, 99>>.
Therefore, if you write:
Address = [97,98,99].
then the code:
Bin = <<Address>>.
after variable substitution becomes:
Bin = <<[97,98,99]>>.
and that isn't legal binary syntax.
If you need to convert a string/list contained in a variable, like Address, to a binary, you use list_to_binary(Address)--not <<Address>>.
In your code here:
Json = string:join([binary_to_list(<<Address>>),
binary_to_list(<<Payload>>)],
";").
Address and Payload were previously assigned the return value of string:concat(), which returns a string, so there is no reason to (attempt) to convert Address to a binary with <<Address>>, then immediately convert the binary back to a string with binary_to_list(). Instead, you would just write:
Json = string:join(Address, Payload, ";")
The problem with your original code is that you called string:concat() with a string as the first argument and a binary as the second argument--yet string:concat() takes two string arguments. You can use binary_to_list() to convert a binary to the string that you need for the second argument.
Sorry I'm new to Erlang
As with any language, you have to study the basics and write numerous toy examples before you can start writing code that actually does something.
You don't have to concatenate strings. It is called iolist and is one of best things in Erlang:
1> RxData = "Hello World!", DevAddr = "Earth",
1> Data = base64:encode(RxData), Devaddr = base64:encode(DevAddr),
1> TextAddr="Device address", TextPayload="Payload",
1> Json=["{'", TextAddr, "': '", Devaddr, "', '", TextPayload, "': '", Data, "'}"].
["{'","Device address","': '",<<"RWFydGg=">>,"', '",
"Payload","': '",<<"SGVsbG8gV29ybGQh">>,"'}"]
2> file:write_file("/tmp/foo.txt", Json).
ok
3> file:read_file("/tmp/foo.txt").
{ok,<<"{'Device address': 'RWFydGg=', 'Payload': 'SGVsbG8gV29ybGQh'}">>}
i am trying to save this two objects into a csv file , this i used inside function that is why i used append write.table
dt <- "Error in .verify.JDBC.result(r, \"Unable to retrieve JDBC result set for \", : \n Unable to retrieve JDBC result set for SELECT * FROM rAXA (ORA-00942: table or view does not exist\n)\n"
er <- "error"
er_file <- cbind(er,dt)
write.table(er_file, file = "E:\\Hama_Hex\\Project\\Predictive\\log1.csv",sep=",",
col.names = FALSE, append=TRUE)
but when i execute the above script its not properly saving into csv file, the special character \n in dt object is creating problem , it making to move new line . i want to store entire line in one cell. Here i am saving in csv format, value has some comma so it got splits and enter into new cell.
You just need to escape the newline characters:
write.table(gsub("\\n", "\\\\n", er_file),
file = "E:\\Hama_Hex\\Project\\Predictive\\log1.csv",sep=",",
col.names = FALSE, append=TRUE)
This will replace every \n with \\n. Note in the above code that you have to escape the \ characters in order to create a usable regular expression for gsub.