Invalid multibyte string when writing a table in R - r

Example of what I'm trying to do:
columnA <- c(1:10)
columnB <- c("A","A","B","B","B","B","C","D","D","D")
df <- data.frame(columnA,columnB)
colBtable <- sort(table(df$columnB),decreasing=T)
write.table(colBtable,"colB.csv",col.names = FALSE)
This works, and does what I want it to do (ie: make a CSV file that says B 4, D 3, C 2, A 1).
However, with my (rather large) data set, I get the error:
Error in data.frame(x) : invalid multibyte string 360
There are several "invalid multibyte string" type errors on Stack Overflow, and I've tried some of the solutions. These also give errors, such as:
iconv(enc2utf8(df$columnB),sub="byte")
argumemt is not a character vector
or
tolower(df$columnB)
invalid multibyte string 1880
I suspect this is because there are special characters in my data. Any suggestions on how to resolve these errors?
Alternatively, any suggestions on other ways to export this data? I need to share it with colleagues who may not be using R (so a txt or csv file would be ideal).

Related

Reordering columns in large data frame

Passing a long vector of characters to to reorder a data.frame gives me the errors seen below
I tried to order the columns manually by using a long string (7152 char). As a workaround, I tried to save the same string in a text file and read in the text file. None of it worked.
df<-all_reps[,c(1,2,827,828,3,4,829,830,5,6,831,832,7,8,833,834,9,10,835,836,11,12,837,838,13,14,839,840,15,16,841,842,17,18,843,844,19,20,845,846,21,22,847,848,23,24,849,850,25,26,851,852,27,28,853,854,29,30,855,856,31,32,857,858,33,34,859,860,35,36,861,862,37,38,863,864,39,40,865,866,41,42,867,868,43,44,869,870,45,46,871,872,47,48,873,874,49,50,875,876,51,52,877,878,53,54,879,880,55,56,881,882,57,58,883,884,59,60,885,886,61,62,887,888,63,64,889,890,65,66,891,892,67,68,893,894,69,70,895,896,71,72,897,898,73,74,899,900,75,76,901,902,77,78,903,904,79,80,905,906,81,82,907,908,83,84,909,910,85,86,911,912,87,88,913,914,89,90,915,916,91,92,917,918,93,94,919,920,95,96,921,922,97,98,923,924,99,100,925,926,101,102,927,928,103,104,929,930,105,106,931,932,107,108,933,934,109,110,935,936,111,112,937,938,113,114,939,940,115,116,941,942,117,118,943,944,119,120,945,946,121,122,947,948,123,124,949,950,125,126,951,952,127,128,953,954,129,130,955,956,131,132,957,958,133,134,959,960,135,136,961,962,137,138,963,964,139,140,965,966,141,142,967,968,143,144,969,970,145,146,971,972,147,148,973,974,149,150,975,976,151,152,977,978,153,154,979,980,155,156,981,982,157,158,983,984,159,160,985,986,161,162,987,988,163,164,989,990,165,166,991,992,167,168,993,994,169,170,995,996,171,172,997,998,173,174,999,1000,175,176,1001,1002,177,178,1003,1004,179,180,1005,1006,181,182,1007,1008,183,184,1009,1010,185,186,1011,1012,187,188,1013,1014,189,190,1015,1016,191,192,1017,1018,193,194,1019,1020,195,196,1021,1022,197,198,1023,1024,199,200,1025,1026,201,202,1027,1028,203,204,1029,1030,205,206,1031,1032,207,208,1033,1034,209,210,1035,1036,211,212,1037,1038,213,214,1039,1040,215,216,1041,1042,217,218,1043,1044,219,220,1045,1046,221,222,1047,1048,223,224,1049,1050,225,226,1051,1052,227,228,1053,1054,229,230,1055,1056,231,232,1057,1058,233,234,1059,1060,235,236,1061,1062,237,238,1063,1064,239,240,1065,1066,241,242,1067,1068,243,244,1069,1070,245,246,1071,1072,247,248,1073,1074,249,250,1075,1076,251,252,1077,1078,253,254,1079,1080,255,256,1081,1082,257,258,1083,1084,259,260,1085,1086,261,262,1087,1088,263,264,1089,1090,265,266,1091,1092,267,268,1093,1094,269,270,1095,1096,271,272,1097,1098,273,274,1099,1100,275,276,1101,1102,277,278,1103,1104,279,280,1105,1106,281,282,1107,1108,283,284,1109,1110,285,286,1111,1112,287,288,1113,1114,289,290,1115,1116,291,292,1117,1118,293,294,1119,1120,295,296,1121,1122,297,298,1123,1124,299,300,1125,1126,301,302,1127,1128,303,304,1129,1130,305,306,1131,1132,307,308,1133,1134,309,310,1135,1136,311,312,1137,1138,313,314,1139,1140,315,316,1141,1142,317,318,1143,1144,319,320,1145,1146,321,322,1147,1148,323,324,1149,1150,325,326,1151,1152,327,328,1153,1154,329,330,1155,1156,331,332,1157,1158,333,334,1159,1160,335,336,1161,1162,337,338,1163,1164,339,340,1165,1166,341,342,1167,1168,343,344,1169,1170,345,346,1171,1172,347,348,1173,1174,349,350,1175,1176,351,352,1177,1178,353,354,1179,1180,355,356,1181,1182,357,358,1183,1184,359,360,1185,1186,361,362,1187,1188,363,364,1189,1190,365,366,1191,1192,367,368,1193,1194,369,370,1195,1196,371,372,1197,1198,373,374,1199,1200,375,376,1201,1202,377,378,1203,1204,379,380,1205,1206,381,382,1207,1208,383,384,1209,1210,385,386,1211,1212,387,388,1213,1214,389,390,1215,1216,391,392,1217,1218,393,394,1219,1220,395,396,1221,1222,397,398,1223,1224,399,400,1225,1226,401,402,1227,1228,403,404,1229,1230,405,406,1231,1232,407,408,1233,1234,409,410,1235,1236,411,412,1237,1238,413,414,1239,1240,415,416,1241,1242,417,418,1243,1244,419,420,1245,1246,421,422,1247,1248,423,424,1249,1250,425,426,1251,1252,427,428,1253,1254,429,430,1255,1256,431,432,1257,1258,433,434,1259,1260,435,436,1261,1262,437,438,1263,1264,439,440,1265,1266,441,442,1267,1268,443,444,1269,1270,445,446,1271,1272,447,448,1273,1274,449,450,1275,1276,451,452,1277,1278,453,454,1279,1280,455,456,1281,1282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478,1303,1304,479,480,1305,1306,481,482,1307,1308,483,484,1309,1310,485,486,1311,1312,487,488,1313,1314,489,490,1315,1316,491,492,1317,1318,493,494,1319,1320,495,496,1321,1322,497,498,1323,1324,499,500,1325,1326,501,502,1327,1328,503,504,1329,1330,505,506,1331,1332,507,508,1333,1334,509,510,1335,1336,511,512,1337,1338,513,514,1339,1340,515,516,1341,1342,517,518,1343,1344,519,520,1345,1346,521,522,1347,1348,523,524,1349,1350,525,526,1351,1352,527,528,1353,1354,529,530,1355,1356,531,532,1357,1358,533,534,1359,1360,535,536,1361,1362,537,538,1363,1364,539,540,1365,1366,541,542,1367,1368,543,544,1369,1370,545,546,1371,1372,547,548,1373,1374,549,550,1375,1376,551,552,1377,1378,553,554,1379,1380,555,556,1381,1382,557,558,1383,1384,559,560,1385,1386,561,562,1387,1388,563,564,1389,1390,565,566,1391,1392,567,568,1393,1394,569,570,1395,1396,571,572,1397,1398,573,574,1399,1400,575,576,1401,1402,577,578,1403,1404,579,580,1405,1406,581,582,1407,1408,583,584,1409,1410,585,586,1411,1412,587,588,1413,1414,589,590,1415,1416,591,592,1417,1418,593,594,1419,1420,595,596,1421,1422,597,598,1423,1424,599,600,1425,1426,601,602,1427,1428,603,604,1429,1430,605,606,1431,1432,607,608,1433,1434,609,610,1435,1436,611,612,1437,1438,613,614,1439,1440,615,616,1441,1442,617,618,1443,1444,619,620,1445,1446,621,622,1447,1448,623,624,1449,1450,625,626,1451,1452,627,628,1453,1454,629,630,1455,1456,631,632,1457,1458,633,634,1459,1460,635,636,1461,1462,637,638,1463,1464,639,640,1465,1466,641,642,1467,1468,643,644,1469,1470,645,646,1471,1472,647,648,1473,1474,649,650,1475,1476,651,652,1477,1478,653,654,1479,1480,655,656,1481,1482,657,658,1483,1484,659,660,1485,1486,661,662,1487,1488,663,664,1489,1490,665,666,1491,1492,667,668,1493,1494,669,670,1495,1496,671,672,1497,1498,673,674,1499,1500,675,676,1501,1502,677,678,1503,1504,679,680,1505,1506,681,682,1507,1508,683,684,1509,1510,685,686,1511,1512,687,688,1513,1514,689,690,1515,1516,691,692,1517,1518,693,694,1519,1520,695,696,1521,1522,697,698,1523,1524,699,700,1525,1526,701,702,1527,1528,703,704,1529,1530,705,706,1531,1532,707,708,1533,1534,709,710,1535,1536,711,712,1537,1538,713,714,1539,1540,715,716,1541,1542,717,718,1543,1544,719,720,1545,1546,721,722,1547,1548,723,724,1549,1550,725,726,1551,1552,727,728,1553,1554,729,730,1555,1556,731,732,1557,1558,733,734,1559,1560,735,736,1561,1562,737,738,1563,1564,739,740,1565,1566,741,742,1567,1568,743,744,1569,1570,745,746,1571,1572,747,748,1573,1574,749,750,1575,1576,751,752,1577,1578,753,754,1579,1580,755,756,1581,1582,757,758,1583,1584,759,760,1585,1586,761,762,1587,1588,763,764,1589,1590,765,766,1591,1592,767,768,1593,1594,769,770,1595,1596,771,772,1597,1598,773,774,1599,1600,775,776,1601,1602,777,778,1603,1604,779,780,1605,1606,781,782,1607,1608,783,784,1609,1610,785,786,1611,1612,787,788,1613,1614,789,790,1615,1616,791,792,1617,1618,793,794,1619,1620,795,796,1621,1622,797,798,1623,1624,799,800,1625,1626,801,802,1627,1628,803,804,1629,1630,805,806,1631,1632,807,808,1633,1634,809,810,1635,1636,811,812,1637,1638,813,814,1639,1640,815,816,1641,1642,817,818,1643,1644,819,820,1645,1646,821,822,1647,1648,823,824,1649,1650,825,826,1651,1652)]
Error: unexpected symbol in:
"282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478,
test<-read.table('order.txt',stringsAsFactors = FALSE)
test<-as.character(test)
df<-all_reps[,c(test)]
Error in all_reps[, c(test)] : subscript out of bounds
Is the problem that the column vector consists of 7152 chars?
A better option would be to scan and use that in rearrangng the columns
test <- scan('order.txt', sep=",", quiet = TRUE)

R: invalid multibyte string 1 (with spread)

I'm trying to spread two columns but R is returning
Error in make.names(x) : invalid multibyte string 1
There are plenty of questions here and elsewhere about invalid multibyte strings, but they are all for reading files into R; the issue seems to always be about encoding. Here, though, I already have my file read into R and it is only when spreading that I run into the issue.
I cannot reproduce the problem but here is my code:
df <- spread(df, Var1, Var2)

Error in coercing R data.frame to a nz.data.frame

One of the columns in R dataframe has "," (comma) in it and because of it, when I try to convert it into netezza data frame, it throws me below error:
Error in nzQuery(sqlCommandUpload) : HY008 51 Operation canceled
01000 1 Unable to write nzlog/bad files
01000 1 Unable to write nzlog/bad files
HY000 46 ERROR: External Table : count of bad input rows reached maxerrors limit
How can I achieve this without making any changes to data?
With a dataframe like this, everything works fine:
I get error when the dataframe is like this:
library(nzr)
library(forecast)
library (reshape2)
library(doBy)
nzDisconnect()
nzConnectDSN('DSNInfo', force=FALSE , verbose=TRUE)
#read file
test2<-read.csv("test_df.csv", stringsAsFactors = F)
# convert to nz dataframe, no error
#nzdf.test2<-as.nz.data.frame(test2)
nzdf.d<-as.nz.data.frame(d)
# copy
#test<-test2
testd<-d
#replace one of the values containing a ","
#test$Category[1]<-"a,b"
testd$Category[1]<-"Bed, Bath & Towels"
# converting to nz gives error
#nzdf.test<-as.nz.data.frame(test)
nzdf.testd<-as.nz.data.frame(testd)
#remove ","
test$Category <- gsub(",","",test$Category)
# converting to nz dataframe, gives no error
nzdf.test<-as.nz.data.frame(test)
Did you check if you have nulls (NAs) in your data? I have faced the same problem but when i checked Netezza-R documentation i found that you can not write Nulls into a Netezza tables from another system. there is a mention about using setOutputNull funciton in such cases.
So a workaround is replace nulls with the string "NULL" in your R-dataframe, this makes the numerical columns become varchar, mind you. But fortunately "NULL" becomes null in your netezza table automatically. Only extra effort is that you have to covnert the columns back to numeric later.
Hope this helps

R convert exponent (read by R as string) into simple number

I read a CSV file into R with the following command:
myfile <- read.csv('C:/Users/myfilepath.csv', sep=',', header = F)
With this I get a nice data frame looking a little like this:
year / Variable1 / Variable2 / etc.
1958 / 1.42547014192473E-08 / 3.06399766669684E-10 / etc.
1959 / 2.05022315791225E-09 / 8.80152568089836E-08 / etc.
1960 / etc. .... ....
However, R seems to treat the letter E for exponents as string. So I need to convert these first into a simple number before I can analyze the data. The data set has 50 rows and 12 columns.
I tried as.numeric but get the error message
Error: (list) object cannot be coerced to type 'double'
Any ideas?
You can format the DF using:
format(myfile,scientific=FALSE)
You can use "options("scipen"=100)" before you read the file.
If you see there is tailing zeros, then I will suggest tou to check the csv file before import.
The answers by Soto and Alistair work if the cells in the csv that is imported are formatted as 'scientific'. Otherwise it doesn't. Thanks guys!
Code used:
mydata<- read.csv('C:/Users/mydata.csv', sep=',', na.strings=c("", "NA"), header = F)
mydata <- sapply(mydata, as.numeric)

Error while trying to read .data file in R

I am trying to read car.data file at this location - https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data using read.table as below. Tried various solutions listed earlier, but did not work. I am using Windows 8, R version 3.2.3. I can save this file as txt file and then read, but not able to read the .data file directly from URL or even after saving using read.table
t <- read.table(
"https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data",
fileEncoding="UTF-16",
sep = ",",
header=F
)
Here is the error I am getting and is resulting in an empty dataframe with single cell with "?" in it:
Warning messages:
1: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", : invalid input found on input connection 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
2: In read.table("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data", :
incomplete final line found by readTableHeader on 'https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data'
Please help!
Don't use read.table when the data is not stored in a table. Data at that link is clearly presented in comma-separated format. Use the RCurl package instead and read the data as CSV:
library(RCurl)
x <- getURL("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")
y <- read.csv(text = x)
Now y contains your data.
Thanks to cory, here is the solution - just use read.csv directly:
x <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data")

Resources