R: invalid multibyte string 1 (with spread) - r

I'm trying to spread two columns but R is returning
Error in make.names(x) : invalid multibyte string 1
There are plenty of questions here and elsewhere about invalid multibyte strings, but they are all for reading files into R; the issue seems to always be about encoding. Here, though, I already have my file read into R and it is only when spreading that I run into the issue.
I cannot reproduce the problem but here is my code:
df <- spread(df, Var1, Var2)

Related

Reordering columns in large data frame

Passing a long vector of characters to to reorder a data.frame gives me the errors seen below
I tried to order the columns manually by using a long string (7152 char). As a workaround, I tried to save the same string in a text file and read in the text file. None of it worked.
df<-all_reps[,c(1,2,827,828,3,4,829,830,5,6,831,832,7,8,833,834,9,10,835,836,11,12,837,838,13,14,839,840,15,16,841,842,17,18,843,844,19,20,845,846,21,22,847,848,23,24,849,850,25,26,851,852,27,28,853,854,29,30,855,856,31,32,857,858,33,34,859,860,35,36,861,862,37,38,863,864,39,40,865,866,41,42,867,868,43,44,869,870,45,46,871,872,47,48,873,874,49,50,875,876,51,52,877,878,53,54,879,880,55,56,881,882,57,58,883,884,59,60,885,886,61,62,887,888,63,64,889,890,65,66,891,892,67,68,893,894,69,70,895,896,71,72,897,898,73,74,899,900,75,76,901,902,77,78,903,904,79,80,905,906,81,82,907,908,83,84,909,910,85,86,911,912,87,88,913,914,89,90,915,916,91,92,917,918,93,94,919,920,95,96,921,922,97,98,923,924,99,100,925,926,101,102,927,928,103,104,929,930,105,106,931,932,107,108,933,934,109,110,935,936,111,112,937,938,113,114,939,940,115,116,941,942,117,118,943,944,119,120,945,946,121,122,947,948,123,124,949,950,125,126,951,952,127,128,953,954,129,130,955,956,131,132,957,958,133,134,959,960,135,136,961,962,137,138,963,964,139,140,965,966,141,142,967,968,143,144,969,970,145,146,971,972,147,148,973,974,149,150,975,976,151,152,977,978,153,154,979,980,155,156,981,982,157,158,983,984,159,160,985,986,161,162,987,988,163,164,989,990,165,166,991,992,167,168,993,994,169,170,995,996,171,172,997,998,173,174,999,1000,175,176,1001,1002,177,178,1003,1004,179,180,1005,1006,181,182,1007,1008,183,184,1009,1010,185,186,1011,1012,187,188,1013,1014,189,190,1015,1016,191,192,1017,1018,193,194,1019,1020,195,196,1021,1022,197,198,1023,1024,199,200,1025,1026,201,202,1027,1028,203,204,1029,1030,205,206,1031,1032,207,208,1033,1034,209,210,1035,1036,211,212,1037,1038,213,214,1039,1040,215,216,1041,1042,217,218,1043,1044,219,220,1045,1046,221,222,1047,1048,223,224,1049,1050,225,226,1051,1052,227,228,1053,1054,229,230,1055,1056,231,232,1057,1058,233,234,1059,1060,235,236,1061,1062,237,238,1063,1064,239,240,1065,1066,241,242,1067,1068,243,244,1069,1070,245,246,1071,1072,247,248,1073,1074,249,250,1075,1076,251,252,1077,1078,253,254,1079,1080,255,256,1081,1082,257,258,1083,1084,259,260,1085,1086,261,262,1087,1088,263,264,1089,1090,265,266,1091,1092,267,268,1093,1094,269,270,1095,1096,271,272,1097,1098,273,274,1099,1100,275,276,1101,1102,277,278,1103,1104,279,280,1105,1106,281,282,1107,1108,283,284,1109,1110,285,286,1111,1112,287,288,1113,1114,289,290,1115,1116,291,292,1117,1118,293,294,1119,1120,295,296,1121,1122,297,298,1123,1124,299,300,1125,1126,301,302,1127,1128,303,304,1129,1130,305,306,1131,1132,307,308,1133,1134,309,310,1135,1136,311,312,1137,1138,313,314,1139,1140,315,316,1141,1142,317,318,1143,1144,319,320,1145,1146,321,322,1147,1148,323,324,1149,1150,325,326,1151,1152,327,328,1153,1154,329,330,1155,1156,331,332,1157,1158,333,334,1159,1160,335,336,1161,1162,337,338,1163,1164,339,340,1165,1166,341,342,1167,1168,343,344,1169,1170,345,346,1171,1172,347,348,1173,1174,349,350,1175,1176,351,352,1177,1178,353,354,1179,1180,355,356,1181,1182,357,358,1183,1184,359,360,1185,1186,361,362,1187,1188,363,364,1189,1190,365,366,1191,1192,367,368,1193,1194,369,370,1195,1196,371,372,1197,1198,373,374,1199,1200,375,376,1201,1202,377,378,1203,1204,379,380,1205,1206,381,382,1207,1208,383,384,1209,1210,385,386,1211,1212,387,388,1213,1214,389,390,1215,1216,391,392,1217,1218,393,394,1219,1220,395,396,1221,1222,397,398,1223,1224,399,400,1225,1226,401,402,1227,1228,403,404,1229,1230,405,406,1231,1232,407,408,1233,1234,409,410,1235,1236,411,412,1237,1238,413,414,1239,1240,415,416,1241,1242,417,418,1243,1244,419,420,1245,1246,421,422,1247,1248,423,424,1249,1250,425,426,1251,1252,427,428,1253,1254,429,430,1255,1256,431,432,1257,1258,433,434,1259,1260,435,436,1261,1262,437,438,1263,1264,439,440,1265,1266,441,442,1267,1268,443,444,1269,1270,445,446,1271,1272,447,448,1273,1274,449,450,1275,1276,451,452,1277,1278,453,454,1279,1280,455,456,1281,1282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478,1303,1304,479,480,1305,1306,481,482,1307,1308,483,484,1309,1310,485,486,1311,1312,487,488,1313,1314,489,490,1315,1316,491,492,1317,1318,493,494,1319,1320,495,496,1321,1322,497,498,1323,1324,499,500,1325,1326,501,502,1327,1328,503,504,1329,1330,505,506,1331,1332,507,508,1333,1334,509,510,1335,1336,511,512,1337,1338,513,514,1339,1340,515,516,1341,1342,517,518,1343,1344,519,520,1345,1346,521,522,1347,1348,523,524,1349,1350,525,526,1351,1352,527,528,1353,1354,529,530,1355,1356,531,532,1357,1358,533,534,1359,1360,535,536,1361,1362,537,538,1363,1364,539,540,1365,1366,541,542,1367,1368,543,544,1369,1370,545,546,1371,1372,547,548,1373,1374,549,550,1375,1376,551,552,1377,1378,553,554,1379,1380,555,556,1381,1382,557,558,1383,1384,559,560,1385,1386,561,562,1387,1388,563,564,1389,1390,565,566,1391,1392,567,568,1393,1394,569,570,1395,1396,571,572,1397,1398,573,574,1399,1400,575,576,1401,1402,577,578,1403,1404,579,580,1405,1406,581,582,1407,1408,583,584,1409,1410,585,586,1411,1412,587,588,1413,1414,589,590,1415,1416,591,592,1417,1418,593,594,1419,1420,595,596,1421,1422,597,598,1423,1424,599,600,1425,1426,601,602,1427,1428,603,604,1429,1430,605,606,1431,1432,607,608,1433,1434,609,610,1435,1436,611,612,1437,1438,613,614,1439,1440,615,616,1441,1442,617,618,1443,1444,619,620,1445,1446,621,622,1447,1448,623,624,1449,1450,625,626,1451,1452,627,628,1453,1454,629,630,1455,1456,631,632,1457,1458,633,634,1459,1460,635,636,1461,1462,637,638,1463,1464,639,640,1465,1466,641,642,1467,1468,643,644,1469,1470,645,646,1471,1472,647,648,1473,1474,649,650,1475,1476,651,652,1477,1478,653,654,1479,1480,655,656,1481,1482,657,658,1483,1484,659,660,1485,1486,661,662,1487,1488,663,664,1489,1490,665,666,1491,1492,667,668,1493,1494,669,670,1495,1496,671,672,1497,1498,673,674,1499,1500,675,676,1501,1502,677,678,1503,1504,679,680,1505,1506,681,682,1507,1508,683,684,1509,1510,685,686,1511,1512,687,688,1513,1514,689,690,1515,1516,691,692,1517,1518,693,694,1519,1520,695,696,1521,1522,697,698,1523,1524,699,700,1525,1526,701,702,1527,1528,703,704,1529,1530,705,706,1531,1532,707,708,1533,1534,709,710,1535,1536,711,712,1537,1538,713,714,1539,1540,715,716,1541,1542,717,718,1543,1544,719,720,1545,1546,721,722,1547,1548,723,724,1549,1550,725,726,1551,1552,727,728,1553,1554,729,730,1555,1556,731,732,1557,1558,733,734,1559,1560,735,736,1561,1562,737,738,1563,1564,739,740,1565,1566,741,742,1567,1568,743,744,1569,1570,745,746,1571,1572,747,748,1573,1574,749,750,1575,1576,751,752,1577,1578,753,754,1579,1580,755,756,1581,1582,757,758,1583,1584,759,760,1585,1586,761,762,1587,1588,763,764,1589,1590,765,766,1591,1592,767,768,1593,1594,769,770,1595,1596,771,772,1597,1598,773,774,1599,1600,775,776,1601,1602,777,778,1603,1604,779,780,1605,1606,781,782,1607,1608,783,784,1609,1610,785,786,1611,1612,787,788,1613,1614,789,790,1615,1616,791,792,1617,1618,793,794,1619,1620,795,796,1621,1622,797,798,1623,1624,799,800,1625,1626,801,802,1627,1628,803,804,1629,1630,805,806,1631,1632,807,808,1633,1634,809,810,1635,1636,811,812,1637,1638,813,814,1639,1640,815,816,1641,1642,817,818,1643,1644,819,820,1645,1646,821,822,1647,1648,823,824,1649,1650,825,826,1651,1652)]
Error: unexpected symbol in:
"282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478,
test<-read.table('order.txt',stringsAsFactors = FALSE)
test<-as.character(test)
df<-all_reps[,c(test)]
Error in all_reps[, c(test)] : subscript out of bounds
Is the problem that the column vector consists of 7152 chars?
A better option would be to scan and use that in rearrangng the columns
test <- scan('order.txt', sep=",", quiet = TRUE)

How to convert a factor type into a numeric type in R after reading a csv file?

After reading a csv file
data<-read.table(paste0('C:/Users/data/','30092017ARB.csv'),header=TRUE, sep=";")
I get for rather all numeric variable factor as the type, specially for the last column.
I tried all suggestion here However, I get a warning for all suggestions
Warning message:
NAs introduced by coercion
Some one mentioned even in this post:
"Every answer in this post failed to generate results for me , NAs were getting generated."
any idea how can I solve this problem?
Addendum: in the following pic you can see one possible approach suggested in here
However, I get always the same NA .
The percent sign is clearly the problem. Replace the "%" by the empty string, "", and then convert to numeric.
data[[3]] <- sub("%", "", data[[3]])
data[[3]] <- as.numeric(data[[3]])
You can do this in one line of code,
data[[3]] <- as.numeric(sub("%", "", data[[3]]))
Also, two notes on reading the data in.
First, some files use the semi-colon as a column separator. This is very used in countries where the decimal point is the comma. That is why R has two functions to read files in the CSV format.
These functions are both calls to read.table with some defaults changed.
read.csv - Sets arguments header = TRUE and sep = ",".
read.csv2 - Sets arguments header = TRUE, sep = ";" and dec = ",".
For a full explanation see read.table or at an R prompt run help("read.table").
Second, you can avoid factor problems if you use argument stringsAsFactors = FALSE from the start, when reading in the data.

Error in coercing R data.frame to a nz.data.frame

One of the columns in R dataframe has "," (comma) in it and because of it, when I try to convert it into netezza data frame, it throws me below error:
Error in nzQuery(sqlCommandUpload) : HY008 51 Operation canceled
01000 1 Unable to write nzlog/bad files
01000 1 Unable to write nzlog/bad files
HY000 46 ERROR: External Table : count of bad input rows reached maxerrors limit
How can I achieve this without making any changes to data?
With a dataframe like this, everything works fine:
I get error when the dataframe is like this:
library(nzr)
library(forecast)
library (reshape2)
library(doBy)
nzDisconnect()
nzConnectDSN('DSNInfo', force=FALSE , verbose=TRUE)
#read file
test2<-read.csv("test_df.csv", stringsAsFactors = F)
# convert to nz dataframe, no error
#nzdf.test2<-as.nz.data.frame(test2)
nzdf.d<-as.nz.data.frame(d)
# copy
#test<-test2
testd<-d
#replace one of the values containing a ","
#test$Category[1]<-"a,b"
testd$Category[1]<-"Bed, Bath & Towels"
# converting to nz gives error
#nzdf.test<-as.nz.data.frame(test)
nzdf.testd<-as.nz.data.frame(testd)
#remove ","
test$Category <- gsub(",","",test$Category)
# converting to nz dataframe, gives no error
nzdf.test<-as.nz.data.frame(test)
Did you check if you have nulls (NAs) in your data? I have faced the same problem but when i checked Netezza-R documentation i found that you can not write Nulls into a Netezza tables from another system. there is a mention about using setOutputNull funciton in such cases.
So a workaround is replace nulls with the string "NULL" in your R-dataframe, this makes the numerical columns become varchar, mind you. But fortunately "NULL" becomes null in your netezza table automatically. Only extra effort is that you have to covnert the columns back to numeric later.
Hope this helps

Invalid multibyte string when writing a table in R

Example of what I'm trying to do:
columnA <- c(1:10)
columnB <- c("A","A","B","B","B","B","C","D","D","D")
df <- data.frame(columnA,columnB)
colBtable <- sort(table(df$columnB),decreasing=T)
write.table(colBtable,"colB.csv",col.names = FALSE)
This works, and does what I want it to do (ie: make a CSV file that says B 4, D 3, C 2, A 1).
However, with my (rather large) data set, I get the error:
Error in data.frame(x) : invalid multibyte string 360
There are several "invalid multibyte string" type errors on Stack Overflow, and I've tried some of the solutions. These also give errors, such as:
iconv(enc2utf8(df$columnB),sub="byte")
argumemt is not a character vector
or
tolower(df$columnB)
invalid multibyte string 1880
I suspect this is because there are special characters in my data. Any suggestions on how to resolve these errors?
Alternatively, any suggestions on other ways to export this data? I need to share it with colleagues who may not be using R (so a txt or csv file would be ideal).

Error importing SPSS data into R

I imported a dataset in the .sav SPSS format, and I'm getting an error that I haven't seen before.
1: In read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", ... :
C:\Users\acer\Desktop\X\X\PIREDEU\ees2009_v0.9_20110622.sav: File contains duplicate label for value 1.1 for variable V200
Error in cat(list(...), file, sep, fill, labels, append) :
argument 2 (type 'list') cannot be handled by 'cat'
This came up after I typed warnings(PIREDEU). I imported the data using the foreign library:
library(foreign)
PIREDEU<-read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav", use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
I've fiddled with various combinations for the latter three arguments of the read.spss function, and I've gotten nowhere.
Anyone have any suggestions?
I used the below one and it worked perfectly, just ignore the warning message and check data by typing its name:
mydata4<-read.spss("C:\\Work\\data.sav",use.value.labels=F,to.data.frame=T)
mydata4 # check data
Do you have long strings in the file - longer than 8 bytes? Statistics uses some special arrangements to handle those. It looks like the problem is with the value labels. If you can delete those (using SPSS) you might be able to get the rest of the data.
Try to read data without labels.
library(foreign)
PIREDEU <- read.spss("C:\\Users\\acer\\Desktop\\X\\X\\PIREDEU\\ees2009_v0.9_20110622.sav",
use.value.labels = F,
to.data.frame = T)
Does it work?
Convert the spss datafile into .por (portable file) and in R, install the packages hMisc, memisc and foreign and load the package using library(foreign), library(hMisc) and library(memisc).
Then type the following:
mydata <- spss.get("c:/mydata.por", use.value.labels=TRUE)
# last option converts value labels to R factors

Resources