Reordering columns in large data frame - r

Passing a long vector of characters to to reorder a data.frame gives me the errors seen below
I tried to order the columns manually by using a long string (7152 char). As a workaround, I tried to save the same string in a text file and read in the text file. None of it worked.
df<-all_reps[,c(1,2,827,828,3,4,829,830,5,6,831,832,7,8,833,834,9,10,835,836,11,12,837,838,13,14,839,840,15,16,841,842,17,18,843,844,19,20,845,846,21,22,847,848,23,24,849,850,25,26,851,852,27,28,853,854,29,30,855,856,31,32,857,858,33,34,859,860,35,36,861,862,37,38,863,864,39,40,865,866,41,42,867,868,43,44,869,870,45,46,871,872,47,48,873,874,49,50,875,876,51,52,877,878,53,54,879,880,55,56,881,882,57,58,883,884,59,60,885,886,61,62,887,888,63,64,889,890,65,66,891,892,67,68,893,894,69,70,895,896,71,72,897,898,73,74,899,900,75,76,901,902,77,78,903,904,79,80,905,906,81,82,907,908,83,84,909,910,85,86,911,912,87,88,913,914,89,90,915,916,91,92,917,918,93,94,919,920,95,96,921,922,97,98,923,924,99,100,925,926,101,102,927,928,103,104,929,930,105,106,931,932,107,108,933,934,109,110,935,936,111,112,937,938,113,114,939,940,115,116,941,942,117,118,943,944,119,120,945,946,121,122,947,948,123,124,949,950,125,126,951,952,127,128,953,954,129,130,955,956,131,132,957,958,133,134,959,960,135,136,961,962,137,138,963,964,139,140,965,966,141,142,967,968,143,144,969,970,145,146,971,972,147,148,973,974,149,150,975,976,151,152,977,978,153,154,979,980,155,156,981,982,157,158,983,984,159,160,985,986,161,162,987,988,163,164,989,990,165,166,991,992,167,168,993,994,169,170,995,996,171,172,997,998,173,174,999,1000,175,176,1001,1002,177,178,1003,1004,179,180,1005,1006,181,182,1007,1008,183,184,1009,1010,185,186,1011,1012,187,188,1013,1014,189,190,1015,1016,191,192,1017,1018,193,194,1019,1020,195,196,1021,1022,197,198,1023,1024,199,200,1025,1026,201,202,1027,1028,203,204,1029,1030,205,206,1031,1032,207,208,1033,1034,209,210,1035,1036,211,212,1037,1038,213,214,1039,1040,215,216,1041,1042,217,218,1043,1044,219,220,1045,1046,221,222,1047,1048,223,224,1049,1050,225,226,1051,1052,227,228,1053,1054,229,230,1055,1056,231,232,1057,1058,233,234,1059,1060,235,236,1061,1062,237,238,1063,1064,239,240,1065,1066,241,242,1067,1068,243,244,1069,1070,245,246,1071,1072,247,248,1073,1074,249,250,1075,1076,251,252,1077,1078,253,254,1079,1080,255,256,1081,1082,257,258,1083,1084,259,260,1085,1086,261,262,1087,1088,263,264,1089,1090,265,266,1091,1092,267,268,1093,1094,269,270,1095,1096,271,272,1097,1098,273,274,1099,1100,275,276,1101,1102,277,278,1103,1104,279,280,1105,1106,281,282,1107,1108,283,284,1109,1110,285,286,1111,1112,287,288,1113,1114,289,290,1115,1116,291,292,1117,1118,293,294,1119,1120,295,296,1121,1122,297,298,1123,1124,299,300,1125,1126,301,302,1127,1128,303,304,1129,1130,305,306,1131,1132,307,308,1133,1134,309,310,1135,1136,311,312,1137,1138,313,314,1139,1140,315,316,1141,1142,317,318,1143,1144,319,320,1145,1146,321,322,1147,1148,323,324,1149,1150,325,326,1151,1152,327,328,1153,1154,329,330,1155,1156,331,332,1157,1158,333,334,1159,1160,335,336,1161,1162,337,338,1163,1164,339,340,1165,1166,341,342,1167,1168,343,344,1169,1170,345,346,1171,1172,347,348,1173,1174,349,350,1175,1176,351,352,1177,1178,353,354,1179,1180,355,356,1181,1182,357,358,1183,1184,359,360,1185,1186,361,362,1187,1188,363,364,1189,1190,365,366,1191,1192,367,368,1193,1194,369,370,1195,1196,371,372,1197,1198,373,374,1199,1200,375,376,1201,1202,377,378,1203,1204,379,380,1205,1206,381,382,1207,1208,383,384,1209,1210,385,386,1211,1212,387,388,1213,1214,389,390,1215,1216,391,392,1217,1218,393,394,1219,1220,395,396,1221,1222,397,398,1223,1224,399,400,1225,1226,401,402,1227,1228,403,404,1229,1230,405,406,1231,1232,407,408,1233,1234,409,410,1235,1236,411,412,1237,1238,413,414,1239,1240,415,416,1241,1242,417,418,1243,1244,419,420,1245,1246,421,422,1247,1248,423,424,1249,1250,425,426,1251,1252,427,428,1253,1254,429,430,1255,1256,431,432,1257,1258,433,434,1259,1260,435,436,1261,1262,437,438,1263,1264,439,440,1265,1266,441,442,1267,1268,443,444,1269,1270,445,446,1271,1272,447,448,1273,1274,449,450,1275,1276,451,452,1277,1278,453,454,1279,1280,455,456,1281,1282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478,1303,1304,479,480,1305,1306,481,482,1307,1308,483,484,1309,1310,485,486,1311,1312,487,488,1313,1314,489,490,1315,1316,491,492,1317,1318,493,494,1319,1320,495,496,1321,1322,497,498,1323,1324,499,500,1325,1326,501,502,1327,1328,503,504,1329,1330,505,506,1331,1332,507,508,1333,1334,509,510,1335,1336,511,512,1337,1338,513,514,1339,1340,515,516,1341,1342,517,518,1343,1344,519,520,1345,1346,521,522,1347,1348,523,524,1349,1350,525,526,1351,1352,527,528,1353,1354,529,530,1355,1356,531,532,1357,1358,533,534,1359,1360,535,536,1361,1362,537,538,1363,1364,539,540,1365,1366,541,542,1367,1368,543,544,1369,1370,545,546,1371,1372,547,548,1373,1374,549,550,1375,1376,551,552,1377,1378,553,554,1379,1380,555,556,1381,1382,557,558,1383,1384,559,560,1385,1386,561,562,1387,1388,563,564,1389,1390,565,566,1391,1392,567,568,1393,1394,569,570,1395,1396,571,572,1397,1398,573,574,1399,1400,575,576,1401,1402,577,578,1403,1404,579,580,1405,1406,581,582,1407,1408,583,584,1409,1410,585,586,1411,1412,587,588,1413,1414,589,590,1415,1416,591,592,1417,1418,593,594,1419,1420,595,596,1421,1422,597,598,1423,1424,599,600,1425,1426,601,602,1427,1428,603,604,1429,1430,605,606,1431,1432,607,608,1433,1434,609,610,1435,1436,611,612,1437,1438,613,614,1439,1440,615,616,1441,1442,617,618,1443,1444,619,620,1445,1446,621,622,1447,1448,623,624,1449,1450,625,626,1451,1452,627,628,1453,1454,629,630,1455,1456,631,632,1457,1458,633,634,1459,1460,635,636,1461,1462,637,638,1463,1464,639,640,1465,1466,641,642,1467,1468,643,644,1469,1470,645,646,1471,1472,647,648,1473,1474,649,650,1475,1476,651,652,1477,1478,653,654,1479,1480,655,656,1481,1482,657,658,1483,1484,659,660,1485,1486,661,662,1487,1488,663,664,1489,1490,665,666,1491,1492,667,668,1493,1494,669,670,1495,1496,671,672,1497,1498,673,674,1499,1500,675,676,1501,1502,677,678,1503,1504,679,680,1505,1506,681,682,1507,1508,683,684,1509,1510,685,686,1511,1512,687,688,1513,1514,689,690,1515,1516,691,692,1517,1518,693,694,1519,1520,695,696,1521,1522,697,698,1523,1524,699,700,1525,1526,701,702,1527,1528,703,704,1529,1530,705,706,1531,1532,707,708,1533,1534,709,710,1535,1536,711,712,1537,1538,713,714,1539,1540,715,716,1541,1542,717,718,1543,1544,719,720,1545,1546,721,722,1547,1548,723,724,1549,1550,725,726,1551,1552,727,728,1553,1554,729,730,1555,1556,731,732,1557,1558,733,734,1559,1560,735,736,1561,1562,737,738,1563,1564,739,740,1565,1566,741,742,1567,1568,743,744,1569,1570,745,746,1571,1572,747,748,1573,1574,749,750,1575,1576,751,752,1577,1578,753,754,1579,1580,755,756,1581,1582,757,758,1583,1584,759,760,1585,1586,761,762,1587,1588,763,764,1589,1590,765,766,1591,1592,767,768,1593,1594,769,770,1595,1596,771,772,1597,1598,773,774,1599,1600,775,776,1601,1602,777,778,1603,1604,779,780,1605,1606,781,782,1607,1608,783,784,1609,1610,785,786,1611,1612,787,788,1613,1614,789,790,1615,1616,791,792,1617,1618,793,794,1619,1620,795,796,1621,1622,797,798,1623,1624,799,800,1625,1626,801,802,1627,1628,803,804,1629,1630,805,806,1631,1632,807,808,1633,1634,809,810,1635,1636,811,812,1637,1638,813,814,1639,1640,815,816,1641,1642,817,818,1643,1644,819,820,1645,1646,821,822,1647,1648,823,824,1649,1650,825,826,1651,1652)]
Error: unexpected symbol in:
"282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478,
test<-read.table('order.txt',stringsAsFactors = FALSE)
test<-as.character(test)
df<-all_reps[,c(test)]
Error in all_reps[, c(test)] : subscript out of bounds
Is the problem that the column vector consists of 7152 chars?

A better option would be to scan and use that in rearrangng the columns
test <- scan('order.txt', sep=",", quiet = TRUE)

Related

CSV imported data table is not possible to use for histogram plot

I have created my own data set named as Kwality.csv in Excel and when I am executing above code I am not able to get histogram for the same data and it's throwing me error like this:
Error in hist.default(mydata) : 'x' must be numeric
library(data.table)
mydata = fread("Kwality.csv", header = FALSE)
View(mydata)
hist(mydata)
I tried to reproduce you work flow and exported xlsx-file into csv-file (using export to comma-separated file).
First, you should check what kind of character is used for variable and decimal places separation. In my case, for variable separation it is the ; semicolon, and the decimal places is "," comma.
Then you should choose the column, which you will use for the histogramm plot with the function[[]]. The data table itself is not a valid argument for hist function. Please see as below.
See below:
Taken this into consideration you cod execute your code:
library(data.table)
# load csv generatd by NORMSINV(RAND()) in Excel
mydata = fread("check.csv",header = FALSE, sep = ";", dec = ",")
mydata
#hist(mydata)
# Error in hist.default(mydata) : 'x' should be numeric
# does not work
# access by column, e.g. third colum - OK
hist(mydata[[3]])
Output:

How to convert a factor type into a numeric type in R after reading a csv file?

After reading a csv file
data<-read.table(paste0('C:/Users/data/','30092017ARB.csv'),header=TRUE, sep=";")
I get for rather all numeric variable factor as the type, specially for the last column.
I tried all suggestion here However, I get a warning for all suggestions
Warning message:
NAs introduced by coercion
Some one mentioned even in this post:
"Every answer in this post failed to generate results for me , NAs were getting generated."
any idea how can I solve this problem?
Addendum: in the following pic you can see one possible approach suggested in here
However, I get always the same NA .
The percent sign is clearly the problem. Replace the "%" by the empty string, "", and then convert to numeric.
data[[3]] <- sub("%", "", data[[3]])
data[[3]] <- as.numeric(data[[3]])
You can do this in one line of code,
data[[3]] <- as.numeric(sub("%", "", data[[3]]))
Also, two notes on reading the data in.
First, some files use the semi-colon as a column separator. This is very used in countries where the decimal point is the comma. That is why R has two functions to read files in the CSV format.
These functions are both calls to read.table with some defaults changed.
read.csv - Sets arguments header = TRUE and sep = ",".
read.csv2 - Sets arguments header = TRUE, sep = ";" and dec = ",".
For a full explanation see read.table or at an R prompt run help("read.table").
Second, you can avoid factor problems if you use argument stringsAsFactors = FALSE from the start, when reading in the data.

Reading data into R

I am trying to read data from the msigdb database into my R environment, but I am having trouble reading it into the format that I would like. Right now when I read the data in it is read as the type "integer", I want it read in as the type "character" or any other type so that when I transfer data between data frames/matrices I dont get the integer value for the item instead of the written letters that comprise the name of the item.
df<-read.table("msigdb.v5.2.symbols.txt", fill = TRUE)
This is what I currently have, but like I said when I do typeof(df[1,1]) I get "integer".
To summarize:
After reading in data with columns that should be character, the current behavior is: typeof(df[1,1)] produces "integer". The desired behavior is: typeof(df[1,1]] produces "character"
Reproducible example:
library(dplyr)
write.table(band_instruments, "test.txt")
df <- read.table("test.txt", header = TRUE)
typeof(df[1,1])
# [1] "integer"
Thank you!
df<-read.table("msigdb.v5.2.symbols.txt", fill = TRUE, stringsAsFactors = FALSE)
By default, read.table reads all columns as character unless specified otherwise in colClasses*, and read.table and data.frame convert characters to factors. When you extract a single cell of a factor, it's going to show as the internal integer code.
Setting stringsAsFactors = FALSE in the call to read.table resolves this.
*despite the comment below, this is true. read.table reads all columns as character first, then converts them. This is in the documentation, and you can see it from the source code. You can confirm this with the following code:
write.table(mtcars, "mtcars.txt")
read.table("mtcars.txt", header = TRUE, quote = ".")
# Fails because it reads the decimals in the numeric data as quotes
# From the documentation: Quoting is only considered for columns read
# as character, which is all of them unless colClasses is specified

write.table unintendedly adds subscript x to header

I have got a comma delimited csv document with predefined headers and a few rows. I just want to exchange the comma delimiter to a pipe delimiter. So my naive approach is:
myData <- read.csv(file="C:/test.CSV", header=TRUE, sep=",", check.names = FALSE)
Viewing myData gives me results without X subscripts in header columns. If I set check.names = TRUE, the column headers have a X subscript.
Now I am trying to write a new csv with pipe-delimiter.
write.table(MyData1, file = "C:/test_pipe.CSV",row.names=FALSE, na="",col.names=TRUE, sep="|")
In the next step I am going to test my results:
mydata.test <- read.csv(file="C:/test_pipe.CSV", header=TRUE, sep="|")
Import seems fine, but unfortunately the X subscript in column headers appear again. Now my question is:
Is there something wrong with the original file or is there an error in my naive approach?
The original csv test.csv was created with Excel, of course without X subscripts in column headers.
Thanks in advance
You have to keep using check.names = FALSE, also the second time.
Else your header will be modified, because apparently it contains variable names that would not be considered valid names of columns of a data.frame. E.g., special characters would be replaced by dots, i.e. . Similarly, numbers would be pre-fixed with X.

R Programming - String getting converted to numbers

I have a text file with 100 names that I am trying to concatenate to create a large single string using the following code. However, my output is showing me a number for each name instead of the actual names itself.
It seems that when these names are being converted into character using the paste function, they are being converted into numbers. Any help will be greatly appreciated.
input=(read.csv("names.txt"))
final_output1 = paste(input, collapse = '')
The read.csv() function has a "stringAsFactors" argument that you can set to FLASE.
input <- read.csv("names.txt", stringsAsFactors=FALSE)

Resources