R Programming - String getting converted to numbers - r
I have a text file with 100 names that I am trying to concatenate to create a large single string using the following code. However, my output is showing me a number for each name instead of the actual names itself.
It seems that when these names are being converted into character using the paste function, they are being converted into numbers. Any help will be greatly appreciated.
input=(read.csv("names.txt"))
final_output1 = paste(input, collapse = '')
The read.csv() function has a "stringAsFactors" argument that you can set to FLASE.
input <- read.csv("names.txt", stringsAsFactors=FALSE)
Related
Outputting an R dataframe to a .txt file - Align positive and negative values
I am trying to output a dataframe in R to a .txt file. I want the .txt file to ultimately mirror the dataframe output, with columns and rows all aligned. I found this post on SO which mostly gave me the desired output with the following (now modified) code: gene_names_only <- select(deseq2_hits_table_df, Gene, L2F) colnames(gene_names_only) <- c() capture.output( print.data.frame(gene_names_only, row.names=F, col.names=F, print.gap=0, quote=F, right=F), file="all_samples_comparison_gene_list.txt" ) The resultant output, however, does not align negative and positive values. See: I ultimately want both positive and negative values to be properly aligned with one another. This means that -0.00012 and 4.00046 would have the '-' character from the prior number aligned with the '4' of the next character. How could I accomplish this? Two other questions: The output file has a blank line at the beginning of the output. How can I change this? The output file also seems to put far more spaces between the left column and the right column than I would want. Is there any way I can change this?
Maybe try a finer scale treatment of the printing using sprintf and a different format string for positive and negative numbers, e.g.: > df = data.frame(x=c('PICALM','Luc','SEC22B'),y=c(-2.261085123,-2.235376098,2.227728912)) > sprintf('%15-s%.6f',df$x[1],df$y[1]) [1] "PICALM -2.261085" > sprintf('%15-s%.6f',df$x[2],df$y[2]) [1] "Luc -2.235376" > sprintf('%15-s%.7f',df$x[3],df$y[3]) [1] "SEC22B 2.2277289" EDIT: I don't think that write.table or similar functions accept custom format strings, so one option could be to create a data frame of formatted strings and the use write.table or writeLines to write to a file, e.g. dfstr = data.frame(x=sprintf('%15-s', df$x), y=sprintf(paste0('%.', 7-1*(df$y<0),'f'), df$y)) (The format string for y here is essentially what I previously proposed.) Next, write dfstr directly: write.table(x=dfstr,file='filename.txt', quote=F,row.names=F,col.names=F)
Reordering columns in large data frame
Passing a long vector of characters to to reorder a data.frame gives me the errors seen below I tried to order the columns manually by using a long string (7152 char). As a workaround, I tried to save the same string in a text file and read in the text file. None of it worked. df<-all_reps[,c(1,2,827,828,3,4,829,830,5,6,831,832,7,8,833,834,9,10,835,836,11,12,837,838,13,14,839,840,15,16,841,842,17,18,843,844,19,20,845,846,21,22,847,848,23,24,849,850,25,26,851,852,27,28,853,854,29,30,855,856,31,32,857,858,33,34,859,860,35,36,861,862,37,38,863,864,39,40,865,866,41,42,867,868,43,44,869,870,45,46,871,872,47,48,873,874,49,50,875,876,51,52,877,878,53,54,879,880,55,56,881,882,57,58,883,884,59,60,885,886,61,62,887,888,63,64,889,890,65,66,891,892,67,68,893,894,69,70,895,896,71,72,897,898,73,74,899,900,75,76,901,902,77,78,903,904,79,80,905,906,81,82,907,908,83,84,909,910,85,86,911,912,87,88,913,914,89,90,915,916,91,92,917,918,93,94,919,920,95,96,921,922,97,98,923,924,99,100,925,926,101,102,927,928,103,104,929,930,105,106,931,932,107,108,933,934,109,110,935,936,111,112,937,938,113,114,939,940,115,116,941,942,117,118,943,944,119,120,945,946,121,122,947,948,123,124,949,950,125,126,951,952,127,128,953,954,129,130,955,956,131,132,957,958,133,134,959,960,135,136,961,962,137,138,963,964,139,140,965,966,141,142,967,968,143,144,969,970,145,146,971,972,147,148,973,974,149,150,975,976,151,152,977,978,153,154,979,980,155,156,981,982,157,158,983,984,159,160,985,986,161,162,987,988,163,164,989,990,165,166,991,992,167,168,993,994,169,170,995,996,171,172,997,998,173,174,999,1000,175,176,1001,1002,177,178,1003,1004,179,180,1005,1006,181,182,1007,1008,183,184,1009,1010,185,186,1011,1012,187,188,1013,1014,189,190,1015,1016,191,192,1017,1018,193,194,1019,1020,195,196,1021,1022,197,198,1023,1024,199,200,1025,1026,201,202,1027,1028,203,204,1029,1030,205,206,1031,1032,207,208,1033,1034,209,210,1035,1036,211,212,1037,1038,213,214,1039,1040,215,216,1041,1042,217,218,1043,1044,219,220,1045,1046,221,222,1047,1048,223,224,1049,1050,225,226,1051,1052,227,228,1053,1054,229,230,1055,1056,231,232,1057,1058,233,234,1059,1060,235,236,1061,1062,237,238,1063,1064,239,240,1065,1066,241,242,1067,1068,243,244,1069,1070,245,246,1071,1072,247,248,1073,1074,249,250,1075,1076,251,252,1077,1078,253,254,1079,1080,255,256,1081,1082,257,258,1083,1084,259,260,1085,1086,261,262,1087,1088,263,264,1089,1090,265,266,1091,1092,267,268,1093,1094,269,270,1095,1096,271,272,1097,1098,273,274,1099,1100,275,276,1101,1102,277,278,1103,1104,279,280,1105,1106,281,282,1107,1108,283,284,1109,1110,285,286,1111,1112,287,288,1113,1114,289,290,1115,1116,291,292,1117,1118,293,294,1119,1120,295,296,1121,1122,297,298,1123,1124,299,300,1125,1126,301,302,1127,1128,303,304,1129,1130,305,306,1131,1132,307,308,1133,1134,309,310,1135,1136,311,312,1137,1138,313,314,1139,1140,315,316,1141,1142,317,318,1143,1144,319,320,1145,1146,321,322,1147,1148,323,324,1149,1150,325,326,1151,1152,327,328,1153,1154,329,330,1155,1156,331,332,1157,1158,333,334,1159,1160,335,336,1161,1162,337,338,1163,1164,339,340,1165,1166,341,342,1167,1168,343,344,1169,1170,345,346,1171,1172,347,348,1173,1174,349,350,1175,1176,351,352,1177,1178,353,354,1179,1180,355,356,1181,1182,357,358,1183,1184,359,360,1185,1186,361,362,1187,1188,363,364,1189,1190,365,366,1191,1192,367,368,1193,1194,369,370,1195,1196,371,372,1197,1198,373,374,1199,1200,375,376,1201,1202,377,378,1203,1204,379,380,1205,1206,381,382,1207,1208,383,384,1209,1210,385,386,1211,1212,387,388,1213,1214,389,390,1215,1216,391,392,1217,1218,393,394,1219,1220,395,396,1221,1222,397,398,1223,1224,399,400,1225,1226,401,402,1227,1228,403,404,1229,1230,405,406,1231,1232,407,408,1233,1234,409,410,1235,1236,411,412,1237,1238,413,414,1239,1240,415,416,1241,1242,417,418,1243,1244,419,420,1245,1246,421,422,1247,1248,423,424,1249,1250,425,426,1251,1252,427,428,1253,1254,429,430,1255,1256,431,432,1257,1258,433,434,1259,1260,435,436,1261,1262,437,438,1263,1264,439,440,1265,1266,441,442,1267,1268,443,444,1269,1270,445,446,1271,1272,447,448,1273,1274,449,450,1275,1276,451,452,1277,1278,453,454,1279,1280,455,456,1281,1282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478,1303,1304,479,480,1305,1306,481,482,1307,1308,483,484,1309,1310,485,486,1311,1312,487,488,1313,1314,489,490,1315,1316,491,492,1317,1318,493,494,1319,1320,495,496,1321,1322,497,498,1323,1324,499,500,1325,1326,501,502,1327,1328,503,504,1329,1330,505,506,1331,1332,507,508,1333,1334,509,510,1335,1336,511,512,1337,1338,513,514,1339,1340,515,516,1341,1342,517,518,1343,1344,519,520,1345,1346,521,522,1347,1348,523,524,1349,1350,525,526,1351,1352,527,528,1353,1354,529,530,1355,1356,531,532,1357,1358,533,534,1359,1360,535,536,1361,1362,537,538,1363,1364,539,540,1365,1366,541,542,1367,1368,543,544,1369,1370,545,546,1371,1372,547,548,1373,1374,549,550,1375,1376,551,552,1377,1378,553,554,1379,1380,555,556,1381,1382,557,558,1383,1384,559,560,1385,1386,561,562,1387,1388,563,564,1389,1390,565,566,1391,1392,567,568,1393,1394,569,570,1395,1396,571,572,1397,1398,573,574,1399,1400,575,576,1401,1402,577,578,1403,1404,579,580,1405,1406,581,582,1407,1408,583,584,1409,1410,585,586,1411,1412,587,588,1413,1414,589,590,1415,1416,591,592,1417,1418,593,594,1419,1420,595,596,1421,1422,597,598,1423,1424,599,600,1425,1426,601,602,1427,1428,603,604,1429,1430,605,606,1431,1432,607,608,1433,1434,609,610,1435,1436,611,612,1437,1438,613,614,1439,1440,615,616,1441,1442,617,618,1443,1444,619,620,1445,1446,621,622,1447,1448,623,624,1449,1450,625,626,1451,1452,627,628,1453,1454,629,630,1455,1456,631,632,1457,1458,633,634,1459,1460,635,636,1461,1462,637,638,1463,1464,639,640,1465,1466,641,642,1467,1468,643,644,1469,1470,645,646,1471,1472,647,648,1473,1474,649,650,1475,1476,651,652,1477,1478,653,654,1479,1480,655,656,1481,1482,657,658,1483,1484,659,660,1485,1486,661,662,1487,1488,663,664,1489,1490,665,666,1491,1492,667,668,1493,1494,669,670,1495,1496,671,672,1497,1498,673,674,1499,1500,675,676,1501,1502,677,678,1503,1504,679,680,1505,1506,681,682,1507,1508,683,684,1509,1510,685,686,1511,1512,687,688,1513,1514,689,690,1515,1516,691,692,1517,1518,693,694,1519,1520,695,696,1521,1522,697,698,1523,1524,699,700,1525,1526,701,702,1527,1528,703,704,1529,1530,705,706,1531,1532,707,708,1533,1534,709,710,1535,1536,711,712,1537,1538,713,714,1539,1540,715,716,1541,1542,717,718,1543,1544,719,720,1545,1546,721,722,1547,1548,723,724,1549,1550,725,726,1551,1552,727,728,1553,1554,729,730,1555,1556,731,732,1557,1558,733,734,1559,1560,735,736,1561,1562,737,738,1563,1564,739,740,1565,1566,741,742,1567,1568,743,744,1569,1570,745,746,1571,1572,747,748,1573,1574,749,750,1575,1576,751,752,1577,1578,753,754,1579,1580,755,756,1581,1582,757,758,1583,1584,759,760,1585,1586,761,762,1587,1588,763,764,1589,1590,765,766,1591,1592,767,768,1593,1594,769,770,1595,1596,771,772,1597,1598,773,774,1599,1600,775,776,1601,1602,777,778,1603,1604,779,780,1605,1606,781,782,1607,1608,783,784,1609,1610,785,786,1611,1612,787,788,1613,1614,789,790,1615,1616,791,792,1617,1618,793,794,1619,1620,795,796,1621,1622,797,798,1623,1624,799,800,1625,1626,801,802,1627,1628,803,804,1629,1630,805,806,1631,1632,807,808,1633,1634,809,810,1635,1636,811,812,1637,1638,813,814,1639,1640,815,816,1641,1642,817,818,1643,1644,819,820,1645,1646,821,822,1647,1648,823,824,1649,1650,825,826,1651,1652)] Error: unexpected symbol in: "282,457,458,1283,1284,459,460,1285,1286,461,462,1287,1288,463,464,1289,1290,465,466,1291,1292,467,468,1293,1294,469,470,1295,1296,471,472,1297,1298,473,474,1299,1300,475,476,1301,1302,477,478, test<-read.table('order.txt',stringsAsFactors = FALSE) test<-as.character(test) df<-all_reps[,c(test)] Error in all_reps[, c(test)] : subscript out of bounds Is the problem that the column vector consists of 7152 chars?
A better option would be to scan and use that in rearrangng the columns test <- scan('order.txt', sep=",", quiet = TRUE)
How can a data frame be transformed into a string with a csv format on R?
I don't want to write a csv into a file, but to get a string representation of the dataframe with a csv format (to send it over the network). I'm using R.NET, if it helps to know.
If you are not limited to base functions, you may try readr::format_csv. library(readr) format_csv(iris[1:2, 1:3]) # [1] "Sepal.Length,Sepal.Width,Petal.Length\n5.1,3.5,1.4\n4.9,3.0,1.4\n"
If you want a single string in csv format, you could capture the output from write.csv. Let's use mtcars as an example: paste(capture.output(write.csv(mtcars)), collapse = "\n") This reads back into R fine with read.csv(text = ..., row.names = 1). You can make adjustments for the printing of row names and other attributes in write.csv.
Alternatively: write.csv(mtcars, textConnection("output", "w"), row.names=FALSE) which will create the variable output in the global environment and store it in a character vector. You can do paste0(output, collapse="\n") to make it one big character string, similar to Rich's answer (but paste0() is marginally faster).
Vectorise an imported variable in R
I have imported a CSV file to R but now I would like to extract a variable into a vector and analyse it separately. Could you please tell me how I could do that? I know that the summary() function gives a rough idea but I would like to learn more. I apologise if this is a trivial question but I have watched a number of tutorial videos and have not seen that anywhere.
Read data into data frame using read.csv. Get names of data frame. They should be the names of the CSV columns unless you've done something wrong. Use dollar-notation to get vectors by name. Try reading some tutorials instead of watching videos, then you can try stuff out. d = read.csv("foo.csv") names(d) v = d$whatever # for example hist(v) # for example This is totally trivial stuff.
I assume you have use the read.csv() or the read.table() function to import your data in R. (You can have help directly in R with ? e.g. ?read.csv So normally, you have a data.frame. And if you check the documentation the data.frame is described as a "[...]tightly coupled collections of variables which share many of the properties of matrices and of lists[...]" So basically you can already handle your data as vector. A quick research on SO gave back this two posts among others: Converting a dataframe to a vector (by rows) and Extract Column from data.frame as a Vector And I am sure they are more relevant ones. Try some good tutorials on R (videos are not so formative in this case). There is a ton of good ones on the Internet, e.g: * http://www.introductoryr.co.uk/R_Resources_for_Beginners.html (which lists some) or * http://tryr.codeschool.com/ Anyways, one way to deal with your csv would be: #import the data to R as a data.frame mydata = read.csv(file="SomeFile.csv", header = TRUE, sep = ",", quote = "\"",dec = ".", fill = TRUE, comment.char = "") #extract a column to a vector firstColumn = mydata$col1 # extract the column named "col1" of mydata to a vector #This previous line is equivalent to: firstColumn = mydata[,"col1"] #extract a row to a vector firstline = mydata[1,] #extract the first row of mydata to a vector Edit: In some cases[1], you might need to coerce the data in a vector by applying functions such as as.numeric or as.character: firstline=as.numeric(mydata[1,])#extract the first row of mydata to a vector #Note: the entire row *has to be* numeric or compatible with that class [1] e.g. it happened to me when I wanted to extract a row of a data.frame inside a nested function
Importing csv file into R - numeric values read as characters
I am aware that there are similar questions on this site, however, none of them seem to answer my question sufficiently. This is what I have done so far: I have a csv file which I open in excel. I manipulate the columns algebraically to obtain a new column "A". I import the file into R using read.csv() and the entries in column A are stored as factors - I want them to be stored as numeric. I find this question on the topic: Imported a csv-dataset to R but the values becomes factors Following the advice, I include stringsAsFactors = FALSE as an argument in read.csv(), however, as Hong Ooi suggested in the page linked above, this doesn't cause the entries in column A to be stored as numeric values. A possible solution is to use the advice given in the following page: How to convert a factor to an integer\numeric without a loss of information? however, I would like a cleaner solution i.e. a way to import the file so that the entries of column entries are stored as numeric values. Cheers for any help!
Whatever algebra you are doing in Excel to create the new column could probably be done more effectively in R. Please try the following: Read the raw file (before any excel manipulation) into R using read.csv(... stringsAsFactors=FALSE). [If that does not work, please take a look at ?read.table (which read.csv wraps), however there may be some other underlying issue]. For example: delim = "," # or is it "\t" ? dec = "." # or is it "," ? myDataFrame <- read.csv("path/to/file.csv", header=TRUE, sep=delim, dec=dec, stringsAsFactors=FALSE) Then, let's say your numeric columns is column 4 myDataFrame[, 4] <- as.numeric(myDataFrame[, 4]) # you can also refer to the column by "itsName" Lastly, if you need any help with accomplishing in R the same tasks that you've done in Excel, there are plenty of folks here who would be happy to help you out
In read.table (and its relatives) it is the na.strings argument which specifies which strings are to be interpreted as missing values NA. The default value is na.strings = "NA" If missing values in an otherwise numeric variable column are coded as something else than "NA", e.g. "." or "N/A", these rows will be interpreted as character, and then the whole column is converted to character. Thus, if your missing values are some else than "NA", you need to specify them in na.strings.
If you're dealing with large datasets (i.e. datasets with a high number of columns), the solution noted above can be manually cumbersome, and requires you to know which columns are numeric a priori. Try this instead. char_data <- read.csv(input_filename, stringsAsFactors = F) num_data <- data.frame(data.matrix(char_data)) numeric_columns <- sapply(num_data,function(x){mean(as.numeric(is.na(x)))<0.5}) final_data <- data.frame(num_data[,numeric_columns], char_data[,!numeric_columns]) The code does the following: Imports your data as character columns. Creates an instance of your data as numeric columns. Identifies which columns from your data are numeric (assuming columns with less than 50% NAs upon converting your data to numeric are indeed numeric). Merging the numeric and character columns into a final dataset. This essentially automates the import of your .csv file by preserving the data types of the original columns (as character and numeric).
Including this in the read.csv command worked for me: strip.white = TRUE (I found this solution here.)
version for data.table based on code from dmanuge : convNumValues<-function(ds){ ds<-data.table(ds) dsnum<-data.table(data.matrix(ds)) num_cols <- sapply(dsnum,function(x){mean(as.numeric(is.na(x)))<0.5}) nds <- data.table( dsnum[, .SD, .SDcols=attributes(num_cols)$names[which(num_cols)]] ,ds[, .SD, .SDcols=attributes(num_cols)$names[which(!num_cols)]] ) return(nds) }
I had a similar problem. Based on Joshua's premise that excel was the problem I looked at it and found that the numbers were formatted with commas between every third digit. Reformatting without commas fixed the problem.
So, I had the similar situation here in my data file when I readin as a csv. All the numeric value were turned into char. But in my file there was a value with a word "Filtered" instead of NA. I converted "Filtered" to NA in vim editor of linux terminal with a command <%s/Filtered/NA/g> and saved this file and later used it and read it in R, all the values were num type and not char type any more. Looks like character value "Filtered" was inducing all values to be char format. Charu
Hello #Shawn Hemelstrand here are the steps in detail below: example matrix file.csv having 'Filtered' word in it I opened the file.csv in linux command terminal vi file.csv then press "Esc shift:" and type the following command at the bottom "%s/Filtered/NA/g" press enter then press "Esc shift:" write "wq" at the bottom (this save the file and quit vim editor) then in R script I read the file data<- read.csv("file.csv", sep = ',', header = TRUE) str(data) All columns were num type which were earlier char type. In case you need more help, it would be easier to share your txt or csv file.