I made a table like this table name a.
variable relative_importance scaled_importance percentage
1 x005 68046.078125 1.000000 0.195396
2 x004 63890.796875 0.938934 0.183464
3 x007 48253.820312 0.709134 0.138562
4 x012 43492.117188 0.639157 0.124889
5 x008 43132.035156 0.633865 0.123855
6 x013 32495.070312 0.477545 0.093310
7 x009 18466.910156 0.271388 0.053028
8 x015 10625.453125 0.156151 0.030511
9 x010 8893.750977 0.130702 0.025539
10 x014 4904.361816 0.072074 0.014083
11 x002 1812.269531 0.026633 0.005204
12 x001 1704.574585 0.025050 0.004895
13 x006 1438.692139 0.021143 0.004131
14 x011 1080.584106 0.015880 0.003103
15 x003 10.152302 0.000149 0.000029
and use this code to order that table.
setorder(a,variable)
and want to get only second column.
a[2]
relative_importance
12 380.4296
11 645.4594
15 10.1440
4 8599.7715
2 10749.5752
13 263.7065
5 8434.3760
6 7443.8530
7 3602.8850
10 935.6713
14 256.7183
3 9160.4062
1 12071.1826
9 1173.0701
8 1698.0955
I want to copy "relative_importance" and paste in Excel.
But, I couldn't delete the rownames. (12,11,15...,9,8)
Is there any way to print only "relative_importance"? (print without rownames or hide rownames)
Thank you :)
You could simply use writeClipboard( as.character(a$relative_importance) ) and paste it in Excel
You could create a csv file, which you can open with Excel.
write.csv(a[2], "myfile.csv", row.names = FALSE, col.names = FALSE.
Note that the file will be created in your current working directory, which you can find by running the following code: getwd().
On a different note, are you trying to get the column into Excel for further analysis? If you are, I encourage you to learn how to do that analysis in R.
Related
Say I have two files, file1.txt and file2.txt that looks like this:
file1.txt
blablabla
lorem ipsum
year: 2007
Jan Feb Mar
1 2 3
4 5 6
file2.txt
blablabla
lorem ipsum
year: 2008
Jan Feb Mar
7 8 9
10 11 12
I can read these files with purrr::map_df(read_table,skip=3)
But what I want to do is extract the year from each file and assign it on a new year column so that my final dataframe looks like this:
Jan Feb Mar Year
1 2 3 2007
4 5 6 2007
7 8 9 2008
10 11 12 2008
I am looking somewhere in the line of using readr::read_lines first then readr::read_table using rlang::exec but don't know how exactly to do this.
Base R implements streaming connections with readLines:
f <- function(path) {
## Open connection and close on exit
zzz <- file(path, open = "rt")
on.exit(close(zzz))
## Read first three lines into character vector and extract year
y <- as.integer(gsub("\\D", "", readLines(zzz, n = 3L)[3L]))
## Read remaining lines into data frame
d <- read.table(zzz, header = TRUE)
d$Year <- y
d
}
nms <- c("file1.txt", "file2.txt")
do.call(rbind, lapply(nms, f))
Jan Feb Mar Year
1 1 2 3 2007
2 4 5 6 2007
3 7 8 9 2008
4 10 11 12 2008
It's not clear to me that readr has this functionality:
library("readr")
zzz <- file("file1.txt", open = "rb")
read_lines(zzz, skip = 2L, n_max = 1L)
## [1] "year: 2007"
read_table(zzz)
## # A tibble: 0 × 0
close(zzz)
Even though we only asked read_lines for the third line of file1.txt, it seems to have (invisibly) read all of the lines, leaving nothing for read_table.
On the other hand, this GitHub issue was "fixed" last year, so it is strange not to see support for streaming connections in the latest release version of readr. Maybe I'm missing something...?
One other solution is use the id argument in read_csv which in your example creates a new column with the file name (eg. "file1.txt") to show which file each row came from.
Note you don't need to use map_df, you can directly pass the file name to read_csv() and it will apply the read_csv to each file and compile to a dataframe.
From there, you can create new dataframe with read_csv of just the 3rd row eg(year:2007) again with the id argument and this time use skip and n_max arguments so that you only pull in the row with "year:2007".
With those two data frames you can then left join based on the column you set with the id argument to pull in the that row!
You will need to extract out the "year" text, which be easily done with the str_extract() argument.
df_missing_year <- readr::read_csv(file=file_path, id="source", skip=3)
df_year_only <- readr::read_csv(file=file_path, id="source",skip=2, n_max=1)
df_complete <- dplyr::left_join(x=df_missing_year, y=df_year_only, by="source")
If you found this helpful, please consider up voting or selecting it has the answer.
I tried to write a dataframe into a txt file without heading, but it kept adding column names. When I opened the file directly from the drive, it has 21 rows without the heading, but when I opened the file using read.delim(), I can see headers with some symbols.
Here is the code
write.table(trans_sequence, file="mytxtout.txt", sep=";", col.names =FALSE, row.names = FALSE,
quote = FALSE)
When I retrieved the data using read.delim, it looks like below. It should have 21 rows, but it made the top row a column name, making 20 rows. The first row should be like this
2745;9;2;HbA1c;LDL;C;Tests
But it created a header instead
read.delim("mytxtout.txt")
X2745.9.2.HbA1c.LDL.C.Tests
1 10433;9;2;BMI;Blood Pressure
2 13601;0;1;LDL-C Tests
3 13601;6;1;LDL-C Tests
4 36127;2;2;BMI;Blood Pressure
5 36127;5;1;Blood Pressure
6 36127;9;2;BMI;Blood Pressure
7 36127;10;2;BMI;Blood Pressure
8 54881;9;2;HbA1c;LDL-C Tests
9 59650;0;2;BMI;Blood Pressure
10 59650;3;2;BMI;Blood Pressure
11 66741;0;1;LDL-C Tests
12 72772;3;1;LDL-C Tests
13 77618;2;3;BMI;BMI Percentile;Blood Pressure
14 77618;3;2;BMI;BMI Percentile
15 81397;4;1;BMI
16 81397;6;2;BMI;Blood Pressure
17 81397;9;2;BMI;Blood Pressure
18 81397;9;1;BMI
19 83520;6;3;BMI;BMI Percentile;Blood Pressure
20 85178;10;1;LDL-C Tests
Any help will be greatly appreciated
I have a small issue regarding a dataset I am using. Suppose I have a dataset called mergedData2 defined using those command lines from a subset of mergedData:
mergedData=rbind(test_set,training_set)
lookformean<-grep("mean()",names(mergedData),fixed=TRUE)
lookforstd<-grep("std()",names(mergedData),fixed=TRUE)
varsofinterests<-sort(c(lookformean,lookforstd))
mergedData2<-mergedData[,c(1:2,varsofinterests)]
If I do names(mergedData2), I get:
[1] "volunteer_identifier" "type_of_experiment"
[3] "body_acceleration_mean()-X" "body_acceleration_mean()-Y"
[5] "body_acceleration_mean()-Z" "body_acceleration_std()-X"
(I takes this 6 first names as MWE but I have a vector of 68 names)
Now, suppose I want to take the average of each of the measurements per volunteer_identifier and type_of_experiment. For this, I used a combination of split and lapply:
mylist<-split(mergedData2,list(mergedData2$volunteer_identifier,mergedData2$type_of_experiment))
average_activities<-lapply(mylist,function(x) colMeans(x))
average_dataset<-t(as.data.frame(average_activities))
As average_activities is a list, I converted it into a data frame and transposed this data frame to keep the same format as mergedData and mergedData2. The problem now is the following: when I call names(average_dataset), it returns NULL !! But, more strangely, when I do:head(average_dataset) ; it returns :
volunteer_identifier type_of_experiment body_acceleration_mean()-X body_acceleration_mean()-Y
1 1 0.2773308 -0.01738382
2 1 0.2764266 -0.01859492
3 1 0.2755675 -0.01717678
4 1 0.2785820 -0.01483995
5 1 0.2778423 -0.01728503
6 1 0.2836589 -0.01689542
This is just a small sample of the output, to say that the names of the variables are there. So why names(average_dataset) returns NULL ?
Thanks in advance for your reply, best
EDIT: Here is an MWE for mergedData2:
volunteer_identifier type_of_experiment body_acceleration_mean()-X body_acceleration_mean()-Y
1 2 5 0.2571778 -0.02328523
2 2 5 0.2860267 -0.01316336
3 2 5 0.2754848 -0.02605042
4 2 5 0.2702982 -0.03261387
5 2 5 0.2748330 -0.02784779
6 2 5 0.2792199 -0.01862040
body_acceleration_mean()-Z body_acceleration_std()-X body_acceleration_std()-Y body_acceleration_std()-Z
1 -0.01465376 -0.9384040 -0.9200908 -0.6676833
2 -0.11908252 -0.9754147 -0.9674579 -0.9449582
3 -0.11815167 -0.9938190 -0.9699255 -0.9627480
4 -0.11752018 -0.9947428 -0.9732676 -0.9670907
5 -0.12952716 -0.9938525 -0.9674455 -0.9782950
6 -0.11390197 -0.9944552 -0.9704169 -0.9653163
gravity_acceleration_mean()-X gravity_acceleration_mean()-Y gravity_acceleration_mean()-Z
1 0.9364893 -0.2827192 0.1152882
2 0.9274036 -0.2892151 0.1525683
3 0.9299150 -0.2875128 0.1460856
4 0.9288814 -0.2933958 0.1429259
5 0.9265997 -0.3029609 0.1383067
6 0.9256632 -0.3089397 0.1305608
gravity_acceleration_std()-X gravity_acceleration_std()-Y gravity_acceleration_std()-Z
1 -0.9254273 -0.9370141 -0.5642884
2 -0.9890571 -0.9838872 -0.9647811
3 -0.9959365 -0.9882505 -0.9815796
4 -0.9931392 -0.9704192 -0.9915917
5 -0.9955746 -0.9709604 -0.9680853
6 -0.9988423 -0.9907387 -0.9712319
My duty is to get this average_dataset (which is a dataset which contains the average value for each physical quantity (column 3 and onwards) for each volunteer and type of experiment (e.g 1 1 mean1 mean2 mean3...mean68
2 1 mean1 mean2 mean3...mean68, etc)
After this I will have to export it as a txt file (so I think using write.table with row.names=F, and col.names=T). Note that for now, if I do this and import the dataset generated using read.table, I don't recover the names of the columns of the dataset; even while specifying col.names=T.
I am having problem with the following codes (I'm a beginner, so please go easy on me):
COW$id<- (COW$tcode1*1000 + COW$tcode2)
COW$id<- (COW$tcode2*1000 + COW$tcode1)
I want the first line of code to be executed on the condition that the value of tcode1 (a variable in COW dataframe) is less than tcode2 (tcode1 < tcode2), and I want the second line of code to be executed if tcode1 is greater than tcode2 (tcode1 > tcode2). The end result I am looking for is a single column "ID" in my dataframe, on the basis of the conditions above. Does anyone know how to achieve this?
COW = data.frame(tcode1=c(5,7,18,9),tcode2=c(4,15,8,10))
head(COW)
tcode1 tcode2
5 4
7 15
18 8
9 10
id = ifelse(COW$tcode1<COW$tcode2,
COW$tcode1*1000 + COW$tcode2,
COW$tcode2*1000 + COW$tcode1)
COW = data.frame(id=id,COW)
head(COW)
id tcode1 tcode2
4005 5 4
7015 7 15
8018 18 8
9010 9 10
I have some cross correlation function crosscor, and I would like to loop through the function for each of the columns I have in my data matrix. The function outputs some cross correlation that looks something like this each time it is run:
Lags Cross.Correlation P.value
1 0 -0.0006844958 0.993233547
2 1 0.1021006478 0.204691627
3 2 0.0976746274 0.226628526
4 3 0.1150337867 0.155426784
5 4 0.1943150900 0.016092041
6 5 0.2360415470 0.003416147
7 6 0.1855274375 0.022566685
8 7 0.0800646242 0.330081900
9 8 0.1111071269 0.177338885
10 9 0.0689602574 0.404948252
11 10 -0.0097332533 0.906856279
12 11 0.0146241719 0.860926388
13 12 0.0862549791 0.302268025
14 13 0.1283308019 0.125302070
15 14 0.0909537922 0.279988895
16 15 0.0628012627 0.457795228
17 16 0.1669241304 0.047886605
18 17 0.2019811994 0.016703619
19 18 0.1440124960 0.090764520
20 19 0.1104842808 0.197035340
21 20 0.1247428178 0.146396407
I would like put all of the lists together so they are in a data frame, and ultimately export it into a csv file so the columns are as follows: lags.3, cross-correlation.3, p-value.3, lags.3, cross-correlation.2....etc. until p.value.50.
I have tried to use do.call as follows, but have not been successful:
for(i in 3:50)
{
l1<-crosscor(data[,2], data[,i], lagmax=20)
ccdata<-do.call(rbind, l1)
cat("Data row", i)
}
I've also tried just creating the data frame straight out, but am just getting the lag column names:
ccdata <- data.frame()
for(i in 3:50)
{
ccdata[i-2:i+1]<-crosscor(data[,2], data[,i], lagmax=20)
cat("Data row", i)
}
What am I doing wrong? Or is there an online source on data sets I could access to figure out how to do this? Best,
There is a transpose method for data.frames. If "crosscor" is the name of the object just try this:
tcrosscor <- t(crosscor)
write.csv(tcrosscor, file="my_crosscor_1.csv")
The first row would be the Lag's; the second row, the Cross.Correlation's; the third row the P.value's. I suppose you could "flatten" it further so it would be entirely "horizontal" or "wide". Seems painful but this might go something like:
single_line <- as.data.frame(unlist(tcrosscor))
names(single_line) <- paste("Lag", 'Cross.Correlation', 'P.value'), rep(1:50, 3), sep=".")
write.csv(single_line, file="my_single_1.csv")