R- get a single column from many columns

R- get a single column from many columns - r

I have wavelenghts from 350 to 2500 each one have data:
x350 x351 x352 x353 x354 ...... x2500
0.18 0.17 0.17 0.17 0.16 ...... 0.3
0.16 0.15 0.15 0.15 0.15 ...... 0.47
0.14 0.14 0.13 0.13 0.13 ...... 0.35
I need to make one column without the name of the wavelenght and give to this new colum a name:
Wave
0.18
0.16
0.14
0.17
0.15
0.14
0.17
0.15
0.13
0.16
0.15
0.13
.
.
.
0.3
0.47
0.35
m is my file and the columns of the wavelenghts are from 17 col to 2167 col. I tried:
a <- list(m[1:16,17:2167])
but I get the list with the names of the columns in between:
list(structure(list(X350 = c(0.15723315, 0.138406682, 0.174909807,
0.143139974, 0.123193808, 0.154449448, 0.163255619, 0.126194713,
0.14327512, 0.066265248, 0.139851395, 0.158271497, 0.158060045,
0.145313933, 0.143890661), X351 = c(0.154324452, 0.135509959,
0.173350322, 0.139867145, 0.121439474, 0.15276091, 0.160391152,
0.125592826, 0.140349489, 0.065316491, 0.137927937, 0.158400317,
0.156211611, 0.142498763, 0.141353986), X352 = c(0.151243533....
How can I get just one column with one name from 2465 columns?
More info
str(m)
'data.frame': 16 obs. of 2167 variables:
$ pott : int 48 49 50 51 52 53 54 55 56 57 ...
$ b : chr "B1" "B1" "B1" "B1" ...
$ F : int 1 1 1 1 1 1 1 1 1 1 ...
$ G : chr "Sunstar" "Quarrion" "Nacozari" "W130114" ...
$ R : int 3 3 3 3 3 3 3 3 3 3 ...
$ D : int 80 80 81 80 81 80 82 82 82 82 ...
$ W: num 1.8 1.5 1.3 1.9 1.8 1.25 1.85 2.1 1.6 2.4 ...
$ S : num 43.4 35.7 44.7 48.6 45.3 35.5 49.2 49.1 46.8 41.5 ...
$ R : num -0.327 1.149 2.348 1.636 1.952 ...
$ V : num 76.4 49 118.9 108 114.5 ...
$ J : num 158 114 191 169 183 ...
$ P: num 19.9 10.6 24.1 21.1 23.6 ...
$ Ce : num 0.367 0.13 0.466 0.36 0.462 ...
$ Ci : num 273 246 280 263 272 ...
$ S : num 23.5 29 30.9 29.4 24.1 ...
$ L : num 42.5 34.4 32.4 34 41.4 ...
$ X350 : num 0.176 0.157 0.138 0.175 0.143 ...
$ X351 : num 0.172 0.154 0.136 0.173 0.14 ...
$ X352 : num 0.169 0.151 0.133 0.172 0.138 ...
$ X353 : num 0.167 0.147 0.132 0.17 0.137 ...
$ X354 : num 0.165 0.147 0.13 0.167 0.133 ...
$ X355 : num 0.162 0.146 0.127 0.166 0.13 ...
$ X356 : num 0.159 0.144 0.126 0.164 0.128 ...
$ X357 : num 0.158 0.14 0.125 0.161 0.125 ...
$ X358 : num 0.155 0.138 0.123 0.159 0.124 ...
$ X359 : num 0.153 0.137 0.121 0.157 0.123 ...
$ X360 : num 0.15 0.135 0.12 0.154 0.122 ...
....$2500

I guess your data are in a text file
data <- read.table("your_file", header=T, quote="\"")
so, data will look like
structure(list(x350 = c(0.18, 0.16, 0.14), x351 = c(0.17, 0.15,
0.14), x352 = c(0.17, 0.15, 0.13), x353 = c(0.17, 0.15, 0.13)), .Names = c("x350",
"x351", "x352", "x353"), class = "data.frame", row.names = c(NA,
-3L))
and
result <- data.frame(Wave = unlist(data,use.names=FALSE))
will produce
Wave
1 0.18
2 0.16
3 0.14
4 0.17
5 0.15
6 0.14
7 0.17
8 0.15
9 0.13
10 0.17
11 0.15
12 0.13

Related

Problem trying to convert a characterfile in R -- I can't seem to get as.numeric() to work right

Here is the question set up:
I have read in a data file from the Machine Learing Depository called "abalone.data":
dat=read.csv(file="abalone.data",header=FALSE)
colnames(dat)<-c('Sex','Length','Diameter','Height','Whole weight',
'Shucked wieght','Viscera weight','Shell weight','Rings')
Here is a sample:
head(dat)
Sex Length Diameter Height Whole weight Shucked wieght Viscera weight Shell weight Rings
1 M 0.455 0.365 0.095 0.5140 0.2245 0.1010 0.150 15
2 M 0.350 0.265 0.090 0.2255 0.0995 0.0485 0.070 7
3 F 0.530 0.420 0.135 0.6770 0.2565 0.1415 0.210 9
And here is the structure":
str(dat)
'data.frame': 4177 obs. of 9 variables:
$ Sex : chr "M" "M" "F" "M" ...
$ Length : num 0.455 0.35 0.53 0.44 0.33 0.425 0.53 0.545 0.475 0.55 ...
$ Diameter : num 0.365 0.265 0.42 0.365 0.255 0.3 0.415 0.425 0.37 0.44 ...
$ Height : num 0.095 0.09 0.135 0.125 0.08 0.095 0.15 0.125 0.125 0.15 ...
$ Whole weight : num 0.514 0.226 0.677 0.516 0.205 ...
$ Shucked wieght: num 0.2245 0.0995 0.2565 0.2155 0.0895 ...
$ Viscera weight: num 0.101 0.0485 0.1415 0.114 0.0395 ...
$ Shell weight : num 0.15 0.07 0.21 0.155 0.055 0.12 0.33 0.26 0.165 0.32 ...
$ Rings : int 15 7 9 10 7 8 20 16 9 19 ...
Here is the problem:
I want to convert the first row to numeric; e.g. "M" to 1, "F" to 2 and "I"to 3.
So, I try
Sex <- as.numeric(dat$Sex)
but I get:
Sex<-as.numeric(dat$sex)
> Sex[1:5]
[1] NA NA NA NA NA
I've tried a lot of similar commands; e.g.:
as.numeric(dat$Sex=character(),levels=levels)
Error: unexpected '=' in " as.numeric(dat$Sex="
I cannot figure this out.
Please help

That's because the Sex variable is a character vector. You first need to change it to a factor:
dat$Sex <- as.numeric(factor(dat$Sex))

Subset DataFrame by one column then value in another column [duplicate]

This question already has answers here:
Select the top N values by group
(10 answers)
Closed 1 year ago.
I have data.frame TeamFourFactorsRAPM consisting of 44 columns, I want to subset data frame base on two columns
column 41 teamName (consist of team names for all players in the NBA)
column 44 mp (consist of how many minutes a player played throughout season)
I want to get the 8 players with the highest minutes played, for every team
'data.frame': 539 obs. of 44 variables:
$ playerId : int 101108 1628415 101150 1627783 1627846 1629620 1626153 1629641 1628021 1628035 ...
$ playerName : chr "Chris Paul" "Dillon Brooks" "Lou Williams" "Pascal Siakam" ...
$ LA_RAPM : num 1.37 1.33 -0.82 1.91 -0.45 -0.48 0.65 -0.7 0.96 -0.08 ...
$ LA_RAPM_Rank : int 37 42 463 20 386 400 104 439 73 268 ...
$ LA_RAPM__Def : num 0.58 0.84 -0.69 0.47 0.1 -0.33 0.44 0.11 0.4 0.05 ...
$ LA_RAPM__Def_Rank: int 76 39 486 100 198 401 102 197 112 237 ...
$ LA_RAPM__Off : num 0.79 0.49 -0.14 1.44 -0.55 -0.15 0.2 -0.81 0.56 -0.13 ...
$ LA_RAPM__Off_Rank: int 63 105 322 15 443 333 166 489 94 318 ...
$ RA_EFG : num 0.42 0.88 -0.57 0.48 0.09 -0.22 -1.08 0.05 -0.03 -0.02 ...
$ RA_EFG_Rank : int 117 37 460 99 207 357 518 229 259 258 ...
$ RA_EFG__Def : num 0.08 0.36 -0.17 -0.04 0.13 -0.09 -0.69 0.25 0.35 0.19 ...
$ RA_EFG__Def_Rank : int 208 84 387 297 169 346 522 110 88 132 ...
$ RA_EFG__Off : num 0.33 0.52 -0.4 0.52 -0.05 -0.13 -0.4 -0.2 -0.37 -0.21 ...
$ RA_EFG__Off_Rank : int 109 70 449 71 264 329 448 366 443 372 ...
$ RA_FTR : num -0.12 -0.24 0.55 0.26 -0.18 0.05 1.18 -0.42 0.39 -0.19 ...
$ RA_FTR_Rank : int 315 356 90 161 337 247 19 414 124 340 ...
$ RA_FTR__Def : num 0.53 -0.2 0.01 0.34 0.08 -0.04 0.69 -0.08 -0.09 -0.75 ...
$ RA_FTR__Def_Rank : int 61 373 241 102 201 276 41 302 315 522 ...
$ RA_FTR__Off : num -0.64 -0.04 0.53 -0.08 -0.26 0.09 0.48 -0.34 0.47 0.56 ...
$ RA_FTR__Off_Rank : int 504 289 53 305 397 218 65 430 68 49 ...
$ RA_ORBD : num -0.02 -0.3 -1.06 0.01 -0.63 -0.7 -1.59 -0.13 -0.67 0.68 ...
$ RA_ORBD_Rank : int 269 357 485 253 437 446 522 300 441 103 ...
$ RA_ORBD__Def : num 0.83 0.06 -0.82 -0.25 0.38 -0.78 -1.05 0.08 -0.73 0.01 ...
$ RA_ORBD__Def_Rank: int 40 236 506 373 123 502 524 227 496 264 ...
$ RA_ORBD__Off : num -0.85 -0.35 -0.23 0.26 -1.01 0.08 -0.54 -0.2 0.06 0.69 ...
$ RA_ORBD__Off_Rank: int 496 392 355 169 511 220 437 341 227 65 ...
$ RA_TOV : num 0.93 0.41 0.13 0.93 -0.24 -0.39 1.84 -0.51 0.1 0.51 ...
$ RA_TOV_Rank : int 11 82 191 12 412 457 1 482 211 60 ...
$ RA_TOV__Def : num 0.5 0.51 -0.27 0.4 0.09 -0.16 1.12 -0.28 0.17 0.36 ...
$ RA_TOV__Def_Rank : int 36 33 452 55 193 394 2 456 139 66 ...
$ RA_TOV__Off : num 0.43 -0.1 0.4 0.52 -0.32 -0.24 0.72 -0.23 -0.08 0.15 ...
$ RA_TOV__Off_Rank : int 51 358 62 26 466 438 10 433 336 154 ...
$ RAPM : num 1.67 1.84 -0.43 1.68 -0.02 -0.76 -0.02 -0.48 -0.08 0.21 ...
$ RAPM_Rank : int 44 34 356 43 248 421 246 368 274 197 ...
$ RAPM__Def : num 0.92 1 -0.64 0.22 0.5 -0.51 0.03 0.2 0.44 0.43 ...
$ RAPM__Def_Rank : int 53 41 457 177 112 440 248 189 121 122 ...
$ RAPM__Off : num 0.75 0.84 0.21 1.46 -0.53 -0.25 -0.04 -0.68 -0.53 -0.23 ...
$ RAPM__Off_Rank : int 94 81 179 38 406 342 250 440 407 331 ...
$ season : chr "2020-21" "2020-21" "2020-21" "2020-21" ...
$ teamId : int 1610612756 1610612763 1610612746 1610612761 1610612756 1610612760 1610612765 1610612738 1610612745 1610612747 ...
$ teamName : chr "PHX" "MEM" "LAC" "TOR" ...
$ primaryKey : chr "101108_2020-21" "1628415_2020-21" "101150_2020-21" "1627783_2020-21" ...
$ playerRole : chr "Ball Handler, Primary Playmaker" "Wing, Shooter" "Ball Handler, Primary Playmaker" "Wing, Playmaker" ...
$ mp : num 2199 1997 2846 2006 355 ...

If every player has only 1 row this should work -
result <- TeamFourFactorsRAPM %>%
group_by(teamName) %>%
slice_max(mp, n = 8) %>% ungroup

ggbiplot graphical display in groups

I am learning biplot with wine data set. How does R know Barolo, Grignolino and Barbera are wine.class while we don't see the wine class column in the data set?
More details about the wine data set are in the following links
ggbiplot - how not to use the feature vectors in the plot
https://github.com/vqv/ggbiplot
Thanks very much

In the wine dataset, you have 2 objects, one data.frame wine with 178 observations of 13 quantitative variables:
str(wine)
'data.frame': 178 obs. of 13 variables:
$ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ...
$ MalicAcid : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
$ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
$ AlcAsh : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
$ Mg : int 127 100 101 113 118 112 96 121 97 98 ...
$ Phenols : num 2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
$ Flav : num 3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
$ NonFlavPhenols: num 0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
$ Proa : num 2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
$ Color : num 5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
$ Hue : num 1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
$ OD : num 3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
$ Proline : int 1065 1050 1185 1480 735 1450 1290 1295 1045 1045 ...
There is also one vector wine.class that contains 178 observations of the qualitative wine.class variable:
str(wine.class)
Factor w/ 3 levels "barolo","grignolino",..: 1 1 1 1 1 1 1 1 1 1 ...
The 13 quantitative variables are used to compute the PCA:
wine.pca <- prcomp(wine, scale. = TRUE)
while the wine.class variable is just used to color the points on the plot

Error using IRMI imputation

I've been trying to use irmi from VIM package to replace NA's.
My data looks something like this:
> str(sub_mex)
'data.frame': 21 obs. of 83 variables:
$ pH : num 7.2 7.4 7.4 7.36 7.2 7.82 7.67 7.73 7.79 7.7 ...
$ Cond : num 1152 1078 1076 1076 1018 ...
$ CO3 : num NA NA NA NA NA ...
$ Mg : num 25.8 24.9 24.3 24.8 23.4 ...
$ NO3 : num 49.7 25.6 27.1 39.6 52.8 ...
$ Cd : num 0.0088 0.0104 0.0085 0.0092 0.0086 ...
$ As_H : num 0.006 0.0059 0.0056 0.0068 0.0073 ...
$ As_F : num 0.0056 0.0058 0.0057 0.0066 0.0065 0.004 0.004 0.004 0.0048 0.0078 ...
$ As_FC : num NA NA NA NA NA NA NA NA NA 0.0028 ...
$ Pb : num 0.0097 0.0096 0.0092 0.01 0.0093 0.0275 0.024 0.0255 0.031 0.024 ...
$ Fe : num 0.39 0.26 0.27 0.28 0.32 0.135 0.08 NA 0.13 NA ...
$ No_EPT : int 0 0 0 0 0 0 0 0 0 0 ...
I've subset my sub_mex dataset to analyze observations separately, so i have sub_t dataset. Which look something like this
> str(sub_t)
'data.frame': 5 obs. of 83 variables:
$ pH : num 7.82 7.67 7.73 7.79 7.7
$ CO3 : num 45 NA 37.2 41.9 40.3
$ Mg : num 41.3 51.4 47.7 51.8 53
$ NO3 : num 47.1 40.7 39.9 42.1 37.6
$ Cd : num 0.0173 0.0145 0.016 0.016 0.0154
$ As_H : num 0.00949 0.01009 0.00907 0.00972 0.00954
$ As_F : num 0.004 0.004 0.004 0.0048 0.0078
$ As_FC : num NA NA NA NA 0.0028
$ Pb : num 0.0275 0.024 0.0255 0.031 0.024
$ Fe : num 0.135 0.08 NA 0.13 NA
$ No_EPT : int 0 0 0 0 0
I impute NA's of the sub_mex dataset using:
imp_mexi <- irmi(sub_mex) which works fine
However when I try to impute the subset sub_t I got the following error message:
> imp_t <- irmi(sub_t)
Error in indexNA2s[, variable[j]] : subscript out of bounds
Does anyone have an idea of how to solve this? I want to impute my data sub_t and I don't want to use a subset of the ìmp_mexi imputed dataset.
Any help will be deeply appreciated.

I had a similar issue and discovered that one of the columns in my dataframe was entirely missing- hence the out of bounds error.

Plot results from describe.by{psych} output [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have the results from a describe.by{psych} applied on a dataframe. The results is a list.
List of 1000
$ 1 :Classes ‘psych’, ‘describe’ and 'data.frame': 20 obs. of 13 variables:
..$ var : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
..$ n : num [1:20] 24 24 24 24 24 24 24 24 24 24 ...
..$ mean : num [1:20] 24 30.8 24 31.6 240 ...
..$ sd : num [1:20] 0.937 3.667 0.937 3.537 9.367 ...
..$ median : num [1:20] 23.9 31 23.9 31.9 238.6 ...
..$ trimmed : num [1:20] 24 30.9 24 31.7 239.7 ...
..$ mad : num [1:20] 1.11 4.12 1.11 3.29 11.09 ...
..$ min : num [1:20] 22.6 24 22.6 25.3 225.9 ...
..$ max : num [1:20] 25.6 36.9 25.6 36.9 256 ...
..$ range : num [1:20] 3 12.9 3 11.6 30 ...
..$ skew : num [1:20] 0.309 -0.258 0.309 -0.411 0.309 ...
..$ kurtosis: num [1:20] -1.163 -0.898 -1.163 -0.819 -1.163 ...
..$ se : num [1:20] 0.191 0.749 0.191 0.722 1.912 ...
$ 2 :Classes ‘psych’, ‘describe’ and 'data.frame': 20 obs. of 13 variables:
..$ var : int [1:20] 1 2 3 4 5 6 7 8 9 10 ...
..$ n : num [1:20] 7 7 7 7 7 7 7 7 7 7 ...
..$ mean : num [1:20] 16.3 39.3 16.3 40.7 162.9 ...
..$ sd : num [1:20] 0.609 8.045 0.609 8.394 6.086 ...
..$ median : num [1:20] 16.4 39.1 16.4 39.6 164.2 ...
..$ trimmed : num [1:20] 16.3 39.3 16.3 40.7 162.9 ...
I would like to plot a graph ( probably candlestick) or boxplots with this sample for each of the 13 metrics. Is there a package in which I can directly leverage the summary stats computed ?

You question is vague.
describeBy( describe.by is deprecated) , Report basic summary statistics by a grouping variable.
So I guess that a boxplot it is the nearest plot.
For example :
describeBy(sat.act,sat.act$gender)
group: 1
var n mean sd median trimmed mad min max range skew kurtosis se
gender 1 247 1.00 0.00 1 1.00 0.00 1 1 0 NaN NaN 0.00
education 2 247 3.00 1.54 3 3.12 1.48 0 5 5 -0.54 -0.60 0.10
age 3 247 25.86 9.74 22 24.23 5.93 14 58 44 1.43 1.43 0.62
ACT 4 247 28.79 5.06 30 29.23 4.45 3 36 33 -1.06 1.89 0.32
SATV 5 247 615.11 114.16 630 622.07 118.61 200 800 600 -0.63 0.13 7.26
SATQ 6 245 635.87 116.02 660 645.53 94.89 300 800 500 -0.72 -0.12 7.41
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
group: 2
var n mean sd median trimmed mad min max range skew kurtosis se
gender 1 453 2.00 0.00 2 2.00 0.00 2 2 0 NaN NaN 0.00
education 2 453 3.26 1.35 3 3.40 1.48 0 5 5 -0.74 0.27 0.06
age 3 453 25.45 9.37 22 23.70 5.93 13 65 52 1.77 3.03 0.44
ACT 4 453 28.42 4.69 29 28.63 4.45 15 36 21 -0.39 -0.42 0.22
SATV 5 453 610.66 112.31 620 617.91 103.78 200 800 600 -0.65 0.42 5.28
SATQ 6 442 596.00 113.07 600 602.21 133.43 200 800 600 -0.58 0.13 5.38
>
You can plot this like :
boxplot(sat.act,sat.act$gender, col ='pink')

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R- get a single column from many columns - r

Related

Problem trying to convert a characterfile in R -- I can't seem to get as.numeric() to work right

Subset DataFrame by one column then value in another column [duplicate]

ggbiplot graphical display in groups

Error using IRMI imputation

Plot results from describe.by{psych} output [closed]

Categories

Resources