How to make data frame from two vectors in R? - r

I have two vectors here. One is all the data about population for various countries:
## [1] "China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47"
## [2] "India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70"
## [3] "United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25"
## [4] "Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51"
## [5] "Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83"
## [6] "Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73"
## [7] "Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64"
## [8] "Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11"
## [9] "Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 "
## [10] "Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00"
## [11] "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00"
The other vector is all the column names in the exact order corresponding to the country name and those numbers above:
## [1] "Country(ordependency)" "Population(2020)" "YearlyChange"
## [4] "NetChange" "Density(P/Km²)" "LandArea(Km²)"
## [7] "Migrants(net)" "Fert.Rate" "Med.Age"
## [10] "UrbanPop%" "WorldShare"
How do I make a dataframe that match the column names corresponding to the its data such like this:
head(population)
Country (or dependency) Population (2020) Yearly Change Net Change Density (P/Km²) ......
1 China 1439323776 0.39 5540090 ... ....
2 India 1380004385 0.99 13586631 .......
3 United States 331002651 0.59 1937734 .......
4 Indonesia 273523615 1.07 2898047 .......
5 Pakistan 220892340 2.00 4327022 .......
Note: For the last two countries Tokelau and Holy See there are no "Migrants(net)" data.
TIA!
EDIT:
Some more samples are here:
## [53] "Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51 0.34"
## [86] "Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74 0.14"
## [93] "United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86 0.13"
## [98] "Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13 0.11"
## [135] "Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52 0.04"
## [230] "Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
UPDATES:
Here is the problem:
tail(population)
## Country(ordependency) Population(2020) YearlyChange NetChange
## 230 Saint Pierre & Miquelon 5794 -0.48 -28
## 231 Montserrat 4992 0.06 3
## 232 Falkland Islands 3480 3.05 103
## 233 Niue 1626 0.68 11
## 234 Tokelau 1357 1.27 17
## 235 Holy See 801 0.25 2
## Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate **Med.Age** **UrbanPop%**
## 230 25 230 N.A. N.A. 100 0.00
## 231 50 100 N.A. N.A. 10 0.00
## 232 0 12170 N.A. N.A. 66 0.00
## 233 6 260 N.A. N.A. 46 0.00
## 234 136 10 N.A. N.A. 0 0.00
## 235 2003 0 N.A. N.A. N.A. 0.00
## **WorldShare**
## 230 NA
## 231 NA
## 232 NA
## 233 NA
## 234 NA
## 235 NA
All the rows with 10 variables instead of 11 are here:
## [202] "Isle of Man 85033 0.53 449 149 570 N.A. N.A. 53 0.00"
## [203] "Andorra 77265 0.16 123 164 470 N.A. N.A. 88 0.00"
## [204] "Dominica 71986 0.25 178 96 750 N.A. N.A. 74 0.00"
## [205] "Cayman Islands 65722 1.19 774 274 240 N.A. N.A. 97 0.00"
## [206] "Bermuda 62278 -0.36 -228 1246 50 N.A. N.A. 97 0.00"
## [207] "Marshall Islands 59190 0.68 399 329 180 N.A. N.A. 70 0.00"
## [208] "Northern Mariana Islands 57559 0.60 343 125 460 N.A. N.A. 88 0.00"
## [209] "Greenland 56770 0.17 98 0 410450 N.A. N.A. 87 0.00"
## [210] "American Samoa 55191 -0.22 -121 276 200 N.A. N.A. 88 0.00"
## [211] "Saint Kitts & Nevis 53199 0.71 376 205 260 N.A. N.A. 33 0.00"
## [212] "Faeroe Islands 48863 0.38 185 35 1396 N.A. N.A. 43 0.00"
## [213] "Sint Maarten 42876 1.15 488 1261 34 N.A. N.A. 96 0.00"
## [214] "Monaco 39242 0.71 278 26337 1 N.A. N.A. N.A. 0.00"
## [215] "Turks and Caicos 38717 1.38 526 41 950 N.A. N.A. 89 0.00"
## [216] "Saint Martin 38666 1.75 664 730 53 N.A. N.A. 0 0.00"
## [217] "Liechtenstein 38128 0.29 109 238 160 N.A. N.A. 15 0.00"
## [218] "San Marino 33931 0.21 71 566 60 N.A. N.A. 97 0.00"
## [219] "Gibraltar 33691 -0.03 -10 3369 10 N.A. N.A. N.A. 0.00"
## [220] "British Virgin Islands 30231 0.67 201 202 150 N.A. N.A. 52 0.00"
## [221] "Caribbean Netherlands 26223 0.94 244 80 328 N.A. N.A. 75 0.00"
## [222] "Palau 18094 0.48 86 39 460 N.A. N.A. N.A. 0.00"
## [223] "Cook Islands 17564 0.09 16 73 240 N.A. N.A. 75 0.00"
## [224] "Anguilla 15003 0.90 134 167 90 N.A. N.A. N.A. 0.00"
## [225] "Tuvalu 11792 1.25 146 393 30 N.A. N.A. 62 0.00"
## [226] "Wallis & Futuna 11239 -1.69 -193 80 140 N.A. N.A. 0 0.00"
## [227] "Nauru 10824 0.63 68 541 20 N.A. N.A. N.A. 0.00"
## [228] "Saint Barthelemy 9877 0.30 30 470 21 N.A. N.A. 0 0.00"
## [229] "Saint Helena 6077 0.30 18 16 390 N.A. N.A. 27 0.00"
## [230] "Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
## [231] "Montserrat 4992 0.06 3 50 100 N.A. N.A. 10 0.00"
## [232] "Falkland Islands 3480 3.05 103 0 12170 N.A. N.A. 66 0.00"
## [233] "Niue 1626 0.68 11 6 260 N.A. N.A. 46 0.00"
## [234] "Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00"
## [235] "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00"

It would be easier to read with read.table with delimiter space. But, there is an issue with space as the 'Country' may have multiple words and this should be read as a single column. In order to do that, we can insert single quotes as boundary for the Country using sub and then read with read.table while specifying the col.names as 'v2'
df1 <- read.table(text = sub("^([^0-9]+)\\s", ' "\\1"', v1),
header = FALSE, col.names = v2, fill = TRUE, check.names = FALSE)
-output
df1
Country(ordependency) Population(2020) YearlyChange NetChange Density(P/Km²) LandArea(Km²) Migrants(net) Fert.Rate Med.Age UrbanPop%
1 China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61
2 India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35
3 United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83
4 Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56
5 Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35
6 Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88
7 Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52
8 Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39
9 Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74
10 Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0
11 Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0
12 Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51
13 Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74
14 United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86
15 Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13
16 Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52
17 Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0
WorldShare
1 18.47
2 17.70
3 4.25
4 3.51
5 2.83
6 2.73
7 2.64
8 2.11
9 1.87
10 NA
11 NA
12 0.34
13 0.14
14 0.13
15 0.11
16 0.04
17 NA
For those cases where the count is less, we can update the column values by shifting the columns values with row/column indexing
library(stringr)
cnt <- str_count(sub("^([^0-9]+)\\s", '', v1), "\\s+") + 2
i1 <- cnt == 10
df1[i1, 10:11] <- df1[i1, 9:10]
df1[i1, 9] <- NA
data
v1 <- c("China 1439323776 0.39 5540090 153 9388211 -348399 1.7 38 61 18.47",
"India 1380004385 0.99 13586631 464 2973190 -532687 2.2 28 35 17.70",
"United States 331002651 0.59 1937734 36 9147420 954806 1.8 38 83 4.25",
"Indonesia 273523615 1.07 2898047 151 1811570 -98955 2.3 30 56 3.51",
"Pakistan 220892340 2.00 4327022 287 770880 -233379 3.6 23 35 2.83",
"Brazil 212559417 0.72 1509890 25 8358140 21200 1.7 33 88 2.73",
"Nigeria 206139589 2.58 5175990 226 910770 -60000 5.4 18 52 2.64",
"Bangladesh 164689383 1.01 1643222 1265 130170 -369501 2.1 28 39 2.11",
"Russia 145934462 0.04 62206 9 16376870 182456 1.8 40 74 1.87 ",
"Tokelau 1357 1.27 17 136 10 N.A. N.A. 0 0.00", "Holy See 801 0.25 2 2003 0 N.A. N.A. N.A. 0.00",
"Côte d'Ivoire 26378274 2.57 661730 83 318000 -8000 4.7 19 51 0.34",
"Czech Republic (Czechia) 10708981 0.18 19772 139 77240 22011 1.6 43 74 0.14",
"United Arab Emirates 9890402 1.23 119873 118 83600 40000 1.4 33 86 0.13",
"Papua New Guinea 8947024 1.95 170915 20 452860 -800 3.6 22 13 0.11",
"Bosnia and Herzegovina 3280819 -0.61 -20181 64 51000 -21585 1.3 43 52 0.04",
"Saint Pierre & Miquelon 5794 -0.48 -28 25 230 N.A. N.A. 100 0.00"
)
v2 <- c("Country(ordependency)", "Population(2020)", "YearlyChange",
"NetChange", "Density(P/Km²)", "LandArea(Km²)", "Migrants(net)",
"Fert.Rate", "Med.Age", "UrbanPop%", "WorldShare")

I am not sure what you mean but you could try:
df <- do.call(rbind.data.frame, vector1)
colnames(df) <- vector2

Related

wanted to extract table info from the webpage

i just wanted to scrape the Countries in the world by population table from below table
"https://www.worldometers.info/world-population/population-by-country/"
sample data
You can grab the desired table data using pandas
import pandas as pd
import requests
headers={"User-Agent":"mozilla/5.0"}
url='https://www.worldometers.info/world-population/population-by-country/'
red=requests.get(url,headers=headers).text
df = pd.read_html(red)[0]
print(df)
Output:
0 1 China 1439323776 ... 38 61 % 18.47
%
1 2 India 1380004385 ... 28 35 % 17.70
%
2 3 United States 331002651 ... 38 83 % 4.25
%
3 4 Indonesia 273523615 ... 30 56 % 3.51
%
4 5 Pakistan 220892340 ... 23 35 % 2.83
%
.. ... ... ... ... ... ... ...
230 231 Montserrat 4992 ... N.A. 10 % 0.00
%
231 232 Falkland Islands 3480 ... N.A. 66 % 0.00
%
232 233 Niue 1626 ... N.A. 46 % 0.00
%
233 234 Tokelau 1357 ... N.A. 0 % 0.00
%
234 235 Holy See 801 ... N.A. N.A. 0.00
%
[235 rows x 12 columns]

How to round data values into new column and group in R

I am working with temperature (Kelvin) and incident data and have created new columns with Celsius conversion and I would like to round the celsius values to the nearest whole number and also group in groups of 4 numbers. Such as 29.15 celsius is rounded to 29 and grouped in bins(?) of every 4 numbers. The groups would start at zero and contain 4 digits, e.g. 0-3, 4-7, 8-11, 12-15 etc. Sorry I am trying to think of better words to use but I am quite new to R. How would I round and group this way? Below is the code I have used so far and the result, but it isn't rounding or grouping as I need. Thanks so much!
tempDF <- data.frame(Kelvin = seq(240,320)) %>% #define an empty dat frame with temperatures going from 240 - 320 Kelvin
mutate(Celsius = Kelvin - 273.15) %>%
merge(New_AllTime_Temp, by.x = "Kelvin", by.y = "Temp", all.x = TRUE) %>% #Merge New_AllTime_Temp into the empty data frame, mapping each temperature to the data frame
merge(New_Incident_Temp, by.x = "Kelvin", by.y = "temp", all.x = TRUE) %>% #Merge New_Incident_Temp into the empty datframe, keeping temperature mapping
replace(is.na(.), 0) %>% ## Replace NA values with zeroes
mutate(norm_counnt = scales::rescale(counnt, to=c(0,1))) %>%
mutate(norm_incident = scales::rescale(incidents, to=c(0,1))) %>%
mutate(diffs = norm_incident - norm_counnt) %>%
mutate(rounded = round(Celsius, -2:4))```
"Kelvin" "Celsius" "counnt" "incidents" "norm_counnt" "norm_incident" "diffs" "rounded"
"1" 240 -33.15 0 0 0 0 0 0
"2" 241 -32.15 0 0 0 0 0 -30
"3" 242 -31.15 0 0 0 0 0 -31
"4" 243 -30.15 3 0 0.00146056475170399 0 -0.00146056475170399 -30.1
"5" 244 -29.15 9 0 0.00438169425511198 0 -0.00438169425511198 -29.15
"6" 245 -28.15 7 0 0.00340798442064265 0 -0.00340798442064265 -28.15
"7" 246 -27.15 11 1 0.0053554040895813 0.0196078431372549 0.0142524390476736 -27.15
"8" 247 -26.15 15 0 0.00730282375851996 0 -0.00730282375851996 0
"9" 248 -25.15 22 1 0.0107108081791626 0.0196078431372549 0.00889703495809229 -30
"10" 249 -24.15 11 1 0.0053554040895813 0.0196078431372549 0.0142524390476736 -24
"11" 250 -23.15 32 0 0.0155793573515093 0 -0.0155793573515093 -23.1
"12" 251 -22.15 33 0 0.0160662122687439 0 -0.0160662122687439 -22.15
"13" 252 -21.15 47 0 0.0228821811100292 0 -0.0228821811100292 -21.15
"14" 253 -20.15 107 1 0.0520934761441091 0.0196078431372549 -0.0324856330068542 -20.15
"15" 254 -19.15 117 0 0.0569620253164557 0 -0.0569620253164557 0
"16" 255 -18.15 162 2 0.0788704965920156 0.0392156862745098 -0.0396548103175058 -20
"17" 256 -17.15 221 4 0.107594936708861 0.0784313725490196 -0.0291635641598412 -17
"18" 257 -16.15 258 2 0.125608568646543 0.0392156862745098 -0.0863928823720335 -16.1
"19" 258 -15.15 272 3 0.132424537487829 0.0588235294117647 -0.0736010080760639 -15.15
"20" 259 -14.15 314 4 0.152872444011685 0.0784313725490196 -0.0744410714626649 -14.15
"21" 260 -13.15 409 4 0.199123661148978 0.0784313725490196 -0.120692288599958 -13.15
"22" 261 -12.15 478 11 0.232716650438169 0.215686274509804 -0.0170303759283655 0
"23" 262 -11.15 523 13 0.254625121713729 0.254901960784314 0.0002768390705844 -10
"24" 263 -10.15 574 8 0.279454722492697 0.156862745098039 -0.122591977394658 -10
"25" 264 -9.14999999999998 793 9 0.386075949367089 0.176470588235294 -0.209605361131794 -9.1
"26" 265 -8.14999999999998 924 14 0.44985394352483 0.274509803921569 -0.175344139603261 -8.15
"27" 266 -7.14999999999998 1108 18 0.539435248296008 0.352941176470588 -0.186494071825419 -7.15
"28" 267 -6.14999999999998 1082 17 0.526777020447907 0.333333333333333 -0.193443687114573 -6.15
"29" 268 -5.14999999999998 1198 15 0.583252190847128 0.294117647058824 -0.289134543788304 0
"30" 269 -4.14999999999998 1233 13 0.600292112950341 0.254901960784314 -0.345390152166027 0
"31" 270 -3.14999999999998 1244 17 0.605647517039922 0.333333333333333 -0.272314183706589 -3
"32" 271 -2.14999999999998 1496 32 0.728334956183057 0.627450980392157 -0.100883975790901 -2.1
"33" 272 -1.14999999999998 1565 25 0.761927945472249 0.490196078431373 -0.271731867040877 -1.15
"34" 273 -0.149999999999977 1870 35 0.910418695228822 0.686274509803922 -0.2241441854249 -0.15
"35" 274 0.850000000000023 2054 31 1 0.607843137254902 -0.392156862745098 0.85
"36" 275 1.85000000000002 2034 29 0.990262901655307 0.568627450980392 -0.421635450674915 0
"37" 276 2.85000000000002 1974 33 0.961051606621227 0.647058823529412 -0.313992783091815 0
"38" 277 3.85000000000002 1966 32 0.95715676728335 0.627450980392157 -0.329705786891193 4
"39" 278 4.85000000000002 2040 51 0.993184031158715 1 0.00681596884128532 4.9
"40" 279 5.85000000000002 1949 29 0.94888023369036 0.568627450980392 -0.380252782709968 5.85
"41" 280 6.85000000000002 2053 40 0.999513145082765 0.784313725490196 -0.215199419592569 6.85
"42" 281 7.85000000000002 1987 34 0.967380720545277 0.666666666666667 -0.300714053878611 7.85
"43" 282 8.85000000000002 1959 40 0.953748782862707 0.784313725490196 -0.169435057372511 0
"44" 283 9.85000000000002 1770 32 0.861733203505355 0.627450980392157 -0.234282223113199 10
"45" 284 10.85 1816 27 0.88412852969815 0.529411764705882 -0.354716764992268 11
"46" 285 11.85 1859 39 0.905063291139241 0.764705882352941 -0.140357408786299 11.9
"47" 286 12.85 2029 35 0.987828627069133 0.686274509803922 -0.301554117265212 12.85
"48" 287 13.85 1926 33 0.937682570593963 0.647058823529412 -0.290623747064551 13.85
"49" 288 14.85 1848 43 0.899707887049659 0.843137254901961 -0.0565706321476984 14.85
"50" 289 15.85 1823 33 0.887536514118793 0.647058823529412 -0.240477690589381 0
"51" 290 16.85 1662 24 0.809152872444012 0.470588235294118 -0.338564637149894 20
"52" 291 17.85 1578 31 0.7682570593963 0.607843137254902 -0.160413922141398 18
"53" 292 18.85 1425 12 0.693768257059396 0.235294117647059 -0.458474139412337 18.9
"54" 293 19.85 1318 17 0.641674780915287 0.333333333333333 -0.308341447581954 19.85
"55" 294 20.85 1204 19 0.586173320350535 0.372549019607843 -0.213624300742692 20.85
"56" 295 21.85 1029 18 0.500973709834469 0.352941176470588 -0.148032533363881 21.85
"57" 296 22.85 876 12 0.426484907497566 0.235294117647059 -0.191190789850507 0
"58" 297 23.85 735 13 0.357838364167478 0.254901960784314 -0.102936403383164 20
"59" 298 24.85 623 5 0.303310613437196 0.0980392156862745 -0.205271397750921 25
"60" 299 25.85 571 7 0.277994157740993 0.137254901960784 -0.140739255780209 25.9
"61" 300 26.85 512 5 0.249269717624148 0.0980392156862745 -0.151230501937874 26.85
"62" 301 27.85 417 5 0.203018500486855 0.0980392156862745 -0.10497928480058 27.85
"63" 302 28.85 345 14 0.167964946445959 0.274509803921569 0.10654485747561 28.85
"64" 303 29.85 294 6 0.143135345666991 0.117647058823529 -0.0254882868434618 0
"65" 304 30.85 253 3 0.12317429406037 0.0588235294117647 -0.0643507646486053 30
"66" 305 31.85 198 3 0.0963972736124635 0.0588235294117647 -0.0375737442006988 32
"67" 306 32.85 128 2 0.062317429406037 0.0392156862745098 -0.0231017431315272 32.9
"68" 307 33.85 88 2 0.0428432327166504 0.0392156862745098 -0.00362754644214063 33.85
"69" 308 34.85 64 1 0.0311587147030185 0.0196078431372549 -0.0115508715657636 34.85
"70" 309 35.85 48 0 0.0233690360272639 0 -0.0233690360272639 35.85
"71" 310 36.85 20 0 0.00973709834469328 0 -0.00973709834469328 0
"72" 311 37.85 16 0 0.00778967867575463 0 -0.00778967867575463 40
"73" 312 38.85 7 0 0.00340798442064265 0 -0.00340798442064265 39
"74" 313 39.85 1 0 0.000486854917234664 0 -0.000486854917234664 39.9
"75" 314 40.85 0 0 0 0 0 40.85
"76" 315 41.85 0 0 0 0 0 41.85
"77" 316 42.85 0 0 0 0 0 42.85
"78" 317 43.85 0 0 0 0 0 0
"79" 318 44.85 0 0 0 0 0 40
"80" 319 45.85 0 0 0 0 0 46
"81" 320 46.85 0 0 0 0 0 46.9
Rounding can be done via the aptly named round function.
The cut function is made for continuous data, so instead of a group ranging from 0 to 3 and a different one from 4 to 7 we can just cut the continuum of real numbers at -.5, 3.5, 7.5, 11.5, ...
library(magrittr)
unrounded <- c(-12.6, -12.4, -.01, +.01, 12.4, 12.6)
rounded <- unrounded %>% round(digits = 0)
values <- c(1, 2, 4, 7, 10 ,20)
group <- values %>% cut(breaks = seq(-.5, 1000, 4))
It wasn't clear to me what you want to do with values less than zero but here's a tidyverse solution...
library(dplyr)
tempDF <- data.frame(Kelvin = seq(240,320)) %>%
mutate(Celsius = Kelvin - 273.15) %>%
mutate(Celsius_rounded = round(Celsius)) %>%
mutate(Celsius_groups = cut(Celsius_rounded, breaks = seq(-.5, 1000, 4)))
tempDF
#> Kelvin Celsius Celsius_rounded Celsius_groups
#> 1 240 -33.15 -33 <NA>
#> 2 241 -32.15 -32 <NA>
#> 3 242 -31.15 -31 <NA>
#> 4 243 -30.15 -30 <NA>
#> 5 244 -29.15 -29 <NA>
#> 6 245 -28.15 -28 <NA>
#> 7 246 -27.15 -27 <NA>
#> 8 247 -26.15 -26 <NA>
#> 9 248 -25.15 -25 <NA>
#> 10 249 -24.15 -24 <NA>
#> 11 250 -23.15 -23 <NA>
#> 12 251 -22.15 -22 <NA>
#> 13 252 -21.15 -21 <NA>
#> 14 253 -20.15 -20 <NA>
#> 15 254 -19.15 -19 <NA>
#> 16 255 -18.15 -18 <NA>
#> 17 256 -17.15 -17 <NA>
#> 18 257 -16.15 -16 <NA>
#> 19 258 -15.15 -15 <NA>
#> 20 259 -14.15 -14 <NA>
#> 21 260 -13.15 -13 <NA>
#> 22 261 -12.15 -12 <NA>
#> 23 262 -11.15 -11 <NA>
#> 24 263 -10.15 -10 <NA>
#> 25 264 -9.15 -9 <NA>
#> 26 265 -8.15 -8 <NA>
#> 27 266 -7.15 -7 <NA>
#> 28 267 -6.15 -6 <NA>
#> 29 268 -5.15 -5 <NA>
#> 30 269 -4.15 -4 <NA>
#> 31 270 -3.15 -3 <NA>
#> 32 271 -2.15 -2 <NA>
#> 33 272 -1.15 -1 <NA>
#> 34 273 -0.15 0 (-0.5,3.5]
#> 35 274 0.85 1 (-0.5,3.5]
#> 36 275 1.85 2 (-0.5,3.5]
#> 37 276 2.85 3 (-0.5,3.5]
#> 38 277 3.85 4 (3.5,7.5]
#> 39 278 4.85 5 (3.5,7.5]
#> 40 279 5.85 6 (3.5,7.5]
#> 41 280 6.85 7 (3.5,7.5]
#> 42 281 7.85 8 (7.5,11.5]
#> 43 282 8.85 9 (7.5,11.5]
#> 44 283 9.85 10 (7.5,11.5]
#> 45 284 10.85 11 (7.5,11.5]
#> 46 285 11.85 12 (11.5,15.5]
#> 47 286 12.85 13 (11.5,15.5]
#> 48 287 13.85 14 (11.5,15.5]
#> 49 288 14.85 15 (11.5,15.5]
#> 50 289 15.85 16 (15.5,19.5]
#> 51 290 16.85 17 (15.5,19.5]
#> 52 291 17.85 18 (15.5,19.5]
#> 53 292 18.85 19 (15.5,19.5]
#> 54 293 19.85 20 (19.5,23.5]
#> 55 294 20.85 21 (19.5,23.5]
#> 56 295 21.85 22 (19.5,23.5]
#> 57 296 22.85 23 (19.5,23.5]
#> 58 297 23.85 24 (23.5,27.5]
#> 59 298 24.85 25 (23.5,27.5]
#> 60 299 25.85 26 (23.5,27.5]
#> 61 300 26.85 27 (23.5,27.5]
#> 62 301 27.85 28 (27.5,31.5]
#> 63 302 28.85 29 (27.5,31.5]
#> 64 303 29.85 30 (27.5,31.5]
#> 65 304 30.85 31 (27.5,31.5]
#> 66 305 31.85 32 (31.5,35.5]
#> 67 306 32.85 33 (31.5,35.5]
#> 68 307 33.85 34 (31.5,35.5]
#> 69 308 34.85 35 (31.5,35.5]
#> 70 309 35.85 36 (35.5,39.5]
#> 71 310 36.85 37 (35.5,39.5]
#> 72 311 37.85 38 (35.5,39.5]
#> 73 312 38.85 39 (35.5,39.5]
#> 74 313 39.85 40 (39.5,43.5]
#> 75 314 40.85 41 (39.5,43.5]
#> 76 315 41.85 42 (39.5,43.5]
#> 77 316 42.85 43 (39.5,43.5]
#> 78 317 43.85 44 (43.5,47.5]
#> 79 318 44.85 45 (43.5,47.5]
#> 80 319 45.85 46 (43.5,47.5]
#> 81 320 46.85 47 (43.5,47.5]

Split one column into multiple based on spaces in r

How can I split one column in multiple columns in R using spaces as separators?
I tried to find an answer for few hours (even days) but now I count on you guys to help me!
This is how my data set looks like and it's all in one column, I don't really care about the column names as in the end I will only need a few of them for my analysis:
[1] 1000.0 246
[2] 970.0 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7
[3] 909.0 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3
[4] 900.0 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2
[5] 879.0 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0
[6] 850.0 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6
Also, I tried the separate function and it give me an error telling me that this is not possible for a function class object.
Thanks a lot for your help!
It's always easier to help if there is minimal reproducible example in the question. The data you show is not easily usable...
MRE:
data_vector <- c("1000.0 246",
"970.0 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7",
"909.0 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3",
"900.0 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2",
"879.0 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0",
"850.0 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6")
And here is a solution using gsub and read.csv:
oo <- read.csv(text=gsub(" +", " ", paste0(data_vector, collapse="\n")), sep=" ", header=FALSE)
Which produces this output:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
1 1000 246 NA NA NA NA NA NA NA NA NA
2 970 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7
3 909 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3
4 900 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2
5 879 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0
6 850 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6
The read.table/read.csv would work if we pass it as a character vector
read.table(text = data_vector, header = FALSE, fill = TRUE)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
#1 1000 246 NA NA NA NA NA NA NA NA NA
#2 970 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7
#3 909 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3
#4 900 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2
#5 879 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0
#6 850 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6
data
data_vector <- c("1000.0 246",
"970.0 491 -3.3 -5.0 88 2.73 200 4 272.2 279.8 272.7",
"909.0 1002 -4.7 -6.6 87 2.58 200 12 275.9 283.2 276.3",
"900.0 1080 -5.5 -7.5 86 2.43 200 13 275.8 282.8 276.2",
"879.0 1264 -6.5 -8.8 84 2.25 200 16 276.7 283.1 277.0",
"850.0 1525 -6.5 -12.5 62 1.73 200 20 279.3 284.4 279.6")

Delete Several Lines in txt file with conditional in R

i got problem how to delete several lines in txt file then convert into csv with R because i just want to get the data from txt.
My code cant delete propely because it delete lines which contain the date of the data
Here the code i used
setwd("D:/tugasmaritim/")
FILES <- list.files( pattern = ".txt")
for (i in 1:length(FILES)) {
l <- readLines(FILES[i],skip=4)
l2 <- l[-sapply(grep("</PRE><H3>", l), function(x) seq(x, x + 30))]
l3 <- l2[-sapply(grep("<P>Description", l2), function(x) seq(x, x + 29))]
l4 <- l3[-sapply(grep("<HTML>", l3), function(x) seq(x, x + 3))]
write.csv(l4,row.names=FALSE,file=paste0("D:/tugasmaritim/",sub(".txt","",FILES[i]),".csv"))
}
my data looks like this
<HTML>
<TITLE>University of Wyoming - Radiosonde Data</TITLE>
<LINK REL="StyleSheet" HREF="/resources/select.css" TYPE="text/css">
<BODY BGCOLOR="white">
<H2>96749 WIII Jakarta Observations at 00Z 02 Oct 1995</H2>
<PRE>
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1011.0 8 23.2 22.5 96 17.30 0 0 295.4 345.3 298.5
1000.0 98 23.6 22.4 93 17.39 105 8 296.8 347.1 299.8
977.3 300 24.6 22.1 86 17.49 105 8 299.7 351.0 302.8
976.0 311 24.6 22.1 86 17.50 104 8 299.8 351.2 303.0
950.0 548 23.0 22.0 94 17.87 88 12 300.5 353.2 303.7
944.4 600 22.6 21.8 95 17.73 85 13 300.6 352.9 303.8
925.0 781 21.2 21.0 99 17.25 90 20 301.0 351.9 304.1
918.0 847 20.6 20.6 100 16.95 90 23 301.0 351.0 304.1
912.4 900 20.4 18.6 89 15.00 90 26 301.4 345.7 304.1
897.0 1047 20.0 13.0 64 10.60 90 26 302.4 334.1 304.3
881.2 1200 19.4 11.4 60 9.70 90 26 303.3 332.5 305.1
850.0 1510 18.2 8.2 52 8.09 95 18 305.2 329.9 306.7
845.0 1560 18.0 7.0 49 7.49 91 17 305.5 328.4 306.9
810.0 1920 15.0 9.0 67 8.97 60 11 306.0 333.4 307.7
792.9 2100 14.3 3.1 47 6.06 45 8 307.1 325.9 308.2
765.1 2400 13.1 -6.8 24 3.01 40 8 309.0 318.7 309.5
746.0 2612 12.2 -13.8 15 1.77 38 10 310.3 316.2 310.6
712.0 3000 10.3 -15.0 15 1.69 35 13 312.3 318.1 312.6
700.0 3141 9.6 -15.4 16 1.66 35 13 313.1 318.7 313.4
653.0 3714 6.6 -16.4 18 1.63 32 12 316.0 321.6 316.3
631.0 3995 4.8 -2.2 60 5.19 31 11 317.0 333.9 318.0
615.3 4200 3.1 -3.9 60 4.70 30 11 317.4 332.8 318.3
601.0 4391 1.6 -5.4 60 4.28 20 8 317.8 331.9 318.6
592.9 4500 0.6 -12.0 38 2.59 15 6 317.9 326.6 318.4
588.0 4567 0.0 -16.0 29 1.88 11 6 317.9 324.4 318.3
571.0 4800 -1.2 -18.9 25 1.51 355 5 319.1 324.4 319.4
549.8 5100 -2.8 -22.8 20 1.12 45 6 320.7 324.8 321.0
513.0 5649 -5.7 -29.7 13 0.64 125 10 323.6 326.0 323.8
500.0 5850 -5.1 -30.1 12 0.63 155 11 326.8 329.1 326.9
494.0 5945 -4.9 -29.9 12 0.65 146 11 328.1 330.6 328.3
471.7 6300 -7.4 -32.0 12 0.56 110 13 329.3 331.5 329.4
453.7 6600 -9.6 -33.8 12 0.49 100 14 330.3 332.2 330.4
400.0 7570 -16.5 -39.5 12 0.31 105 14 333.5 334.7 333.5
398.0 7607 -16.9 -39.9 12 0.30 104 14 333.4 334.6 333.5
371.9 8100 -20.4 -42.6 12 0.24 95 16 335.4 336.3 335.4
300.0 9660 -31.3 -51.3 12 0.11 115 18 341.1 341.6 341.2
269.0 10420 -36.3 -55.3 12 0.08 79 20 344.7 345.0 344.7
265.9 10500 -36.9 75 20 344.9 344.9
250.0 10920 -40.3 80 28 346.0 346.0
243.4 11100 -41.8 85 37 346.4 346.4
222.5 11700 -46.9 75 14 347.6 347.6
214.0 11960 -49.1 68 16 348.1 348.1
200.0 12400 -52.7 55 20 349.1 349.1
156.0 13953 -66.1 55 25 352.1 352.1
152.3 14100 -67.2 55 26 352.6 352.6
150.0 14190 -67.9 55 26 352.9 352.9
144.7 14400 -69.6 60 26 353.6 353.6
137.5 14700 -72.0 60 39 354.6 354.6
130.7 15000 -74.3 50 28 355.6 355.6
124.2 15300 -76.7 40 36 356.5 356.5
118.0 15600 -79.1 50 48 357.4 357.4
116.0 15698 -79.9 45 44 357.6 357.6
112.0 15900 -79.1 45 26 362.6 362.6
106.3 16200 -78.0 35 24 370.2 370.2
100.0 16550 -76.7 35 24 379.3 379.3
</PRE><H3>Station information and sounding indices</H3><PRE>
Station identifier: WIII
Station number: 96749
Observation time: 951002/0000
Station latitude: -6.11
Station longitude: 106.65
Station elevation: 8.0
Showalter index: 6.30
Lifted index: -1.91
LIFT computed using virtual temperature: -2.80
SWEAT index: 145.41
K index: 6.50
Cross totals index: 13.30
Vertical totals index: 23.30
Totals totals index: 36.60
Convective Available Potential Energy: 799.02
CAPE using virtual temperature: 1070.13
Convective Inhibition: -26.70
CINS using virtual temperature: -12.88
Equilibrum Level: 202.64
Equilibrum Level using virtual temperature: 202.60
Level of Free Convection: 828.70
LFCT using virtual temperature: 909.19
Bulk Richardson Number: 210.78
Bulk Richardson Number using CAPV: 282.30
Temp [K] of the Lifted Condensation Level: 294.96
Pres [hPa] of the Lifted Condensation Level: 958.67
Mean mixed layer potential temperature: 298.56
Mean mixed layer mixing ratio: 17.50
1000 hPa to 500 hPa thickness: 5752.00
Precipitable water [mm] for entire sounding: 36.31
</PRE>
<H2>96749 WIII Jakarta Observations at 00Z 03 Oct 1995</H2>
<PRE>
-----------------------------------------------------------------------------
PRES HGHT TEMP DWPT RELH MIXR DRCT SKNT THTA THTE THTV
hPa m C C % g/kg deg knot K K K
-----------------------------------------------------------------------------
1012.0 8 23.6 22.9 96 17.72 140 2 295.7 346.9 298.9
1000.0 107 24.0 21.6 86 16.54 135 3 297.1 345.2 300.1
990.0 195 24.4 20.3 78 15.39 128 4 298.4 343.4 301.2
945.4 600 22.9 20.2 85 16.00 95 7 300.9 348.0 303.7
925.0 791 22.2 20.1 88 16.29 100 6 302.0 350.3 304.9
913.5 900 21.9 18.2 80 14.63 105 6 302.8 346.3 305.4
911.0 924 21.8 17.8 78 14.28 108 6 302.9 345.4 305.5
850.0 1522 17.4 16.7 96 14.28 175 6 304.4 347.1 307.0
836.0 1665 16.4 16.4 100 14.24 157 7 304.8 347.5 307.4
811.0 1925 15.0 14.7 98 13.14 123 8 305.9 345.6 308.3
795.0 2095 14.2 7.2 63 8.08 101 9 306.8 331.6 308.3
794.5 2100 14.2 7.2 63 8.05 100 9 306.8 331.5 308.3
745.0 2642 10.4 2.4 58 6.14 64 11 308.4 327.6 309.6
736.0 2744 11.0 0.0 47 5.23 57 11 310.2 326.7 311.1
713.8 3000 9.2 5.0 75 7.70 40 12 310.9 335.0 312.4
711.0 3033 9.0 5.6 79 8.08 40 12 311.0 336.2 312.6
700.0 3163 8.6 1.6 61 6.18 40 12 312.0 331.5 313.1
688.5 3300 8.3 -6.0 36 3.57 60 12 313.1 324.8 313.8
678.0 3427 8.0 -13.0 21 2.08 70 12 314.2 321.2 314.6
642.0 3874 5.0 -2.0 61 5.17 108 11 315.7 332.4 316.7
633.0 3989 4.4 -11.6 30 2.50 117 10 316.3 324.7 316.8
616.6 4200 3.1 -14.1 27 2.09 135 10 317.1 324.3 317.6
580.0 4694 0.0 -20.0 21 1.36 164 13 319.1 323.9 319.4
572.3 4800 -0.4 -20.7 20 1.29 170 14 319.9 324.5 320.1
510.8 5700 -4.0 -26.6 15 0.86 80 10 326.1 329.2 326.2
500.0 5870 -4.7 -27.7 15 0.79 80 10 327.2 330.2 327.4
497.0 5917 -4.9 -27.9 15 0.78 71 13 327.6 330.5 327.7
491.7 6000 -5.5 -28.3 15 0.76 55 19 327.9 330.7 328.0
473.0 6300 -7.6 -29.9 15 0.68 55 16 328.9 331.4 329.0
436.0 6930 -12.1 -33.1 16 0.54 77 17 330.9 333.0 331.0
400.0 7580 -17.9 -37.9 16 0.37 100 19 331.6 333.1 331.7
388.3 7800 -19.9 -39.9 15 0.31 105 20 331.8 333.1 331.9
386.0 7844 -20.3 -40.3 15 0.30 103 20 331.9 333.1 331.9
372.0 8117 -18.3 -38.3 16 0.38 91 23 338.1 339.6 338.1
343.6 8700 -22.1 -41.4 16 0.30 65 29 340.7 342.0 340.8
329.0 9018 -24.1 -43.1 16 0.26 73 27 342.2 343.2 342.2
300.0 9680 -29.9 -44.9 22 0.23 90 22 343.1 344.1 343.2
278.6 10200 -34.3 85 37 344.1 344.1
266.9 10500 -36.8 60 32 344.7 344.7
255.8 10800 -39.4 65 27 345.2 345.2
250.0 10960 -40.7 65 27 345.4 345.4
204.0 12300 -51.8 55 23 348.6 348.6
200.0 12430 -52.9 55 23 348.8 348.8
194.6 12600 -55.0 60 23 348.1 348.1
160.7 13800 -70.1 35 39 342.4 342.4
153.2 14100 -73.9 35 41 340.6 340.6
150.0 14230 -75.5 35 41 339.9 339.9
131.5 15000 -76.3 50 53 351.6 351.6
124.9 15300 -76.6 50 57 356.2 356.2
122.0 15436 -76.7 57 45 358.3 358.3
118.6 15600 -77.3 65 31 360.2 360.2
115.0 15779 -77.9 65 31 362.2 362.2
112.6 15900 -77.7 85 17 364.8 364.8
107.0 16200 -77.2 130 10 371.2 371.2
100.0 16590 -76.5 120 18 379.7 379.7
</PRE><H3>Station information and sounding indices</H3><PRE>
Station identifier: WIII
Station number: 96749
Observation time: 951003/0000
Station latitude: -6.11
Station longitude: 106.65
Station elevation: 8.0
Showalter index: -0.58
Lifted index: 0.17
LIFT computed using virtual temperature: -0.57
SWEAT index: 222.41
K index: 31.80
Cross totals index: 21.40
Vertical totals index: 22.10
Totals totals index: 43.50
Convective Available Potential Energy: 268.43
CAPE using virtual temperature: 431.38
Convective Inhibition: -84.04
CINS using virtual temperature: -81.56
Equilibrum Level: 141.42
Equilibrum Level using virtual temperature: 141.35
Level of Free Convection: 784.91
LFCT using virtual temperature: 804.89
Bulk Richardson Number: 221.19
Bulk Richardson Number using CAPV: 355.46
Temp [K] of the Lifted Condensation Level: 293.21
Pres [hPa] of the Lifted Condensation Level: 940.03
Mean mixed layer potential temperature: 298.46
Mean mixed layer mixing ratio: 16.01
1000 hPa to 500 hPa thickness: 5763.00
Precipitable water [mm] for entire sounding: 44.54
and here my data
data
and this is what i want to get
contoh

Trouble with character column from a file read in with read.csv in r

On the website:
http://naturalstattrick.com/teamtable.php?season=20172018&stype=2&sit=pp&score=all&rate=n&vs=all&loc=B&gpf=82&fd=2017-10-04&td=2018-04-07
the bottom of the page there is an option to download csv. I downloaded the csv file and renamed it Team Season Totals - Natural Stat Trick 2007-2008 5 vs 5 (Counts).csv. I also put the csv file in my directory.
I successfully read in the file using read.csv.
teams <- read.csv(file = "Team Season Totals - Natural Stat Trick 2007-2008 5 vs 5 (Counts).csv", stringsAsFactors = FALSE)
head(teams)
ï.. Team GP TOI W L OTL ROW CF CA CF. FF FA FF. SF SA SF. GF GA GF. SCF SCA SCF. SCGF SCGA SCGF. SCSH.
1 1 Atlanta Thrashers 82 3539.050 34 40 8 25 2638 3512 42.89 2002 2717 42.42 1505 2052 42.31 125 172 42.09 1195 1500 44.34 83 126 39.71 6.95
2 2 Pittsburgh Penguins 82 3435.417 47 27 8 40 2820 3380 45.48 2192 2542 46.30 1580 1812 46.58 142 122 53.79 1343 1374 49.43 112 90 55.45 8.34
3 3 Los Angeles Kings 82 3502.333 32 43 7 27 3008 3576 45.69 2306 2787 45.28 1649 1961 45.68 137 174 44.05 1049 1286 44.93 63 80 44.06 6.01
4 4 Montreal Canadiens 82 3475.183 47 25 10 42 3089 3601 46.17 2266 2603 46.54 1617 1863 46.47 144 138 51.06 1156 1221 48.63 62 61 50.41 5.36
5 5 Edmonton Oilers 82 3442.633 41 35 6 26 2958 3424 46.35 2255 2585 46.59 1601 1830 46.66 143 166 46.28 1334 1398 48.83 104 116 47.27 7.80
6 6 Philadelphia Flyers 82 3374.800 42 29 11 39 2902 3343 46.47 2188 2505 46.62 1609 1857 46.42 125 137 47.71 919 1028 47.20 61 68 47.29 6.64
SCSV. HDCF HDCA HDCF. HDGF HDGA HDGF. HDSH. HDSV. SH. SV. PDO
1 91.60 388 468 45.33 51 82 38.35 13.14 82.48 8.31 91.62 0.999
2 93.45 503 444 53.12 79 49 61.72 15.71 88.96 8.99 93.27 1.023
3 93.78 270 356 43.13 29 36 44.62 10.74 89.89 8.31 91.13 0.994
4 95.00 271 322 45.70 25 31 44.64 9.23 90.37 8.91 92.59 1.015
5 91.70 443 452 49.50 57 61 48.31 12.87 86.50 8.93 90.93 0.999
6 93.39 257 266 49.14 24 24 50.00 9.34 90.98 7.77 92.62 1.004
The one thing I noticed was the Team Column had a accent in it:
teams$Team
[1] "Atlanta Thrashers" "Pittsburgh Penguins" "Los Angeles Kings" "Montreal Canadiens" "Edmonton Oilers" "Philadelphia Flyers"
[7] "St Louis Blues" "Colorado Avalanche" "Vancouver Canucks" "Minnesota Wild" "Florida Panthers" "Phoenix Coyotes"
[13] "Tampa Bay Lightning" "Buffalo Sabres" "Chicago Blackhawks" "New York Islanders" "Nashville Predators" "Anaheim Ducks"
[19] "Boston Bruins" "Ottawa Senators" "Dallas Stars" "Toronto Maple Leafs" "Carolina Hurricanes" "Columbus Blue Jackets"
[25] "New Jersey Devils" "Calgary Flames" "San Jose Sharks" "New York Rangers" "Washington Capitals" "Detroit Red Wings"
Removing the accent:
teams$Team <- sub(pattern = "Â", replacement = "", teams$Team)
teams$Team[1]
[1] "Atlanta Thrashers"
Now when I want to subset the data based on Team, all the values come back FALSE:
teams$Team[1]
[1] "Atlanta Thrashers"
teams$Team[1] == "Atlanta Thrashers"
[1] FALSE
dplyr::filter(teams, Team == "Atlanta Thrashers")
[1] ï.. Team GP TOI W L OTL ROW CF CA CF. FF FA FF. SF SA SF. GF GA GF. SCF SCA SCF. SCGF SCGA
[26] SCGF. SCSH. SCSV. HDCF HDCA HDCF. HDGF HDGA HDGF. HDSH. HDSV. SH. SV. PDO
<0 rows> (or 0-length row.names)
It comes back FALSE for every team and I don't understand why? Something with the accent that I removed? Does it have to do something with encoding, i.e., utf-8? If someone could please assist me I would appreciate it. Thanks.
I figured it out. I had to do with the accent. I used:
iconv(teams$Team,, "UTF-8", "UTF-8",sub=' ')
iconv(teams$Team, "UTF-8", "UTF-8",sub=' ')[1] == "Atlanta Thrashers"
[1] TRUE
I never had that happen to me and have no experience with encoding and utf-8.

Resources