Kaggle notebook multiple commands in one cell? - jupyter-notebook

I'm trying to get the modes of a couple of columns by giving all the commands in one cell itself , rather than having to do it for each column in a separate cell , but the output gives only the result of the first column. How do I do it?
data['director'].mode()
data['cast'].mode()
data['country'].mode()
data['date_added'].mode()
data['rating'].mode()
Output:
0 TV-MA

Related

Why won't R access all of the values in a certain row?

I read a large (24,000 observations and 1900 variables) dataset into R using this command:
expression_data<-read.table("data_expression_median.txt", sep="\t", header=TRUE, fill=TRUE)
When I look at my data using view(expression_data) and when I pull a limited number of rows/columns out with expression_data[1:3,1:5], all of the data shows up correctly. Also, when I use the command expression_data[3, 1:5] it prints the column headers AND the actual values (which is the expected result):
Hugo_Symbol Entrez_Gene_Id MB.0362 MB.0346 MB.0386
3 CD049690 NA 5.453928 5.454185 5.501577
However, when I try to subset an entire row using expression_data[3,] or any other command to pull out an entire row, I only get the column headers:
Hugo_Symbol Entrez_Gene_Id MB.0362 MB.0346 MB.0386
MB.0574 MB.0503 MB.0641 MB.0201 MB.0218 MB.0316 MB.0189
MB.0891 MB.0658 MB.0899 MB.0605 MB.0258 MB.0506 MB.0420
MB.0223 MB.0445 MB.0199 MB.0517 MB.0155 MB.0428 MB.0117
Why is this? What am I doing wrong? I need to do operations on a row basis so I need to be able to access the data from entire rows.
R has printing limits and your data are very wide. expression_data[3,] has all the values and you can access them, they just won't be printed by default.
You can play with the print options, especially the max.print option to get it to print more in your console, but the R console is really the wrong tool to view thousands of columns of data.
If you're doing a lot of math on the rows of a data frame, you may consider converting to matrix for efficiency.

data frame accessing specific rows and col from csv file in R programming

I have csv file contains iphone device roadmap like version number, name of model, release of model , price etc. I have done following:
I have imported data set in Rstudio in variable name iphonedetail by following command. iphonedetail <-read.csv("iphodedata.csv")
Than i hv changed the attribute "name of model" to character by using following: iphonedetail$nameofmodel <- as.character(iphonedetail$nameofmodel)
Now i need to access 1st 5 name of model and store them in vector .
I tried this to achieve : iphonesubset <- data.frame(iphonedetail$nameofmodel)
Then on console i typed iphonesubset, but gave 0 col and row.
Could someone help in above 2 steps correct or not ? and also suggest how to fix 3rd step?
if you want to extract the first five (non unique):
iphonedf1to5 <- df[1:5,]
That means that you get the first 5 rows and all columns. Then if you want to get the unique first five elements it should be like:
iphonedf1to5 <- unique(df[1:5,])
Edit:
df means your data frame of the read csv, iphonedetail in your case.

Import table with irregular linebreaks

I have a 2.8gb text file that I'm trying to import into R.
1) I used fread(file='file.txt',sep = ';',header = T,nrows = 1000,stringsAsFactors = F,fill=T) to take a quick look, and I saw that some rows happen to show some columns with NA's and in the row below are the values that should be in the NA's place.
2) Next, I used HJSplit to see a part of the file in a notepad and noticed that there are linebreaks in the middle of some rows, making these rows occupy two rows. Here's an ilustration of what's happening (example of ';' separated file with 4 columns):
id;name;age;sex
150;bob;40;F
151;luke;20;M
152;mary
20;F
153;larry;30;M
Question: Is there a way that I can solve this problem?
One thing that came to my mind was to use the fact that the number of columns are defined, but I don't know how.

R - Exctract multiple tables from text file

I have a .txt file containing text (which I don't want) and 65 tables, as shown below (just the top of the .txt file)
Does anyone know how I can extract only the tables from this text file, such that I can open the resulting .txt file as a data.frame with my 65 tables in R? Above each table is a fixed number of lines (starting with "The result of abcpred on seq..." and ending with "Predicted B cell epitopes") and below each of them is a variable number of lines, depending on how many rows each tables has. Then it comes the next table, and it goes like that until I reach the 65th table.
Given that the tables are the only elements that start with numbers, to grep for integers at the beginning of the line is indeed the best solution. Using the shell (and not R) the command:
grep '^[0-9]' input > output
did exactly what I wanted.

How to create a table in R from a csv file?

I have a csv file and am unsure how to get R to interpret it as a table because all the title info is in one cell and all the data relating to the titles is in a separate cell. So all the info I need is in 2 cells but it actually needs to be split up.
The cell A3 has a value called 'Team' , this corresponds to the part in the cell A4 that says 'Visitor'. Then each part after than corresponds to the bit below it. ..sorry I don't know how to describe it, but ultimately it would look like this …
Looks like the field separator in your data is a ;
read.csv has a parameter sep to change the field separator and another parameter header to tell it there is an initial line containing the column names. Use read.csv like this:
data = read.csv(file="/mydir/myfile.csv", sep=";", header=T)
To test you can print out the first 5 lines of the data table with:
head(data,5)

Resources