Rstudio datasets

10/2/2023

UCBAdmissions Student Admissions at UC Berkeley ToothGrowth The Effect of Vitamin C on Tooth Growth in Titanic Survival of passengers on the Titanic Seatbelts Road Casualties in Great Britain 1969-84 Puromycin Reaction Velocity of an Enzymatic Reaction PlantGrowth Results from an Experiment on Plant Growth LifeCycleSavings Intercountry Life-Cycle Savings Data JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share InsectSprays Effectiveness of Insect Sprays Indometh Pharmacokinetics of Indomethacin HairEyeColor Hair and Eye Color of Statistics Students Live Demo > data() Output Data sets in package ‘datasets’:ĪirPassengers Monthly Airline Passenger Numbers 1949-1960īJsales Sales Data with Leading IndicatorĬO2 Carbon Dioxide Uptake in Grass PlantsĬhickWeight Weight versus age of chicks on different dietsĮuStockMarkets Daily Closing Prices of Major European Stockįormaldehyde Determination of Formaldehyde Also, for data sets in base R, we can use ls("package:datasets"). To get the list of available data sets in base R we can use data() but to get the list of data sets available in a package we first need to load that package then data() command shows the available data sets in that package. Therefore, it becomes helpful to everyone who want to learn the use of R programming.

The characteristics of these data sets are very different, for example, some data sets are time series data, some have only numerical columns, some have numerical as well as factor columns, some includes character columns with other type of columns. It’s "" by default.There are many data sets available in base R and in different packages of R. Oh, and by the way, you can set the entries for NA values with the na parameter. You can see the effect in the screenshot. BoldHeaderRow is self-explanatory, I guess. If you want to write several ames into several sheets of the Excel file, you can put several names in a vector here that have to correspond with the names of the objects at the first position.Īdj.Width is a nice parameter because it tries to adjust the width of the columns in Excel in a way that every entry fits in the cells. With the parameter SheetNames you can set the names of the data sheets (visible at the bottom of Excel, not included in the screenshot). Here, you could replace data with c("data", "data2"). You can save several dataframes in one Excel file by including the names of the objects at the first position.

This is what the resulting Excel file looks like on my machine. WriteXLS(data, ExcelFileName = "data.xlsx", I tried a few packages for writing Excel files and I find this one the most convenient to use. In this case, you can use the Write.XLS() function from the Write.XLS package. Maybe some colleagues only work with Excel (because you still not managed to convince them switching to R) or you want to use Excel for annotating your dataset with a spreadsheet editor. You might come into a situation where you want to export your dataset to an Excel file. The only advantage of save() really is that you can save several objects into one file - but in the end it might be better to have one file for one object. Also, it is more similar to the behavior of all the other “reading functions” like read.table(): for these, you also have to assign the result to a variable. This might mean more typing but it also has the advantage that you can choose a new name for the variable to integrate it in into the rest of the new script more smoothly. When we use readRDS(), we have to assign the result of the reading process to a variable. But this also means that you have to “remember” the names of the previously used objects when using load(). When we use load(), we do not assign the result of the loading process to a variable because the original names of the objects are used. So, you might ask “why should I use saveRDS() instead of save()”? Actually, I like saveRDS() better - for one specific reason that you might not have noticed in the calls above. The difference between save() and saveRDS() In the scope of this post, let’s suppose that the calculation above took veeeery long and you absolutely don’t want to run it everytime. If they don’t, you can just run your pre-processing code every time you are getting back to analyzing the dataset. data$ <- is.na(data$gender) | data$gender = "Unknown"Īs I wrote above: Saving the current state of your dataset in R makes sense when all the preparations take a lot of time. Here, I’m assigning a new column data$ which is TRUE whenever data$gender is "Unknown" or NA. Just for the sake of simulating a real workflow, I will do some very light data manipulation. We now have a dataset with over 50,000 rows (you can scroll through the first 6 of them in the box above) and 14 variables in our global environment (the ‘workspace’).

0 Comments

Rstudio datasets

Leave a Reply.

Author

Archives

Categories