472,145 Members | 1,504 Online
Bytes | Software Development & Data Engineering Community
Post +

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 472,145 developers and data experts.

Lesson 1 – introduction to data analysis using r

nbiswas
149 100+
In this lesson we will initially learn about the features and uses of R.

R is a software environment that is excellent for data analysis and graphics.
It was initially created in 1993 by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. They created R as a language to help teach introductory statistics to their students. They based R on the S language that was developed earlier at Bell Labs in the 1970s.
After some time they made R available as an open source GNU project. A very active R community now exists around the world.
R is considered a Domain Specific Language as it was designed primarily for data analysis.
R programs are typically created using functions and the programs are executed by an R interpreter.
R is not just a programming language as it has native support for creating high quality data visualizations.
R is used across many industries such as healthcare, retail, and financial services.
R can be used to analyze both structured and unstructured datasets.
R can help you explore a new dataset and perform descriptive analysis.
R is also excellent at building predictive models.

There are many reasons why learning R is beneficial.
As a Data Analyst or Data Scientist – R can be used to dig deeper into your data than is possible using spreadsheet-based tools alone
As a software developer – R can enable data analytics computations and graphics into new or existing applications with minimal effort.
With the explosion of Big Data, there are many new scenarios where using R is an excellent choice to help meet user demands.
As a data analyst, R can be used to perform classical statistical tests and predictive models.
R also has native support for handling time-series datasets.
Classification and clustering models can be used to better detect patterns.
As a developer R is a powerful functional programming language.
Since R scripts are interpreted it encourages an interactive approach to development.
R scripts are typically written using expressions and built-in functions.
R provides native support for many useful types of data structures. Many of these data structures will be explored in other lessons.
External libraries can be used to extend the capabilities of R.
As your R skills improve you will likely start to define your own functions and possibly new Classes to meet the demands of your users.
Installing R is quite simple.
Simply navigate to the R Project page and click on the Comprehensive R Archive Network or CRAN link.
CRAN is a set of servers around the world that store identical, up-to-date, versions of code and documentation for R.
There are binary installers available for Windows, Linux, and Mac OS platforms. It is possible to build R from source, but it is best to avoid this step if possible so you can get started using R more quickly.
Installing R on Windows involves downloading the MSI file and executing it.
There are 32-bit and 64-bit installation options available. We will use the 64-bit version for our coursework as it has higher limits on the amount of memory that can be used.
Once the Windows installation has finished you can get started with R by launching the R command line environment or the RGui tool.
RGui provides some useful productivity features beyond the R command line environment for R users.
Installing R on Linux involves either: downloading the appropriate RPM file from the CRAN website or use of a Linux package manager such as YUM as shown.
Note that you must be logged in as a root user or have sudo privileges on your Linux system to complete the installation.
Once installed on the system any user can use R.
By default, there is an R command line and GUI provided, but many R users prefer to use a more comprehensive Integrated Development Environment (IDE) such as RCmdr or Rstudio.
RStudio is an excellent alternative to the RGui tool provided with R. RStudio is a available on Linux, Mac OS X, and Windows.
In this configuration we are using RStudio on a Linux server from within a browser.
This environment is ideal for occasional R users as they would not need to install R on their own computer to use it.
Let's examine some of the tiled windows show here:
• In the top left corner we are able to view the 2013_cars.csv data file and an R source file called cars.R.
• In the bottom left corner we have the R Console.
• In the top right corner we have access to the objects in the current R workspace and a history of recently used R commands.
• In the bottom right corner we have a histogram plot of data along with access to the R help utility.

It is worth the time and effort to install an IDE such as RStudio as you learn R.
Previously we stated that R can be extended using packages.
There are over 4000 different packages available in CRAN and more being added frequently.
The packages published in CRAN are categorized based on their functionality into Task Views.
During this course we will primarily use the built-in or standard set of packages, but you may wish to explore some of the additional packages along the way.
The base R environment provides a significant set of functions for data analysis, but there are many excellent packages available from the R Community.
The new packages can be installed add using the install.packages() function.
CRAN will be searched for the package or you may have given a new package that is not available in CRAN.
Simply use the same function and direct it to the compressed archive file for the new package.
Here we see that the RJDBC package is being installed to enable connectivity to database servers such as Informix or DB2 through a JDBC driver.
If your develop an R script that uses functions that are not part of the R base your script should contain the library() or require() functions within the first few lines in the script so the package is loaded into memory during runtime.
Sep 4 '14 #1
1 5042
zmbd
5,501 Expert Mod 4TB
nbiswas:
Please provide proper citations for these articles.
-z
Sep 29 '14 #2

Post your reply

Sign in to post your reply or Sign up for a free account.

Similar topics

3 posts views Thread by Geoffrey | last post: by

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.