473,397 Members | 2,099 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes and contribute your articles to a community of 473,397 developers and data experts.

Lesson 1 – introduction to data analysis using r

nbiswas
149 100+
In this lesson we will initially learn about the features and uses of R.

R is a software environment that is excellent for data analysis and graphics.
It was initially created in 1993 by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. They created R as a language to help teach introductory statistics to their students. They based R on the S language that was developed earlier at Bell Labs in the 1970s.
After some time they made R available as an open source GNU project. A very active R community now exists around the world.
R is considered a Domain Specific Language as it was designed primarily for data analysis.
R programs are typically created using functions and the programs are executed by an R interpreter.
R is not just a programming language as it has native support for creating high quality data visualizations.
R is used across many industries such as healthcare, retail, and financial services.
R can be used to analyze both structured and unstructured datasets.
R can help you explore a new dataset and perform descriptive analysis.
R is also excellent at building predictive models.

There are many reasons why learning R is beneficial.
As a Data Analyst or Data Scientist – R can be used to dig deeper into your data than is possible using spreadsheet-based tools alone
As a software developer – R can enable data analytics computations and graphics into new or existing applications with minimal effort.
With the explosion of Big Data, there are many new scenarios where using R is an excellent choice to help meet user demands.
As a data analyst, R can be used to perform classical statistical tests and predictive models.
R also has native support for handling time-series datasets.
Classification and clustering models can be used to better detect patterns.
As a developer R is a powerful functional programming language.
Since R scripts are interpreted it encourages an interactive approach to development.
R scripts are typically written using expressions and built-in functions.
R provides native support for many useful types of data structures. Many of these data structures will be explored in other lessons.
External libraries can be used to extend the capabilities of R.
As your R skills improve you will likely start to define your own functions and possibly new Classes to meet the demands of your users.
Installing R is quite simple.
Simply navigate to the R Project page and click on the Comprehensive R Archive Network or CRAN link.
CRAN is a set of servers around the world that store identical, up-to-date, versions of code and documentation for R.
There are binary installers available for Windows, Linux, and Mac OS platforms. It is possible to build R from source, but it is best to avoid this step if possible so you can get started using R more quickly.
Installing R on Windows involves downloading the MSI file and executing it.
There are 32-bit and 64-bit installation options available. We will use the 64-bit version for our coursework as it has higher limits on the amount of memory that can be used.
Once the Windows installation has finished you can get started with R by launching the R command line environment or the RGui tool.
RGui provides some useful productivity features beyond the R command line environment for R users.
Installing R on Linux involves either: downloading the appropriate RPM file from the CRAN website or use of a Linux package manager such as YUM as shown.
Note that you must be logged in as a root user or have sudo privileges on your Linux system to complete the installation.
Once installed on the system any user can use R.
By default, there is an R command line and GUI provided, but many R users prefer to use a more comprehensive Integrated Development Environment (IDE) such as RCmdr or Rstudio.
RStudio is an excellent alternative to the RGui tool provided with R. RStudio is a available on Linux, Mac OS X, and Windows.
In this configuration we are using RStudio on a Linux server from within a browser.
This environment is ideal for occasional R users as they would not need to install R on their own computer to use it.
Let's examine some of the tiled windows show here:
• In the top left corner we are able to view the 2013_cars.csv data file and an R source file called cars.R.
• In the bottom left corner we have the R Console.
• In the top right corner we have access to the objects in the current R workspace and a history of recently used R commands.
• In the bottom right corner we have a histogram plot of data along with access to the R help utility.

It is worth the time and effort to install an IDE such as RStudio as you learn R.
Previously we stated that R can be extended using packages.
There are over 4000 different packages available in CRAN and more being added frequently.
The packages published in CRAN are categorized based on their functionality into Task Views.
During this course we will primarily use the built-in or standard set of packages, but you may wish to explore some of the additional packages along the way.
The base R environment provides a significant set of functions for data analysis, but there are many excellent packages available from the R Community.
The new packages can be installed add using the install.packages() function.
CRAN will be searched for the package or you may have given a new package that is not available in CRAN.
Simply use the same function and direct it to the compressed archive file for the new package.
Here we see that the RJDBC package is being installed to enable connectivity to database servers such as Informix or DB2 through a JDBC driver.
If your develop an R script that uses functions that are not part of the R base your script should contain the library() or require() functions within the first few lines in the script so the package is loaded into memory during runtime.
Sep 4 '14 #1
1 5332
zmbd
5,501 Expert Mod 4TB
nbiswas:
Please provide proper citations for these articles.
-z
Sep 29 '14 #2

Sign in to post your reply or Sign up for a free account.

Similar topics

3
by: Geoffrey | last post by:
I am working on a file conversion project that reads data from a one file format, reformats in and writes in out to another. The data is records of informations - names address, account...
0
by: Magic1812 | last post by:
Magic Software invites you to join us this coming Tuesday (January 27th, 2004) at 12:00 EDT / 17:00 GMT for a FREE live Webinar: Title: Data Integrity Using eDeveloper Date: January 27, 2004...
0
by: Mason | last post by:
If you are looking for a data analysis tool take a look at Databeacon's .NET smart client tool @ http://www.databeacon.com/PressReleaseOct142.cfm
1
by: Bart | last post by:
Dear all, I would like to encrypt a large amount of data by using public/private keys, but I read on MSDN: "Symmetric encryption is performed on streams and is therefore useful to encrypt large...
6
by: OldSchool | last post by:
Sir its me julius, Sir how can i compare the mscomm data recieved using RFID and Compare it with my Text data. Sir example. Private Sub MSComm1_OnComm() Text1.SetFocus '...
2
by: Gary42103 | last post by:
Hi I need Perl Script to do Data Parsing using existing data files. I have my existing data files in the following directory: Directory Name: workfs/ams Data File Names: 20070504.dat,...
5
by: orabalu | last post by:
Hi Guys, Can you give me some examples for Incremental load in PL/SQL for Datawarehouse projects. Regards, Balu
1
by: sevana | last post by:
Hi, I would like to ask PHP professionals if they know any PHP module that has any data analysis or data mining functionality? And if you do, does it help? Is there a need for such module? ...
5
by: DR | last post by:
Why is its substantialy slower to load 50GB of gzipped file (20GB gzipped file) then loading 50GB unzipped data? im using System.IO.Compression.GZipStream and its not maxing out the cpu while...
16
by: bestbird7788 | last post by:
Hi, everybody I need to conduct a large amount of data analysis on database. Could anyone recommend an interactive application for data analysis? The requirements are: 1. Able to cope...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: nemocccc | last post by:
hello, everyone, I want to develop a software for my android phone for daily needs, any suggestions?
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.