We have been using many built-in R functions throughout the lessons.
After you start using R more, you will reach the point when you will want to create reusable sections of statements.
Functions are simply a named group of R expressions that are considered an R object of type function().
Input values or function arguments are passed to functions by value. Each argument is either matched with the function definition by position or by name.
We have used built-in functions that are also considered generic, such as the plot() function. Generic functions are designed to accept a variety of different arguments. R functions can accept a variable length of input arguments and they can even accept references to other functions.
When functions are called the actual arguments are assigned to local variables within the function. The statements in the function body are evaluated based on the input data. Control will be returned to the invoking function when the return() statement is reached or when the final expression in the function body is reached.
If there is no explicit data returned to the calling function, the output generated by the last expression will be returned.
Let's take a look at an example of a simple R function.
We want find that we are often writing code to determine the number of students who passed a course so we decide to define a function to simply our code.
The function is defined using the function() expression. The name of this function has been specified as numPassed and the function takes a single argument.
The comments on line 3 through 6 help explain the usage of the function includes the input and output datatypes expected.
The body or content of the function is contained within the brace brackets ({}). We decided to use the vector enabled sum() function with a conditional expression to calculate the number of students with marks of 50 or higher. The pass object on line 8 is actually a vector of boolean values based on the criteria defined in line 7.
If you want to examine the source code of a function you can type the name of the function, without specifying any arguments, in the R Console.
This function will accepts a single dataframe or a list and it will perform a conditional expression tests for each element of the data structure. If the student has obtained a mark of 50 or more a logical value of TRUE is generated. These logical values are collected in a local vector called pass.
We then use the sum() function to count the number of TRUE values and return the data to the calling function as an integer.
Finally, the function is being called within the concatenate or cat() function.
One of the most used features of R is the *apply() family of functions.
Here we are using the lapply() or List Apply function. The function that we are invoking is the mean or average function within the base package.
The mean function will be executed across each column of the data.frame called m.
The lapply() function always returns a list as its ouput. In our scenario, the list with the mean or average by subject is provided.
In the box of the right side we see how the sapply() or simplified apply function can be used. sapply() basically performs the same task as lapply(), but it coerces the output into a vector or a matrix. Notice how sapply generated a vector of double values where each element has a name associated with it. The final expression demonstrates how to retrieve the element named "Math".
Here we are using the same approach of using lapply() to invoke a function across a dataset.
In the previous example we used the built-in function mean and in this example we are using the function numPassedCourse , that we created, to find the number of students who passed the course.
These examples of using the apply() function are quite simple, but you can probably see how a single line of R code can actually be performing a complex operation including iteration.
Now that we have learned about flow of control and functions, let's examine data relationships across multiple variables.
In our case study we want to determine if there is a relationship between the height of a person and their shoe size.
We will use R to simulate data sets for our analysis and we will use quantitative measures and visualizations to perform our data analysis.
There are a few different quantitative measurements that can be used to measure the degree of varying between multiple variables.
A positive covariance would indicate that there is a positive linear relationship between the two variables, and a negative covariance would indicate the opposite.
Correlation is a similar statistical measurement of the degree of linear relationship between variables, but it has the added benefit that it has a well defined range of values. The coefficient range is from -1 to 1.
For example, a correlation coefficient of 0.8 would indicate that there is a strong linear relationship between the two variables. Therefore, if variable 1 increases there is a similar observed increase in variable 2. If the correlation coefficient is close to zero (0) there is minimal relationship between the changes in values of the variables. Negative coefficients would indicate that the direction of change in value of the 2 variables are opposite one another.
Here we are creating 2 uniform distributions of heights and shoe size for our analysis.
We use the covariance or cov() function to compute the covariance measurement and we use the correlation or cor() function to compute the correlation.
These values are stored in variables as they will be used in our final visualization.
We are using the base graphics package plot() function to create a scatterplot of the generated data. Visually it seems like there is no relationship between these two measurements. The values of correlation and covariance support our visual understanding as their values are close to (0) zero.
On line 22 a linear regression model is created and stored in a variable called lm1.
The linear model function or lm() can be used to perform an analysis of multiple data sets and determine an linear approximation of the relationship between the variables. Notice that the lm() function is provided a formula. In R a formula consists of a tilde (~) character where in this case the variable on the left is considered the response and the variable on the right of the tilde is the term under consideration.
The object lm1 contains quite a large amount of detailed information. In this scenario we use the object to add a line to our graph to visualize the relationship.
We use a math expression on line 26 to adjust the shoe size for our next analysis.
As an aside, the expression uses integer division to introduce a relationship between the variables.
Again we determine the covariance and correlation coefficients of the revised variables.
In our initial simulation of randomly generated uniform shoe sizes and heights there was very little correlation.
Here we see visually that there is a strong linear correlation and the value of the correlation coefficient agrees with our visual analysis.
We have also performed a revised linear regression model and plotted the result on the graph.
Here we see a summary of the before and after of our measurements for heights and shoe sizes.
The strong correlation is quite evident in the revised data for shoe sizes shown on the right.
There are many lab exercises that you can work with following this lesson including: dice simulations, correlations, and recoding of datasets using the apply family of functions in R.
2 5413 zmbd 5,501
Recognized Expert Moderator Expert
nbiswas:
Please provide proper citations for these articles.
-z
Thanks for the information
Sign in to post your reply or Sign up for a free account.
Similar topics | |
by: christopher diggins |
last post by:
I have written an article on how to do Aspect Oriented Programming in
vanilla C++ (i.e. without language extensions or other tools such as
AspectC++). The article is available at
http://www.heron-language.com/aspect-cpp.html. I would appreciate some
feedback on the article, and also I would like to know whether I am
repeating some prior work.
Thanks in advance!
--
|
by: Roger Smythe |
last post by:
A means for the progressive decomposition a problem space into increasingly simpler component parts
such that these component parts represent higher levels of conceptual abstraction, and are
completely independent of each other except for their well-defined interfaces.
This was an attempt to explain the gist of OOP to programmers accustomed to the
structured programming paradigm. I tried to explain OOP in terms of ideals that can
be...
|
by: Justin Robbs |
last post by:
I am trying to write the communcations part of a Point of Sale program for
the Convenience Store industry. The setup in each store will have varying
numbers of registers. There could be as few as 2 or as many as 12. The
program I am working on runs on a computer which communicates to our gas
pumps and sends status changes to all registers. It also handles a certain
amount of individual communications to a specific register. Anyway, I...
|
by: Ken Allen |
last post by:
OK, I admit that I have been programming since before C++ was invented,
and I have developed more than my share of assembly language systems,
and even contributed to operating system and compiler systems over the
years. I have developed code in more than 30 distinct programming
languages for a wide cariety of industries. But this issue of structures
in C# is beginning to annoy me.
I should also make it clear that I am not a big supporter...
|
by: UrsusMaximus |
last post by:
While preparing a Python411 podcast about classes and OOP, my mind
wondered far afield. I found myself constructing an extended metaphor
or analogy between the way programs are organized and certain
philosophical ideas. So, going where my better angels dare not, here is
the forbidden fruit of my noodling:
Spiritual Programming:
It seems to me that, if anything of a person survives death in any way,
it must do so in some way very...
| | |
by: Juan R. |
last post by:
In
http://canonicalscience.blogspot.com/2006/04/scientific-language-canonml-is.html]
I presented some generic requirements for a markup language for science
and mathematics. Basic features of CanonML and ampliations and
improvements over TeX, SGML, XML or Scheme based encodings are listed
below. However, let me an incise first. Rememeber how we also saw that
the mathematics in Distler's blog Musings were being incorrectly
encoded with...
|
by: istillshine |
last post by:
There are many languages around: C++, JAVA, PASCAL, and so on. I
tried to learn C++ and JAVA, but ended up criticizing them. Is it
because C was my first programming language?
I like C because, comparatively, it is small, efficient, and able to
handle large and complex tasks.
I could not understand why people are using and talking about other
programming languages.
|
by: nbiswas |
last post by:
In this lesson we will initially learn about the features and uses of R.
R is a software environment that is excellent for data analysis and graphics.
It was initially created in 1993 by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. They created R as a language to help teach introductory statistics to their students. They based R on the S language that was developed earlier at Bell Labs in the 1970s.
After...
|
by: nbiswas |
last post by:
Welcome to the lesson on R data structures.
To perform any meaningful data analysis we need to collect our data into R data structures.
In this lesson we will explore the most frequently used data types and data structures.
R can be used to analyze many different forms of data. We will explore the built-in data types of R.
Data analysis usually requires an examination of large sets of similar data.
In this lesson we will explore...
|
by: nbiswas |
last post by:
The programming structures that we will examine include: control flow statements and user defined functions.
In previous lessons we were either performing summary or descriptive statistics across single variables, but in this lesson we will examine the interrelationships between variables. Interrelationship analysis comes in different forms and in this lesson we will examine covariance, correlation, and linear regression techniques.
You have...
|
by: marktang |
last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However, people are often confused as to whether an ONU can Work As a Router. In this blog post, we’ll explore What is ONU, What Is Router, ONU & Router’s main usage, and What is the difference between ONU and Router. Let’s take a closer look !
Part I. Meaning of...
| | |
by: Hystou |
last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can effortlessly switch the default language on Windows 10 without reinstalling. I'll walk you through it.
First, let's disable language synchronization. With a Microsoft account, language settings sync across devices. To prevent any complications,...
|
by: Hystou |
last post by:
Overview:
Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows Update option using the Control Panel or Settings app; it automatically checks for updates and installs any it finds, whether you like it or not. For most users, this new feature is actually very convenient. If you want to control the update process,...
|
by: agi2029 |
last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing, and deployment—without human intervention. Imagine an AI that can take a project description, break it down, write the code, debug it, and then launch it, all on its own....
Now, this would greatly impact the work of software developers. The idea...
|
by: isladogs |
last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM).
In this session, we are pleased to welcome a new presenter, Adolph Dupré who will be discussing some powerful techniques for using class modules.
He will explain when you may want to use classes instead of User Defined Types (UDT). For example, to manage the data in unbound forms.
Adolph will...
|
by: conductexam |
last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and then checking html paragraph one by one.
At the time of converting from word file to html my equations which are in the word document file was convert into image.
Globals.ThisAddIn.Application.ActiveDocument.Select();...
|
by: TSSRALBI |
last post by:
Hello
I'm a network technician in training and I need your help.
I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs.
The last exercise I practiced was to create a LAN-to-LAN VPN between two Pfsense firewalls, by using IPSEC protocols.
I succeeded, with both firewalls in the same network. But I'm wondering if it's possible to do the same thing, with 2 Pfsense firewalls...
| | |
by: adsilva |
last post by:
A Windows Forms form does not have the event Unload, like VB6. What one acts like?
|
by: 6302768590 |
last post by:
Hai team
i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated we have to send another system
| |