Getting Started With R .
Introduction
R is a programming language and software environment for statistical computing and graphics supported by the R Foundation. Unlike general-purpose programming languages like Java and C, R was created by statisticians as an interactive environment. Interactivity is a critical characteristic that allows R to effectively explore data. It serves as both a programming language and a development environment for statistical and graphical analysis, covering various statistical testing approaches, including linear and non-linear modeling, classification, and more. Different types of plots are often required during data analysis.
To run R, we typically use an Integrated Development Environment (IDE), which is a software application that provides comprehensive facilities for software development, according to Wikipedia. The core component required for every R program is Base R, which contains the essential code segments necessary to successfully execute our code.
History About R
Bell Labs developed the S language in 1976. In 1993, Ross Ihaka and Robert Gentleman created R in New Zealand. R became open source in 1995. R version 1.0.0 was released to the public in 2000. The RStudio IDE was released in 2011.
Drawback
- R is built using
S
. If we want to build apps, R is probably one of our choices. - The objects that we work with must be stored in memory, and working with large datasets can quickly become a challenge.
Installing and Setting up R in your Windows
Step 1: Downloading installation file
- Download R tools from Official Website
- Next, we need to have an IDE, most popular one is Rstudio. We can download it from this link.
After downloading the installation file, install it in the desired location, and then open the console.
After installation is completed, open R, and then we will get a window just like the one below.
Now we can write our R codes within console or we can do it via Rstudio.
I prefer to use Jupyter Notebook for runing R because it is more friendly for me. A good tutorial is available at Anaconda’s Documentation.
My First R program
I am assigning variable in R as my first R programs.
Assigning Variable and operator in R
A variable is a container that stores values. An assignment statement sets or resets the value stored in the storage location(s) denoted by the variable name (according to Wikipedia). The assignment operator is a command that tells the computer to assign the text “apple” to the variable “product.” We can also assign it using assign('product', 'apple')
. In R, we can assign variables in many ways, as shown below.
Way 1
('apple'-> product)
Way 2
(product = 'apple')
Way 3
assign('products', ' apple)
Logical Operators in R
Logical operator means those which gives True
and False
value. For example
Example 1
apple <- 2
banana <- 3
most_expensive <- banana> apple
most_expensive
Output of above code is,
TRUE
Example 2
apple <- 2
banana <- 3
most_expensive <- banana< apple
most_expensive
Output of above code is,
FALSE
Example 3
apple <- 2
banana <- 2
most_expensive <- banana == apple
most_expensive
Output is,
TRUE
Example 4
apple <- 2
banana <- 2
most_expensive <- banana != apple
most_expensive
Output is,
FALSE
Some Commonly Used Data Types in R
Data is the center for analysis; if there is no data, there is no analysis. Every piece of data works with certain characteristics. These characteristics can be summarized with data types.
-
Character
: Anything inside quotation marks is a character. -
Number
: Numbers in R are double. Working with both whole and fractional numbers is a unique feature of double. Another type is integer. -
Integer
: An integer is actually a simplified version of double. It stores data as a string; we must use the capital letter L. In our use, we need to use double rather than integer. -
Logical (Boolean)
: Represents “Yes” or “No,” also represented as “T” or “F.” -
Complex Number
: Example: (2 + 6i). -
Raw
: This is not a very popular data type. It is not easy to create a variable of raw type. If we really need to create a raw function, as a result of calling this function, we get raw type data.
All the fundamental data types are called atomic data types.
Example of numbers
An integer:
a <- 2L
class(a)
Output is,
'integer'
A numeric:
a <- 2
class(a)
Output is,
'numeric'
quantity <- 2
typeof(quantity)
Output is,
'double'
quantity_integer <- 2L
typeof(quantity_integer)
Output is,
'integer'
Comments
Comments are used to provide important information about the code. They are not executed by the program but are written by the programmer to enhance the explanation of the code.
# This is a comment in R
Exploring vectors and factors
A data structure, as the name suggests, represents a way to organize data to facilitate different operations and perform calculations more quickly.
-
Vectors
: Collection of data of the same structure. -
Factors
: Used to store categorical data. -
Array
: A matrix that is a generalization of vectors. -
List\DataFrame
: Elements of different lists are data frames. Lists are more complex data structures because they allow us to store other lists as well. We can think of a data frame as a spreadsheet where data is organized into columns and rows, with each column having a specific data type. Within a data frame, we can have various data types, but within one column, we have only one data type. Another criterion for categorizing our data is by dimension:- Vectors and lists are one-dimensional objects.
- Matrices and data frames are two-dimensional data structures.
- Arrays are objects that have more than two dimensions.
Vectors have two properties: they are one-dimensional and contain elements of the same type.
Assigning a column vector
Lets assign a column vector,
assign('b',c(1,2,3,4))
print(b)
Output is,
1 2 3 4
Vectors attributes:
length
: It is denoted by length(a) and its meaning is number of elements.Name
: names(a), it allows us to add element in the list.Type
: typeof(a), It gives type of data.
There are six vectors types
- Double
- logical
- character
- complex
- Raw
- Integer
vector <- c("Durga","Puja","Ram","Hari")
vector
length(vector) # length
names(vector)= "Sita" #names
typeof(vector) # type
vector
Output is,
'Durga''Puja''Ram''Hari'
4
'character'
Sita'Durga'2'Puja'3'Ram'4'Hari'
Manipulating vectors.
Manipulating of vectors consists of sorting, ordering, indexing.
sorting
: Sort the data in some order.Ordering
: The order function return the index needed to get the vector sort.Indexing
: Selecting specifics iteam by position.
quantity <- c(1,3,2,5,6,7)
sort(quantity)
order(quantity)
Output is,
1 2 3 5 6 7
1 3 2 4 5 6
a <- c(1,7,36,0,7,5)
a[2]
a[3:5]
a[c(2,4)]
a[c(4,7)]# it return particular element from vector
a[-2]
a[-(2:4)] # it skip the element in the vector.
a[a==1]
a[a>3]
a[a %in%c(2,4)] # it gives matching element.
Output is,
7
36 0 7
7 0
0 <NA>
1 36 0 7 5
1 7 5
1
7 36 7 5
Operating vector
Adding or multipling vector of different size is called recycling rule. For recycling largest vector must be multiple of small one.
c <- 1:6
d <- 1:3
c * d
Output is,
1 4 9 4 10 18
Sequence generation
It is used to create a sequence of elements in a vector. The seq()
function takes length and the difference between values as optional arguments. In the code below, I took elements in the range from 1 to 5 with an interval of 1.5.
Example:
seq(1,5,by = 1.5)
Output is,
1 2.5 4
Replicating elements
It is used to return the replicated elements in the list a specified number of times. In the following code, I replicated the numbers from 1 to 6 two times using the built-in function rep()
.
Example:
e<- rep(1:6,times = 2)
e
Output is,
1 2 3 4 5 6 1 2 3 4 5 6
We can replicate the same number at desirable times.
x <- rep(c(1),each = 10)
x
Out put is,
1 1 1 1 1 1 1 1 1 1
Scan Function
The scan()
function reads a file into a vector and is a powerful function. In the code given below, the scan()
function reads the file covid_data.csv
.
f <- scan("covid data.csv", what = "Character")
f
Out put of the above code is,
'date,totalCases,newCases,totalRecoveries,newRecoveries,totalDeaths,newDeaths' '1/23/2020,1,1,0,0,0,0' '1/24/2020,0,0,0,0,0,0' '1/25/2020,0,0,0,0,0,0' '1/26/2020,0,0,0,0,0,0' '1/27/2020,0,0,0,0,0,0' '1/28/2020,0,0,0,0,0,0' '1/29/2020,0,0,0,0,0,0' '1/30/2020,0,0,0,0,0,0' '1/31/2020,0,0,1,1,0,0' '2/1/2020,0,0,1,0,0,0' '2/2/2020,0,0,1,0,0,0' '2/3/2020,0,0,1,0,0,0' '2/4/2020,0,0,1,0,0,0' '2/5/2020,0,0,1,0,0,0' '2/6/2020,0,0,1,0,0,0' '2/7/2020,0,0,1,0,0,0' '2/8/2020,0,0,1,0,0,0' '2/9/2020,0,0,1,0,0,0' '2/10/2020,0,0,1,0,0,0' '2/11/2020,0,0,1,0,0,0' '2/12/2020,0,0,1,0,0,0' '2/13/2020,0,0,1,0,0,0' '2/14/2020,0,0,1,0,0,0' '2/15/2020,0,0,1,0,0,0' '2/16/2020,0,0,1,0,0,0' '2/17/2020,0,0,1,0,0,0' '2/18/2020,0,0,1,0,0,0' '2/19/2020,0,0,1,0,0,0' '2/20/2020,0,0,2,1,0,0' '2/21/2020,0,0,2,0,0,0' '2/22/2020,0,0,2,0,0,0' '2/23/2020,0,0,2,0,0,0' '2/24/2020,0,0,2,0,0,0' '2/25/2020,0,0,2,0,0,0' '2/26/2020,0,0,2,0,0,0' '2/27/2020,0,0,2,0,0,0' '2/28/2020,0,0,2,0,0,0' '2/29/2020,0,0,2,0,0,0' '3/1/2020,0,0,2,0,0,0' '3/2/2020,0,0,2,0,0,0' '3/3/2020,0,0,2,0,0,0' '3/4/2020,0,0,2,0,0,0' '3/5/2020,0,0,2,0,0,0' '3/6/2020,0,0,2,0,0,0' '3/7/2020,0,0,2,0,0,0' '3/8/2020,0,0,2,0,0,0' '3/9/2020,0,0,2,0,0,0' '3/10/2020,0,0,2,0,0,0' '3/11/2020,0,0,2,0,0,0' '3/12/2020,0,0,2,0,0,0' '3/13/2020,0,0,2,0,0,0' '3/14/2020,0,0,2,0,0,0' '3/15/2020,0,0,2,0,0,0' '3/16/2020,0,0,2,0,0,0' '3/17/2020,0,0,2,0,0,0' '3/18/2020,0,0,2,0,0,0' '3/19/2020,0,0,2,0,0,0' '3/20/2020,0,0,2,0,0,0' '3/21/2020,0,0,2,0,0,0' '3/22/2020,0,0,2,0,0,0' '3/23/2020,1,1,2,0,0,0' '3/24/2020,1,0,2,0,0,0' '3/25/2020,2,1,2,0,0,0' '3/26/2020,2,0,2,0,0,0' '3/27/2020,3,1,2,0,0,0' '3/28/2020,4,1,2,0,0,0' '3/29/2020,4,0,2,0,0,0' '3/30/2020,4,0,2,0,0,0' '3/31/2020,4,0,2,0,0,0' '4/1/2020,4,0,2,0,0,0' '4/2/2020,5,1,2,0,0,0' '4/3/2020,5,0,2,0,0,0' '4/4/2020,8,3,2,0,0,0' '4/5/2020,8,0,2,0,0,0' '4/6/2020,8,0,2,0,0,0' '4/7/2020,8,0,2,0,0,0' '4/8/2020,8,0,2,0,0,0' '4/9/2020,8,0,2,0,0,0' '4/10/2020,8,0,2,0,0,0' '4/11/2020,8,0,2,0,0,0' '4/12/2020,11,3,2,0,0,0' '4/13/2020,13,2,2,0,0,0' '4/14/2020,15,2,2,0,0,0' '4/15/2020,15,0,2,0,0,0' '4/16/2020,15,0,2,0,0,0' '4/17/2020,29,14,2,0,0,0' '4/18/2020,30,1,4,2,0,0' '4/19/2020,30,0,5,1,0,0' '4/20/2020,30,0,5,0,0,0' '4/21/2020,41,11,6,1,0,0' '4/22/2020,44,3,8,2,0,0' '4/23/2020,47,3,9,1,0,0' '4/24/2020,48,1,11,2,0,0' '4/25/2020,48,0,12,1,0,0' '4/26/2020,51,3,14,2,0,0' '4/27/2020,51,0,14,0,0,0' '4/28/2020,53,2,14,0,0,0' '4/29/2020,56,3,14,0,0,0' '4/30/2020,56,0,14,0,0,0' '5/1/2020,58,2,14,0,0,0' '5/2/2020,58,0,14,0,0,0' '5/3/2020,74,16,14,0,0,0' '5/4/2020,74,0,14,0,0,0' '5/5/2020,81,7,14,0,0,0' '5/6/2020,98,17,20,6,0,0' '5/7/2020,100,2,20,0,0,0' '5/8/2020,101,1,28,8,0,0' '5/9/2020,108,7,29,1,0,0' '5/10/2020,109,1,29,0,0,0' '5/11/2020,133,24,31,2,0,0' '5/12/2020,216,83,31,0,0,0' '5/13/2020,242,26,33,2,0,0' '5/14/2020,248,6,33,0,1,1' '5/15/2020,266,18,34,1,1,0' '5/16/2020,280,14,34,0,1,0' '5/17/2020,294,14,34,0,3,2' '5/18/2020,374,80,34,0,3,0' '5/19/2020,401,27,35,1,3,0' '5/20/2020,426,25,43,8,3,0' '5/21/2020,456,30,47,4,4,1' '5/22/2020,515,59,68,21,4,0' '5/23/2020,583,68,68,0,5,1' '5/24/2020,602,19,85,17,5,0' '5/25/2020,681,79,110,25,5,0' '5/26/2020,771,90,152,42,5,0' '5/27/2020,885,114,180,28,6,1' '5/28/2020,1041,156,184,4,6,0' '5/29/2020,1211,170,184,0,6,0' '5/30/2020,1400,189,188,4,7,1' '5/31/2020,1571,171,189,1,8,1' '6/1/2020,1810,239,190,1,8,0' '6/2/2020,2098,288,235,45,9,1' '6/3/2020,2299,201,238,3,11,2' '6/4/2020,2633,334,256,18,12,1' '6/5/2020,2911,278,289,33,12,0' '6/6/2020,3234,323,295,6,13,1' '6/7/2020,3447,213,340,45,13,0' '6/8/2020,3760,313,363,23,14,1' '6/9/2020,4083,323,394,31,15,1' '6/10/2020,4362,279,394,0,17,2' '6/11/2020,4612,250,394,0,17,0' '6/12/2020,5059,447,394,0,18,1' '6/13/2020,5334,275,394,0,19,1' '6/14/2020,5759,425,394,0,19,0' '6/15/2020,6210,451,1044,650,20,1' '6/16/2020,6590,380,1161,117,20,0' '6/17/2020,7176,586,1170,9,22,2' '6/18/2020,7847,671,1189,19,22,0' '6/19/2020,8273,426,1405,216,23,1' '6/20/2020,8604,331,1581,176,23,0' '6/21/2020,9025,421,1775,194,23,0' '6/22/2020,9558,533,2151,376,24,1' '6/23/2020,10098,540,2225,74,24,0' '6/24/2020,10727,629,2339,114,25,1' '6/25/2020,11161,434,2651,312,27,2' '6/26/2020,11754,593,2699,48,27,0' '6/27/2020,12308,554,2835,136,29,2' '6/28/2020,12771,463,3014,179,30,1' '6/29/2020,13247,476,3135,121,30,0' '6/30/2020,13563,316,3195,60,30,0' '7/1/2020,14045,482,4657,1462,33,3' '7/2/2020,14518,473,5321,664,33,0' '7/3/2020,15258,740,6144,823,33,0' '7/4/2020,15490,232,6416,272,34,1' '7/5/2020,15783,293,6548,132,35,1' '7/6/2020,15963,180,6812,264,35,0' '7/7/2020,16167,204,7500,688,36,1' '7/8/2020,16422,255,7753,253,36,0' '7/9/2020,16530,108,7892,139,38,2' '7/10/2020,16648,118,8012,120,39,1' '7/11/2020,16718,70,8443,431,39,0' '7/12/2020,16800,82,8590,147,39,0' '7/13/2020,16944,144,10295,1705,39,0' '7/14/2020,17060,116,10329,34,39,0' '7/15/2020,17176,116,11026,697,40,1' '7/16/2020,17343,167,11250,224,41,1' '7/17/2020,17444,101,11388,138,41,0' '7/18/2020,17501,57,11491,103,41,0' '7/19/2020,17657,156,11549,58,41,0' '7/20/2020,17843,186,11722,173,41,0' '7/21/2020,17993,150,12331,609,42,1' '7/22/2020,18093,100,12538,207,44,2' '7/23/2020,18240,147,12694,156,44,0' '7/24/2020,18373,133,12801,107,47,3' '7/25/2020,18482,109,12907,106,47,0' '7/26/2020,18612,130,12982,75,49,2' '7/27/2020,18751,139,13608,626,50,1' '7/28/2020,19062,311,13729,121,50,0' '7/29/2020,19272,210,13875,146,53,3' '7/30/2020,19546,274,14102,227,56,3' '7/31/2020,19770,224,14253,151,57,1' '8/1/2020,20085,315,14346,93,59,2' '8/2/2020,20331,246,14457,111,59,0' '8/3/2020,20749,418,14815,358,61,2' '8/4/2020,21008,259,14880,65,62,1' '8/5/2020,21389,381,15010,130,67,5' '8/6/2020,21749,360,15243,233,71,4' '8/7/2020,22213,464,15668,425,74,3' '8/8/2020,22591,378,16167,499,76,2' '8/9/2020,22971,380,16207,40,80,4' '8/10/2020,23309,338,16347,140,83,3' '8/11/2020,23947,638,16518,171,86,3' '8/12/2020,24431,484,16582,64,95,9' '8/13/2020,24956,525,16691,109,96,1' '8/14/2020,25550,594,16931,240,101,5' '8/15/2020,26018,468,17055,124,102,1' '8/16/2020,26659,641,17189,134,104,2' '8/17/2020,27240,581,17349,160,107,3' '8/18/2020,28256,1016,17434,85,114,7' '8/19/2020,28937,681,17554,120,120,6' '8/20/2020,29644,707,17818,264,126,6' '8/21/2020,30482,838,18068,250,137,11' '8/22/2020,31116,634,18204,136,146,9' '8/23/2020,31934,818,18485,281,149,3' '8/24/2020,32677,743,18660,175,157,8' '8/25/2020,33532,855,18973,313,164,7' '8/26/2020,34417,885,19358,385,175,11' '8/27/2020,35528,1111,19927,569,183,8' '8/28/2020,36455,927,20096,169,195,12' '8/29/2020,37339,884,20409,313,207,12' '8/30/2020,38560,1221,20676,267,221,14' '8/31/2020,39459,899,21264,588,228,7' '9/1/2020,40528,1069,22032,768,239,11' '9/2/2020,41648,1120,23144,1112,250,11' '9/3/2020,42876,1228,24061,917,257,7' '9/4/2020,44235,1359,25415,1354,271,14' '9/5/2020,45276,1041,26981,1566,280,9' '9/6/2020,46256,980,28795,1814,289,9' '9/7/2020,47235,979,30531,1736,300,11' '9/8/2020,48137,902,32818,2287,306,6' '9/9/2020,49218,1081,33736,918,312,6' '9/10/2020,50464,1246,35554,1818,317,5' '9/11/2020,51918,1454,36526,972,322,5' '9/12/2020,53119,1201,37378,852,336,14' '9/13/2020,54158,1039,38551,1173,345,9' '9/14/2020,55328,1170,39430,879,360,15' '9/15/2020,56787,1459,40492,1062,371,11' '9/16/2020,58326,1539,41560,1068,379,8' '9/17/2020,59572,1246,42803,1243,383,4' '9/18/2020,61592,2020,43674,871,390,7' '9/19/2020,62796,1204,45121,1447,401,11' '9/20/2020,64121,1325,46087,966,411,10' '9/21/2020,65275,1154,47092,1005,427,16' '9/22/2020,66631,1356,47915,823,429,2' '9/23/2020,67803,1172,49808,1893,436,7' '9/24/2020,69300,1497,50265,457,452,16' '9/25/2020,70613,1313,51720,1455,458,6' '9/26/2020,71820,1207,52867,1147,466,8' '9/27/2020,73393,1573,53752,885,476,10' '9/28/2020,74744,1351,54494,742,481,5' '9/29/2020,76257,1513,55225,731,491,10' '9/30/2020,77816,1559,56282,1057,498,7'
Conversion of different type of data into character type is called implicit coercian
R convert coerced data type into character.
x <- c(1,'two',4,"durga")
x
typeof(x)
Output is,
'1' 'two' '4' 'durga'
'character'
Explicit type coercian
- We do this by typing
as.desire data type
. Explicit type coercian helps us to deal with incorectly catagorized data. - We can not transfer numeric into character
- Character into numberic.
num <- 1:5 num_char <- as.character(num) num_char
Output is,
'1' '2' '3' '4' '5'
product <- c("apple",1,"banana") as.numeric(product)
Output is,
Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"
<NA> 1 <NA>
Installing Packages in R
There are numerous useful packages to do various tasks in R and with those packages, we could do things better and faster way. Once simpler way to install packages is via console;
install.packages("haven")
library("haven") # allows to read sav file
saq8 <- read_sav("F:/Statisticts with R/CSV file for covid data/SAQ8.sav")
In above example, I first installed package named as haven
and then I used it to read sav
file.
This is all for this blog, and I hope you enjoyed it. Your feedback is highly appreciated, and I would love to hear your thoughts. Please leave your comments and suggestions below. Stay tuned for my next blog, where we’ll explore more exciting topics and insights.
Comments