Getting Started With R .

Introduction

R is a programming language and software environment for statistical computing and graphics supported by the R Foundation. Unlike general-purpose programming languages like Java and C, R was created by statisticians as an interactive environment. Interactivity is a critical characteristic that allows R to explore data effectively. It serves as both a programming language and a development environment for statistical and graphical analysis, covering various statistical testing approaches, including linear and non-linear modeling, classification, and more. During data analysis, different types of plots are often required.

To run R, we typically use an Integrated Development Environment (IDE), which is a software application that provides comprehensive facilities for software development, according to Wikipedia. The core component required for every R program is Base R, which contains the essential code segments necessary to successfully run our code.

History About R

Bell Labs developed the S language in 1976. In 1993, Ross Ihaka and Robert Gentleman created R in New Zealand. R became open source in 1995. R version 1.0.0 was released to the public in 2000. The RStudio IDE was released in 2011.

Drawback

  • R is built using S. If we want to build apps, R is probably one of our choices.
  • The objects that we work with must be stored in memory, and working with large datasets can quickly become a challenge.

Installing and Setting up R in your Windows

Step 1: Downloading installation file

  • Download R tools from Official Website
  • Next, we need to have an IDE, most popular one is Rstudio. We can download it from this link.

After downloading installation file, install them on desired places and then open the console.

After installation completed open R then we get window just like below

img

Now we can write our R codes within console or we can do it via Rstudio.

I prefer to use Jupyter Notebook for runing R because it is more friendly for me. A good tutorial is available at Anaconda’s Documentation.

My First R program

I am assigning variable in R as my first R programs.

Assigning Variable and operator in R

A variable is a container that stores values. An assignment statement sets or resets the value stored in the storage location(s) denoted by the variable name (according to Wikipedia). The assignment operator is a command that tells the computer to assign the text “apple” to the variable “product.” We can also assign it using assign('product', 'apple'). In R, we can assign variables in many ways, as shown below.

Way 1

('apple'-> product)

Way 2

(product = 'apple')

Way 3

assign('products', ' apple)

Logical Operators in R

Logical operator means those which gives True and False value. For example

Example 1

apple <- 2
banana <- 3
most_expensive <- banana> apple
most_expensive

Output of above code is,

TRUE

Example 2

apple <- 2
banana <- 3
most_expensive <- banana< apple
most_expensive

Output of above code is,

FALSE

Example 3

apple <- 2
banana <- 2
most_expensive <- banana == apple
most_expensive

Output is,

TRUE

Example 4

apple <- 2
banana <- 2
most_expensive <- banana != apple
most_expensive

Output is,

FALSE

Some Commonly Used Data Types in R

Data is centre for analysis if there is no data there is no analysis. Every piece of data are working with some characteristics thses characteristics can be summarize with data type.

  • Character : Anything inside quotation is a character.
  • Number: Number in R is double. Working with whole and fraction is a unique feature of double. Another is integer.
  • Integer Integer is actually simplified version of double. It store data as a string we must use capital letter L. In our use we need to use double rather than integer.
  • Logical(Boolean): Yes or No. Also T or F.
  • Complex Number: \(2 + 6i\)
  • Raw: It is not so popular data type. It is not easy to create variable of raw type. If we really need to create raw function as a result of calling this function we get raw type data.

All the fundamental data types are called atomic data type.

Example of numbers

An integer:

a <- 2L
class(a)

Output is,

'integer'

A numeric:

a <- 2
class(a)

Output is,

'numeric'
quantity <- 2
typeof(quantity)

Output is,

'double'
quantity_integer <- 2L
typeof(quantity_integer)

Output is,

'integer'

Comments

Comments are used to provide important information about the code. They are not executed by the program but are written by the programmer to enhance the explanation of the code.

# This is a comment in R

Exploring vectors and factors

A data structure, as the name suggests, represents a way to organize data to facilitate different operations and perform faster calculations.

  • Vectors: Collection of data of same structure.
  • Factors: Which are used to store categorical data.
  • Array: Is a matrix which are generalization of vectors.
  • List\DataFrame: Elements of different lists are data frames. Lists are more complex data structures because they allow us to store other lists as well. We can think of a data frame as a spreadsheet where data is organized into columns and rows, with each column having a specific data type. Within a data frame, we can have various data types, but within one column, we have only one data type. Another criterion for categorizing our data is by dimension:

  • Vectors and lists are one-dimensional objects.
  • Matrices and data frames are two-dimensional data structures.
  • Arrays are objects that have more than two dimensions.

Vector have two properties they are one dimensional and containing element of same type.

Assigning a column vector

Lets assign a column vector,

assign('b',c(1,2,3,4))
print(b)

Output is,

1 2 3 4

Vectors attributes:

  • length: It is denoted by length(a) and its meaning is number of elements.
  • Name: names(a), it allows us to add element in the list.
  • Type: typeof(a), It gives type of data.

There are six vectors types

  • Double
  • logical
  • character
  • complex
  • Raw
  • Integer
vector <- c("Durga","Puja","Ram","Hari")
vector
length(vector) # length 
names(vector)= "Sita" #names
typeof(vector) # type
vector

Output is,

'Durga''Puja''Ram''Hari'
4
'character'
Sita'Durga'2'Puja'3'Ram'4'Hari'

Manipulating vectors.

Manipulating of vectors consists of sorting, ordering, indexing.

  • sorting: Sort the data in some order.
  • Ordering: The order function return the index needed to get the vector sort.
  • Indexing: Selecting specifics iteam by position.
quantity <- c(1,3,2,5,6,7)
sort(quantity)
order(quantity)

Output is,

1 2 3 5 6 7
1 3 2 4 5 6
a <- c(1,7,36,0,7,5)
a[2]
a[3:5]
a[c(2,4)]
a[c(4,7)]# it return particular element from vector
a[-2]
a[-(2:4)] # it skip the element in the vector.
a[a==1]
a[a>3]
a[a %in%c(2,4)] # it gives matching element.

Output is,

7
36 0 7
7 0
0 <NA>
1 36 0 7 5
1 7 5
1
7 36 7 5

Operating vector

Adding or multipling vector of different size is called recycling rule. For recycling largest vector must be multiple of small one.

c <- 1:6
d <- 1:3
c * d

Output is,

1 4 9 4 10 18

Sequence generation

It is used to create a sequence of elements in a vector. The seq() function takes length and the difference between values as optional arguments. In the code below, I took elements in the range from 1 to 5 with an interval of 1.5.

Example:

seq(1,5,by = 1.5)

Output is,

1 2.5 4

Replicating elements

It is used to return the replicated elements in the list a specified number of times. In the following code, I replicated the numbers from 1 to 6 two times using the built-in function rep().

Example:

e<- rep(1:6,times = 2)
e

Output is,

1 2 3 4 5 6 1 2 3 4 5 6

We can replicate the same number at desirable times.

x <- rep(c(1),each = 10)
x

Out put is,

1 1 1 1 1 1 1 1 1 1

Scan Function

The scan() function reads a file into a vector and is a powerful function. In the code given below, the scan() function reads the file covid_data.csv.

f <- scan("covid data.csv", what = "Character")
f

Out put of the above code is,

'date,totalCases,newCases,totalRecoveries,newRecoveries,totalDeaths,newDeaths' '1/23/2020,1,1,0,0,0,0' '1/24/2020,0,0,0,0,0,0' '1/25/2020,0,0,0,0,0,0' '1/26/2020,0,0,0,0,0,0' '1/27/2020,0,0,0,0,0,0' '1/28/2020,0,0,0,0,0,0' '1/29/2020,0,0,0,0,0,0' '1/30/2020,0,0,0,0,0,0' '1/31/2020,0,0,1,1,0,0' '2/1/2020,0,0,1,0,0,0' '2/2/2020,0,0,1,0,0,0' '2/3/2020,0,0,1,0,0,0' '2/4/2020,0,0,1,0,0,0' '2/5/2020,0,0,1,0,0,0' '2/6/2020,0,0,1,0,0,0' '2/7/2020,0,0,1,0,0,0' '2/8/2020,0,0,1,0,0,0' '2/9/2020,0,0,1,0,0,0' '2/10/2020,0,0,1,0,0,0' '2/11/2020,0,0,1,0,0,0' '2/12/2020,0,0,1,0,0,0' '2/13/2020,0,0,1,0,0,0' '2/14/2020,0,0,1,0,0,0' '2/15/2020,0,0,1,0,0,0' '2/16/2020,0,0,1,0,0,0' '2/17/2020,0,0,1,0,0,0' '2/18/2020,0,0,1,0,0,0' '2/19/2020,0,0,1,0,0,0' '2/20/2020,0,0,2,1,0,0' '2/21/2020,0,0,2,0,0,0' '2/22/2020,0,0,2,0,0,0' '2/23/2020,0,0,2,0,0,0' '2/24/2020,0,0,2,0,0,0' '2/25/2020,0,0,2,0,0,0' '2/26/2020,0,0,2,0,0,0' '2/27/2020,0,0,2,0,0,0' '2/28/2020,0,0,2,0,0,0' '2/29/2020,0,0,2,0,0,0' '3/1/2020,0,0,2,0,0,0' '3/2/2020,0,0,2,0,0,0' '3/3/2020,0,0,2,0,0,0' '3/4/2020,0,0,2,0,0,0' '3/5/2020,0,0,2,0,0,0' '3/6/2020,0,0,2,0,0,0' '3/7/2020,0,0,2,0,0,0' '3/8/2020,0,0,2,0,0,0' '3/9/2020,0,0,2,0,0,0' '3/10/2020,0,0,2,0,0,0' '3/11/2020,0,0,2,0,0,0' '3/12/2020,0,0,2,0,0,0' '3/13/2020,0,0,2,0,0,0' '3/14/2020,0,0,2,0,0,0' '3/15/2020,0,0,2,0,0,0' '3/16/2020,0,0,2,0,0,0' '3/17/2020,0,0,2,0,0,0' '3/18/2020,0,0,2,0,0,0' '3/19/2020,0,0,2,0,0,0' '3/20/2020,0,0,2,0,0,0' '3/21/2020,0,0,2,0,0,0' '3/22/2020,0,0,2,0,0,0' '3/23/2020,1,1,2,0,0,0' '3/24/2020,1,0,2,0,0,0' '3/25/2020,2,1,2,0,0,0' '3/26/2020,2,0,2,0,0,0' '3/27/2020,3,1,2,0,0,0' '3/28/2020,4,1,2,0,0,0' '3/29/2020,4,0,2,0,0,0' '3/30/2020,4,0,2,0,0,0' '3/31/2020,4,0,2,0,0,0' '4/1/2020,4,0,2,0,0,0' '4/2/2020,5,1,2,0,0,0' '4/3/2020,5,0,2,0,0,0' '4/4/2020,8,3,2,0,0,0' '4/5/2020,8,0,2,0,0,0' '4/6/2020,8,0,2,0,0,0' '4/7/2020,8,0,2,0,0,0' '4/8/2020,8,0,2,0,0,0' '4/9/2020,8,0,2,0,0,0' '4/10/2020,8,0,2,0,0,0' '4/11/2020,8,0,2,0,0,0' '4/12/2020,11,3,2,0,0,0' '4/13/2020,13,2,2,0,0,0' '4/14/2020,15,2,2,0,0,0' '4/15/2020,15,0,2,0,0,0' '4/16/2020,15,0,2,0,0,0' '4/17/2020,29,14,2,0,0,0' '4/18/2020,30,1,4,2,0,0' '4/19/2020,30,0,5,1,0,0' '4/20/2020,30,0,5,0,0,0' '4/21/2020,41,11,6,1,0,0' '4/22/2020,44,3,8,2,0,0' '4/23/2020,47,3,9,1,0,0' '4/24/2020,48,1,11,2,0,0' '4/25/2020,48,0,12,1,0,0' '4/26/2020,51,3,14,2,0,0' '4/27/2020,51,0,14,0,0,0' '4/28/2020,53,2,14,0,0,0' '4/29/2020,56,3,14,0,0,0' '4/30/2020,56,0,14,0,0,0' '5/1/2020,58,2,14,0,0,0' '5/2/2020,58,0,14,0,0,0' '5/3/2020,74,16,14,0,0,0' '5/4/2020,74,0,14,0,0,0' '5/5/2020,81,7,14,0,0,0' '5/6/2020,98,17,20,6,0,0' '5/7/2020,100,2,20,0,0,0' '5/8/2020,101,1,28,8,0,0' '5/9/2020,108,7,29,1,0,0' '5/10/2020,109,1,29,0,0,0' '5/11/2020,133,24,31,2,0,0' '5/12/2020,216,83,31,0,0,0' '5/13/2020,242,26,33,2,0,0' '5/14/2020,248,6,33,0,1,1' '5/15/2020,266,18,34,1,1,0' '5/16/2020,280,14,34,0,1,0' '5/17/2020,294,14,34,0,3,2' '5/18/2020,374,80,34,0,3,0' '5/19/2020,401,27,35,1,3,0' '5/20/2020,426,25,43,8,3,0' '5/21/2020,456,30,47,4,4,1' '5/22/2020,515,59,68,21,4,0' '5/23/2020,583,68,68,0,5,1' '5/24/2020,602,19,85,17,5,0' '5/25/2020,681,79,110,25,5,0' '5/26/2020,771,90,152,42,5,0' '5/27/2020,885,114,180,28,6,1' '5/28/2020,1041,156,184,4,6,0' '5/29/2020,1211,170,184,0,6,0' '5/30/2020,1400,189,188,4,7,1' '5/31/2020,1571,171,189,1,8,1' '6/1/2020,1810,239,190,1,8,0' '6/2/2020,2098,288,235,45,9,1' '6/3/2020,2299,201,238,3,11,2' '6/4/2020,2633,334,256,18,12,1' '6/5/2020,2911,278,289,33,12,0' '6/6/2020,3234,323,295,6,13,1' '6/7/2020,3447,213,340,45,13,0' '6/8/2020,3760,313,363,23,14,1' '6/9/2020,4083,323,394,31,15,1' '6/10/2020,4362,279,394,0,17,2' '6/11/2020,4612,250,394,0,17,0' '6/12/2020,5059,447,394,0,18,1' '6/13/2020,5334,275,394,0,19,1' '6/14/2020,5759,425,394,0,19,0' '6/15/2020,6210,451,1044,650,20,1' '6/16/2020,6590,380,1161,117,20,0' '6/17/2020,7176,586,1170,9,22,2' '6/18/2020,7847,671,1189,19,22,0' '6/19/2020,8273,426,1405,216,23,1' '6/20/2020,8604,331,1581,176,23,0' '6/21/2020,9025,421,1775,194,23,0' '6/22/2020,9558,533,2151,376,24,1' '6/23/2020,10098,540,2225,74,24,0' '6/24/2020,10727,629,2339,114,25,1' '6/25/2020,11161,434,2651,312,27,2' '6/26/2020,11754,593,2699,48,27,0' '6/27/2020,12308,554,2835,136,29,2' '6/28/2020,12771,463,3014,179,30,1' '6/29/2020,13247,476,3135,121,30,0' '6/30/2020,13563,316,3195,60,30,0' '7/1/2020,14045,482,4657,1462,33,3' '7/2/2020,14518,473,5321,664,33,0' '7/3/2020,15258,740,6144,823,33,0' '7/4/2020,15490,232,6416,272,34,1' '7/5/2020,15783,293,6548,132,35,1' '7/6/2020,15963,180,6812,264,35,0' '7/7/2020,16167,204,7500,688,36,1' '7/8/2020,16422,255,7753,253,36,0' '7/9/2020,16530,108,7892,139,38,2' '7/10/2020,16648,118,8012,120,39,1' '7/11/2020,16718,70,8443,431,39,0' '7/12/2020,16800,82,8590,147,39,0' '7/13/2020,16944,144,10295,1705,39,0' '7/14/2020,17060,116,10329,34,39,0' '7/15/2020,17176,116,11026,697,40,1' '7/16/2020,17343,167,11250,224,41,1' '7/17/2020,17444,101,11388,138,41,0' '7/18/2020,17501,57,11491,103,41,0' '7/19/2020,17657,156,11549,58,41,0' '7/20/2020,17843,186,11722,173,41,0' '7/21/2020,17993,150,12331,609,42,1' '7/22/2020,18093,100,12538,207,44,2' '7/23/2020,18240,147,12694,156,44,0' '7/24/2020,18373,133,12801,107,47,3' '7/25/2020,18482,109,12907,106,47,0' '7/26/2020,18612,130,12982,75,49,2' '7/27/2020,18751,139,13608,626,50,1' '7/28/2020,19062,311,13729,121,50,0' '7/29/2020,19272,210,13875,146,53,3' '7/30/2020,19546,274,14102,227,56,3' '7/31/2020,19770,224,14253,151,57,1' '8/1/2020,20085,315,14346,93,59,2' '8/2/2020,20331,246,14457,111,59,0' '8/3/2020,20749,418,14815,358,61,2' '8/4/2020,21008,259,14880,65,62,1' '8/5/2020,21389,381,15010,130,67,5' '8/6/2020,21749,360,15243,233,71,4' '8/7/2020,22213,464,15668,425,74,3' '8/8/2020,22591,378,16167,499,76,2' '8/9/2020,22971,380,16207,40,80,4' '8/10/2020,23309,338,16347,140,83,3' '8/11/2020,23947,638,16518,171,86,3' '8/12/2020,24431,484,16582,64,95,9' '8/13/2020,24956,525,16691,109,96,1' '8/14/2020,25550,594,16931,240,101,5' '8/15/2020,26018,468,17055,124,102,1' '8/16/2020,26659,641,17189,134,104,2' '8/17/2020,27240,581,17349,160,107,3' '8/18/2020,28256,1016,17434,85,114,7' '8/19/2020,28937,681,17554,120,120,6' '8/20/2020,29644,707,17818,264,126,6' '8/21/2020,30482,838,18068,250,137,11' '8/22/2020,31116,634,18204,136,146,9' '8/23/2020,31934,818,18485,281,149,3' '8/24/2020,32677,743,18660,175,157,8' '8/25/2020,33532,855,18973,313,164,7' '8/26/2020,34417,885,19358,385,175,11' '8/27/2020,35528,1111,19927,569,183,8' '8/28/2020,36455,927,20096,169,195,12' '8/29/2020,37339,884,20409,313,207,12' '8/30/2020,38560,1221,20676,267,221,14' '8/31/2020,39459,899,21264,588,228,7' '9/1/2020,40528,1069,22032,768,239,11' '9/2/2020,41648,1120,23144,1112,250,11' '9/3/2020,42876,1228,24061,917,257,7' '9/4/2020,44235,1359,25415,1354,271,14' '9/5/2020,45276,1041,26981,1566,280,9' '9/6/2020,46256,980,28795,1814,289,9' '9/7/2020,47235,979,30531,1736,300,11' '9/8/2020,48137,902,32818,2287,306,6' '9/9/2020,49218,1081,33736,918,312,6' '9/10/2020,50464,1246,35554,1818,317,5' '9/11/2020,51918,1454,36526,972,322,5' '9/12/2020,53119,1201,37378,852,336,14' '9/13/2020,54158,1039,38551,1173,345,9' '9/14/2020,55328,1170,39430,879,360,15' '9/15/2020,56787,1459,40492,1062,371,11' '9/16/2020,58326,1539,41560,1068,379,8' '9/17/2020,59572,1246,42803,1243,383,4' '9/18/2020,61592,2020,43674,871,390,7' '9/19/2020,62796,1204,45121,1447,401,11' '9/20/2020,64121,1325,46087,966,411,10' '9/21/2020,65275,1154,47092,1005,427,16' '9/22/2020,66631,1356,47915,823,429,2' '9/23/2020,67803,1172,49808,1893,436,7' '9/24/2020,69300,1497,50265,457,452,16' '9/25/2020,70613,1313,51720,1455,458,6' '9/26/2020,71820,1207,52867,1147,466,8' '9/27/2020,73393,1573,53752,885,476,10' '9/28/2020,74744,1351,54494,742,481,5' '9/29/2020,76257,1513,55225,731,491,10' '9/30/2020,77816,1559,56282,1057,498,7'

Conversion of different type of data into character type is called implicit coercian

R convert coerced data type into character.

x <- c(1,'two',4,"durga")
x
typeof(x)

Output is,

'1' 'two' '4' 'durga'
'character'

Explicit type coercian

  • We do this by typing as.desire data type. Explicit type coercian helps us to deal with incorectly catagorized data.
  • We can not transfer numeric into character
  • Character into numberic.
    num <- 1:5
    num_char <- as.character(num)
    num_char
    

    Output is,

    '1' '2' '3' '4' '5'
    
    product <- c("apple",1,"banana")
    as.numeric(product)
    

    Output is,

Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"
<NA> 1 <NA>

Installing Packages in R

There are numerous useful packages to do various tasks in R and with those packages, we could do things better and faster way. Once simpler way to install packages is via console;

install.packages("haven")
library("haven") # allows to read sav file
saq8 <- read_sav("F:/Statisticts with R/CSV file for covid data/SAQ8.sav")

In above example, I first installed package named as haven and then I used it to read sav file.

This is all for this blog, and I hope you enjoyed it. Your feedback is highly appreciated, and I would love to hear your thoughts. Please leave your comments and suggestions below. Stay tuned for my next blog, where we’ll explore more exciting topics and insights.

Comments