Intro To R: February 2016

Saturday, February 27, 2016

Module 7 : R Objects

I came into this week feeling very optimistic about the coursework, due in large part to the fact that I’m familiar with object oriented programming from C++ and Java courses in past semesters. This doesn’t make the assignment any less challenging, mind you, but at least I have a more firm grasp on these concepts and can hit the ground running, so to speak. On to the assignment!

I chose to use a dataset called “discoveries” from the datasets package that consists of the yearly numbers of important discoveries from 1860 to 1959:

The bass type of the 'discoveries' object is double, and this is easily determined by typeof(discoveries). I also tried some other datasets in my environment that were defined as integers and typeof(a) (for example) confirmed that.

The second step in our assignment addressed generic functions which are functions that dispatch methods of a generic concept. Examples of generic functions in R include plot, mean, residuals, predict, summary and others. I chose to determine whether a generic function could be assigned to my "discoveries" dataset by using the plot function, plot(discoveries):

I also attempted some others such as summaries(discoveries):

I had success with a variety of generic functions, but I also tried plenty that did not work, such as logLik and predict which returned errors stating:

Error in UseMethod("predict/logLik") :

no applicable method for 'predict/logLik' applied to an object of class "ts"

The final step of the assignment is to determine whether S3 or S4 can be assigned to the dataset I chose. S3 and S4 are two object systems that are used in the R programming language. S3 objects are informal and more interactive than S4 which are more rigorous. The way to determine whether S3 or S4 can be assigned to a dataset is by using the S4() function. For example, S4(experiences) returns false which means that S3 can be applied.

Saturday, February 20, 2016

Module 6 : Math & Simulations II

Time is really moving fast this semester, isn't it? We are already submitting our Module 6 assignment, part two of a block focused on math and simulations. This module provided a lot of insight and instruction about related transposing matrix, multiplying it by a vector, finding the inverse of a matrix, and finding its determinant as well. These are some challenging concepts to grasp for some of us.

Instead of using 6 for nrows, I went with 10 for both A and B matrix data sets. In my opinion, the less conflicts the better. Transposing a matrix is very easy: t(matrix). Inputting the command t(A) output the matrix with 10 rows and 10 columns, cells numbering 1 to 100. Inputting the command t(B) output the matrix with 100 rows and 10 columns, cells numbering 1 to 1,000.

Multiplying the matrix by a vector was the next step, and I needed to create a vector. I did so easily and multiplied: X = a*A, Y = b*B. I also created a vector Z = a*B and displayed that as well to evaluate how it differed. After this, I reverted back to nrow=6 for both 1:100 and 1:1000. I then reassigned a to 1:17 and b to 1:167 and used %*% to multiply a against A, then B against b.

The next step in the assignment was to reverse the matrix. I changed A matrix to 1:4 with an nrow=2 and was able to invert using solve(A). It clearly became inverted! I then created a matrix using runif to generate random numbers ranging from 0 to 50 with 25 as the median. I then found the determinant by using det(A).

R can be pretty simple if you know your equations and can wrap your head around these concepts. It's not always easy, but I am not finding it to be quite as difficult for me to understand as C++ or Java, for example. I'm looking forward to more!

Sunday, February 14, 2016

Module 5 : Math and Simulations

This was a very challenging and interesting assignment for me. I am familiar with entering data sets now, and so I was very comfortable doing that. I created a side by side box plot to represent the data:

The left-most boxplot represents the first assessment of blood pressure levels. The middle boxplot represents the second assessment of blood pressure levels. The right-most boxplot represents the final decision. It is very interesting to see how the data sets are transformed into visual representations, and it allows us to see them in a different way and perhaps gain a little better insight into what it actually means.

I also created histograms for this module assigment as follows:

I am a little less clear on this portion of the assignment, and I feel like I may have done something wrong here. I was looking for a way to combine these into one histogram rather than have four separate visualizations. The histogram that depicts blood pressure makes the most sense to me because it clearly shows the distribution of blood pressures according to frequency.