Sunday, March 27, 2016

Module 11 : Debugging!

This module is perhaps one of the most helpful to a new programmer. At least for me it is, because my code is usually filled with bugs, and I spend a lot of time working through the code to resolve the errors and get it up and running. The study materials explored a variety of different ways to conduct sound debugging, and I appreciate that because it's good to have options. That said, nothing really seemed to work on the code we were presented in this assignment. It's highly probable that this is due to the fact that i'm new at R and wasn't entirely successful implementing those debugging steps!

As you're well aware, we started out with the following code:

tukey_multiple <- function(x) { 
   outliers <- array(TRUE,dim=dim(x)) 
   for (j in 1:ncol(x)) 
    { 
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j]) 
    } 
outlier.vec <- vector(length=nrow(x)) 
    for (i in 1:nrow(x)) 
    { outlier.vec[i] <- all(outliers[i,]) } return(outlier.vec) }

I set X equal to 5 and input the code into R Studio. This is the error message I received:

Error: unexpected symbol in:
"    for (i in 1:nrow(x)) 
    { outlier.vec[i] <- all(outliers[i,]) } return"

How disappointing! I actually did a lot of research at this point into the error itself, looking over several entirely unhelpful posts from Stack Overflow. The good news is that in doing that kind of digging/browsing for information, you often learn about other aspects that maybe weren't part of the assignment. I don't mind this phase of research because it sparks my interest and keeps me engaged.

After awhile I decided to clean up my code:

tukey_multiple <- function(x) { 
   outliers <- array(TRUE,dim=dim(x)) 
   for (j in 1:ncol(x)) 
    { 
    outliers[,j] <- outliers[,j] && tukey.outlier(x[,j]) 
    } 
outlier.vec <- vector(length=nrow(x)) 
    for (i in 1:nrow(x)) 
    { 
outlier.vec[i] <- all(outliers[i,])

return(outlier.vec)

This code yields no errors in R Studio and so I can only assume the bug was resolved by making sure everything was on its own line where appropriate. The brackets were crowding other lines of code and I think this is what may have been causing the error in the first place. If there is something more that should have been done, please let me know! I was sort of expecting to see some kind of result in R Studio other than a simple return without an error message, but I'm not sure what else I may be missing.

Sunday, March 13, 2016

Module 9 : Visualization, Graphics & R

This latest module in our course focuses on visualizing data sets using basic graphic as well as the more complex packages of lattice and ggplot2. There are a lot of really excellent elements within the various packages and methods that allow the user to visualize data in ways that accentuate results, or clarify scenarios.

I spent a lot of time reading about and playing around with the code, as well as how the code behaved with different data sets. The data set I decided to use for the submission of this assignment is USPop, which is a record of data that reflects the population of the United States from 1790 to 2000. Following is a visualization example that I created for basic, lattice and ggplot2 packages.

Base Graphics

Base graphics consists of the most basic options that are available in the R programming language. I experimented a lot with plot(), and came up with this:


plot(USPop, col="green", type"b", cex=1.5, pch=4)

Apologies for the small size of the image, but making it any larger would have conflicted with blogger's format. You can click the image and save it so that you can view it larger, if you like! I made the following modifications:

  • Changed color from black to green.
  • Set point character to X.
  • Increased plot point size 1.5 times.
  • Set plot style to points connected by line segments (b).


Lattice Package



The Lattice package is useful because it creates the entire plot at once. It's also able to display many plot points and handle those easier than base would. Lattice also falls short in some areas. It can be difficult to manipulate, and you cannot make changes to a lattice plot after it has been created. Lattice did not suit my data set very well, I believe because I only had two points. As such, I chose the xyplot:


xyplot(population~year, data=USPop, pch="*", cex=3)

I personally found this difficult to use, moreso than base, because I don't think my data set was really complex enough to leverage the benefits of lattice.

GGPlot2 Package

GGPlot2 is more like a design app than either of the other graphical methods available in R. GGPlot2 is excellent, and provides more variety in terms of changes the user can make, than either of the other packages as well. Following is an example I created using my data set:


ggplot(USPop, aes(year, population))

In my opinion, GGPlot2 is the best package to use because it provides the most options to work with different visualizations, etc. and is far and away the most powerful tool in this respect. I would have liked to work longer on this particular portion but I'm afraid I've run out of time! I look forward to utilizing GGPlot2 in future assignments and projects.

Sunday, March 6, 2016

Module 8 : Input/Output, String Manipulation & PLYR Package

Module 8 was one of the most interesting areas of focus so far in this course. I really enjoyed learning about how to use data from files, both in terms of input and output. I am beginning to envision different ways in which the R language can be used, and the ability to use files adds a very useful dimension to it.

The first step in this week's assignment was to import the dataset, which was provided as a .txt file initially. We have done this before, and it wasn't much of a challenge! I also installed the "pylr" package which is used for working with groups of information in a larger set, among other things.



I then ran plyr generates for the mean for both Age and Grade, split by gender. After this, I output the data from y into a file called Sorted_Average:


In order to make a .csv file, I made the following adjustment:


The final phase of the assignment was to output only those names that included the letter i or I, and then to output that to a file: