Input/Output, String Manipulation and Plyr

First, I had to install the plyr package in RStudio.

install.packages("plyr")

library(plyr)


For step one, I downloaded the assignment 6 dataset text file to my computer and imported it into R.

At first, I had "<FileName>.txt" in the read line, but I was getting the error "cannot open file Assignment 6 Dataset.txt': No such file or directory", so I changed it to file.choose(). 

Then it was reading the text as only one column when trying to find the object Sex, resulting in "Error in FUN(X[[i]], ...) : object 'Sex' not found." To fix this, I had to make sure the read line had sep = "," because the file is comma-separated for columns.


#Step 1

student_assignment_6 <- read.table(file.choose(), header = TRUE, sep = ",")


Using the ddply function, I calculated the mean by the Sex category. We can see that the new file is now organized with the females in rows 2-17 and the males in rows 18-21, with a new column containing the calculated mean of each student.

students_gendered_mean = ddply(student_assignment_6, "Sex", transform, Grade.Average = mean(Grade))

write.table(students_gendered_mean, "Students_Gendered_Mean.txt")


In step two, I needed to filter the data set to include only those students who have the letter i in their name. Using the function grepl to search for the letter i and ignore.case to include both lowercase and uppercase i in this search.

#Step 2

i_students <- subset(student_assignment_6, grepl("i", student_assignment_6$Name, ignore.case = TRUE))


For the final step, I just wrote the now filtered data to a new CSV file. The dataset went from 20 students to 14 having an i in their name. 13 out of the 16 females and only 1 out of the 4 males.

#Step 3

write.table(i_students, "DataSubset.csv", sep = ",")





Comments

Popular posts from this blog

myMean Error and Correction

Election Data Frame

Doing Math on Matrices