Input/Output, String Manipulation and Plyr
First, I had to install the plyr package in RStudio.
install.packages("plyr")
library(plyr)
For step one, I downloaded the assignment 6 dataset text file to my computer and imported it into R.
Then it was reading the text as only one column when trying to find the object Sex, resulting in "Error in FUN(X[[i]], ...) : object 'Sex' not found." To fix this, I had to make sure the read line had sep = "," because the file is comma-separated for columns.
#Step 1
student_assignment_6 <- read.table(file.choose(), header = TRUE, sep = ",")
Using the ddply function, I calculated the mean by the Sex category. We can see that the new file is now organized with the females in rows 2-17 and the males in rows 18-21, with a new column containing the calculated mean of each student.
students_gendered_mean = ddply(student_assignment_6, "Sex", transform, Grade.Average = mean(Grade))
write.table(students_gendered_mean, "Students_Gendered_Mean.txt")
In step two, I needed to filter the data set to include only those students who have the letter i in their name. Using the function grepl to search for the letter i and ignore.case to include both lowercase and uppercase i in this search.
#Step 2
i_students <- subset(student_assignment_6, grepl("i", student_assignment_6$Name, ignore.case = TRUE))
For the final step, I just wrote the now filtered data to a new CSV file. The dataset went from 20 students to 14 having an i in their name. 13 out of the 16 females and only 1 out of the 4 males.
#Step 3
write.table(i_students, "DataSubset.csv", sep = ",")



Comments
Post a Comment