Creating an Evaluation tool & Dashboard
THE GOAL
My goal for this project is to create an evaluation tool and dashboard that can be used to track teacher observation data and provide insights for growth management and coaching. In this case, we’ll be looking at creating a tool that can be used for a school (the imaginary "Charest Academy") in Rhode Island.
STEP 1: RESEARCH
To start off, I did some research on the requirements and structures that exist for teacher evaluation in Rhode Island through the Rhode Island Department of Education . Here’s some of the main resources I examined at this stage:
I kept track of what I learned on the "Research" tab of my Observation Workbook. In reviewing these materials, I got a better sense of what the evaluation system requires, and, more importantly, what the observation tools are lacking. In order to conduct a meaningful observation, an evaluator has to have intricate knowledge of the professional practice rubric, OR, keep a copy of it with them while completing the observation.
This is all the state-created observation form includes:
There is VERY little guidance here on the form itself.
The framework outline in the appendix of the teacher guidance document offers a little more guidance of what evaluators are looking for, as well as a full breakdown of each domain and component.
However, if I’m conducting an evaluation, the observation tool on its own is not enough, and the total rubric (all 16 pages of it for the relevant domains and subfactors) is too much.
STEP 2: DESIGNING THE OBSERVATION FORM
A. Developing the Scale
The first step was figuring out the scale for the observer to follow (because this will influence the language of the measures in the observation form). Both the state-created observation form and rubric have a 4-point scale.
However, there’s no clear delineation between each of the steps. In addition, the actual scoring connection to the “Final Effectiveness Rating” is not beholden to that scale, as it is out of 100 points. So, for ease of this project, and for our would-be observers, I used a 5-point Likert scale that I had created for one of my projects at work (see "Scale" tab of the Observation Workbook).
This works out perfectly, because the “component sum” can just be the sum of the average of each of the components (max 20, min 4), and then you can multiply that score by 5 to get the weighted points for the overall “Final Effectiveness Rating.”
B. Creating the Measures
The next step was creating a new observation form, with specific measurements for each component. To do this, I looked at the “Indicators” and Level 4 “Critical Attributes” for each component (trying to connect the critical attributes to appropriate indicators).
I organized these into a Google Sheets file (tab "Overview" of the Observation Workbook) to try to determine the individual measures I would include in the form itself. Once I had that organized, I reviewed this along with the language used in the listed elements of the components from the rubric to create a list of metrics to be measured in the observation form.
There were some challenges to this, since the attributes didn’t always align well with the indicators on the rubric document, and there was some overlap between the domain components. It took me a little while to create a list of metrics that were 1) clear statements, 2) measurable, and 3) properly aligned.
In the end, I developed 5 distinct measures for each component. You can see the full list in the "Measurements" tab of the Observation Workbook.
C. Creating the Form
This was probably the easiest step of all. There are many tools to collect data (SurveyMonkey, Typeform, Qualtrics, etc.), but I went with Old Faithful - Google Forms. I know from my time as a teacher that this is something EVERYONE knows how to use. In fact, most schools use Google Suite as their main tool, so it’d be easily accessible for school leadership and evaluators.
I made sure to also include in the form details about the lesson, date, start time, stop time, etc. This can be helpful for analysis later on.
D. Creating Fake Data
For this, as much as I have issues with it, rather than creating a bunch of responses manually, I relied on ChatGPT. It was a little bit of a task, and I had to do a little bit of finagling to get the data to match the output in the form. The first dataset was fairly good with the non-metrics, but used the wrong Likert-Scale. The second dataset was a little better, but the information didn’t always make sense. I ran out of the free-ability for ChatGPT, and I didn’t want to wait, so I made the best of it, and made some slight modifications on the raw data so it fit the correct format of the form. (Admittedly, I spent more time fiddling with this than I should have…)
STEP 3: CLEANING & PREPARING THE DATA
A. Organizing the Data
First, I always like to have a clean and organized sheet of just the form responses, just to make it easier to read and review quickly. So, I just created a new tab called “Responses_Wide” in the Response sheet where the questions are easier to read, and more clearly connected to their domains and components.
In addition, I created a “Reference” tab in the sheet that did the following:
B. Aggregating the Data
This is something that could be done in Looker through calculations, but I always find it easier to do it in sheets. Plus, some folks love spreadsheets, so having all the data in one source to easily pull later works best for them.
The first thing I did was create a field for “Minutes Observed.” This would take the difference between the end time and the start time. This revealed the first flaw of the ChatGPT dataset, and I had to adjust the start/end time so we wouldn’t have negative minutes, or thousand minute observations. I didn’t go crazy here, so my fake dataset still has some… lengthy observations.
Then I created averages for each of the components, domains, and a calculation for the total scores. For each calculation I did, I made sure to include a =IF(ISBLANK(),””) function at the start, and then carried that down the full column. This way, as more observations are completed, the sheet will still aggregate the data.
C. Converting the data to Long Format
The final data adjustment in sheets was to convert the data to long format, because there are some visualizations that I planned to do in Looker Studio that only work if the data is in long format.
To do this, I had to use a combination of an ARRAYFORMULA, SPLIT, and FLATTEN formulas, and then combine with a FILTER function that filtered the return to only display rows that started with a number (since my first column was a date).
I then used a few VLOOKUPs surrounded by IFERRORS to connect the measurement to the proper domain and component. And voilà, the data is ready for the dashboard.
STEP 4: THE DASHBOARD TOOL
I created the dashboard in Looker Studio. I created six distinct pages, each with a separate purpose:
Here you can see the full dashboard:
With two different datasets (wide and long format), normally, you would set up “Blends” to connect the data. BUT, sometimes this is a challenge with Looker Studio, because of the limitations to the number of fields in blended data. One way to work around this is to create connected field IDs, like I did below:
STEP 5: FEEDBACK FORM
I created a template feedback form in Google Docs that had the key information that evaluators would want to share with teachers post-observation. Specifically, it includes the scores (out of 5) for the domains and components of each domain, as well as any comments from the observation.
With the help of ChatGPT (because I do NOT know JavaScript), I created a Google Apps Script code that would copy the template and fill in all the relevant information from a feeder sheet (which I created using a combined IMPORTRANGE and QUERY function to bring in the relevant fields).
As a result of the code, I had a folder of all the feedback forms with the appropriate information filled for each observation.
FINAL THOUGHTS
While this dashboard and reporting tool is tailored for the imaginary "Charest Academy," it really could be adapted by anyone using the Rhode Island Model of evaluation set forth by the Rhode Island Department of Education . The insights from this could be far more substantial than what is currently provided by the DOE in terms of: