Basics of Manipulating Data

Key Terms:

  • Function: A method that calculates some quantity based on inputs that can be applied to a range of data

  • Filter: A feature that allows for the sorting and slicing of data

  • Cell: A single reference point for data

  • Range: A rectangular group of cells, such as rows or columns

Filters

A simple yet powerful tool for organizing data is filters, which also include a feature called slicing. Filters require no special syntax, but have certain conditions to be applied. They can only be applied vertically, and must be applied in such a way that accurately sorts all affected columns (an example of bad filtering: only applying the filters to a column of values, not including the names, so when we sort the values, the names are not accordingly reordered and stay where they were).

Once a filter is applied, it can be easily customized to fit various needs. Filters can span across multiple columns, allowing a single dataset to be sorted by any one column attribute instantly. Additionally, when applying a filter, the drop-down menu allows for slicing, or selective viewing of a dataset. Filters are incredibly versatile because they not only sort numerically, but can also take in various inputs such as if the cell contains a certain phrase or even sort by the color of the cell! Filters are a simple way to make use of an existing well-organized table and easily reorder and limit the user view to relevant information.

Functions

The key to using functions is understanding each function’s unique syntax. It is not practical nor useful to provide a comprehensive list of functions here when one already exists at https://support.google.com/docs/table/25273?hl=en. Rather, this section will focus on providing a basic understanding of how functions work, in particular, viewing cells and ranges as large pieces of data, and how to understand the general notation of a function through some commonly used examples.

First and foremost, it is important to understand how cells and ranges are referenced in functions; after all, functions need input data to work with and manipulate! Individual cells are referenced by their column letter and row number: for example, A3 is in the first column and third row. Ranges are referenced by the top left cell, then a colon, then the bottom right cell: for example, A3:B7 is a range that spans from columns A to B and rows 3 to 7. In fact, rows and columns themselves are ranges of data: we can call upon the entire first column by simply saying A:A, or go from the sixth to tenth rows by saying 6:10. This notation is shared across all functions. If this notation is not intuitive, be aware that when typing out a function’s input, you are able to highlight a range of cells, and that range will be automatically translated into this colon notation. Finally, to reference a cell that’s not in the current sheet, or if you want to be extra sure about your formula’s references in case that formula is copied over to another sheet, use the notation Sheet!Range to specify which sheet should be referenced, since all sheets share the same cell location convention.

In regards to function syntax, all functions follow the same convention of name(arguments), which is fairly similar to Python syntax! In order to insert a function into a cell, use the = character to signal that a function formula is beginning. Thus, for example, if we wanted to have a cell that calculates the average of the range A3:B9, we would input =AVERAGE(A3:B9) into the cell that we want to display the average. Functions like SUM, MIN, MAX, AVERAGE, and COUNT are helpfully provided in the toolbar under the sigma icon, and take expected arguments: since these all calculate a single quantity out of a range of data, the only input they take is the range in question.

Aside from the usual basic mathematical functions, there are some functions that are useful to know when creating more complex spreadsheets. There are conditional functions, such as IF, IFS, and SWITCH which are powerful tools to calculate different quantities based on the inputs. Beginning with IF, this function takes in 3 arguments: a logical expression, an output if true, and an output if false. This is expanded upon with the IFS function, which takes in pairs of expressions and outputs, which is more useful if you know every single case for what you might want to calculate. Finally, SWITCH is another tool that allows for flexible outputs depending on a logical expression. It takes in an expression, then case-value pairs. For example, we might set our expression as a particular cell that contains a student’s grade, then check it against a couple cases: If the grade is above 90, we return the value “A”, and if the grade is above 80, we return the value “B”, and so on. It is important to note that multiple values will not be returned, rather, the IFS and SWITCH functions will short-circuit and return the value of the first condition that is true.

Another important tool are lookup-based functions, which are incredibly useful when we have a constant, documented connection between keys and values. Some of these functions are LOOKUP, VLOOKUP, and HLOOKUP. LOOKUP takes in a search key, a search range, and an optional result range. It looks for the search key within the search range, and will return the corresponding value from the result range based on the search key’s location. For example, we might have a reference table of product names and their prices. We could use LOOKUP to search for the particular product name the user has inputted into a cell, then specify the result range as the column of product prices. The result would be the price of the specified product! VLOOKUP and HLOOKUP are very similar, but helpfully work when the data is not sorted. As the names suggest, VLOOKUP is a vertical-only version of LOOKUP, and HLOOKUP is the same except horizontal. They take the same arguments, but apply to either columns (VLOOKUP) or rows (HLOOKUP). For the purposes of this module, VLOOKUP will be used as it is generally more common to store data vertically, but know that “column” can be exchanged for “row” and vice versa in this context. VLOOKUP takes in 4 arguments: a key, a range, an index, and an optional parameter for if the data is sorted. The key and range work exactly the same as in LOOKUP, while index is analogous to the result range. Index refers to the index of the column in which data should be returned from, where the first column specified in RANGE is numbered 1. The sorted parameter can take either TRUE (default) or FALSE values, where TRUE will return the closest match to the search key (“less than or equal to” the search key) while FALSE forces the function to find an exact match, or returns an error if there is no such value.

This section will conclude with a discussion of some other useful mathematical functions, but there are many other functions that you may find useful for specific calculations. The existing documentation for functions is usually quite detailed, and also provides examples of the functions in use. Chances are, if you can think of a function that you might need, it probably exists in some form or as a combination of a couple functions! One useful statistical function is CORREL, which takes in 2 ranges of data and calculates the correlation coefficient (how strongly and in what direction the variables are related) of the 2 data sets. We might, for example, examine the correlation coefficient between the amount of money dedicated to advertising and the amount of product sold, and see if there is a relationship between the two to make a business decision. There are also a variety of variations on the basic AVERAGE function, namely AVERAGEIF, AVERAGEIFS, TRIMMEAN, and AVERAGE.WEIGHTED. AVERAGEIF and AVERAGEIFS work similarly to the IF and IFS functions, in that they take in a range, then check that range against a specific criteria, then take the average of the values (potentially in a separate range) that satisfied the criteria. For example, we might want to take the average of the prices of products which have product numbers above 100, so we would input =AVERAGEIF([range of product numbers], “>100”, [range of product prices]). AVERAGEIFS works similarly, except the range to average is listed first, then followed by pairs of ranges and criterion to check those ranges against. TRIMMEAN is another helpful variation on the AVERAGE function that “trims” a specified proportion of data from the extremes of the data set, then calculates the average of the remaining values. Finally, AVERAGE.WEIGHTED calculates a weighted average and takes in pairs of values and weights. For example, we might use this to calculate a student’s grade in a class where homework is worth 25%, quizzes are worth 20%, tests are worth 30%, and the final exam is worth 25%. We would store the averages for each category in one column, then the weights in another, then call =AVERAGE.WEIGHTED(scores column, weights column) to calculate the student’s grade.

Last updated