Using roxygen now
marberts committed Oct 14, 2023
Package: rsmatrix
Title: Matrices for Repeat-Sales Price Indexes
Authors@R: c(
person(given = "Steve", family = "Martin", role = c("aut", "cre", "cph"), email = "[email protected]", comment = c(ORCID = "0000-0003-2544-9480"))
Config/testthat/edition: 3
VignetteBuilder: knitr
RoxygenNote: 7.2.3
Roxygen: list(markdown = TRUE)
export(rs_matrix, rs_pairs, rs_var)
# Generated by roxygen2: do not edit by hand

importFrom(Matrix, sparseMatrix, rowSums)
importMethodsFrom(Matrix, solve, crossprod, tcrossprod)
#---- Helper functions (internal) ----
#' Test if inputs have the same length
#' @noRd
different_lengths <- function(...) {
res <- lengths(list(...))
any(res != res[1L])

#' Compute the Z matrix
#' @noRd
rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {
# coerce t2 and t1 into characters prior to taking the union
# so that both dates and factors are treated the same
Expand All @@ -13,8 +16,10 @@ rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {
t2 <- factor(t2, lev)
t1 <- factor(t1, lev)
if (any(unclass(t2) <= unclass(t1))) {
warning("all elements of 't2' should be greater than the corresponding ",
"elements in 't1'")
"all elements of 't2' should be greater than the corresponding ",
"elements in 't1'"
"elements in 't1'"

# make row names before interacting with f
Expand Down Expand Up @@ -44,9 +49,10 @@ rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {
t2 <- t2[non_zero]
t1 <- t1[non_zero]
if (sparse) {
res <- sparseMatrix(, 2), c(t2, t1),
x = rep(c(1, -1), each = length(i)),
dims = dims)
res <- Matrix::sparseMatrix(, 2), c(t2, t1),
x = rep(c(1, -1), each = length(i)),
dims = dims
} else {
res <-, prod(dims))
res[(t2 - 1L) * dims[1L] + i] <- 1
Expand All @@ -62,9 +68,95 @@ rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {

#' Compute X matrix
#' @noRd
rs_x_ <- function(z, p2, p1) (z > 0) * p2 - (z < 0) * p1

#---- All matrices ----
#' Shiller's repeat-sales matrices
#' Create a function to compute the \eqn{Z}, \eqn{X}, \eqn{y}, and \eqn{Y}
#' matrices in Shiller (1991, sections I-II) from sales-pair data in order to
#' calculate a repeat-sales price index.
#' The function returned by `rs_matrix()` computes a generalization of the
#' matrices in Shiller (1991, sections I-II) that are applicable to grouped
#' data. These are useful for calculating separate indexes for many, say,
#' cities without needing an explicit loop.
#' The \eqn{Z}, \eqn{X}, and \eqn{Y} matrices are not well defined if either
#' `t1` or `t2` have missing values, and an error is thrown in this
#' case. Similarly, it should always be the case that `t2 > t1`, otherwise
#' a warning is given.
#' @param t2,t1 A pair of vectors giving the time period of the second and
#' first sale, respectively. Usually a vector of dates, but other values are
#' possible if they can be coerced to character vectors and sorted in
#' chronological order (i.e., with [`order()`]).
#' @param p2,p1 A pair of numeric vectors giving the price of the second and
#' first sale, respectively.
#' first sale, respectively.
#' @param f An optional factor the same length as `t1` and `t2`, or a
#' vector to be turned into a factor, that is used to group sales.
#' @param sparse Should sparse matrices from the \pkg{Matrix} package be used
#' (faster for large datasets), or regular dense matrices (the default)?
#' @return A function that takes a single argument naming the desired matrix.
#' It returns one of two matrices (\eqn{Z} and \eqn{X}) or two vectors
#' (\eqn{y} and \eqn{Y}), either regular matrices if `sparse = FALSE`, or sparse
#' matrices of class `dgCMatrix` if `sparse = TRUE`.
#' @seealso [rs_pairs()] for turning sales data into sales pairs.
#' @references Bailey, M. J., Muth, R. F., and Nourse, H. O. (1963). A
#' regression method for real estate price index construction.
#' *Journal of the American Statistical Association*, 53(304):933-942.
#' Shiller, R. J. (1991). Arithmetic repeat sales price estimators.
#' *Journal of Housing Economics*, 1(1):110-126.
#' @examples
#' # Make some data
#' x <- data.frame(
#' date = c(3, 2, 3, 2, 3, 3),
#' date_prev = c(1, 1, 2, 1, 2, 1),
#' price = 6:1,
#' price_prev = 1
#' )
#' # Calculate matrices
#' mat <- with(x, rs_matrix(date, date_prev, price, price_prev))
#' Z <- mat("Z") # Z matrix
#' X <- mat("X") # X matrix
#' y <- mat("y") # y vector
#' Y <- mat("Y") # Y vector
#' # Calculate the GRS index in Bailey, Muth, and Nourse (1963)
#' b <- solve(crossprod(Z), crossprod(Z, y))[, 1]
#' # or b <- qr.coef(qr(Z), y)
#' (grs <- exp(b) * 100)
#' # Standard errors
#' vcov <- rs_var(y - Z %*% b, Z)
#' sqrt(diag(vcov)) * grs # delta method
#' # Calculate the ARS index in Shiller (1991)
#' b <- solve(crossprod(Z, X), crossprod(Z, Y))[, 1]
#' # or b <- qr.coef(qr(crossprod(Z, X)), crossprod(Z, Y))
#' (ars <- 100 / b)
#' # Standard errors
#' vcov <- rs_var(Y - X %*% b, Z, X)
#' sqrt(diag(vcov)) * ars^2 / 100 # delta method
#' # Works with grouped data
#' x <- data.frame(
#' date = c(3, 2, 3, 2),
#' date_prev = c(2, 1, 2, 1),
#' price = 4:1,
#' price_prev = 1,
#' group = c("a", "a", "b", "b")
#' )
#' mat <- with(x, rs_matrix(date, date_prev, price, price_prev, group))
#' b <- solve(crossprod(mat("Z"), mat("X")), crossprod(mat("Z"), mat("Y")))[, 1]
#' 100 / b
#' @export rs_matrix
rs_matrix <- function(t2, t1, p2, p1, f = NULL, sparse = FALSE) {
if (is.null(f)) {
if (different_lengths(t2, t1, p2, p1)) {
Expand All @@ -89,14 +181,13 @@ rs_matrix <- function(t2, t1, p2, p1, f = NULL, sparse = FALSE) {
n <- max(1L, nlevels(f)) * (ncol(z) > 0)
# return value
res <- function(matrix = c("Z", "X", "y", "Y")) {
Z = z[, -seq_len(n), drop = FALSE],
X = rs_x_(z[, -seq_len(n), drop = FALSE], p2, p1),
y = structure(log(p2 / p1), names = rownames(z)),
# rowSums() gets the single value in the base period
# for each group
Y = -rowSums(rs_x_(z[, seq_len(n), drop = FALSE], p2, p1))
Y = -Matrix::rowSums(rs_x_(z[, seq_len(n), drop = FALSE], p2, p1))
# clean up enclosing environment
Expand Down
#' Sales pairs
#' Turn repeat-sales data into sales pairs that are suitable for making
#' repeat-sales matrices.
#' repeat-sales matrices.
#' @param period A vector that gives the time period for each sale. Usually a
#' date vector, or a factor with the levels in chronological order, but other
#' values are possible if they can be sorted in chronological order (i.e., with
#' [order()]).
#' @param product A vector that gives the product identifier for each sale.
#' Usually a factor or vector of integer codes for each product.
#' @return A numeric vector of indices giving the position of the previous sale
#' for each `product`, with the convention that the previous sale for the
#' first sale is itself. The first position is returned in the case of ties.
#' @note [`order()`] is the workhorse of `rs_pairs()`,
#' so performance can be sensitive to the types of `period` and
#' `product`, and can be slow for large character vectors.
#' @seealso [rs_matrix()] for using sales pairs to make a
#' repeat-sales index.
#' repeat-sales index.
#' @examples
#' # Make sales pairs
#' x <- data.frame(
#' id = c(1, 1, 1, 3, 2, 2, 3, 3),
#' date = c(1, 2, 3, 2, 1, 3, 4, 1),
#' price = c(1, 3, 2, 3, 1, 1, 1, 2)
#' )
#' pairs <- rs_pairs(x$date, x$id)
#' x[c("date_prev", "price_prev")] <- x[c("date", "price")][pairs, ]
#' x
#' @export rs_pairs
rs_pairs <- function(period, product) {
n <- length(period)

Expand Down
#---- Variance matrix ----
#' Robust variance matrix for repeat-sales indexes
#' @noRd
sss <- function(n, k, g) {
g / (g - 1L) * (n - 1L) / (n - k)

#---- Variance matrix ----
#' Robust variance matrix for repeat-sales indexes
#' Convenience function to compute a cluster-robust variance matrix for a
#' linear regression, with or without instruments, where clustering occurs
#' along one dimension. Useful for calculating a variance matrix when a
#' regression is calculated manually.
#' This function calculates the standard robust variance matrix for a linear
#' regression, as in Manski (1988, section 8.1.2) or White (2001, Theorem 6.3);
#' that is, \eqn{(Z'X)^{-1} V (X'Z)^{-1}}{(Z'X)^-1 V (X'Z)^-1}. It is useful
#' when a regression is calculated by hand. This generalizes the variance
#' @param u An \eqn{n \times 1}{n x 1} vector of residuals from a linear
#' regression.
#' than twice.
#' @param Z An \eqn{n \times k}{n x k} matrix of instruments.
#' = 'group')` from the \pkg{plm} package.
#' @param X An \eqn{n \times k}{n x k} matrix of covariates.
#' regression.
#' @param Z An \eqn{n \times k}{n x k} matrix of instruments.
#' @param X An \eqn{n \times k}{n x k} matrix of covariates.
#' @param ids A factor of length \eqn{n}, or something that can be coerced into
#' one, that groups observations in `u`. By default each observation
#' belongs to its own group.
#' @param df An optional degrees of freedom correction. Default is Stata's
#' small sample degrees of freedom correction.
#' @return A \eqn{k \times k}{k x k} covariance matrix.
#' @references Manski, C. (1988). *Analog Estimation Methods in
#' Econometrics*. Chapman and Hall.
#' Shiller, R. J. (1991). Arithmetic repeat sales price estimators.
#' *Journal of Housing Economics*, 1(1):110-126.
#' White, H. (2001). *Asymptotic Theory for Econometricians* (revised
#' edition). Emerald Publishing.
#' @examples
#' # Makes some groups in mtcars
#' mtcars$clust <- letters[1:4]
#' # Matrices for regression
#' x <- model.matrix(~ cyl + disp, mtcars)
#' y <- matrix(mtcars$mpg)
#' # Regression coefficients
#' b <- solve(crossprod(x), crossprod(x, y))
#' # Residuals
#' r <- y - x %*% b
#' # Robust variance matrix
#' vcov <- rs_var(r, x, ids = mtcars$clust)
#' \dontrun{
#' # Same as plm
#' library(plm)
#' mdl <- plm(mpg ~ cyl + disp, mtcars, model = "pooling", index = "clust")
#' vcov2 <- vcovHC(mdl, type = "sss", cluster = "group")
#' vcov - vcov2
#' }
#' @export rs_var
#' @importMethodsFrom Matrix solve crossprod tcrossprod
rs_var <- function(u, Z, X = Z, ids = seq_len(nrow(X)), df = NULL) {
ids <- as.factor(ids)
df <- if (is.null(df)) {
Expand Down
#' @keywords internal
A small package for calculating the matrices in Shiller (1991) that serve as the foundation for many repeat-sales price indexes. Builds on the 'rsi' package by Kirby-McGregor and Martin (2019).
A small package for calculating the matrices in Shiller (1991) that serve as the foundation for many repeat-sales price indexes.

## Installation

Expand All @@ -27,18 +27,31 @@ install.package("rsmatrix")
Get the development version from GitHub.

```{r, eval=FALSE}

Or from R-universe.

```{r, eval=FALSE}
repos = c("", "")

## Usage

Most repeat-sales price indexes used in practice require the matrices in Shiller (1991, sections I-II), e.g., S&P's Case-Shiller index, Teranet-National Bank's HPI, and formerly Statistics Canada's RPPI. The `rs_matrix()` function produces a function to easily construct these matrices. In most cases data need to be structured as sales pairs, which can be done with the `rs_pairs()` function.

# Make some data
sales <- data.frame(id = c(1, 1, 1, 2, 2),
date = c(1, 2, 3, 1, 3),
price = c(1, 3, 2, 1, 1))
sales <- data.frame(
id = c(1, 1, 1, 2, 2),
date = c(1, 2, 3, 1, 3),
price = c(1, 3, 2, 1, 1)
# Turn into sales pairs
sales[c("date_prev", "price_prev")] <- sales[rs_pairs(sales$date, sales$id), c("date", "price")]
Expand All @@ -62,8 +75,15 @@ b <- with(matrices, solve(crossprod(Z, X), crossprod(Z, Y))[, 1])
(ars <- 100 / b)

## Contribution

The `McSpatial` package (formerly on CRAN) has some functionality for making repeat-sales indices. The functions in this package build off of those in the `rsi` package in Kirby-McGregor and Martin (2019), which also gives a good background on the theory of repeat-sales indexes.

## References

ILO, IMF, OECD, UN, World Bank, Eurostat. (2013). \emph{Handbook
on Residential Property Prices Indices (RPPIs)}. Eurostat.

Kirby-McGregor, M., and Martin, S. (2019). An R package for calculating repeat-sale price indices. *Romanian Statistical Review*, 3:17-33.

Shiller, R. J. (1991). Arithmetic repeat sales price estimators. *Journal of Housing Economics*, 1(1):110-126.
Shiller, R. J. (1991). Arithmetic repeat sales price estimators. *Journal of Housing Economics*, 1(1):110-126.

