Skip to content

Commit

Permalink
Using roxygen now
Browse files Browse the repository at this point in the history
  • Loading branch information
marberts committed Oct 14, 2023
1 parent a4389e7 commit c617d40
Show file tree
Hide file tree
Showing 19 changed files with 644 additions and 258 deletions.
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: rsmatrix
Title: Matrices for Repeat-Sales Price Indexes
Version: 0.2.7.9002
Version: 0.2.7.9003
Authors@R: c(
person(given = "Steve", family = "Martin", role = c("aut", "cre", "cph"), email = "[email protected]", comment = c(ORCID = "0000-0003-2544-9480"))
)
Expand All @@ -19,3 +19,5 @@ URL: https://marberts.github.io/rsmatrix, https://github.com/marberts/rsmatrix
BugReports: https://github.com/marberts/rsmatrix/issues
Config/testthat/edition: 3
VignetteBuilder: knitr
RoxygenNote: 7.2.3
Roxygen: list(markdown = TRUE)
10 changes: 7 additions & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
export(rs_matrix, rs_pairs, rs_var)
# Generated by roxygen2: do not edit by hand

importFrom(Matrix, sparseMatrix, rowSums)
importMethodsFrom(Matrix, solve, crossprod, tcrossprod)
export(rs_matrix)
export(rs_pairs)
export(rs_var)
importMethodsFrom(Matrix,crossprod)
importMethodsFrom(Matrix,solve)
importMethodsFrom(Matrix,tcrossprod)
111 changes: 101 additions & 10 deletions R/rs_matrix.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
#---- Helper functions (internal) ----
#' Test if inputs have the same length
#' @noRd
different_lengths <- function(...) {
res <- lengths(list(...))
any(res != res[1L])
}

#' Compute the Z matrix
#' @noRd
rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {
# coerce t2 and t1 into characters prior to taking the union
# so that both dates and factors are treated the same
Expand All @@ -13,8 +16,10 @@ rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {
t2 <- factor(t2, lev)
t1 <- factor(t1, lev)
if (any(unclass(t2) <= unclass(t1))) {
warning("all elements of 't2' should be greater than the corresponding ",
"elements in 't1'")
warning(
"all elements of 't2' should be greater than the corresponding ",
"elements in 't1'"
)
}

# make row names before interacting with f
Expand Down Expand Up @@ -44,9 +49,10 @@ rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {
t2 <- t2[non_zero]
t1 <- t1[non_zero]
if (sparse) {
res <- sparseMatrix(rep.int(i, 2), c(t2, t1),
x = rep(c(1, -1), each = length(i)),
dims = dims)
res <- Matrix::sparseMatrix(rep.int(i, 2), c(t2, t1),
x = rep(c(1, -1), each = length(i)),
dims = dims
)
} else {
res <- rep.int(0, prod(dims))
res[(t2 - 1L) * dims[1L] + i] <- 1
Expand All @@ -62,9 +68,95 @@ rs_z_ <- function(t2, t1, f = NULL, sparse = FALSE) {
res
}

#' Compute X matrix
#' @noRd
rs_x_ <- function(z, p2, p1) (z > 0) * p2 - (z < 0) * p1

#---- All matrices ----
#' Shiller's repeat-sales matrices
#'
#' Create a function to compute the \eqn{Z}, \eqn{X}, \eqn{y}, and \eqn{Y}
#' matrices in Shiller (1991, sections I-II) from sales-pair data in order to
#' calculate a repeat-sales price index.
#'
#' The function returned by `rs_matrix()` computes a generalization of the
#' matrices in Shiller (1991, sections I-II) that are applicable to grouped
#' data. These are useful for calculating separate indexes for many, say,
#' cities without needing an explicit loop.
#'
#' The \eqn{Z}, \eqn{X}, and \eqn{Y} matrices are not well defined if either
#' `t1` or `t2` have missing values, and an error is thrown in this
#' case. Similarly, it should always be the case that `t2 > t1`, otherwise
#' a warning is given.
#'
#' @param t2,t1 A pair of vectors giving the time period of the second and
#' first sale, respectively. Usually a vector of dates, but other values are
#' possible if they can be coerced to character vectors and sorted in
#' chronological order (i.e., with [`order()`]).
#' @param p2,p1 A pair of numeric vectors giving the price of the second and
#' first sale, respectively.
#' @param f An optional factor the same length as `t1` and `t2`, or a
#' vector to be turned into a factor, that is used to group sales.
#' @param sparse Should sparse matrices from the \pkg{Matrix} package be used
#' (faster for large datasets), or regular dense matrices (the default)?
#' @return A function that takes a single argument naming the desired matrix.
#' It returns one of two matrices (\eqn{Z} and \eqn{X}) or two vectors
#' (\eqn{y} and \eqn{Y}), either regular matrices if `sparse = FALSE`, or sparse
#' matrices of class `dgCMatrix` if `sparse = TRUE`.
#' @seealso [rs_pairs()] for turning sales data into sales pairs.
#' @references Bailey, M. J., Muth, R. F., and Nourse, H. O. (1963). A
#' regression method for real estate price index construction.
#' *Journal of the American Statistical Association*, 53(304):933-942.
#'
#' Shiller, R. J. (1991). Arithmetic repeat sales price estimators.
#' *Journal of Housing Economics*, 1(1):110-126.
#' @examples
#' # Make some data
#' x <- data.frame(
#' date = c(3, 2, 3, 2, 3, 3),
#' date_prev = c(1, 1, 2, 1, 2, 1),
#' price = 6:1,
#' price_prev = 1
#' )
#'
#' # Calculate matrices
#' mat <- with(x, rs_matrix(date, date_prev, price, price_prev))
#' Z <- mat("Z") # Z matrix
#' X <- mat("X") # X matrix
#' y <- mat("y") # y vector
#' Y <- mat("Y") # Y vector
#'
#' # Calculate the GRS index in Bailey, Muth, and Nourse (1963)
#' b <- solve(crossprod(Z), crossprod(Z, y))[, 1]
#' # or b <- qr.coef(qr(Z), y)
#' (grs <- exp(b) * 100)
#'
#' # Standard errors
#' vcov <- rs_var(y - Z %*% b, Z)
#' sqrt(diag(vcov)) * grs # delta method
#'
#' # Calculate the ARS index in Shiller (1991)
#' b <- solve(crossprod(Z, X), crossprod(Z, Y))[, 1]
#' # or b <- qr.coef(qr(crossprod(Z, X)), crossprod(Z, Y))
#' (ars <- 100 / b)
#'
#' # Standard errors
#' vcov <- rs_var(Y - X %*% b, Z, X)
#' sqrt(diag(vcov)) * ars^2 / 100 # delta method
#'
#' # Works with grouped data
#' x <- data.frame(
#' date = c(3, 2, 3, 2),
#' date_prev = c(2, 1, 2, 1),
#' price = 4:1,
#' price_prev = 1,
#' group = c("a", "a", "b", "b")
#' )
#'
#' mat <- with(x, rs_matrix(date, date_prev, price, price_prev, group))
#' b <- solve(crossprod(mat("Z"), mat("X")), crossprod(mat("Z"), mat("Y")))[, 1]
#' 100 / b
#'
#' @export rs_matrix
rs_matrix <- function(t2, t1, p2, p1, f = NULL, sparse = FALSE) {
if (is.null(f)) {
if (different_lengths(t2, t1, p2, p1)) {
Expand All @@ -89,14 +181,13 @@ rs_matrix <- function(t2, t1, p2, p1, f = NULL, sparse = FALSE) {
n <- max(1L, nlevels(f)) * (ncol(z) > 0)
# return value
res <- function(matrix = c("Z", "X", "y", "Y")) {
switch(
match.arg(matrix),
switch(match.arg(matrix),
Z = z[, -seq_len(n), drop = FALSE],
X = rs_x_(z[, -seq_len(n), drop = FALSE], p2, p1),
y = structure(log(p2 / p1), names = rownames(z)),
# rowSums() gets the single value in the base period
# for each group
Y = -rowSums(rs_x_(z[, seq_len(n), drop = FALSE], p2, p1))
Y = -Matrix::rowSums(rs_x_(z[, seq_len(n), drop = FALSE], p2, p1))
)
}
# clean up enclosing environment
Expand Down
35 changes: 35 additions & 0 deletions R/rs_pairs.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,38 @@
#' Sales pairs
#'
#' Turn repeat-sales data into sales pairs that are suitable for making
#' repeat-sales matrices.
#'
#'
#' @param period A vector that gives the time period for each sale. Usually a
#' date vector, or a factor with the levels in chronological order, but other
#' values are possible if they can be sorted in chronological order (i.e., with
#' [order()]).
#' @param product A vector that gives the product identifier for each sale.
#' Usually a factor or vector of integer codes for each product.
#' @return A numeric vector of indices giving the position of the previous sale
#' for each `product`, with the convention that the previous sale for the
#' first sale is itself. The first position is returned in the case of ties.
#' @note [`order()`] is the workhorse of `rs_pairs()`,
#' so performance can be sensitive to the types of `period` and
#' `product`, and can be slow for large character vectors.
#' @seealso [rs_matrix()] for using sales pairs to make a
#' repeat-sales index.
#' @examples
#' # Make sales pairs
#' x <- data.frame(
#' id = c(1, 1, 1, 3, 2, 2, 3, 3),
#' date = c(1, 2, 3, 2, 1, 3, 4, 1),
#' price = c(1, 3, 2, 3, 1, 1, 1, 2)
#' )
#'
#' pairs <- rs_pairs(x$date, x$id)
#'
#' x[c("date_prev", "price_prev")] <- x[c("date", "price")][pairs, ]
#'
#' x
#'
#' @export rs_pairs
rs_pairs <- function(period, product) {
n <- length(period)

Expand Down
66 changes: 64 additions & 2 deletions R/rs_var.R
Original file line number Diff line number Diff line change
@@ -1,9 +1,71 @@
#---- Helper functions (internal) ----
#' Stata's degrees-of-freedom correction
#' @noRd
sss <- function(n, k, g) {
g / (g - 1L) * (n - 1L) / (n - k)
}

#---- Variance matrix ----
#' Robust variance matrix for repeat-sales indexes
#'
#' Convenience function to compute a cluster-robust variance matrix for a
#' linear regression, with or without instruments, where clustering occurs
#' along one dimension. Useful for calculating a variance matrix when a
#' regression is calculated manually.
#'
#' This function calculates the standard robust variance matrix for a linear
#' regression, as in Manski (1988, section 8.1.2) or White (2001, Theorem 6.3);
#' that is, \eqn{(Z'X)^{-1} V (X'Z)^{-1}}{(Z'X)^-1 V (X'Z)^-1}. It is useful
#' when a regression is calculated by hand. This generalizes the variance
#' matrix proposed by Shiller (1991, section II) when a property sells more
#' than twice.
#'
#' This function gives the same result as `vcovHC(x, type = 'sss', cluster
#' = 'group')` from the \pkg{plm} package.
#'
#' @param u An \eqn{n \times 1}{n x 1} vector of residuals from a linear
#' regression.
#' @param Z An \eqn{n \times k}{n x k} matrix of instruments.
#' @param X An \eqn{n \times k}{n x k} matrix of covariates.
#' @param ids A factor of length \eqn{n}, or something that can be coerced into
#' one, that groups observations in `u`. By default each observation
#' belongs to its own group.
#' @param df An optional degrees of freedom correction. Default is Stata's
#' small sample degrees of freedom correction.
#' @return A \eqn{k \times k}{k x k} covariance matrix.
#' @references Manski, C. (1988). *Analog Estimation Methods in
#' Econometrics*. Chapman and Hall.
#'
#' Shiller, R. J. (1991). Arithmetic repeat sales price estimators.
#' *Journal of Housing Economics*, 1(1):110-126.
#'
#' White, H. (2001). *Asymptotic Theory for Econometricians* (revised
#' edition). Emerald Publishing.
#' @examples
#' # Makes some groups in mtcars
#' mtcars$clust <- letters[1:4]
#'
#' # Matrices for regression
#' x <- model.matrix(~ cyl + disp, mtcars)
#' y <- matrix(mtcars$mpg)
#'
#' # Regression coefficients
#' b <- solve(crossprod(x), crossprod(x, y))
#'
#' # Residuals
#' r <- y - x %*% b
#'
#' # Robust variance matrix
#' vcov <- rs_var(r, x, ids = mtcars$clust)
#'
#' \dontrun{
#' # Same as plm
#' library(plm)
#' mdl <- plm(mpg ~ cyl + disp, mtcars, model = "pooling", index = "clust")
#' vcov2 <- vcovHC(mdl, type = "sss", cluster = "group")
#' vcov - vcov2
#' }
#'
#' @export rs_var
#' @importMethodsFrom Matrix solve crossprod tcrossprod
rs_var <- function(u, Z, X = Z, ids = seq_len(nrow(X)), df = NULL) {
ids <- as.factor(ids)
df <- if (is.null(df)) {
Expand Down
3 changes: 3 additions & 0 deletions R/rsmatrix-package.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#' @keywords internal
"_PACKAGE"
NULL
32 changes: 26 additions & 6 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ knitr::opts_chunk$set(echo = TRUE)
[![R-CMD-check](https://github.com/marberts/rsmatrix/workflows/R-CMD-check/badge.svg)](https://github.com/marberts/rsmatrix/actions)
[![codecov](https://codecov.io/gh/marberts/rsmatrix/branch/master/graph/badge.svg)](https://app.codecov.io/gh/marberts/rsmatrix)

A small package for calculating the matrices in Shiller (1991) that serve as the foundation for many repeat-sales price indexes. Builds on the 'rsi' package by Kirby-McGregor and Martin (2019).
A small package for calculating the matrices in Shiller (1991) that serve as the foundation for many repeat-sales price indexes.

## Installation

Expand All @@ -27,18 +27,31 @@ install.package("rsmatrix")
Get the development version from GitHub.

```{r, eval=FALSE}
devtools::install_github("marberts/rsmatrix")
pak::pkg_install("marberts/rsmatrix")
```

Or from R-universe.

```{r, eval=FALSE}
install.packages(
"rsmatrix",
repos = c("https://marberts.r-universe.dev", "https://cloud.r-project.org")
)
```

## Usage

Most repeat-sales price indexes used in practice require the matrices in Shiller (1991, sections I-II), e.g., S&P's Case-Shiller index, Teranet-National Bank's HPI, and formerly Statistics Canada's RPPI. The `rs_matrix()` function produces a function to easily construct these matrices. In most cases data need to be structured as sales pairs, which can be done with the `rs_pairs()` function.

```{r}
library(rsmatrix)
# Make some data
sales <- data.frame(id = c(1, 1, 1, 2, 2),
date = c(1, 2, 3, 1, 3),
price = c(1, 3, 2, 1, 1))
sales <- data.frame(
id = c(1, 1, 1, 2, 2),
date = c(1, 2, 3, 1, 3),
price = c(1, 3, 2, 1, 1)
)
# Turn into sales pairs
sales[c("date_prev", "price_prev")] <- sales[rs_pairs(sales$date, sales$id), c("date", "price")]
Expand All @@ -62,8 +75,15 @@ b <- with(matrices, solve(crossprod(Z, X), crossprod(Z, Y))[, 1])
(ars <- 100 / b)
```

## Contribution

The `McSpatial` package (formerly on CRAN) has some functionality for making repeat-sales indices. The functions in this package build off of those in the `rsi` package in Kirby-McGregor and Martin (2019), which also gives a good background on the theory of repeat-sales indexes.

## References

ILO, IMF, OECD, UN, World Bank, Eurostat. (2013). \emph{Handbook
on Residential Property Prices Indices (RPPIs)}. Eurostat.

Kirby-McGregor, M., and Martin, S. (2019). An R package for calculating repeat-sale price indices. *Romanian Statistical Review*, 3:17-33.

Shiller, R. J. (1991). Arithmetic repeat sales price estimators. *Journal of Housing Economics*, 1(1):110-126.
Shiller, R. J. (1991). Arithmetic repeat sales price estimators. *Journal of Housing Economics*, 1(1):110-126.
Loading

0 comments on commit c617d40

Please sign in to comment.