diff --git a/docs/index.html b/docs/index.html index 297aa32..5adad27 100644 --- a/docs/index.html +++ b/docs/index.html @@ -203,8 +203,12 @@

Tablemath

+

Math and stats modelling with table ergonomics

1 Preface

+

This project is an initial attempt to create a Clojure library for math and statistics which is friendly to tech.ml.dataset and Tablecloth datasets and uses the functionality of Fastmath. It is also intended to compose well with Tableplot layered plotting. It is highly inspired by R and its package.

+

In a way, it is intended to be a user-friendly compatiblity layer across these libraries.

+

Possibly, after the details clarify, it will be merged into one of the other Scicloj libraries.

Tablemath is a Clojure library for math and statistical modeling with table ergonomics, inspired by R.

It composes Tablecloth datasets with Fastmath modeling.

diff --git a/docs/search.json b/docs/search.json index 2d61428..2912156 100644 --- a/docs/search.json +++ b/docs/search.json @@ -4,7 +4,7 @@ "href": "index.html", "title": "Tablemath", "section": "", - "text": "1 Preface\nTablemath is a Clojure library for math and statistical modeling with table ergonomics, inspired by R.\nIt composes Tablecloth datasets with Fastmath modeling.", + "text": "1 Preface\nMath and stats modelling with table ergonomics\nThis project is an initial attempt to create a Clojure library for math and statistics which is friendly to tech.ml.dataset and Tablecloth datasets and uses the functionality of Fastmath. It is also intended to compose well with Tableplot layered plotting. It is highly inspired by R and its package.\nIn a way, it is intended to be a user-friendly compatiblity layer across these libraries.\nPossibly, after the details clarify, it will be merged into one of the other Scicloj libraries.\nTablemath is a Clojure library for math and statistical modeling with table ergonomics, inspired by R.\nIt composes Tablecloth datasets with Fastmath modeling.", "crumbs": [ "1  Preface" ] @@ -54,7 +54,7 @@ "href": "tablemath_book.reference.html#reference", "title": "2  API reference", "section": "Reference", - "text": "Reference\n\nwith\n[m expr]\nEvaluate expression expr in the context of destructuring all the keys of map m.\n\nExamples\n\n(tm/with {:x 3 :y 9}\n '(+ x y))\n\n\n12\n\n\n(tm/with (tc/dataset {:x (range 4)\n :y 9})\n '(tcc/+ x y))\n\n\n#tech.v3.dataset.column<int64>[4]\nnull\n[9, 10, 11, 12]\n\n\n\n\ncolumns-with\n[dataset specs]\nCompute a sequence of named columns by a given sequence of specs in the context of a given dataset.\nEach spec is one of the following:\n\n\na keyword or string - in that case, we just take the corresponding column of the original dataset.\n\n\na vector of two elements [nam expr], where the first is a string or a keyword. In that case, nam is interpreted as a name or a name-prefix for the resulting columns, and expr is handled as an expression as in (3).\n\n\nany other Clojure form - in that case, we treat it as an expression, and evaluate it while destructuring the column names of dataset as well as all the columns created by previous specs; the evaluation is expected to return one of the following:\n\n\na column (or the data to create a column (e.g., a vector of numbers))\na sequential of columns\na map from column names to columns\n\n\nIn any case, the result of the spec is turned into a sequence of named columns, which is conctenated to the columns from the previous specs. Some default naming mechanisms are invoked if column names are missing.\nEventually, the sequence of all resulting columns is returned.\n\nExamples\nNote the naming of the resulting columns, and note they can sequentially depend on each other.\n\n(tm/columns-with (tc/dataset {\"w\" [:A :B :C]\n :x (range 3)\n :y (reverse (range 3))})\n [\"w\"\n :x\n '(tcc/+ x y)\n [:z '(tcc/+ x y)]\n [:z1000 '(tcc/* z 1000)]\n '((juxt tcc/+ tcc/*) x y)\n [:p '((juxt tcc/+ tcc/*) x y)]\n '{:a (tcc/+ x y)\n :b (tcc/* x y)}\n [:p '{:a (tcc/+ x y)\n :b (tcc/* x y)}]\n '[(tcc/column (tcc/+ x y) {:name :c})\n (tcc/column (tcc/* x y) {:name :d})]\n [:p '[(tcc/column (tcc/+ x y) {:name :c})\n (tcc/column (tcc/* x y) {:name :d})]]])\n\n\n(#tech.v3.dataset.column<keyword>[3]\nw\n[:A, :B, :C] #tech.v3.dataset.column<int64>[3]\n:x\n[0, 1, 2] #tech.v3.dataset.column<int64>[3]\n(tcc/+ x y)\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:z\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:z1000\n[2000, 2000, 2000] #tech.v3.dataset.column<int64>[3]\n((juxt tcc/+ tcc/*) x y)_0\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n((juxt tcc/+ tcc/*) x y)_1\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:p_0\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:p_1\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:a\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:b\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:pa\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:pb\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:c\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:d\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:pc\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:pd\n[0, 1, 0])\n\n\n\n\ndesign\n[dataset target-specs feature-specs]\nGiven a dataset and sequences target-specs, feature-specs, generate a new dataset from the columns generated by columns-with from these two sequences. The columns from target-specs will be marked as targets for modelling (e.g., regression, classification).\n(Inspired by metamorph.ml.design-matrix but adapted for columnwise computation.)\n\nExamples\n\n(tm/design (tc/dataset {\"w\" [:A :B :C]\n :x (range 3)\n :y (reverse (range 3))})\n [:y]\n [\"w\"\n :x\n '(tcc/+ x y)\n [:z '(tcc/+ x y)]\n [:z1000 '(tcc/* z 1000)]\n '((juxt tcc/+ tcc/*) x y)\n [:p '((juxt tcc/+ tcc/*) x y)]\n '{:a (tcc/+ x y)\n :b (tcc/* x y)}\n [:p '{:a (tcc/+ x y)\n :b (tcc/* x y)}]\n '[(tcc/column (tcc/+ x y)\n {:name :c})\n (tcc/column (tcc/* x y)\n {:name :d})]\n [:p '[(tcc/column (tcc/+ x y)\n {:name :c})\n (tcc/column (tcc/* x y)\n {:name :d})]]])\n\n\n_unnamed [3 18]:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n:y\nw\n:x\n(tcc/+ x y)\n:z\n:z1000\n((juxt tcc/+ tcc/*) x y)_0\n((juxt tcc/+ tcc/*) x y)_1\n:p_0\n:p_1\n:a\n:b\n:pa\n:pb\n:c\n:d\n:pc\n:pd\n\n\n\n\n2\n:A\n0\n2\n2\n2000\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n\n\n1\n:B\n1\n2\n2\n2000\n2\n1\n2\n1\n2\n1\n2\n1\n2\n1\n2\n1\n\n\n0\n:C\n2\n2\n2\n2000\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n\n\n\n\n\n\n\npolynomial\n[column degree]\nGiven a column and an integer degree, return a vector of columns with all its powers up to that degree, named appropriately.\n\nExamples\n\n(-> [1 2 3]\n (tcc/column {:name :x})\n (tm/polynomial 4))\n\n\n[#tech.v3.dataset.column<int64>[3]\n:x\n[1, 2, 3] #tech.v3.dataset.column<int64>[3]\n:x2\n[1, 4, 9] #tech.v3.dataset.column<int64>[3]\n:x3\n[1, 8, 27] #tech.v3.dataset.column<int64>[3]\n:x4\n[1, 16, 81]]\n\n\n\n\none-hot\n[column]\n[column {:keys [values include-last], :or {values (distinct column), include-last false}}]\nGiven a column, create a vector of integer binary columns, each encoding the presence of absence of one of its values.\nE.g., if the column name is :x, and one of the values is :A, then a resulting binary column will have 1 in all the rows where column has :A.\nThe sequence of values to generate the binary columns is defined as follows: either the value provided for the :values key if present, or the distinct values in column in their order of appearance. If the value of the option key :include-last is false (which is the default), then the last value is ommitted. This is handy for avoiding multicollinearity in linear regression.\nSupported options: - :values - the values to encode as columns - default nil - :include-last - should the last value be included - default false\n\nExamples\n\n(tm/one-hot (tcc/column [:B :A :A :B :B :C]\n {:name :x}))\n\n\n[#tech.v3.dataset.column<int64>[6]\n:x=:B\n[1, 0, 0, 1, 1, 0] #tech.v3.dataset.column<int64>[6]\n:x=:A\n[0, 1, 1, 0, 0, 0]]\n\n\n(tm/one-hot (tcc/column [:B :A :A :B :B :C]\n {:name :x})\n {:values [:A :B :C]})\n\n\n[#tech.v3.dataset.column<int64>[6]\n:x=:A\n[0, 1, 1, 0, 0, 0] #tech.v3.dataset.column<int64>[6]\n:x=:B\n[1, 0, 0, 1, 1, 0]]\n\n\n(tm/one-hot (tcc/column [:B :A :A :B :B :C]\n {:name :x})\n {:values [:A :B :C]\n :include-last true})\n\n\n[#tech.v3.dataset.column<int64>[6]\n:x=:A\n[0, 1, 1, 0, 0, 0] #tech.v3.dataset.column<int64>[6]\n:x=:B\n[1, 0, 0, 1, 1, 0] #tech.v3.dataset.column<int64>[6]\n:x=:C\n[0, 0, 0, 0, 0, 1]]\n\n\n\n\nlm\n[dataset]\n[dataset options]\nCompute a linear regression model for dataset. The first column marked as target is the target. All the columns unmarked as target are the features. The resulting model is of type fastmath.ml.regression.LMData, a generated by Fastmath. It can be summarized by summary.\nSee fastmath.ml.regression.lm for options.\n\nExamples:\n\nLinear relationship\n\n(def linear-toydata\n (-> {:x (range 9)}\n tc/dataset\n (tc/map-columns :y\n [:x]\n (fn [x]\n (+ (* 2 x)\n -3\n (* 3 (rand)))))))\n\n\n(-> linear-toydata\n plotly/layer-point)\n\n\nNote how the coefficients fit the way we generated the data:\n\n(-> linear-toydata\n (tm/design [:y]\n [:x])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+----------+----------+----------|\n| -1.368905 | -0.861143 | 0.166645 | 0.756204 | 1.300869 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-----------+-----------+----------+-----------+----------+-----------------------|\n| Intercept | -1.538249 | 0.612071 | -2.513188 | 0.040209 | [-2.985566 -0.090932] |\n| :x | 2.0943 | 0.128561 | 16.290374 | 1.0E-6 | [1.790302 2.398297] |\n\nF-statistic: 265.37627410003336 on degrees of freedom: {:residual 7, :model 1, :intercept 1}\np-value: 7.999644764389302E-7\n\nR2: 0.9743002578946022\nAdjusted R2: 0.9706288661652597\nResidual standard error: 0.9958258484709785 on 7 degrees of freedom\nAIC: 29.203771766022996\n\n\n\n\nCubic relationship\n\n(def cubic-toydata\n (-> {:x (range 9)}\n tc/dataset\n (tc/map-columns :y\n [:x]\n (fn [x]\n (+ 50\n (* 4 x)\n (* -9 x x)\n (* x x x)\n (* 10 (rand)))))))\n\n\n(-> cubic-toydata\n plotly/layer-point)\n\n\nNote how the coefficients fit the way we generated the data:\n\n(-> cubic-toydata\n (tm/design [:y]\n ['(tm/polynomial x 3)])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|----------+-----------+-----------+----------+----------|\n| -2.44235 | -0.989378 | -0.005178 | 1.338093 | 1.814442 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-----------+-----------+----------+------------+----------+------------------------|\n| Intercept | 55.293656 | 1.625722 | 34.011757 | 0.0 | [51.114605 59.472707] |\n| :x | 4.165365 | 1.876318 | 2.219968 | 0.077131 | [-0.657864 8.988593] |\n| :x2 | -9.064755 | 0.566509 | -16.001083 | 1.7E-5 | [-10.521012 -7.608498] |\n| :x3 | 1.00047 | 0.046468 | 21.53023 | 4.0E-6 | [0.88102 1.119921] |\n\nF-statistic: 915.1047291904271 on degrees of freedom: {:residual 5, :model 3, :intercept 1}\np-value: 2.86891162715186E-7\n\nR2: 0.9981820258854079\nAdjusted R2: 0.9970912414166527\nResidual standard error: 1.754504054818821 on 5 degrees of freedom\nAIC: 40.37016570199448\n\n\n\n\nCategorical relationship\n\n(def days-of-week\n [:Mon :Tue :Wed :Thu :Fri :Sat :Sun])\n\n\n(def categorical-toydata\n (-> {:t (range 21)\n :day-of-week (->> days-of-week\n (repeat 3)\n (apply concat)\n (drop 3))}\n tc/dataset\n (tc/map-columns :traffic\n [:day-of-week]\n (fn [dow]\n (+ (case dow\n :Sat 50\n :Sun 50\n 60)\n (* 5 (rand)))))))\n\n\n(-> categorical-toydata\n (plotly/layer-point {:=x :t\n :=y :traffic\n :=color :day-of-week\n :=mark-size 10})\n (plotly/layer-line {:=x :t\n :=y :traffic}))\n\n\nA model with all days except for one, dropping one category to avoid multicolinearity (note we begin with Thursday due to the order of appearance):\n\n(-> categorical-toydata\n (tm/design [:traffic]\n ['(tm/one-hot day-of-week)])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+-----------+----------+----------|\n| -1.961295 | -0.815524 | -0.292471 | 1.061911 | 2.390817 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-------------------+------------+----------+-----------+----------+------------------------|\n| Intercept | 61.865991 | 0.956053 | 64.70982 | 0.0 | [59.800565 63.931417] |\n| :day-of-week=:Thu | 0.892344 | 1.352063 | 0.659987 | 0.520785 | [-2.02861 3.813297] |\n| :day-of-week=:Fri | 0.566153 | 1.352063 | 0.418733 | 0.682247 | [-2.354801 3.487106] |\n| :day-of-week=:Sat | -10.656278 | 1.352063 | -7.881498 | 3.0E-6 | [-13.577232 -7.735324] |\n| :day-of-week=:Sun | -9.426566 | 1.352063 | -6.971989 | 1.0E-5 | [-12.347519 -6.505612] |\n| :day-of-week=:Mon | 0.075719 | 1.511652 | 0.05009 | 0.960812 | [-3.190007 3.341444] |\n| :day-of-week=:Tue | -0.344939 | 1.511652 | -0.228187 | 0.823051 | [-3.610664 2.920787] |\n| :day-of-week=:Wed | 0.705859 | 1.511652 | 0.466946 | 0.648268 | [-2.559866 3.971585] |\n\nF-statistic: 24.37118021053251 on degrees of freedom: {:residual 13, :model 7, :intercept 1}\np-value: 1.6704908561981924E-6\n\nR2: 0.9291932293059307\nAdjusted R2: 0.8910665066245088\nResidual standard error: 1.6559316650031601 on 13 degrees of freedom\nAIC: 88.70766288980772\n\n\nA model with all days except for one, dropping one category to avoid multicolinearity, and speciftying the order of encoded values:\n\n(-> categorical-toydata\n (tm/design [:traffic]\n ['(tm/one-hot day-of-week\n {:values days-of-week})])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+-----------+----------+----------|\n| -6.674578 | -1.203647 | -0.191458 | 1.387996 | 7.048119 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-------------------+-----------+----------+-----------+----------+------------------------|\n| Intercept | 57.152709 | 1.418155 | 40.300762 | 0.0 | [54.11107 60.194348] |\n| :day-of-week=:Mon | 4.789001 | 2.836309 | 1.688462 | 0.113462 | [-1.294277 10.872279] |\n| :day-of-week=:Tue | 4.368344 | 2.836309 | 1.540151 | 0.145817 | [-1.714934 10.451622] |\n| :day-of-week=:Wed | 5.419142 | 2.836309 | 1.910632 | 0.07675 | [-0.664136 11.50242] |\n| :day-of-week=:Thu | 5.605626 | 2.456316 | 2.282128 | 0.038636 | [0.337353 10.8739] |\n| :day-of-week=:Fri | 5.279436 | 2.456316 | 2.149331 | 0.049579 | [0.011162 10.547709] |\n| :day-of-week=:Sat | -5.942995 | 2.456316 | -2.419475 | 0.029738 | [-11.211268 -0.674722] |\n\nF-statistic: 4.6201715895605755 on degrees of freedom: {:residual 14, :model 6, :intercept 1}\np-value: 0.008629460026389202\n\nR2: 0.6644378109734268\nAdjusted R2: 0.5206254442477526\nResidual standard error: 3.473754998366949 on 14 degrees of freedom\nAIC: 119.38056904506934\n\n\nA model with all days and no intercept, dropping the intercept to avoid multicolinearity and have an easier interpretation of the coefficients:\nNote how the coefficients fit the way we generated the data:\n\n(-> categorical-toydata\n (tm/design [:traffic]\n ['(tm/one-hot day-of-week\n {:values days-of-week\n :include-last true})])\n (tm/lm {:intercept? false})\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+-----------+---------+-----------|\n| -1.961295 | -0.736394 | -0.081753 | 1.94849 | 64.200827 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-------------------+-----------+-----------+----------+----------+------------------------|\n| :day-of-week=:Mon | 61.94171 | 20.281809 | 3.054052 | 0.008581 | [18.441555 105.441865] |\n| :day-of-week=:Tue | 61.521053 | 20.281809 | 3.033312 | 0.008941 | [18.020898 105.021207] |\n| :day-of-week=:Wed | 62.571851 | 20.281809 | 3.085122 | 0.008067 | [19.071696 106.072005] |\n| :day-of-week=:Thu | 62.758335 | 16.560028 | 3.789748 | 0.001991 | [27.240608 98.276063] |\n| :day-of-week=:Fri | 62.432144 | 16.560028 | 3.770051 | 0.00207 | [26.914417 97.949872] |\n| :day-of-week=:Sat | 51.209713 | 16.560028 | 3.092369 | 0.007952 | [15.691986 86.727441] |\n| :day-of-week=:Sun | 52.439426 | 16.560028 | 3.166627 | 0.006861 | [16.921698 87.957153] |\n\nF-statistic: 10.887419319252851 on degrees of freedom: {:residual 14, :model 7, :intercept 0}\np-value: 1.0191663237901771E-4\n\nR2: 0.84480989168932\nAdjusted R2: 0.76721483753398\nResidual standard error: 28.68280980538554 on 14 degrees of freedom\nAIC: 208.04516636125462\n\n\n\nsource: notebooks/tablemath_book/reference.clj", + "text": "Reference\n\nwith\n[m expr]\nEvaluate expression expr in the context of destructuring all the keys of map m.\n\nExamples\n\n(tm/with {:x 3 :y 9}\n '(+ x y))\n\n\n12\n\n\n(tm/with (tc/dataset {:x (range 4)\n :y 9})\n '(tcc/+ x y))\n\n\n#tech.v3.dataset.column<int64>[4]\nnull\n[9, 10, 11, 12]\n\n\n\n\ncolumns-with\n[dataset specs]\nCompute a sequence of named columns by a given sequence of specs in the context of a given dataset.\nEach spec is one of the following:\n\n\na keyword or string - in that case, we just take the corresponding column of the original dataset.\n\n\na vector of two elements [nam expr], where the first is a string or a keyword. In that case, nam is interpreted as a name or a name-prefix for the resulting columns, and expr is handled as an expression as in (3).\n\n\nany other Clojure form - in that case, we treat it as an expression, and evaluate it while destructuring the column names of dataset as well as all the columns created by previous specs; the evaluation is expected to return one of the following:\n\n\na column (or the data to create a column (e.g., a vector of numbers))\na sequential of columns\na map from column names to columns\n\n\nIn any case, the result of the spec is turned into a sequence of named columns, which is conctenated to the columns from the previous specs. Some default naming mechanisms are invoked if column names are missing.\nEventually, the sequence of all resulting columns is returned.\n\nExamples\nNote the naming of the resulting columns, and note they can sequentially depend on each other.\n\n(tm/columns-with (tc/dataset {\"w\" [:A :B :C]\n :x (range 3)\n :y (reverse (range 3))})\n [\"w\"\n :x\n '(tcc/+ x y)\n [:z '(tcc/+ x y)]\n [:z1000 '(tcc/* z 1000)]\n '((juxt tcc/+ tcc/*) x y)\n [:p '((juxt tcc/+ tcc/*) x y)]\n '{:a (tcc/+ x y)\n :b (tcc/* x y)}\n [:p '{:a (tcc/+ x y)\n :b (tcc/* x y)}]\n '[(tcc/column (tcc/+ x y) {:name :c})\n (tcc/column (tcc/* x y) {:name :d})]\n [:p '[(tcc/column (tcc/+ x y) {:name :c})\n (tcc/column (tcc/* x y) {:name :d})]]])\n\n\n(#tech.v3.dataset.column<keyword>[3]\nw\n[:A, :B, :C] #tech.v3.dataset.column<int64>[3]\n:x\n[0, 1, 2] #tech.v3.dataset.column<int64>[3]\n(tcc/+ x y)\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:z\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:z1000\n[2000, 2000, 2000] #tech.v3.dataset.column<int64>[3]\n((juxt tcc/+ tcc/*) x y)_0\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n((juxt tcc/+ tcc/*) x y)_1\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:p_0\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:p_1\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:a\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:b\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:pa\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:pb\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:c\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:d\n[0, 1, 0] #tech.v3.dataset.column<int64>[3]\n:pc\n[2, 2, 2] #tech.v3.dataset.column<int64>[3]\n:pd\n[0, 1, 0])\n\n\n\n\ndesign\n[dataset target-specs feature-specs]\nGiven a dataset and sequences target-specs, feature-specs, generate a new dataset from the columns generated by columns-with from these two sequences. The columns from target-specs will be marked as targets for modelling (e.g., regression, classification).\n(Inspired by metamorph.ml.design-matrix but adapted for columnwise computation.)\n\nExamples\n\n(tm/design (tc/dataset {\"w\" [:A :B :C]\n :x (range 3)\n :y (reverse (range 3))})\n [:y]\n [\"w\"\n :x\n '(tcc/+ x y)\n [:z '(tcc/+ x y)]\n [:z1000 '(tcc/* z 1000)]\n '((juxt tcc/+ tcc/*) x y)\n [:p '((juxt tcc/+ tcc/*) x y)]\n '{:a (tcc/+ x y)\n :b (tcc/* x y)}\n [:p '{:a (tcc/+ x y)\n :b (tcc/* x y)}]\n '[(tcc/column (tcc/+ x y)\n {:name :c})\n (tcc/column (tcc/* x y)\n {:name :d})]\n [:p '[(tcc/column (tcc/+ x y)\n {:name :c})\n (tcc/column (tcc/* x y)\n {:name :d})]]])\n\n\n_unnamed [3 18]:\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n:y\nw\n:x\n(tcc/+ x y)\n:z\n:z1000\n((juxt tcc/+ tcc/*) x y)_0\n((juxt tcc/+ tcc/*) x y)_1\n:p_0\n:p_1\n:a\n:b\n:pa\n:pb\n:c\n:d\n:pc\n:pd\n\n\n\n\n2\n:A\n0\n2\n2\n2000\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n\n\n1\n:B\n1\n2\n2\n2000\n2\n1\n2\n1\n2\n1\n2\n1\n2\n1\n2\n1\n\n\n0\n:C\n2\n2\n2\n2000\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n2\n0\n\n\n\n\n\n\n\npolynomial\n[column degree]\nGiven a column and an integer degree, return a vector of columns with all its powers up to that degree, named appropriately.\n\nExamples\n\n(-> [1 2 3]\n (tcc/column {:name :x})\n (tm/polynomial 4))\n\n\n[#tech.v3.dataset.column<int64>[3]\n:x\n[1, 2, 3] #tech.v3.dataset.column<int64>[3]\n:x2\n[1, 4, 9] #tech.v3.dataset.column<int64>[3]\n:x3\n[1, 8, 27] #tech.v3.dataset.column<int64>[3]\n:x4\n[1, 16, 81]]\n\n\n\n\none-hot\n[column]\n[column {:keys [values include-last], :or {values (distinct column), include-last false}}]\nGiven a column, create a vector of integer binary columns, each encoding the presence of absence of one of its values.\nE.g., if the column name is :x, and one of the values is :A, then a resulting binary column will have 1 in all the rows where column has :A.\nThe sequence of values to generate the binary columns is defined as follows: either the value provided for the :values key if present, or the distinct values in column in their order of appearance. If the value of the option key :include-last is false (which is the default), then the last value is ommitted. This is handy for avoiding multicollinearity in linear regression.\nSupported options: - :values - the values to encode as columns - default nil - :include-last - should the last value be included - default false\n\nExamples\n\n(tm/one-hot (tcc/column [:B :A :A :B :B :C]\n {:name :x}))\n\n\n[#tech.v3.dataset.column<int64>[6]\n:x=:B\n[1, 0, 0, 1, 1, 0] #tech.v3.dataset.column<int64>[6]\n:x=:A\n[0, 1, 1, 0, 0, 0]]\n\n\n(tm/one-hot (tcc/column [:B :A :A :B :B :C]\n {:name :x})\n {:values [:A :B :C]})\n\n\n[#tech.v3.dataset.column<int64>[6]\n:x=:A\n[0, 1, 1, 0, 0, 0] #tech.v3.dataset.column<int64>[6]\n:x=:B\n[1, 0, 0, 1, 1, 0]]\n\n\n(tm/one-hot (tcc/column [:B :A :A :B :B :C]\n {:name :x})\n {:values [:A :B :C]\n :include-last true})\n\n\n[#tech.v3.dataset.column<int64>[6]\n:x=:A\n[0, 1, 1, 0, 0, 0] #tech.v3.dataset.column<int64>[6]\n:x=:B\n[1, 0, 0, 1, 1, 0] #tech.v3.dataset.column<int64>[6]\n:x=:C\n[0, 0, 0, 0, 0, 1]]\n\n\n\n\nlm\n[dataset]\n[dataset options]\nCompute a linear regression model for dataset. The first column marked as target is the target. All the columns unmarked as target are the features. The resulting model is of type fastmath.ml.regression.LMData, a generated by Fastmath. It can be summarized by summary.\nSee fastmath.ml.regression.lm for options.\n\nExamples:\n\nLinear relationship\n\n(def linear-toydata\n (-> {:x (range 9)}\n tc/dataset\n (tc/map-columns :y\n [:x]\n (fn [x]\n (+ (* 2 x)\n -3\n (* 3 (rand)))))))\n\n\n(-> linear-toydata\n plotly/layer-point)\n\n\nNote how the coefficients fit the way we generated the data:\n\n(-> linear-toydata\n (tm/design [:y]\n [:x])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+----------+----------+----------|\n| -1.112596 | -1.018804 | 0.033899 | 0.992519 | 1.331589 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-----------+-----------+----------+-----------+----------+----------------------|\n| Intercept | -0.964159 | 0.65805 | -1.465177 | 0.186293 | [-2.520199 0.591881] |\n| :x | 1.918419 | 0.138218 | 13.879659 | 2.0E-6 | [1.591586 2.245253] |\n\nF-statistic: 192.64492536036985 on degrees of freedom: {:residual 7, :model 1, :intercept 1}\np-value: 2.3816713911051224E-6\n\nR2: 0.9649377514236106\nAdjusted R2: 0.9599288587698407\nResidual standard error: 1.07063246079246 on 7 degrees of freedom\nAIC: 30.50755579984122\n\n\n\n\nCubic relationship\n\n(def cubic-toydata\n (-> {:x (range 9)}\n tc/dataset\n (tc/map-columns :y\n [:x]\n (fn [x]\n (+ 50\n (* 4 x)\n (* -9 x x)\n (* x x x)\n (* 10 (rand)))))))\n\n\n(-> cubic-toydata\n plotly/layer-point)\n\n\nNote how the coefficients fit the way we generated the data:\n\n(-> cubic-toydata\n (tm/design [:y]\n ['(tm/polynomial x 3)])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|----------+-----------+-----------+----------+----------|\n| -4.46192 | -1.251405 | -0.588872 | 2.680837 | 3.474228 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-----------+-----------+----------+-----------+----------+------------------------|\n| Intercept | 54.035832 | 2.959708 | 18.257151 | 9.0E-6 | [46.42766 61.644003] |\n| :x | 5.516304 | 3.41593 | 1.614876 | 0.167259 | [-3.264624 14.297233] |\n| :x2 | -9.303257 | 1.031358 | -9.020399 | 2.8E-4 | [-11.954446 -6.652068] |\n| :x3 | 1.016625 | 0.084598 | 12.01718 | 7.0E-5 | [0.79916 1.23409] |\n\nF-statistic: 261.654706742745 on degrees of freedom: {:residual 5, :model 3, :intercept 1}\np-value: 6.478174587432051E-6\n\nR2: 0.9936705986107883\nAdjusted R2: 0.9898729577772614\nResidual standard error: 3.1941622811238553 on 5 degrees of freedom\nAIC: 51.15466103269903\n\n\n\n\nCategorical relationship\n\n(def days-of-week\n [:Mon :Tue :Wed :Thu :Fri :Sat :Sun])\n\n\n(def categorical-toydata\n (-> {:t (range 21)\n :day-of-week (->> days-of-week\n (repeat 3)\n (apply concat)\n (drop 3))}\n tc/dataset\n (tc/map-columns :traffic\n [:day-of-week]\n (fn [dow]\n (+ (case dow\n :Sat 50\n :Sun 50\n 60)\n (* 5 (rand)))))))\n\n\n(-> categorical-toydata\n (plotly/layer-point {:=x :t\n :=y :traffic\n :=color :day-of-week\n :=mark-size 10})\n (plotly/layer-line {:=x :t\n :=y :traffic}))\n\n\nA model with all days except for one, dropping one category to avoid multicolinearity (note we begin with Thursday due to the order of appearance):\n\n(-> categorical-toydata\n (tm/design [:traffic]\n ['(tm/one-hot day-of-week)])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+-----------+----------+----------|\n| -1.868088 | -1.234805 | -0.350261 | 1.130497 | 2.551595 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-------------------+-----------+----------+-----------+----------+------------------------|\n| Intercept | 61.55921 | 0.9472 | 64.990704 | 0.0 | [59.512909 63.605512] |\n| :day-of-week=:Thu | 0.494675 | 1.339543 | 0.369286 | 0.71786 | [-2.399233 3.388582] |\n| :day-of-week=:Fri | 0.677621 | 1.339543 | 0.50586 | 0.621424 | [-2.216286 3.571529] |\n| :day-of-week=:Sat | -9.970723 | 1.339543 | -7.443374 | 5.0E-6 | [-12.864631 -7.076816] |\n| :day-of-week=:Sun | -9.612128 | 1.339543 | -7.175675 | 7.0E-6 | [-12.506035 -6.71822] |\n| :day-of-week=:Mon | 0.732698 | 1.497655 | 0.48923 | 0.63283 | [-2.502789 3.968185] |\n| :day-of-week=:Tue | -0.146058 | 1.497655 | -0.097525 | 0.923797 | [-3.381545 3.089429] |\n| :day-of-week=:Wed | 0.597514 | 1.497655 | 0.398967 | 0.696394 | [-2.637973 3.833001] |\n\nF-statistic: 23.687437499366894 on degrees of freedom: {:residual 13, :model 7, :intercept 1}\np-value: 1.9747517513435398E-6\n\nR2: 0.9272979696192353\nAdjusted R2: 0.8881507224911311\nResidual standard error: 1.6405989318486354 on 13 degrees of freedom\nAIC: 88.31696156462964\n\n\nA model with all days except for one, dropping one category to avoid multicolinearity, and speciftying the order of encoded values:\n\n(-> categorical-toydata\n (tm/design [:traffic]\n ['(tm/one-hot day-of-week\n {:values days-of-week})])\n tm/lm\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+-----------+----------+----------|\n| -5.156325 | -1.383976 | -0.622956 | 1.668952 | 6.806543 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-------------------+-----------+----------+-----------+----------+-----------------------|\n| Intercept | 56.753147 | 1.437507 | 39.480254 | 0.0 | [53.67 59.836293] |\n| :day-of-week=:Mon | 5.538762 | 2.875014 | 1.926516 | 0.074588 | [-0.627531 11.705054] |\n| :day-of-week=:Tue | 4.660006 | 2.875014 | 1.620864 | 0.127345 | [-1.506287 10.826298] |\n| :day-of-week=:Wed | 5.403578 | 2.875014 | 1.879496 | 0.081152 | [-0.762714 11.56987] |\n| :day-of-week=:Thu | 5.300739 | 2.489835 | 2.128951 | 0.051494 | [-0.039427 10.640904] |\n| :day-of-week=:Fri | 5.483685 | 2.489835 | 2.202429 | 0.044895 | [0.143519 10.823851] |\n| :day-of-week=:Sat | -5.164659 | 2.489835 | -2.074298 | 0.056974 | [-10.504825 0.175506] |\n\nF-statistic: 4.136292370106478 on degrees of freedom: {:residual 14, :model 6, :intercept 1}\np-value: 0.013430676507271366\n\nR2: 0.639340289486495\nAdjusted R2: 0.4847718421235643\nResidual standard error: 3.5211589305922124 on 14 degrees of freedom\nAIC: 119.94983856115492\n\n\nA model with all days and no intercept, dropping the intercept to avoid multicolinearity and have an easier interpretation of the coefficients:\nNote how the coefficients fit the way we generated the data:\n\n(-> categorical-toydata\n (tm/design [:traffic]\n ['(tm/one-hot day-of-week\n {:values days-of-week\n :include-last true})])\n (tm/lm {:intercept? false})\n tm/summary)\n\n\nResiduals:\n\n| :min | :q1 | :median | :q3 | :max |\n|-----------+-----------+----------+----------+-----------|\n| -1.868088 | -1.206448 | 0.411365 | 1.668952 | 63.559689 |\n\nCoefficients:\n\n| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval |\n|-------------------+-----------+-----------+----------+----------+------------------------|\n| :day-of-week=:Mon | 62.291908 | 20.180967 | 3.086666 | 0.008043 | [19.00804 105.575777] |\n| :day-of-week=:Tue | 61.413152 | 20.180967 | 3.043122 | 0.008769 | [18.129284 104.697021] |\n| :day-of-week=:Wed | 62.156725 | 20.180967 | 3.079968 | 0.00815 | [18.872856 105.440593] |\n| :day-of-week=:Thu | 62.053885 | 16.47769 | 3.765933 | 0.002087 | [26.712755 97.395016] |\n| :day-of-week=:Fri | 62.236832 | 16.47769 | 3.777036 | 0.002041 | [26.895701 97.577962] |\n| :day-of-week=:Sat | 51.588487 | 16.47769 | 3.130808 | 0.007367 | [16.247357 86.929618] |\n| :day-of-week=:Sun | 51.947083 | 16.47769 | 3.152571 | 0.007055 | [16.605952 87.288213] |\n\nF-statistic: 10.923317487831053 on degrees of freedom: {:residual 14, :model 7, :intercept 0}\np-value: 1.0005997960205182E-4\n\nR2: 0.8452409760974104\nAdjusted R2: 0.7678614641461157\nResidual standard error: 28.54019658104785 on 14 degrees of freedom\nAIC: 207.83581812173543\n\n\n\nsource: notebooks/tablemath_book/reference.clj", "crumbs": [ "2  API reference" ] diff --git a/docs/tablemath_book.reference.html b/docs/tablemath_book.reference.html index 23ef2e3..bcf0658 100644 --- a/docs/tablemath_book.reference.html +++ b/docs/tablemath_book.reference.html @@ -594,7 +594,7 @@
Linear relationship plotly/layer-point)
+ [{"y":[-0.9302594494813192,-0.04467912898078241,3.452881108851568,5.880769725917473,7.6048877162325725,7.589270414810594,9.765834225272872,11.352180913287826,15.714785493493274],"r":null,"name":"","fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[0,1,2,3,4,5,6,7,8],"text":null}], {"width":500,"height":400,"margin":{"t":25},"automargin":false,"plot_bgcolor":"rgb(235,235,235)","xaxis":{"gridcolor":"rgb(255,255,255)","title":"x"},"yaxis":{"gridcolor":"rgb(255,255,255)","title":"y"},"title":null}, {});

Note how the coefficients fit the way we generated the data:

(-> linear-toydata
@@ -608,22 +608,22 @@ 
Linear relationship | :min | :q1 | :median | :q3 | :max | |-----------+-----------+----------+----------+----------| -| -1.368905 | -0.861143 | 0.166645 | 0.756204 | 1.300869 | +| -1.112596 | -1.018804 | 0.033899 | 0.992519 | 1.331589 | Coefficients: -| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | -|-----------+-----------+----------+-----------+----------+-----------------------| -| Intercept | -1.538249 | 0.612071 | -2.513188 | 0.040209 | [-2.985566 -0.090932] | -| :x | 2.0943 | 0.128561 | 16.290374 | 1.0E-6 | [1.790302 2.398297] | +| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | +|-----------+-----------+----------+-----------+----------+----------------------| +| Intercept | -0.964159 | 0.65805 | -1.465177 | 0.186293 | [-2.520199 0.591881] | +| :x | 1.918419 | 0.138218 | 13.879659 | 2.0E-6 | [1.591586 2.245253] | -F-statistic: 265.37627410003336 on degrees of freedom: {:residual 7, :model 1, :intercept 1} -p-value: 7.999644764389302E-7 +F-statistic: 192.64492536036985 on degrees of freedom: {:residual 7, :model 1, :intercept 1} +p-value: 2.3816713911051224E-6 -R2: 0.9743002578946022 -Adjusted R2: 0.9706288661652597 -Residual standard error: 0.9958258484709785 on 7 degrees of freedom -AIC: 29.203771766022996 +R2: 0.9649377514236106 +Adjusted R2: 0.9599288587698407 +Residual standard error: 1.07063246079246 on 7 degrees of freedom +AIC: 30.50755579984122
@@ -647,7 +647,7 @@
Cubic relationship
plotly/layer-point)
+ [{"y":[53.010669958097274,53.76965184592953,34.77078649819222,13.019111854028552,-4.829555383545323,-28.347900321238843,-24.718421951879073,-15.096214739524445,23.012552422732945],"r":null,"name":"","fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[0,1,2,3,4,5,6,7,8],"text":null}], {"width":500,"height":400,"margin":{"t":25},"automargin":false,"plot_bgcolor":"rgb(235,235,235)","xaxis":{"gridcolor":"rgb(255,255,255)","title":"x"},"yaxis":{"gridcolor":"rgb(255,255,255)","title":"y"},"title":null}, {});

Note how the coefficients fit the way we generated the data:

(-> cubic-toydata
@@ -661,24 +661,24 @@ 
Cubic relationship
| :min | :q1 | :median | :q3 | :max | |----------+-----------+-----------+----------+----------| -| -2.44235 | -0.989378 | -0.005178 | 1.338093 | 1.814442 | +| -4.46192 | -1.251405 | -0.588872 | 2.680837 | 3.474228 | Coefficients: -| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | -|-----------+-----------+----------+------------+----------+------------------------| -| Intercept | 55.293656 | 1.625722 | 34.011757 | 0.0 | [51.114605 59.472707] | -| :x | 4.165365 | 1.876318 | 2.219968 | 0.077131 | [-0.657864 8.988593] | -| :x2 | -9.064755 | 0.566509 | -16.001083 | 1.7E-5 | [-10.521012 -7.608498] | -| :x3 | 1.00047 | 0.046468 | 21.53023 | 4.0E-6 | [0.88102 1.119921] | +| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | +|-----------+-----------+----------+-----------+----------+------------------------| +| Intercept | 54.035832 | 2.959708 | 18.257151 | 9.0E-6 | [46.42766 61.644003] | +| :x | 5.516304 | 3.41593 | 1.614876 | 0.167259 | [-3.264624 14.297233] | +| :x2 | -9.303257 | 1.031358 | -9.020399 | 2.8E-4 | [-11.954446 -6.652068] | +| :x3 | 1.016625 | 0.084598 | 12.01718 | 7.0E-5 | [0.79916 1.23409] | -F-statistic: 915.1047291904271 on degrees of freedom: {:residual 5, :model 3, :intercept 1} -p-value: 2.86891162715186E-7 +F-statistic: 261.654706742745 on degrees of freedom: {:residual 5, :model 3, :intercept 1} +p-value: 6.478174587432051E-6 -R2: 0.9981820258854079 -Adjusted R2: 0.9970912414166527 -Residual standard error: 1.754504054818821 on 5 degrees of freedom -AIC: 40.37016570199448 +R2: 0.9936705986107883 +Adjusted R2: 0.9898729577772614 +Residual standard error: 3.1941622811238553 on 5 degrees of freedom +AIC: 51.15466103269903
@@ -715,7 +715,7 @@
Categorical relat :=y :traffic}))
+ [{"y":[60.755146590412316,63.983183174231016,61.42332582791156],"r":null,"name":":Thu","marker":{"color":"#1B9E77","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[0,7,14],"text":null},{"y":[63.06456972407513,60.36874400412142,63.27718122903792],"r":null,"name":":Fri","marker":{"color":"#D95F02","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[1,8,15],"text":null},{"y":[50.22914268861699,54.14008212890955,50.39623664903687],"r":null,"name":":Sat","marker":{"color":"#7570B3","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[2,9,16],"text":null},{"y":[52.35844740618999,51.88597911823592,51.59682137084755],"r":null,"name":":Sun","marker":{"color":"#E7298A","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[3,10,17],"text":null},{"y":[60.88330161869939,63.7005147617168],"r":null,"name":":Mon","marker":{"color":"#66A61E","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[4,11],"text":null},{"y":[62.633796949211124,60.19250738457099],"r":null,"name":":Tue","marker":{"color":"#E6AB02","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[5,12],"text":null},{"y":[61.53376888034932,62.779680593624164],"r":null,"name":":Wed","marker":{"color":"#A6761D","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[6,13],"text":null},{"y":[63.55968947992022,60.80769555001729,60.310246121785354],"r":null,"name":"","marker":{"color":"#666666","size":10},"fill":null,"mode":"markers","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[18,19,20],"text":null},{"y":[60.755146590412316,63.06456972407513,50.22914268861699,52.35844740618999,60.88330161869939,62.633796949211124,61.53376888034932,63.983183174231016,60.36874400412142,54.14008212890955,51.88597911823592,63.7005147617168,60.19250738457099,62.779680593624164,61.42332582791156,63.27718122903792,50.39623664903687,51.59682137084755,63.55968947992022,60.80769555001729,60.310246121785354],"r":null,"name":"","fill":null,"mode":"lines","width":null,"type":"scatter","theta":null,"z":null,"lon":null,"lat":null,"x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],"text":null}], {"width":500,"height":400,"margin":{"t":25},"automargin":false,"plot_bgcolor":"rgb(235,235,235)","xaxis":{"gridcolor":"rgb(255,255,255)","title":"t"},"yaxis":{"gridcolor":"rgb(255,255,255)","title":"traffic"},"title":null}, {});

A model with all days except for one, dropping one category to avoid multicolinearity (note we begin with Thursday due to the order of appearance):

(-> categorical-toydata
@@ -729,28 +729,28 @@ 
Categorical relat | :min | :q1 | :median | :q3 | :max | |-----------+-----------+-----------+----------+----------| -| -1.961295 | -0.815524 | -0.292471 | 1.061911 | 2.390817 | +| -1.868088 | -1.234805 | -0.350261 | 1.130497 | 2.551595 | Coefficients: -| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | -|-------------------+------------+----------+-----------+----------+------------------------| -| Intercept | 61.865991 | 0.956053 | 64.70982 | 0.0 | [59.800565 63.931417] | -| :day-of-week=:Thu | 0.892344 | 1.352063 | 0.659987 | 0.520785 | [-2.02861 3.813297] | -| :day-of-week=:Fri | 0.566153 | 1.352063 | 0.418733 | 0.682247 | [-2.354801 3.487106] | -| :day-of-week=:Sat | -10.656278 | 1.352063 | -7.881498 | 3.0E-6 | [-13.577232 -7.735324] | -| :day-of-week=:Sun | -9.426566 | 1.352063 | -6.971989 | 1.0E-5 | [-12.347519 -6.505612] | -| :day-of-week=:Mon | 0.075719 | 1.511652 | 0.05009 | 0.960812 | [-3.190007 3.341444] | -| :day-of-week=:Tue | -0.344939 | 1.511652 | -0.228187 | 0.823051 | [-3.610664 2.920787] | -| :day-of-week=:Wed | 0.705859 | 1.511652 | 0.466946 | 0.648268 | [-2.559866 3.971585] | +| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | +|-------------------+-----------+----------+-----------+----------+------------------------| +| Intercept | 61.55921 | 0.9472 | 64.990704 | 0.0 | [59.512909 63.605512] | +| :day-of-week=:Thu | 0.494675 | 1.339543 | 0.369286 | 0.71786 | [-2.399233 3.388582] | +| :day-of-week=:Fri | 0.677621 | 1.339543 | 0.50586 | 0.621424 | [-2.216286 3.571529] | +| :day-of-week=:Sat | -9.970723 | 1.339543 | -7.443374 | 5.0E-6 | [-12.864631 -7.076816] | +| :day-of-week=:Sun | -9.612128 | 1.339543 | -7.175675 | 7.0E-6 | [-12.506035 -6.71822] | +| :day-of-week=:Mon | 0.732698 | 1.497655 | 0.48923 | 0.63283 | [-2.502789 3.968185] | +| :day-of-week=:Tue | -0.146058 | 1.497655 | -0.097525 | 0.923797 | [-3.381545 3.089429] | +| :day-of-week=:Wed | 0.597514 | 1.497655 | 0.398967 | 0.696394 | [-2.637973 3.833001] | -F-statistic: 24.37118021053251 on degrees of freedom: {:residual 13, :model 7, :intercept 1} -p-value: 1.6704908561981924E-6 +F-statistic: 23.687437499366894 on degrees of freedom: {:residual 13, :model 7, :intercept 1} +p-value: 1.9747517513435398E-6 -R2: 0.9291932293059307 -Adjusted R2: 0.8910665066245088 -Residual standard error: 1.6559316650031601 on 13 degrees of freedom -AIC: 88.70766288980772 +R2: 0.9272979696192353 +Adjusted R2: 0.8881507224911311 +Residual standard error: 1.6405989318486354 on 13 degrees of freedom +AIC: 88.31696156462964

A model with all days except for one, dropping one category to avoid multicolinearity, and speciftying the order of encoded values:

@@ -767,27 +767,27 @@
Categorical relat | :min | :q1 | :median | :q3 | :max | |-----------+-----------+-----------+----------+----------| -| -6.674578 | -1.203647 | -0.191458 | 1.387996 | 7.048119 | +| -5.156325 | -1.383976 | -0.622956 | 1.668952 | 6.806543 | Coefficients: -| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | -|-------------------+-----------+----------+-----------+----------+------------------------| -| Intercept | 57.152709 | 1.418155 | 40.300762 | 0.0 | [54.11107 60.194348] | -| :day-of-week=:Mon | 4.789001 | 2.836309 | 1.688462 | 0.113462 | [-1.294277 10.872279] | -| :day-of-week=:Tue | 4.368344 | 2.836309 | 1.540151 | 0.145817 | [-1.714934 10.451622] | -| :day-of-week=:Wed | 5.419142 | 2.836309 | 1.910632 | 0.07675 | [-0.664136 11.50242] | -| :day-of-week=:Thu | 5.605626 | 2.456316 | 2.282128 | 0.038636 | [0.337353 10.8739] | -| :day-of-week=:Fri | 5.279436 | 2.456316 | 2.149331 | 0.049579 | [0.011162 10.547709] | -| :day-of-week=:Sat | -5.942995 | 2.456316 | -2.419475 | 0.029738 | [-11.211268 -0.674722] | +| :name | :estimate | :stderr | :t-value | :p-value | :confidence-interval | +|-------------------+-----------+----------+-----------+----------+-----------------------| +| Intercept | 56.753147 | 1.437507 | 39.480254 | 0.0 | [53.67 59.836293] | +| :day-of-week=:Mon | 5.538762 | 2.875014 | 1.926516 | 0.074588 | [-0.627531 11.705054] | +| :day-of-week=:Tue | 4.660006 | 2.875014 | 1.620864 | 0.127345 | [-1.506287 10.826298] | +| :day-of-week=:Wed | 5.403578 | 2.875014 | 1.879496 | 0.081152 | [-0.762714 11.56987] | +| :day-of-week=:Thu | 5.300739 | 2.489835 | 2.128951 | 0.051494 | [-0.039427 10.640904] | +| :day-of-week=:Fri | 5.483685 | 2.489835 | 2.202429 | 0.044895 | [0.143519 10.823851] | +| :day-of-week=:Sat | -5.164659 | 2.489835 | -2.074298 | 0.056974 | [-10.504825 0.175506] | -F-statistic: 4.6201715895605755 on degrees of freedom: {:residual 14, :model 6, :intercept 1} -p-value: 0.008629460026389202 +F-statistic: 4.136292370106478 on degrees of freedom: {:residual 14, :model 6, :intercept 1} +p-value: 0.013430676507271366 -R2: 0.6644378109734268 -Adjusted R2: 0.5206254442477526 -Residual standard error: 3.473754998366949 on 14 degrees of freedom -AIC: 119.38056904506934 +R2: 0.639340289486495 +Adjusted R2: 0.4847718421235643 +Residual standard error: 3.5211589305922124 on 14 degrees of freedom +AIC: 119.94983856115492

A model with all days and no intercept, dropping the intercept to avoid multicolinearity and have an easier interpretation of the coefficients:

@@ -804,29 +804,29 @@
Categorical relat
Residuals:
 
-|      :min |       :q1 |   :median |     :q3 |      :max |
-|-----------+-----------+-----------+---------+-----------|
-| -1.961295 | -0.736394 | -0.081753 | 1.94849 | 64.200827 |
+|      :min |       :q1 |  :median |      :q3 |      :max |
+|-----------+-----------+----------+----------+-----------|
+| -1.868088 | -1.206448 | 0.411365 | 1.668952 | 63.559689 |
 
 Coefficients:
 
 |             :name | :estimate |   :stderr | :t-value | :p-value |   :confidence-interval |
 |-------------------+-----------+-----------+----------+----------+------------------------|
-| :day-of-week=:Mon |  61.94171 | 20.281809 | 3.054052 | 0.008581 | [18.441555 105.441865] |
-| :day-of-week=:Tue | 61.521053 | 20.281809 | 3.033312 | 0.008941 | [18.020898 105.021207] |
-| :day-of-week=:Wed | 62.571851 | 20.281809 | 3.085122 | 0.008067 | [19.071696 106.072005] |
-| :day-of-week=:Thu | 62.758335 | 16.560028 | 3.789748 | 0.001991 |  [27.240608 98.276063] |
-| :day-of-week=:Fri | 62.432144 | 16.560028 | 3.770051 |  0.00207 |  [26.914417 97.949872] |
-| :day-of-week=:Sat | 51.209713 | 16.560028 | 3.092369 | 0.007952 |  [15.691986 86.727441] |
-| :day-of-week=:Sun | 52.439426 | 16.560028 | 3.166627 | 0.006861 |  [16.921698 87.957153] |
+| :day-of-week=:Mon | 62.291908 | 20.180967 | 3.086666 | 0.008043 |  [19.00804 105.575777] |
+| :day-of-week=:Tue | 61.413152 | 20.180967 | 3.043122 | 0.008769 | [18.129284 104.697021] |
+| :day-of-week=:Wed | 62.156725 | 20.180967 | 3.079968 |  0.00815 | [18.872856 105.440593] |
+| :day-of-week=:Thu | 62.053885 |  16.47769 | 3.765933 | 0.002087 |  [26.712755 97.395016] |
+| :day-of-week=:Fri | 62.236832 |  16.47769 | 3.777036 | 0.002041 |  [26.895701 97.577962] |
+| :day-of-week=:Sat | 51.588487 |  16.47769 | 3.130808 | 0.007367 |  [16.247357 86.929618] |
+| :day-of-week=:Sun | 51.947083 |  16.47769 | 3.152571 | 0.007055 |  [16.605952 87.288213] |
 
-F-statistic: 10.887419319252851 on degrees of freedom: {:residual 14, :model 7, :intercept 0}
-p-value: 1.0191663237901771E-4
+F-statistic: 10.923317487831053 on degrees of freedom: {:residual 14, :model 7, :intercept 0}
+p-value: 1.0005997960205182E-4
 
-R2: 0.84480989168932
-Adjusted R2: 0.76721483753398
-Residual standard error: 28.68280980538554 on 14 degrees of freedom
-AIC: 208.04516636125462
+R2: 0.8452409760974104
+Adjusted R2: 0.7678614641461157
+Residual standard error: 28.54019658104785 on 14 degrees of freedom
+AIC: 207.83581812173543
 
diff --git a/notebooks/index.clj b/notebooks/index.clj index ed70f11..ecab881 100644 --- a/notebooks/index.clj +++ b/notebooks/index.clj @@ -1,5 +1,13 @@ +;; Math and stats modelling with table ergonomics + ;; # Preface +;; This project is an initial attempt to create a Clojure library for math and statistics which is friendly to [tech.ml.dataset](https://github.com/techascent/tech.ml.dataset) and [Tablecloth](https://scicloj.github.io/tablecloth) datasets and uses the functionality of [Fastmath](https://github.com/generateme/fastmath). It is also intended to compose well with [Tableplot](https://scicloj.github.io/tableplot/) layered plotting. It is highly inspired by [R](https://www.r-project.org/) and its package. + +;; In a way, it is intended to be a user-friendly compatiblity layer across these libraries. + +;; Possibly, after the details clarify, it will be merged into one of the other Scicloj libraries. + ;; Tablemath is a Clojure library for math and statistical modeling ;; with table ergonomics, inspired by R.