Perform linear sequential g-estimation to estimate the controlled direct effect of a treatment net the effect of a mediator.
Source:R/DirectEffects.R
sequential_g.Rd
Perform linear sequential g-estimation to estimate the controlled direct effect of a treatment net the effect of a mediator.
Usage
sequential_g(
formula,
data,
subset,
weights,
na.action,
offset,
contrasts = NULL,
verbose = TRUE,
...
)
Arguments
- formula
formula specification of the first-stage, second-stage, and blip-down models. The right-hand side of the formula should have three components separated by the
|
, with the first component specifying the first-stage model with treatment and any baseline covariates, the second component specifying the intermediate covariates for the first-stage, and the third component specifying the blip-down model. See Details below for more information.- data
A dataframe to apply
formula
on.- subset
A vector of logicals indicating which rows of
data
to keep.- weights
an optional vector of weights to be used in the fitting process. Should be
NULL
or a numeric vector. If non-NULL, weighted least squares is used with weightsweights
(that is, minimizingsum(w*e^2)
); otherwise ordinary least squares is used. See also ‘Details’,- na.action
a function which indicates what should happen when the data contain
NA
s. The default is set by thena.action
setting ofoptions
, and isna.fail
if that is unset. The ‘factory-fresh’ default isna.omit
. Another possible value isNULL
, no action. Valuena.exclude
can be useful.- offset
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be
NULL
or a numeric vector or matrix of extents matching those of the response. One or moreoffset
terms can be included in the formula instead or as well, and if more than one are specified their sum is used. Seemodel.offset
.- contrasts
an optional list. See the
contrasts.arg
ofmodel.matrix.default
.- verbose
logical indicating whether to suppress progress bar. Default is FALSE.
- ...
For
lm()
: additional arguments to be passed to the low level regression fitting functions (see below).
Value
Returns an object of class
A "seqg"
. Similar
to the output of a call to lm
. Contains the following
components:
coefficients: a vector of named coefficients for the direct effects model.
residuals: the residuals, that is the blipped-down outcome minus the fitted values.
rank: the numeric rank of the fitted linear direct effects model.
fitted.values: the fitted mean values of the direct effects model.
weights: (only for weighted fits) the specified weights.
df.residual: the residual degrees of freedom for the direct effects model.
aliased: logical vector indicating if any of the terms were dropped or aliased due to perfect collinearity.
terms: the list of
terms
object used. One for the baseline covariates and treatment (X
) and one for the variables in the blip-down model (M
).formula: the
formula
object used, possibly modified to drop a constant in the blip-down model.call: the matched call.
na.action: (where relevant) information returned by
model.frame
of the special handling ofNA
s.xlevels: the levels of the factor variables.
contrasts: the contrasts used for the factor variables.
first_mod: the output from the first-stage regression model.
model: full model frame, including all variables.
Ytilde: the blipped-down response vector.
X: the model matrix for the second stage.
M: the model matrix for demediation/blip-down function.
In addition, non-null fits will have components assign
,
effects
, and qr
from the output of lm.fit
or
lm.wfit
, whichever is used.
Details
The sequential_g
function implements the linear
sequential g-estimator developed by Vansteelandt (2009) with the
consistent variance estimator developed by Acharya, Blackwell, and
Sen (2016).
The formula specifies specifies the full first-stage model
including treatment, baseline confounders, intermediate
confounders, and the mediators. The user places |
bars to
separate out these different components of the model. For
example, the formula should have the form y ~ tr + x1 + x2
| z1 + z2 | m1 + m2
. where tr
is the name of the
treatment variable, x1
and x2
are baseline
covariates, z1
and z2
are intermediate covariates,
and m1
and m2
are the names of the mediator
variables. This last set of variables specify the 'blip-down' or
'demediation' function that is used to remove the average effect
of the mediator (possibly interacted) from the outcome to create
the blipped-down outcome. This blipped-down outcome is the passed
to a standard linear model with the covariates as specified for
the direct effects model.
See the references below for more details.
References
Vansteelandt, S. (2009). Estimating Direct Effects in Cohort and Case-Control Studies. Epidemiology, 20(6), 851-860.
Acharya, Avidit, Blackwell, Matthew, and Sen, Maya. (2016) "Explaining Causal Effects Without Bias: Detecting and Assessing Direct Effects." American Political Science Review 110:3 pp. 512-529
Examples
data(ploughs)
form_main <- women_politics ~ plow +
agricultural_suitability + tropical_climate + large_animals +
political_hierarchies + economic_complexity +
rugged | years_civil_conflict +
years_interstate_conflict + oil_pc +
european_descent + communist_dummy + polity2_2000 +
serv_va_gdp2000 | centered_ln_inc + centered_ln_incsq
direct <- sequential_g(form_main, ploughs)
summary(direct)
#>
#> t test of coefficients:
#>
#> Estimate Std. Err. t value Pr(>|t|)
#> (Intercept) 12.18450 3.64442 3.3433 0.001121 **
#> plow -4.83879 2.34467 -2.0637 0.041312 *
#> agricultural_suitability 4.57388 3.10477 1.4732 0.143458
#> tropical_climate -2.18919 2.10505 -1.0400 0.300554
#> large_animals -1.33001 3.40008 -0.3912 0.696401
#> political_hierarchies 0.49575 1.09060 0.4546 0.650283
#> economic_complexity -0.10521 0.42973 -0.2448 0.807029
#> rugged -0.30869 0.47821 -0.6455 0.519888
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>