Perform linear sequential g-estimation to estimate the controlled direct effect of a treatment net the effect of a mediator.

Usage

sequential_g(
  formula,
  data,
  subset,
  weights,
  na.action,
  offset,
  contrasts = NULL,
  verbose = TRUE,
  ...
)

Arguments

formula: formula specification of the first-stage, second-stage, and blip-down models. The right-hand side of the formula should have three components separated by the |, with the first component specifying the first-stage model with treatment and any baseline covariates, the second component specifying the intermediate covariates for the first-stage, and the third component specifying the blip-down model. See Details below for more information.
data: A dataframe to apply formula on.
subset: A vector of logicals indicating which rows of data to keep.
weights: an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used. See also ‘Details’,
na.action: a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.
offset: this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector or matrix of extents matching those of the response. One or more offset terms can be included in the formula instead or as well, and if more than one are specified their sum is used. See model.offset.
contrasts: an optional list. See the contrasts.arg of model.matrix.default.
verbose: logical indicating whether to suppress progress bar. Default is FALSE.
...: For lm(): additional arguments to be passed to the low level regression fitting functions (see below).

Value

Returns an object of class A "seqg". Similar to the output of a call to lm. Contains the following components:

coefficients: a vector of named coefficients for the direct effects model.
residuals: the residuals, that is the blipped-down outcome minus the fitted values.
rank: the numeric rank of the fitted linear direct effects model.
fitted.values: the fitted mean values of the direct effects model.
weights: (only for weighted fits) the specified weights.
df.residual: the residual degrees of freedom for the direct effects model.
aliased: logical vector indicating if any of the terms were dropped or aliased due to perfect collinearity.
terms: the list of terms object used. One for the baseline covariates and treatment (X) and one for the variables in the blip-down model (M).
formula: the formula object used, possibly modified to drop a constant in the blip-down model.
call: the matched call.
na.action: (where relevant) information returned by model.frame of the special handling of NAs.
xlevels: the levels of the factor variables.
contrasts: the contrasts used for the factor variables.
first_mod: the output from the first-stage regression model.
model: full model frame, including all variables.
Ytilde: the blipped-down response vector.
X: the model matrix for the second stage.
M: the model matrix for demediation/blip-down function.

In addition, non-null fits will have components assign, effects, and qr from the output of lm.fit or lm.wfit, whichever is used.

Details

The sequential_g function implements the linear sequential g-estimator developed by Vansteelandt (2009) with the consistent variance estimator developed by Acharya, Blackwell, and Sen (2016).

The formula specifies specifies the full first-stage model including treatment, baseline confounders, intermediate confounders, and the mediators. The user places | bars to separate out these different components of the model. For example, the formula should have the form y ~ tr + x1 + x2 | z1 + z2 | m1 + m2. where tr is the name of the treatment variable, x1 and x2 are baseline covariates, z1 and z2 are intermediate covariates, and m1 and m2 are the names of the mediator variables. This last set of variables specify the 'blip-down' or 'demediation' function that is used to remove the average effect of the mediator (possibly interacted) from the outcome to create the blipped-down outcome. This blipped-down outcome is the passed to a standard linear model with the covariates as specified for the direct effects model.

See the references below for more details.

References

Vansteelandt, S. (2009). Estimating Direct Effects in Cohort and Case-Control Studies. Epidemiology, 20(6), 851-860.

Acharya, Avidit, Blackwell, Matthew, and Sen, Maya. (2016) "Explaining Causal Effects Without Bias: Detecting and Assessing Direct Effects." American Political Science Review 110:3 pp. 512-529

Examples

data(ploughs)

form_main <- women_politics ~ plow +
  agricultural_suitability + tropical_climate + large_animals +
  political_hierarchies + economic_complexity +
  rugged | years_civil_conflict +
  years_interstate_conflict  + oil_pc +
  european_descent + communist_dummy + polity2_2000 +
  serv_va_gdp2000 | centered_ln_inc + centered_ln_incsq

direct <- sequential_g(form_main, ploughs)

summary(direct)
#> 
#> t test of coefficients: 
#> 
#>                          Estimate Std. Err. t value Pr(>|t|)   
#> (Intercept)              12.18450   3.64442  3.3433 0.001121 **
#> plow                     -4.83879   2.34467 -2.0637 0.041312 * 
#> agricultural_suitability  4.57388   3.10477  1.4732 0.143458   
#> tropical_climate         -2.18919   2.10505 -1.0400 0.300554   
#> large_animals            -1.33001   3.40008 -0.3912 0.696401   
#> political_hierarchies     0.49575   1.09060  0.4546 0.650283   
#> economic_complexity      -0.10521   0.42973 -0.2448 0.807029   
#> rugged                   -0.30869   0.47821 -0.6455 0.519888   
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>