Useful R code snippets Transforming the output of xtabs to a dataframeFirst method, which changes formatting: > realization.xt = xtabs(~ verbs$Verb + verbs$RealizationOfRec) > real.xt.df = data.frame(realization.xt) > head(real.xt.df) verbs.Verb verbs.RealizationOfRec Freq 1 accord NP 1 2 allocate NP 0 3 allow NP 6 4 assess NP 1 5 assure NP 2 6 award NP 7 Second method, keeps formatting: > real.xt.df.2 = data.frame(realization.xt[,]) > head(real.xt.df2) NP PP accord 1 0 allocate 0 3 allow 6 0 assess 1 0 assure 2 0 award 7 11 Computing the mode of a distributionFirst method, only gives you one mode, even if the distribution has multiple items that occur with maximum frequency: > x = c(3,6,8,5,2,1,1,9,34,23,4567,1) > freq.df = data.frame(xtabs(~x)) > freq.df[which.max(freq.df$Freq),]$x [1] 1 Second method, which gives you all modes: > x = c(1,1,2,2) > freq.df = data.frame(xtabs(~x)) > freq.df[which(freq.df$Freq == max(freq.df$Freq)),]$x [1] 1 2 Levels: 1 2 Controlling the number of digits printedUse options() to set general options, among them the number of digits printed: > pi [1] 3.141593 > options(digits=3) > pi [1] 3.14 Switching the levels of the dependent variable in logistic regression In logistic regression, if the values of the dependent variable are a factor, they are re-coded into 0 and 1.By default, the alphabetically prior level is coded as 0, and the other as 1:
Here is how to switch them: > switch.dependent.levels <- function(vec) { abs(as.numeric(vec) - 2)} > lrm(switch.dependent.levels(dative$RealizationOfRecipient) ~ LengthOfTheme, data = dative) Logistic Regression Model lrm(formula = switch.dependent.levels(dative$RealizationOfRecipient) ~ LengthOfTheme, data = dative) Model Likelihood Discrimination Rank Discrim. Ratio Test Indexes Indexes Obs 3263 LR chi2 157.66 R2 0.069 C 0.655 0 849 d.f. 1 g 0.655 Dxy 0.311 1 2414 Pr(> chi2) <0.0001 gr 1.925 gamma 0.367 max |deriv| 3e-08 gp 0.095 tau-a 0.120 Brier 0.184 Coef S.E. Wald Z Pr(>|Z|) Intercept 0.4430 0.0644 6.88 <0.0001 LengthOfTheme 0.1668 0.0160 10.39 <0.0001 > lrm(abs(as.numeric(RealizationOfRecipient) - 2) ~ LengthOfTheme,data=dative) Plotting partial effects of predictors when using ols()Baayen's code for plotting partial effects (p.175), which worked for the Design package, does not work for the rms package. Here is how you need to adapt your code. We use as an example the same regression model with a nonlinearity that Baayen uses: predicting reaction time in the "english" data frame. First, in this case you do need a datadist object. english.dd = datadist(english) options(datadist = "english.dd" english.olsB = ols(RTlexdec ~ pol(WrittenFrequency, 2) + AgeSubject + LengthInLetters, data = english) Now instead of plotting english.olsB directly, you just plot the Predict object. plot(Predict(english.olsB)) Converting categorical variables to a binary vector for use in clusteringTo convert a many-valued categorical nominal (not ordered) variable for a dataset to a series of binary vectors, for e.g. clustering, you can do use the dummies package as follows: install.packages("dummies") library(dummies) dummy(dative$SemanticClass) Say you want to do get distances between datapoints with multiple many-valued features(e.g. in order to allow you to do clustering), you will need to convert them all and combine them. For example to convert and combine two categorical variables in the first 100 datapoints from the dative dataset, you can do the following: recodeddativecats = cbind(dummy(dative[1:100,]$SemanticClass),dummy(dative[1:100,]$AccessOfTheme)) This represents each data-point as a series of binary values. You can then find the distances between the datapoints as follows: x = dist(recodeddativecats,method="binary") Comparing two nested lrm models M1 and M2 using a likelihood ratio testLRtest <- function(model1,model2){ print(model1$dev[2] - model2$dev[2]) print(1 - pchisq((model1$dev[2] - model2$dev[2]),(model2$stats['d.f.'] - model1$stats['d.f.']))) } LRtest(M1,M2) Calculating AIC for an lrm model M1aic <- function(model) { round(unname(model$dev[2] + 2*(model$stats['d.f.']+1)),2) } aic(M1) Model comparison for mlogit models M1, M2For nested mlogit models you can use the command lrtest to perform a likelihood ratio test. So for the example models from class you might do the following: > lrtest(M1,M2) Likelihood ratio test Model 1: EtymAge ~ 1 Model 2: EtymAge ~ 1 | NcountStem #Df LogLik Df Chisq Pr(>Chisq) 1 4 -332.39 2 8 -328.63 4 7.5124 0.1112 For non-nested models, the following function can be used to get AIC values. Just paste this into the console and you can then do get to AIC value for a model M1 by calling mlogitAIC(M1). mlogitAIC <- function(model){ (-model$logLik[1]*2) + 2*length(model$coeff) } |