[Update on 2019-03-01] I completely rewrite the Python program. The updates include:
- I include two domain-specific dictionaries: Loughran and McDonald’s and Henry’s dictionaries, and you can choose which dictionary to use.
- I add negation check as suggested by Loughran and McDonald (2011). That is, any occurrence of negate words (e.g., isn’t, not, never) within three words preceding a positive word will flip that positive word into a negative one. Negation check only applies to positive words because Loughran and McDonald (2011) suggest that double negation (i.e., a negate word precedes a negative word) is not common. I expand their negate word list though, since theirs seem incomplete. In my sample of 90,000+ press releases, negation check finds that 5.7% of press releases have positive word(s) with a preceding negate word.
Please note:
- The Python program first transform an article into a bag of words in their original order. Different research questions may define “word” differently. For example, some research questions only look at alphabetic words (i.e., remove all numbers in an article). I use this definition in the following Python program. But you may want to change this to suit your research question. In addition, there are many nuances in splitting sentences into words. The splitting method in the following Python program is simple but imperfect of course.
- To use the Python program, you have to know how to assign the full text of an article to the variable
article
(using a loop) and how to output the results into a database-like file (Sqlite or CSV).
I acknowledge the work done by C.J. Hutto (see his work at GitHub).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 |
import re # Loughran and McDonald Sentiment Word Lists (https://sraf.nd.edu/textual-analysis/resources/) lmdict = {'Negative': ['abandon', 'abandoned', 'abandoning', 'abandonment', 'abandonments', 'abandons', 'abdicated', 'abdicates', 'abdicating', 'abdication', 'abdications', 'aberrant', 'aberration', 'aberrational', 'aberrations', 'abetting', 'abnormal', 'abnormalities', 'abnormality', 'abnormally', 'abolish', 'abolished', 'abolishes', 'abolishing', 'abrogate', 'abrogated', 'abrogates', 'abrogating', 'abrogation', 'abrogations', 'abrupt', 'abruptly', 'abruptness', 'absence', 'absences', 'absenteeism', 'abuse', 'abused', 'abuses', 'abusing', 'abusive', 'abusively', 'abusiveness', 'accident', 'accidental', 'accidentally', 'accidents', 'accusation', 'accusations', 'accuse', 'accused', 'accuses', 'accusing', 'acquiesce', 'acquiesced', 'acquiesces', 'acquiescing', 'acquit', 'acquits', 'acquittal', 'acquittals', 'acquitted', 'acquitting', 'adulterate', 'adulterated', 'adulterating', 'adulteration', 'adulterations', 'adversarial', 'adversaries', 'adversary', 'adverse', 'adversely', 'adversities', 'adversity', 'aftermath', 'aftermaths', 'against', 'aggravate', 'aggravated', 'aggravates', 'aggravating', 'aggravation', 'aggravations', 'alerted', 'alerting', 'alienate', 'alienated', 'alienates', 'alienating', 'alienation', 'alienations', 'allegation', 'allegations', 'allege', 'alleged', 'allegedly', 'alleges', 'alleging', 'annoy', 'annoyance', 'annoyances', 'annoyed', 'annoying', 'annoys', 'annul', 'annulled', 'annulling', 'annulment', 'annulments', 'annuls', 'anomalies', 'anomalous', 'anomalously', 'anomaly', 'anticompetitive', 'antitrust', 'argue', 'argued', 'arguing', 'argument', 'argumentative', 'arguments', 'arrearage', 'arrearages', 'arrears', 'arrest', 'arrested', 'arrests', 'artificially', 'assault', 'assaulted', 'assaulting', 'assaults', 'assertions', 'attrition', 'aversely', 'backdating', 'bad', 'bail', 'bailout', 'balk', 'balked', 'bankrupt', 'bankruptcies', 'bankruptcy', 'bankrupted', 'bankrupting', 'bankrupts', 'bans', 'barred', 'barrier', 'barriers', 'bottleneck', 'bottlenecks', 'boycott', 'boycotted', 'boycotting', 'boycotts', 'breach', 'breached', 'breaches', 'breaching', 'break', 'breakage', 'breakages', 'breakdown', 'breakdowns', 'breaking', 'breaks', 'bribe', 'bribed', 'briberies', 'bribery', 'bribes', 'bribing', 'bridge', 'broken', 'burden', 'burdened', 'burdening', 'burdens', 'burdensome', 'burned', 'calamities', 'calamitous', 'calamity', 'cancel', 'canceled', 'canceling', 'cancellation', 'cancellations', 'cancelled', 'cancelling', 'cancels', 'careless', 'carelessly', 'carelessness', 'catastrophe', 'catastrophes', 'catastrophic', 'catastrophically', 'caution', 'cautionary', 'cautioned', 'cautioning', 'cautions', 'cease', 'ceased', 'ceases', 'ceasing', 'censure', 'censured', 'censures', 'censuring', 'challenge', 'challenged', 'challenges', 'challenging', 'chargeoffs', 'circumvent', 'circumvented', 'circumventing', 'circumvention', 'circumventions', 'circumvents', 'claiming', 'claims', 'clawback', 'closed', 'closeout', 'closeouts', 'closing', 'closings', 'closure', 'closures', 'coerce', 'coerced', 'coerces', 'coercing', 'coercion', 'coercive', 'collapse', 'collapsed', 'collapses', 'collapsing', 'collision', 'collisions', 'collude', 'colluded', 'colludes', 'colluding', 'collusion', 'collusions', 'collusive', 'complain', 'complained', 'complaining', 'complains', 'complaint', 'complaints', 'complicate', 'complicated', 'complicates', 'complicating', 'complication', 'complications', 'compulsion', 'concealed', 'concealing', 'concede', 'conceded', 'concedes', 'conceding', 'concern', 'concerned', 'concerns', 'conciliating', 'conciliation', 'conciliations', 'condemn', 'condemnation', 'condemnations', 'condemned', 'condemning', 'condemns', 'condone', 'condoned', 'confess', 'confessed', 'confesses', 'confessing', 'confession', 'confine', 'confined', 'confinement', 'confinements', 'confines', 'confining', 'confiscate', 'confiscated', 'confiscates', 'confiscating', 'confiscation', 'confiscations', 'conflict', 'conflicted', 'conflicting', 'conflicts', 'confront', 'confrontation', 'confrontational', 'confrontations', 'confronted', 'confronting', 'confronts', 'confuse', 'confused', 'confuses', 'confusing', 'confusingly', 'confusion', 'conspiracies', 'conspiracy', 'conspirator', 'conspiratorial', 'conspirators', 'conspire', 'conspired', 'conspires', 'conspiring', 'contempt', 'contend', 'contended', 'contending', 'contends', 'contention', 'contentions', 'contentious', 'contentiously', 'contested', 'contesting', 'contraction', 'contractions', 'contradict', 'contradicted', 'contradicting', 'contradiction', 'contradictions', 'contradictory', 'contradicts', 'contrary', 'controversial', 'controversies', 'controversy', 'convict', 'convicted', 'convicting', 'conviction', 'convictions', 'corrected', 'correcting', 'correction', 'corrections', 'corrects', 'corrupt', 'corrupted', 'corrupting', 'corruption', 'corruptions', 'corruptly', 'corruptness', 'costly', 'counterclaim', 'counterclaimed', 'counterclaiming', 'counterclaims', 'counterfeit', 'counterfeited', 'counterfeiter', 'counterfeiters', 'counterfeiting', 'counterfeits', 'countermeasure', 'countermeasures', 'crime', 'crimes', 'criminal', 'criminally', 'criminals', 'crises', 'crisis', 'critical', 'critically', 'criticism', 'criticisms', 'criticize', 'criticized', 'criticizes', 'criticizing', 'crucial', 'crucially', 'culpability', 'culpable', 'culpably', 'cumbersome', 'curtail', 'curtailed', 'curtailing', 'curtailment', 'curtailments', 'curtails', 'cut', 'cutback', 'cutbacks', 'cyberattack', 'cyberattacks', 'cyberbullying', 'cybercrime', 'cybercrimes', 'cybercriminal', 'cybercriminals', 'damage', 'damaged', 'damages', 'damaging', 'dampen', 'dampened', 'danger', 'dangerous', 'dangerously', 'dangers', 'deadlock', 'deadlocked', 'deadlocking', 'deadlocks', 'deadweight', 'deadweights', 'debarment', 'debarments', 'debarred', 'deceased', 'deceit', 'deceitful', 'deceitfulness', 'deceive', 'deceived', 'deceives', 'deceiving', 'deception', 'deceptions', 'deceptive', 'deceptively', 'decline', 'declined', 'declines', 'declining', 'deface', 'defaced', 'defacement', 'defamation', 'defamations', 'defamatory', 'defame', 'defamed', 'defames', 'defaming', 'default', 'defaulted', 'defaulting', 'defaults', 'defeat', 'defeated', 'defeating', 'defeats', 'defect', 'defective', 'defects', 'defend', 'defendant', 'defendants', 'defended', 'defending', 'defends', 'defensive', 'defer', 'deficiencies', 'deficiency', 'deficient', 'deficit', 'deficits', 'defraud', 'defrauded', 'defrauding', 'defrauds', 'defunct', 'degradation', 'degradations', 'degrade', 'degraded', 'degrades', 'degrading', 'delay', 'delayed', 'delaying', 'delays', 'deleterious', 'deliberate', 'deliberated', 'deliberately', 'delinquencies', 'delinquency', 'delinquent', 'delinquently', 'delinquents', 'delist', 'delisted', 'delisting', 'delists', 'demise', 'demised', 'demises', 'demising', 'demolish', 'demolished', 'demolishes', 'demolishing', 'demolition', 'demolitions', 'demote', 'demoted', 'demotes', 'demoting', 'demotion', 'demotions', 'denial', 'denials', 'denied', 'denies', 'denigrate', 'denigrated', 'denigrates', 'denigrating', 'denigration', 'deny', 'denying', 'deplete', 'depleted', 'depletes', 'depleting', 'depletion', 'depletions', 'deprecation', 'depress', 'depressed', 'depresses', 'depressing', 'deprivation', 'deprive', 'deprived', 'deprives', 'depriving', 'derelict', 'dereliction', 'derogatory', 'destabilization', 'destabilize', 'destabilized', 'destabilizing', 'destroy', 'destroyed', 'destroying', 'destroys', 'destruction', 'destructive', 'detain', 'detained', 'detention', 'detentions', 'deter', 'deteriorate', 'deteriorated', 'deteriorates', 'deteriorating', 'deterioration', 'deteriorations', 'deterred', 'deterrence', 'deterrences', 'deterrent', 'deterrents', 'deterring', 'deters', 'detract', 'detracted', 'detracting', 'detriment', 'detrimental', 'detrimentally', 'detriments', 'devalue', 'devalued', 'devalues', 'devaluing', 'devastate', 'devastated', 'devastating', 'devastation', 'deviate', 'deviated', 'deviates', 'deviating', 'deviation', 'deviations', 'devolve', 'devolved', 'devolves', 'devolving', 'difficult', 'difficulties', 'difficultly', 'difficulty', 'diminish', 'diminished', 'diminishes', 'diminishing', 'diminution', 'disadvantage', 'disadvantaged', 'disadvantageous', 'disadvantages', 'disaffiliation', 'disagree', 'disagreeable', 'disagreed', 'disagreeing', 'disagreement', 'disagreements', 'disagrees', 'disallow', 'disallowance', 'disallowances', 'disallowed', 'disallowing', 'disallows', 'disappear', 'disappearance', 'disappearances', 'disappeared', 'disappearing', 'disappears', 'disappoint', 'disappointed', 'disappointing', 'disappointingly', 'disappointment', 'disappointments', 'disappoints', 'disapproval', 'disapprovals', 'disapprove', 'disapproved', 'disapproves', 'disapproving', 'disassociates', 'disassociating', 'disassociation', 'disassociations', 'disaster', 'disasters', 'disastrous', 'disastrously', 'disavow', 'disavowal', 'disavowed', 'disavowing', 'disavows', 'disciplinary', 'disclaim', 'disclaimed', 'disclaimer', 'disclaimers', 'disclaiming', 'disclaims', 'disclose', 'disclosed', 'discloses', 'disclosing', 'discontinuance', 'discontinuances', 'discontinuation', 'discontinuations', 'discontinue', 'discontinued', 'discontinues', 'discontinuing', 'discourage', 'discouraged', 'discourages', 'discouraging', 'discredit', 'discredited', 'discrediting', 'discredits', 'discrepancies', 'discrepancy', 'disfavor', 'disfavored', 'disfavoring', 'disfavors', 'disgorge', 'disgorged', 'disgorgement', 'disgorgements', 'disgorges', 'disgorging', 'disgrace', 'disgraceful', 'disgracefully', 'dishonest', 'dishonestly', 'dishonesty', 'dishonor', 'dishonorable', 'dishonorably', 'dishonored', 'dishonoring', 'dishonors', 'disincentives', 'disinterested', 'disinterestedly', 'disinterestedness', 'disloyal', 'disloyally', 'disloyalty', 'dismal', 'dismally', 'dismiss', 'dismissal', 'dismissals', 'dismissed', 'dismisses', 'dismissing', 'disorderly', 'disparage', 'disparaged', 'disparagement', 'disparagements', 'disparages', 'disparaging', 'disparagingly', 'disparities', 'disparity', 'displace', 'displaced', 'displacement', 'displacements', 'displaces', 'displacing', 'dispose', 'dispossess', 'dispossessed', 'dispossesses', 'dispossessing', 'disproportion', 'disproportional', 'disproportionate', 'disproportionately', 'dispute', 'disputed', 'disputes', 'disputing', 'disqualification', 'disqualifications', 'disqualified', 'disqualifies', 'disqualify', 'disqualifying', 'disregard', 'disregarded', 'disregarding', 'disregards', 'disreputable', 'disrepute', 'disrupt', 'disrupted', 'disrupting', 'disruption', 'disruptions', 'disruptive', 'disrupts', 'dissatisfaction', 'dissatisfied', 'dissent', 'dissented', 'dissenter', 'dissenters', 'dissenting', 'dissents', 'dissident', 'dissidents', 'dissolution', 'dissolutions', 'distort', 'distorted', 'distorting', 'distortion', 'distortions', 'distorts', 'distract', 'distracted', 'distracting', 'distraction', 'distractions', 'distracts', 'distress', 'distressed', 'disturb', 'disturbance', 'disturbances', 'disturbed', 'disturbing', 'disturbs', 'diversion', 'divert', 'diverted', 'diverting', 'diverts', 'divest', 'divested', 'divesting', 'divestiture', 'divestitures', 'divestment', 'divestments', 'divests', 'divorce', 'divorced', 'divulge', 'divulged', 'divulges', 'divulging', 'doubt', 'doubted', 'doubtful', 'doubts', 'downgrade', 'downgraded', 'downgrades', 'downgrading', 'downsize', 'downsized', 'downsizes', 'downsizing', 'downsizings', 'downtime', 'downtimes', 'downturn', 'downturns', 'downward', 'downwards', 'drag', 'drastic', 'drastically', 'drawback', 'drawbacks', 'dropped', 'drought', 'droughts', 'duress', 'dysfunction', 'dysfunctional', 'dysfunctions', 'easing', 'egregious', 'egregiously', 'embargo', 'embargoed', 'embargoes', 'embargoing', 'embarrass', 'embarrassed', 'embarrasses', 'embarrassing', 'embarrassment', 'embarrassments', 'embezzle', 'embezzled', 'embezzlement', 'embezzlements', 'embezzler', 'embezzles', 'embezzling', 'encroach', 'encroached', 'encroaches', 'encroaching', 'encroachment', 'encroachments', 'encumber', 'encumbered', 'encumbering', 'encumbers', 'encumbrance', 'encumbrances', 'endanger', 'endangered', 'endangering', 'endangerment', 'endangers', 'enjoin', 'enjoined', 'enjoining', 'enjoins', 'erode', 'eroded', 'erodes', 'eroding', 'erosion', 'erratic', 'erratically', 'erred', 'erring', 'erroneous', 'erroneously', 'error', 'errors', 'errs', 'escalate', 'escalated', 'escalates', 'escalating', 'evade', 'evaded', 'evades', 'evading', 'evasion', 'evasions', 'evasive', 'evict', 'evicted', 'evicting', 'eviction', 'evictions', 'evicts', 'exacerbate', 'exacerbated', 'exacerbates', 'exacerbating', 'exacerbation', 'exacerbations', 'exaggerate', 'exaggerated', 'exaggerates', 'exaggerating', 'exaggeration', 'excessive', 'excessively', 'exculpate', 'exculpated', 'exculpates', 'exculpating', 'exculpation', 'exculpations', 'exculpatory', 'exonerate', 'exonerated', 'exonerates', 'exonerating', 'exoneration', 'exonerations', 'exploit', 'exploitation', 'exploitations', 'exploitative', 'exploited', 'exploiting', 'exploits', 'expose', 'exposed', 'exposes', 'exposing', 'expropriate', 'expropriated', 'expropriates', 'expropriating', 'expropriation', 'expropriations', 'expulsion', 'expulsions', 'extenuating', 'fail', 'failed', 'failing', 'failings', 'fails', 'failure', 'failures', 'fallout', 'false', 'falsely', 'falsification', 'falsifications', 'falsified', 'falsifies', 'falsify', 'falsifying', 'falsity', 'fatalities', 'fatality', 'fatally', 'fault', 'faulted', 'faults', 'faulty', 'fear', 'fears', 'felonies', 'felonious', 'felony', 'fictitious', 'fined', 'fines', 'fired', 'firing', 'flaw', 'flawed', 'flaws', 'forbid', 'forbidden', 'forbidding', 'forbids', 'force', 'forced', 'forcing', 'foreclose', 'foreclosed', 'forecloses', 'foreclosing', 'foreclosure', 'foreclosures', 'forego', 'foregoes', 'foregone', 'forestall', 'forestalled', 'forestalling', 'forestalls', 'forfeit', 'forfeited', 'forfeiting', 'forfeits', 'forfeiture', 'forfeitures', 'forgers', 'forgery', 'fraud', 'frauds', 'fraudulence', 'fraudulent', 'fraudulently', 'frivolous', 'frivolously', 'frustrate', 'frustrated', 'frustrates', 'frustrating', 'frustratingly', 'frustration', 'frustrations', 'fugitive', 'fugitives', 'gratuitous', 'gratuitously', 'grievance', 'grievances', 'grossly', 'groundless', 'guilty', 'halt', 'halted', 'hamper', 'hampered', 'hampering', 'hampers', 'harass', 'harassed', 'harassing', 'harassment', 'hardship', 'hardships', 'harm', 'harmed', 'harmful', 'harmfully', 'harming', 'harms', 'harsh', 'harsher', 'harshest', 'harshly', 'harshness', 'hazard', 'hazardous', 'hazards', 'hinder', 'hindered', 'hindering', 'hinders', 'hindrance', 'hindrances', 'hostile', 'hostility', 'hurt', 'hurting', 'idle', 'idled', 'idling', 'ignore', 'ignored', 'ignores', 'ignoring', 'ill', 'illegal', 'illegalities', 'illegality', 'illegally', 'illegible', 'illicit', 'illicitly', 'illiquid', 'illiquidity', 'imbalance', 'imbalances', 'immature', 'immoral', 'impair', 'impaired', 'impairing', 'impairment', 'impairments', 'impairs', 'impasse', 'impasses', 'impede', 'impeded', 'impedes', 'impediment', 'impediments', 'impeding', 'impending', 'imperative', 'imperfection', 'imperfections', 'imperil', 'impermissible', 'implicate', 'implicated', 'implicates', 'implicating', 'impossibility', 'impossible', 'impound', 'impounded', 'impounding', 'impounds', 'impracticable', 'impractical', 'impracticalities', 'impracticality', 'imprisonment', 'improper', 'improperly', 'improprieties', 'impropriety', 'imprudent', 'imprudently', 'inability', 'inaccessible', 'inaccuracies', 'inaccuracy', 'inaccurate', 'inaccurately', 'inaction', 'inactions', 'inactivate', 'inactivated', 'inactivates', 'inactivating', 'inactivation', 'inactivations', 'inactivity', 'inadequacies', 'inadequacy', 'inadequate', 'inadequately', 'inadvertent', 'inadvertently', 'inadvisability', 'inadvisable', 'inappropriate', 'inappropriately', 'inattention', 'incapable', 'incapacitated', 'incapacity', 'incarcerate', 'incarcerated', 'incarcerates', 'incarcerating', 'incarceration', 'incarcerations', 'incidence', 'incidences', 'incident', 'incidents', 'incompatibilities', 'incompatibility', 'incompatible', 'incompetence', 'incompetency', 'incompetent', 'incompetently', 'incompetents', 'incomplete', 'incompletely', 'incompleteness', 'inconclusive', 'inconsistencies', 'inconsistency', 'inconsistent', 'inconsistently', 'inconvenience', 'inconveniences', 'inconvenient', 'incorrect', 'incorrectly', 'incorrectness', 'indecency', 'indecent', 'indefeasible', 'indefeasibly', 'indict', 'indictable', 'indicted', 'indicting', 'indictment', 'indictments', 'ineffective', 'ineffectively', 'ineffectiveness', 'inefficiencies', 'inefficiency', 'inefficient', 'inefficiently', 'ineligibility', 'ineligible', 'inequitable', 'inequitably', 'inequities', 'inequity', 'inevitable', 'inexperience', 'inexperienced', 'inferior', 'inflicted', 'infraction', 'infractions', 'infringe', 'infringed', 'infringement', 'infringements', 'infringes', 'infringing', 'inhibited', 'inimical', 'injunction', 'injunctions', 'injure', 'injured', 'injures', 'injuries', 'injuring', 'injurious', 'injury', 'inordinate', 'inordinately', 'inquiry', 'insecure', 'insensitive', 'insolvencies', 'insolvency', 'insolvent', 'instability', 'insubordination', 'insufficiency', 'insufficient', 'insufficiently', 'insurrection', 'insurrections', 'intentional', 'interfere', 'interfered', 'interference', 'interferences', 'interferes', 'interfering', 'intermittent', 'intermittently', 'interrupt', 'interrupted', 'interrupting', 'interruption', 'interruptions', 'interrupts', 'intimidation', 'intrusion', 'invalid', 'invalidate', 'invalidated', 'invalidates', 'invalidating', 'invalidation', 'invalidity', 'investigate', 'investigated', 'investigates', 'investigating', 'investigation', 'investigations', 'involuntarily', 'involuntary', 'irreconcilable', 'irreconcilably', 'irrecoverable', 'irrecoverably', 'irregular', 'irregularities', 'irregularity', 'irregularly', 'irreparable', 'irreparably', 'irreversible', 'jeopardize', 'jeopardized', 'justifiable', 'kickback', 'kickbacks', 'knowingly', 'lack', 'lacked', 'lacking', 'lackluster', 'lacks', 'lag', 'lagged', 'lagging', 'lags', 'lapse', 'lapsed', 'lapses', 'lapsing', 'late', 'laundering', 'layoff', 'layoffs', 'lie', 'limitation', 'limitations', 'lingering', 'liquidate', 'liquidated', 'liquidates', 'liquidating', 'liquidation', 'liquidations', 'liquidator', 'liquidators', 'litigant', 'litigants', 'litigate', 'litigated', 'litigates', 'litigating', 'litigation', 'litigations', 'lockout', 'lockouts', 'lose', 'loses', 'losing', 'loss', 'losses', 'lost', 'lying', 'malfeasance', 'malfunction', 'malfunctioned', 'malfunctioning', 'malfunctions', 'malice', 'malicious', 'maliciously', 'malpractice', 'manipulate', 'manipulated', 'manipulates', 'manipulating', 'manipulation', 'manipulations', 'manipulative', 'markdown', 'markdowns', 'misapplication', 'misapplications', 'misapplied', 'misapplies', 'misapply', 'misapplying', 'misappropriate', 'misappropriated', 'misappropriates', 'misappropriating', 'misappropriation', 'misappropriations', 'misbranded', 'miscalculate', 'miscalculated', 'miscalculates', 'miscalculating', 'miscalculation', 'miscalculations', 'mischaracterization', 'mischief', 'misclassification', 'misclassifications', 'misclassified', 'misclassify', 'miscommunication', 'misconduct', 'misdated', 'misdemeanor', 'misdemeanors', 'misdirected', 'mishandle', 'mishandled', 'mishandles', 'mishandling', 'misinform', 'misinformation', 'misinformed', 'misinforming', 'misinforms', 'misinterpret', 'misinterpretation', 'misinterpretations', 'misinterpreted', 'misinterpreting', 'misinterprets', 'misjudge', 'misjudged', 'misjudges', 'misjudging', 'misjudgment', 'misjudgments', 'mislabel', 'mislabeled', 'mislabeling', 'mislabelled', 'mislabels', 'mislead', 'misleading', 'misleadingly', 'misleads', 'misled', 'mismanage', 'mismanaged', 'mismanagement', 'mismanages', 'mismanaging', 'mismatch', 'mismatched', 'mismatches', 'mismatching', 'misplaced', 'misprice', 'mispricing', 'mispricings', 'misrepresent', 'misrepresentation', 'misrepresentations', 'misrepresented', 'misrepresenting', 'misrepresents', 'miss', 'missed', 'misses', 'misstate', 'misstated', 'misstatement', 'misstatements', 'misstates', 'misstating', 'misstep', 'missteps', 'mistake', 'mistaken', 'mistakenly', 'mistakes', 'mistaking', 'mistrial', 'mistrials', 'misunderstand', 'misunderstanding', 'misunderstandings', 'misunderstood', 'misuse', 'misused', 'misuses', 'misusing', 'monopolistic', 'monopolists', 'monopolization', 'monopolize', 'monopolized', 'monopolizes', 'monopolizing', 'monopoly', 'moratoria', 'moratorium', 'moratoriums', 'mothballed', 'mothballing', 'negative', 'negatively', 'negatives', 'neglect', 'neglected', 'neglectful', 'neglecting', 'neglects', 'negligence', 'negligences', 'negligent', 'negligently', 'nonattainment', 'noncompetitive', 'noncompliance', 'noncompliances', 'noncompliant', 'noncomplying', 'nonconforming', 'nonconformities', 'nonconformity', 'nondisclosure', 'nonfunctional', 'nonpayment', 'nonpayments', 'nonperformance', 'nonperformances', 'nonperforming', 'nonproducing', 'nonproductive', 'nonrecoverable', 'nonrenewal', 'nuisance', 'nuisances', 'nullification', 'nullifications', 'nullified', 'nullifies', 'nullify', 'nullifying', 'objected', 'objecting', 'objection', 'objectionable', 'objectionably', 'objections', 'obscene', 'obscenity', 'obsolescence', 'obsolete', 'obstacle', 'obstacles', 'obstruct', 'obstructed', 'obstructing', 'obstruction', 'obstructions', 'offence', 'offences', 'offend', 'offended', 'offender', 'offenders', 'offending', 'offends', 'omission', 'omissions', 'omit', 'omits', 'omitted', 'omitting', 'onerous', 'opportunistic', 'opportunistically', 'oppose', 'opposed', 'opposes', 'opposing', 'opposition', 'oppositions', 'outage', 'outages', 'outdated', 'outmoded', 'overage', 'overages', 'overbuild', 'overbuilding', 'overbuilds', 'overbuilt', 'overburden', 'overburdened', 'overburdening', 'overcapacities', 'overcapacity', 'overcharge', 'overcharged', 'overcharges', 'overcharging', 'overcome', 'overcomes', 'overcoming', 'overdue', 'overestimate', 'overestimated', 'overestimates', 'overestimating', 'overestimation', 'overestimations', 'overload', 'overloaded', 'overloading', 'overloads', 'overlook', 'overlooked', 'overlooking', 'overlooks', 'overpaid', 'overpayment', 'overpayments', 'overproduced', 'overproduces', 'overproducing', 'overproduction', 'overrun', 'overrunning', 'overruns', 'overshadow', 'overshadowed', 'overshadowing', 'overshadows', 'overstate', 'overstated', 'overstatement', 'overstatements', 'overstates', 'overstating', 'oversupplied', 'oversupplies', 'oversupply', 'oversupplying', 'overtly', 'overturn', 'overturned', 'overturning', 'overturns', 'overvalue', 'overvalued', 'overvaluing', 'panic', 'panics', 'penalize', 'penalized', 'penalizes', 'penalizing', 'penalties', 'penalty', 'peril', 'perils', 'perjury', 'perpetrate', 'perpetrated', 'perpetrates', 'perpetrating', 'perpetration', 'persist', 'persisted', 'persistence', 'persistent', 'persistently', 'persisting', 'persists', 'pervasive', 'pervasively', 'pervasiveness', 'petty', 'picket', 'picketed', 'picketing', 'plaintiff', 'plaintiffs', 'plea', 'plead', 'pleaded', 'pleading', 'pleadings', 'pleads', 'pleas', 'pled', 'poor', 'poorly', 'poses', 'posing', 'postpone', 'postponed', 'postponement', 'postponements', 'postpones', 'postponing', 'precipitated', 'precipitous', 'precipitously', 'preclude', 'precluded', 'precludes', 'precluding', 'predatory', 'prejudice', 'prejudiced', 'prejudices', 'prejudicial', 'prejudicing', 'premature', 'prematurely', 'pressing', 'pretrial', 'preventing', 'prevention', 'prevents', 'problem', 'problematic', 'problematical', 'problems', 'prolong', 'prolongation', 'prolongations', 'prolonged', 'prolonging', 'prolongs', 'prone', 'prosecute', 'prosecuted', 'prosecutes', 'prosecuting', 'prosecution', 'prosecutions', 'protest', 'protested', 'protester', 'protesters', 'protesting', 'protestor', 'protestors', 'protests', 'protracted', 'protraction', 'provoke', 'provoked', 'provokes', 'provoking', 'punished', 'punishes', 'punishing', 'punishment', 'punishments', 'punitive', 'purport', 'purported', 'purportedly', 'purporting', 'purports', 'question', 'questionable', 'questionably', 'questioned', 'questioning', 'questions', 'quit', 'quitting', 'racketeer', 'racketeering', 'rationalization', 'rationalizations', 'rationalize', 'rationalized', 'rationalizes', 'rationalizing', 'reassessment', 'reassessments', 'reassign', 'reassigned', 'reassigning', 'reassignment', 'reassignments', 'reassigns', 'recall', 'recalled', 'recalling', 'recalls', 'recession', 'recessionary', 'recessions', 'reckless', 'recklessly', 'recklessness', 'redact', 'redacted', 'redacting', 'redaction', 'redactions', 'redefault', 'redefaulted', 'redefaults', 'redress', 'redressed', 'redresses', 'redressing', 'refusal', 'refusals', 'refuse', 'refused', 'refuses', 'refusing', 'reject', 'rejected', 'rejecting', 'rejection', 'rejections', 'rejects', 'relinquish', 'relinquished', 'relinquishes', 'relinquishing', 'relinquishment', 'relinquishments', 'reluctance', 'reluctant', 'renegotiate', 'renegotiated', 'renegotiates', 'renegotiating', 'renegotiation', 'renegotiations', 'renounce', 'renounced', 'renouncement', 'renouncements', 'renounces', 'renouncing', 'reparation', 'reparations', 'repossessed', 'repossesses', 'repossessing', 'repossession', 'repossessions', 'repudiate', 'repudiated', 'repudiates', 'repudiating', 'repudiation', 'repudiations', 'resign', 'resignation', 'resignations', 'resigned', 'resigning', 'resigns', 'restate', 'restated', 'restatement', 'restatements', 'restates', 'restating', 'restructure', 'restructured', 'restructures', 'restructuring', 'restructurings', 'retaliate', 'retaliated', 'retaliates', 'retaliating', 'retaliation', 'retaliations', 'retaliatory', 'retribution', 'retributions', 'revocation', 'revocations', 'revoke', 'revoked', 'revokes', 'revoking', 'ridicule', 'ridiculed', 'ridicules', 'ridiculing', 'riskier', 'riskiest', 'risky', 'sabotage', 'sacrifice', 'sacrificed', 'sacrifices', 'sacrificial', 'sacrificing', 'scandalous', 'scandals', 'scrutinize', 'scrutinized', 'scrutinizes', 'scrutinizing', 'scrutiny', 'secrecy', 'seize', 'seized', 'seizes', 'seizing', 'sentenced', 'sentencing', 'serious', 'seriously', 'seriousness', 'setback', 'setbacks', 'sever', 'severe', 'severed', 'severely', 'severities', 'severity', 'sharply', 'shocked', 'shortage', 'shortages', 'shortfall', 'shortfalls', 'shrinkage', 'shrinkages', 'shut', 'shutdown', 'shutdowns', 'shuts', 'shutting', 'slander', 'slandered', 'slanderous', 'slanders', 'slippage', 'slippages', 'slow', 'slowdown', 'slowdowns', 'slowed', 'slower', 'slowest', 'slowing', 'slowly', 'slowness', 'sluggish', 'sluggishly', 'sluggishness', 'solvencies', 'solvency', 'spam', 'spammers', 'spamming', 'staggering', 'stagnant', 'stagnate', 'stagnated', 'stagnates', 'stagnating', 'stagnation', 'standstill', 'standstills', 'stolen', 'stoppage', 'stoppages', 'stopped', 'stopping', 'stops', 'strain', 'strained', 'straining', 'strains', 'stress', 'stressed', 'stresses', 'stressful', 'stressing', 'stringent', 'subjected', 'subjecting', 'subjection', 'subpoena', 'subpoenaed', 'subpoenas', 'substandard', 'sue', 'sued', 'sues', 'suffer', 'suffered', 'suffering', 'suffers', 'suing', 'summoned', 'summoning', 'summons', 'summonses', 'susceptibility', 'susceptible', 'suspect', 'suspected', 'suspects', 'suspend', 'suspended', 'suspending', 'suspends', 'suspension', 'suspensions', 'suspicion', 'suspicions', 'suspicious', 'suspiciously', 'taint', 'tainted', 'tainting', 'taints', 'tampered', 'tense', 'terminate', 'terminated', 'terminates', 'terminating', 'termination', 'terminations', 'testify', 'testifying', 'threat', 'threaten', 'threatened', 'threatening', 'threatens', 'threats', 'tightening', 'tolerate', 'tolerated', 'tolerates', 'tolerating', 'toleration', 'tortuous', 'tortuously', 'tragedies', 'tragedy', 'tragic', 'tragically', 'traumatic', 'trouble', 'troubled', 'troubles', 'turbulence', 'turmoil', 'unable', 'unacceptable', 'unacceptably', 'unaccounted', 'unannounced', 'unanticipated', 'unapproved', 'unattractive', 'unauthorized', 'unavailability', 'unavailable', 'unavoidable', 'unavoidably', 'unaware', 'uncollectable', 'uncollected', 'uncollectibility', 'uncollectible', 'uncollectibles', 'uncompetitive', 'uncompleted', 'unconscionable', 'unconscionably', 'uncontrollable', 'uncontrollably', 'uncontrolled', 'uncorrected', 'uncover', 'uncovered', 'uncovering', 'uncovers', 'undeliverable', 'undelivered', 'undercapitalized', 'undercut', 'undercuts', 'undercutting', 'underestimate', 'underestimated', 'underestimates', 'underestimating', 'underestimation', 'underfunded', 'underinsured', 'undermine', 'undermined', 'undermines', 'undermining', 'underpaid', 'underpayment', 'underpayments', 'underpays', 'underperform', 'underperformance', 'underperformed', 'underperforming', 'underperforms', 'underproduced', 'underproduction', 'underreporting', 'understate', 'understated', 'understatement', 'understatements', 'understates', 'understating', 'underutilization', 'underutilized', 'undesirable', 'undesired', 'undetected', 'undetermined', 'undisclosed', 'undocumented', 'undue', 'unduly', 'uneconomic', 'uneconomical', 'uneconomically', 'unemployed', 'unemployment', 'unethical', 'unethically', 'unexcused', 'unexpected', 'unexpectedly', 'unfair', 'unfairly', 'unfavorability', 'unfavorable', 'unfavorably', 'unfavourable', 'unfeasible', 'unfit', 'unfitness', 'unforeseeable', 'unforeseen', 'unforseen', 'unfortunate', 'unfortunately', 'unfounded', 'unfriendly', 'unfulfilled', 'unfunded', 'uninsured', 'unintended', 'unintentional', 'unintentionally', 'unjust', 'unjustifiable', 'unjustifiably', 'unjustified', 'unjustly', 'unknowing', 'unknowingly', 'unlawful', 'unlawfully', 'unlicensed', 'unliquidated', 'unmarketable', 'unmerchantable', 'unmeritorious', 'unnecessarily', 'unnecessary', 'unneeded', 'unobtainable', 'unoccupied', 'unpaid', 'unperformed', 'unplanned', 'unpopular', 'unpredictability', 'unpredictable', 'unpredictably', 'unpredicted', 'unproductive', 'unprofitability', 'unprofitable', 'unqualified', 'unrealistic', 'unreasonable', 'unreasonableness', 'unreasonably', 'unreceptive', 'unrecoverable', 'unrecovered', 'unreimbursed', 'unreliable', 'unremedied', 'unreported', 'unresolved', 'unrest', 'unsafe', 'unsalable', 'unsaleable', 'unsatisfactory', 'unsatisfied', 'unsavory', 'unscheduled', 'unsellable', 'unsold', 'unsound', 'unstabilized', 'unstable', 'unsubstantiated', 'unsuccessful', 'unsuccessfully', 'unsuitability', 'unsuitable', 'unsuitably', 'unsuited', 'unsure', 'unsuspected', 'unsuspecting', 'unsustainable', 'untenable', 'untimely', 'untrusted', 'untruth', 'untruthful', 'untruthfully', 'untruthfulness', 'untruths', 'unusable', 'unwanted', 'unwarranted', 'unwelcome', 'unwilling', 'unwillingness', 'upset', 'urgency', 'urgent', 'usurious', 'usurp', 'usurped', 'usurping', 'usurps', 'usury', 'vandalism', 'verdict', 'verdicts', 'vetoed', 'victims', 'violate', 'violated', 'violates', 'violating', 'violation', 'violations', 'violative', 'violator', 'violators', 'violence', 'violent', 'violently', 'vitiate', 'vitiated', 'vitiates', 'vitiating', 'vitiation', 'voided', 'voiding', 'volatile', 'volatility', 'vulnerabilities', 'vulnerability', 'vulnerable', 'vulnerably', 'warn', 'warned', 'warning', 'warnings', 'warns', 'wasted', 'wasteful', 'wasting', 'weak', 'weaken', 'weakened', 'weakening', 'weakens', 'weaker', 'weakest', 'weakly', 'weakness', 'weaknesses', 'willfully', 'worries', 'worry', 'worrying', 'worse', 'worsen', 'worsened', 'worsening', 'worsens', 'worst', 'worthless', 'writedown', 'writedowns', 'writeoff', 'writeoffs', 'wrong', 'wrongdoing', 'wrongdoings', 'wrongful', 'wrongfully', 'wrongly'], 'Positive': ['able', 'abundance', 'abundant', 'acclaimed', 'accomplish', 'accomplished', 'accomplishes', 'accomplishing', 'accomplishment', 'accomplishments', 'achieve', 'achieved', 'achievement', 'achievements', 'achieves', 'achieving', 'adequately', 'advancement', 'advancements', 'advances', 'advancing', 'advantage', 'advantaged', 'advantageous', 'advantageously', 'advantages', 'alliance', 'alliances', 'assure', 'assured', 'assures', 'assuring', 'attain', 'attained', 'attaining', 'attainment', 'attainments', 'attains', 'attractive', 'attractiveness', 'beautiful', 'beautifully', 'beneficial', 'beneficially', 'benefit', 'benefited', 'benefiting', 'benefitted', 'benefitting', 'best', 'better', 'bolstered', 'bolstering', 'bolsters', 'boom', 'booming', 'boost', 'boosted', 'breakthrough', 'breakthroughs', 'brilliant', 'charitable', 'collaborate', 'collaborated', 'collaborates', 'collaborating', 'collaboration', 'collaborations', 'collaborative', 'collaborator', 'collaborators', 'compliment', 'complimentary', 'complimented', 'complimenting', 'compliments', 'conclusive', 'conclusively', 'conducive', 'confident', 'constructive', 'constructively', 'courteous', 'creative', 'creatively', 'creativeness', 'creativity', 'delight', 'delighted', 'delightful', 'delightfully', 'delighting', 'delights', 'dependability', 'dependable', 'desirable', 'desired', 'despite', 'destined', 'diligent', 'diligently', 'distinction', 'distinctions', 'distinctive', 'distinctively', 'distinctiveness', 'dream', 'easier', 'easily', 'easy', 'effective', 'efficiencies', 'efficiency', 'efficient', 'efficiently', 'empower', 'empowered', 'empowering', 'empowers', 'enable', 'enabled', 'enables', 'enabling', 'encouraged', 'encouragement', 'encourages', 'encouraging', 'enhance', 'enhanced', 'enhancement', 'enhancements', 'enhances', 'enhancing', 'enjoy', 'enjoyable', 'enjoyably', 'enjoyed', 'enjoying', 'enjoyment', 'enjoys', 'enthusiasm', 'enthusiastic', 'enthusiastically', 'excellence', 'excellent', 'excelling', 'excels', 'exceptional', 'exceptionally', 'excited', 'excitement', 'exciting', 'exclusive', 'exclusively', 'exclusiveness', 'exclusives', 'exclusivity', 'exemplary', 'fantastic', 'favorable', 'favorably', 'favored', 'favoring', 'favorite', 'favorites', 'friendly', 'gain', 'gained', 'gaining', 'gains', 'good', 'great', 'greater', 'greatest', 'greatly', 'greatness', 'happiest', 'happily', 'happiness', 'happy', 'highest', 'honor', 'honorable', 'honored', 'honoring', 'honors', 'ideal', 'impress', 'impressed', 'impresses', 'impressing', 'impressive', 'impressively', 'improve', 'improved', 'improvement', 'improvements', 'improves', 'improving', 'incredible', 'incredibly', 'influential', 'informative', 'ingenuity', 'innovate', 'innovated', 'innovates', 'innovating', 'innovation', 'innovations', 'innovative', 'innovativeness', 'innovator', 'innovators', 'insightful', 'inspiration', 'inspirational', 'integrity', 'invent', 'invented', 'inventing', 'invention', 'inventions', 'inventive', 'inventiveness', 'inventor', 'inventors', 'leadership', 'leading', 'loyal', 'lucrative', 'meritorious', 'opportunities', 'opportunity', 'optimistic', 'outperform', 'outperformed', 'outperforming', 'outperforms', 'perfect', 'perfected', 'perfectly', 'perfects', 'pleasant', 'pleasantly', 'pleased', 'pleasure', 'plentiful', 'popular', 'popularity', 'positive', 'positively', 'preeminence', 'preeminent', 'premier', 'premiere', 'prestige', 'prestigious', 'proactive', 'proactively', 'proficiency', 'proficient', 'proficiently', 'profitability', 'profitable', 'profitably', 'progress', 'progressed', 'progresses', 'progressing', 'prospered', 'prospering', 'prosperity', 'prosperous', 'prospers', 'rebound', 'rebounded', 'rebounding', 'receptive', 'regain', 'regained', 'regaining', 'resolve', 'revolutionize', 'revolutionized', 'revolutionizes', 'revolutionizing', 'reward', 'rewarded', 'rewarding', 'rewards', 'satisfaction', 'satisfactorily', 'satisfactory', 'satisfied', 'satisfies', 'satisfy', 'satisfying', 'smooth', 'smoothing', 'smoothly', 'smooths', 'solves', 'solving', 'spectacular', 'spectacularly', 'stability', 'stabilization', 'stabilizations', 'stabilize', 'stabilized', 'stabilizes', 'stabilizing', 'stable', 'strength', 'strengthen', 'strengthened', 'strengthening', 'strengthens', 'strengths', 'strong', 'stronger', 'strongest', 'succeed', 'succeeded', 'succeeding', 'succeeds', 'success', 'successes', 'successful', 'successfully', 'superior', 'surpass', 'surpassed', 'surpasses', 'surpassing', 'transparency', 'tremendous', 'tremendously', 'unmatched', 'unparalleled', 'unsurpassed', 'upturn', 'upturns', 'valuable', 'versatile', 'versatility', 'vibrancy', 'vibrant', 'win', 'winner', 'winners', 'winning', 'worthy']} # Henry's (2008) Word List # Henry, Elaine. “Are Investors Influenced By How Earnings Press Releases Are Written.” The Journal of Business # Communication (1973) 45, no. 4 (2008): 363–407. hdict = {'Negative': ['negative', 'negatives', 'fail', 'fails', 'failing', 'failure', 'weak', 'weakness', 'weaknesses', 'difficult', 'difficulty', 'hurdle', 'hurdles', 'obstacle', 'obstacles', 'slump', 'slumps', 'slumping', 'slumped', 'uncertain', 'uncertainty', 'unsettled', 'unfavorable', 'downturn', 'depressed', 'disappoint', 'disappoints', 'disappointing', 'disappointed', 'disappointment', 'risk', 'risks', 'risky', 'threat', 'threats', 'penalty', 'penalties', 'down', 'decrease', 'decreases', 'decreasing', 'decreased', 'decline', 'declines', 'declining', 'declined', 'fall', 'falls', 'falling', 'fell', 'fallen', 'drop', 'drops', 'dropping', 'dropped', 'deteriorate', 'deteriorates', 'deteriorating', 'deteriorated', 'worsen', 'worsens', 'worsening', 'weaken', 'weakens', 'weakening', 'weakened', 'worse', 'worst', 'low', 'lower', 'lowest', 'less', 'least', 'smaller', 'smallest', 'shrink', 'shrinks', 'shrinking', 'shrunk', 'below', 'under', 'challenge', 'challenges', 'challenging', 'challenged'], 'Positive': ['positive', 'positives', 'success', 'successes', 'successful', 'succeed', 'succeeds', 'succeeding', 'succeeded', 'accomplish', 'accomplishes', 'accomplishing', 'accomplished', 'accomplishment', 'accomplishments', 'strong', 'strength', 'strengths', 'certain', 'certainty', 'definite', 'solid', 'excellent', 'good', 'leading', 'achieve', 'achieves', 'achieved', 'achieving', 'achievement', 'achievements', 'progress', 'progressing', 'deliver', 'delivers', 'delivered', 'delivering', 'leader', 'leading', 'pleased', 'reward', 'rewards', 'rewarding', 'rewarded', 'opportunity', 'opportunities', 'enjoy', 'enjoys', 'enjoying', 'enjoyed', 'encouraged', 'encouraging', 'up', 'increase', 'increases', 'increasing', 'increased', 'rise', 'rises', 'rising', 'rose', 'risen', 'improve', 'improves', 'improving', 'improved', 'improvement', 'improvements', 'strengthen', 'strengthens', 'strengthening', 'strengthened', 'stronger', 'strongest', 'better', 'best', 'more', 'most', 'above', 'record', 'high', 'higher', 'highest', 'greater', 'greatest', 'larger', 'largest', 'grow', 'grows', 'growing', 'grew', 'grown', 'growth', 'expand', 'expands', 'expanding', 'expanded', 'expansion', 'exceed', 'exceeds', 'exceeded', 'exceeding', 'beat', 'beats', 'beating']} negate = ["aint", "arent", "cannot", "cant", "couldnt", "darent", "didnt", "doesnt", "ain't", "aren't", "can't", "couldn't", "daren't", "didn't", "doesn't", "dont", "hadnt", "hasnt", "havent", "isnt", "mightnt", "mustnt", "neither", "don't", "hadn't", "hasn't", "haven't", "isn't", "mightn't", "mustn't", "neednt", "needn't", "never", "none", "nope", "nor", "not", "nothing", "nowhere", "oughtnt", "shant", "shouldnt", "wasnt", "werent", "oughtn't", "shan't", "shouldn't", "wasn't", "weren't", "without", "wont", "wouldnt", "won't", "wouldn't", "rarely", "seldom", "despite", "no", "nobody"] def negated(word): """ Determine if preceding word is a negation word """ if word.lower() in negate: return True else: return False def tone_count_with_negation_check(dict, article): """ Count positive and negative words with negation check. Account for simple negation only for positive words. Simple negation is taken to be observations of one of negate words occurring within three words preceding a positive words. """ pos_count = 0 neg_count = 0 pos_words = [] neg_words = [] input_words = re.findall(r'\b([a-zA-Z]+n\'t|[a-zA-Z]+\'s|[a-zA-Z]+)\b', article.lower()) word_count = len(input_words) for i in range(0, word_count): if input_words[i] in dict['Negative']: neg_count += 1 neg_words.append(input_words[i]) if input_words[i] in dict['Positive']: if i >= 3: if negated(input_words[i - 1]) or negated(input_words[i - 2]) or negated(input_words[i - 3]): neg_count += 1 neg_words.append(input_words[i] + ' (with negation)') else: pos_count += 1 pos_words.append(input_words[i]) elif i == 2: if negated(input_words[i - 1]) or negated(input_words[i - 2]): neg_count += 1 neg_words.append(input_words[i] + ' (with negation)') else: pos_count += 1 pos_words.append(input_words[i]) elif i == 1: if negated(input_words[i - 1]): neg_count += 1 neg_words.append(input_words[i] + ' (with negation)') else: pos_count += 1 pos_words.append(input_words[i]) elif i == 0: pos_count += 1 pos_words.append(input_words[i]) print('The results with negation check:', end='\n\n') print('The # of positive words:', pos_count) print('The # of negative words:', neg_count) print('The list of found positive words:', pos_words) print('The list of found negative words:', neg_words) print('\n', end='') results = [word_count, pos_count, neg_count, pos_words, neg_words] return results # A sample output article = '''Patent infringement pursued against same companies in U.S. District Court. Test "wasn't good". SUNNYVALE, Calif.--(BUSINESS WIRE)--December 02, 2010-- Rambus Inc. (Nasdaq:RMBS), one of the world's premier technology licensing companies, today announced it has filed a complaint with the United States International Trade Commission (ITC) requesting the commencement of an investigation pertaining to products from Broadcom Corporation, Freescale Semiconductor, Inc., LSI Corporation, MediaTek Inc., NVIDIA Corporation and STMicroelectronics N. V. The complaint seeks an exclusion order barring the importation, sale for importation, or sale after importation of products from Broadcom, Freescale, LSI, NVIDIA and STMicroelectronics that infringe certain patents from the Dally1 family of patents, and of products from Broadcom, Freescale, LSI, MediaTek and STMicroelectronics that infringe certain patents from the Barth family of patents. In an earlier investigation requested by Rambus the ITC found that these same Barth patents were valid and infringed by NVIDIA products, and issued an exclusion order in July of this year. "We have been attempting to license these companies for some time to no avail. One of the respondents frankly told us that the only way they would get serious is if we sued them. Others pursued a strategy of delay rather than negotiate a reasonable resolution," said Harold Hughes, president and chief executive officer at Rambus. "Rambus has invested hundreds of millions of dollars developing a portfolio of technologies that are foundational for many digital electronics. There is widespread knowledge within the industry about our patents including their use in standards-compatible products accused in these actions. In fairness to our shareholders and to our paying licensees, we take these steps to protect our patented innovations and pursue fair compensation for their use." For the Dally patents, the accused semiconductor products from these companies include ones that incorporate PCI Express, certain Serial ATA, certain Serial Attached SCSI (SAS), and DisplayPort interfaces. In the case of the Barth patents, the accused semiconductor products include ones that incorporate DDR, DDR2, DDR3, mobile DDR, LPDDR, LPDDR2, and GDDR3 memory controllers. Accused semiconductor products in the complaint include graphics processors, media processors, communications processors, chip sets and other logic integrated circuits (ICs). In addition to Broadcom, Freescale, LSI, MediaTek, NVIDIA and STMicroelectronics, the ITC complaint names companies whose products incorporate the accused semiconductor products and are imported, sold for importation, or sold after importation into the United States. These products include personal computers, workstations, servers, routers, mobile phones and other handheld devices, set-top boxes, Blu-ray players motherboards, plug-in cards, hard drives and modems. The ITC is expected to decide whether to initiate an investigation under this complaint within 30-45 days. Rambus today also filed separate actions for patent infringement against Broadcom, Freescale, LSI, MediaTek and STMicroelectronics in the United States District Court for the Northern District of California. The lawsuits allege that semiconductor products with certain memory controllers and/or serial links from the above companies infringe certain patents from the Farmwald-Horowitz, Barth, and Dally patent families. In the case of MediaTek, only infringement of the Barth and Farmwald-Horowitz patents for certain memory controllers is alleged. Rambus also filed an action in the United States District Court for the Northern District of California against NVIDIA for infringement of certain Dally patents. The categories of accused semiconductor products in the District Court complaints include the same categories accused in the ITC complaint, as well as SDR memory controllers. Rambus is seeking injunctive relief barring the infringement, contributory infringement, and inducement to infringe the patents, as well as monetary damages. Rambus management will discuss the filing of these actions during a special conference call today at 5:00 p.m. PT. The call will be webcast and can be accessed through the Rambus website. A replay will be available following the call on Rambus' Investor Relations website or for one week at the following numbers: (800) 642-1687 (domestic) or (706) 645-9291 (international) with ID# 29122159. Further information regarding these legal actions will be made available at http://investor.rambus.com in the Litigation Update section. 1 Rambus is the exclusive licensee for the Dally family of patents which are owned by Massachusetts Institute of Technology. This license was assigned to Rambus as a part of its 2003 acquisition of technology and IP from Velio Communications, a company founded by Dr. William Dally. ''' tone_count_with_negation_check(lmdict, article) |
[Original Post] I find two internet resources for this task (thank both authors):
- https://iangow.wordpress.com/2014/07/22/get-tone-from-corporate-disclosures-postgresql-python-and-r/
- http://conjugateprior.org/software/ca-in-python/
The first solution is way more efficient than the second, but the second is more straightforward. The first needs extra knowledge of PostgreSQL and R besides Python. I borrow from the two resources and write the Python code below.
Please note, to use the Python code, you have to know how to assign the full text of an article of interest to the variable text
, and how to output the total word count and the counts of positive/negative words in text
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
# Get tone dictionary with open('lmdict.txt') as list: lines = list.readlines() dict = {} for l in lines: if l[0:2] == '>>': cat = l[2:].strip() dict[cat] = [] else: l = l.strip() if l: dict[cat].append(l) # Set up regular expressions regex = {} for cat in dict.keys(): pattern = '\\b(?:' + '|'.join(dict[cat]) + ')\\b' regex[cat] = re.compile(pattern, re.IGNORECASE) # Get tone count wordcount = len(text.split()) for cat in count.keys(): count[cat] = len(regex[cat].findall(text)) print(count) |
In the first part of the code, I read the dictionary or the word list into a Python dictionary variable. The word list used here is supposed to be a .txt file and in the following format:
1 2 3 4 5 6 7 8 9 |
>>positive BETTER SUCCESS VALUABLE >>negative ABANDON ABNORMAL ANNOY |
For accounting and finance research, a commonly used positive/negative word list was developed by Bill McDonald. See his website.
In the second part of the code, I create regular expressions that are used to find occurrences of positive/negative words. The last few lines of codes are used to get the counts of positive/negative words in the text.
I agree that my solution is more complex. But in part that’s because it’s a more complete solution. One has to download and process the data from Bill MacDonald (“see his website for download” implies undocumented steps in the process). Then one has to organize and perhaps process the text so it can be fed to the Python function. Finally, one needs to handle the output.
I think the first step on my site could be done in Python (rather than R … my decision to use R is more a reflection of my comparative advantage in R than anything inherent to Python). And the second step could be done without PostgreSQL (especially if the first step is done in Python). I think a “pure Python” approach would be more elegant than what I have, at least as a code illustration.
Hi Ian, happy to hear your thoughts promptly – I like your blog and really benefit from it.
I like how you deal with the regular regression pattern. It is very efficient, saving the trouble to use too many loops. In my experiment, your code is about 6 times faster than the other. I agree that your solution is more complete, and that reading texts from and outputting tone counts to a database is a better idea than reading/writing CSV. In my codes, I do bypass the feeding and outputting part in my post.
Hi Kai, I’m new to Python, so I really appreciate your code!
Unfortunately, it doesn’t work for me though. Few errors occured:
#1 NameError: name ‘re’ is not defined -> I added “import re”, which helped I guess
#2 NameError: name ‘text’ is not defined -> I defined text as text = “Bsp.text” (which is the document I would like to analyse). This also seemed to help, at least the error does not occur anymore.
#3 NameError: name ‘count’ is not defined -> I really don’t know how to fix this one though… Can you help me please?
Thanks in advance!
Hi Kai,
I’ve already solved my problem.
Here is the last part of the code (if anyone should be interested):
# Get tone count
with open(‘Bsp.txt’, ‘r’) as content_file:
content = content_file.read()
count = {}
wordcount = len(content.split())
for cat in dict.keys():
count[cat] = len(regex[cat].findall(content))
print(count)
Thanks and have a nice day. 🙂
Thank you Mu Civ, it helps a lot!
Apart from the fact that your code doesn’t actually work, its great.
I never mean to provide click-and-run codes. If the codes do not work on your computer, you should do more debug on your own. Many factors (Python version, operating system, …) can cause a break during the running.
The Code does work. I really appreciate your efforts Kai Chen, Thanks for sharing
Great website! Good job! Thank you!
Good code!
For my own code, I realize that I only tested for negating words immediately preceding the positive words, instead of within 3 words. I didn’t read Loughran and McDonald (2011) carefully.
I also realize that it would be even better if we first tokenize an article into sentences and do the negation test within the boundary of each sentence.
For the definition of words, there are indeed no single definition. For example, Loughran and McDonald seem to define a word as [a-zA-Z]+. In their master dictionary, you can see “email”, but not “e-mail”. “e-mail” will become two words: “e”, and “mail”. By the same definition, “10-K” will become “K”. Sometimes people remove single-letter word. If you use nltk’s word tokenizer, “couldn’t” will become “could” and “n’t”, and “company’s” will become “company” and “‘s”, “e-mail” will be still “e-mail”, “$5.0” will become “$” and “5.0”. People often apply further screening to remove punctuations, and tokens containing digits and punctuations. I find that after removing punctuations, the nltk tokens will be very close to Microsoft’s definition of words.
Papers often do not make clear about their own definitions. This makes replication difficult.
Hi Victor, thanks for your inputs. FYI – McDonald didn’t use 3 words; other researchers did, which I think is better.
Hi Kai,
Thank you for this. Do you think you could inform me what the results of the test article return? I’d like to ensure the slight terminology modifications I made return the same results as intended. I find 4 positive words, 38 negative words, and 726 total words.
Thanks!
Hi Kai,
I just wanted to say thank you for providing the code! It is simple, flexible and addresses the issues of negation. I’m relatively new to python and could easily apply and adapt it.
Hi Kai!
First of all, thanks a lot for sharing your code! As a Python newbie, you really helped me out with that a lot! As I am trying to conduct a sentiment analysis of corporate CSR reports, I am looking for ways to make my analysis more robust. With that in mind, I am wondering whether it is possible to adjust your code in a way that the dictionary words are weighted on basis of their Inverse Document Frequency (IDF) instead of weighting them equally.
Do you perhaps know a way how to include bag-of-words and TF-IDF in your above code?
I would be extremely grateful for any help that you could possibly provide me with!
Thanks so much in advance and best of wishes!