Create Plain Text Transcript From Incremental SRT Subtitle

Hi, my goal is to be able to run a script to remove duplicate strings of text and merge what remains from incremental style subtitles to create a plain text transcript of just the text contents of the subtitle file (sans subtitle timings or any format commands or control characters).

An example to help you help me.

I want this SRT file with incremental subtitles (there are two rows of subtitles showing at all times, the currently spoken subtitle and the previous subtitle for slow readers, this is similar to what you see on T.V. with real-time captions.

Code:

18

00:00:23,999 --> 00:00:24,009

we've already discussed research and



19

00:00:24,009 --> 00:00:26,460

we've already discussed research and

prevalence barriers and assessment shoes



20

00:00:26,460 --> 00:00:26,470

prevalence barriers and assessment shoes



21

00:00:26,470 --> 00:00:28,769

prevalence barriers and assessment shoes

and now we'll discuss where you can send



22

00:00:28,769 --> 00:00:28,779

and now we'll discuss where you can send



23

00:00:28,779 --> 00:00:31,649

and now we'll discuss where you can send

a client what kind of treatment you can



24

00:00:31,649 --> 00:00:31,659

a client what kind of treatment you can



25

00:00:31,659 --> 00:00:35,000

a client what kind of treatment you can

use and what approaches are available



26

00:00:35,000 --> 00:00:35,010

use and what approaches are available



27

00:00:35,010 --> 00:00:37,560

use and what approaches are available

within the general population



28

00:00:37,560 --> 00:00:37,570

within the general population



29

00:00:37,570 --> 00:00:38,970

within the general population

there are many different approaches to



30

00:00:38,970 --> 00:00:38,980

there are many different approaches to

To become this:

Code:

we've already discussed research and prevalence barriers and assessment shoes and now we'll discuss where you can send a client what kind of treatment you can use and what approaches are available within the general population there are many different approaches to

All the text is merged with no overlapping duplicates and all on one line. I'd prefer to use command line tools available on the MacOS platform if possible, I use MacPorts and have GNU Core Utilities installed. I can use GUI software, or Windows 10 in a virtual machine for a one-off quick fix, but I'd like an automated bash script or similar that I can trigger on the MacOS platform.

My scripting abilities are quite rudimentary even if I dabble from time to time. My first thought is to regex out the subtitle line numbers and time codes, that should be easy enough for even me. But then how to set up the array and how to compare/match complete lines to partial lines up to maybe 5 or 6 subtitles forward or back in either direction in the array, and then to concatenate/merge whatever is left is well over my head. Would much appreciate guidance on figuring this out.

I've attached the sample subtitle as seen in the screenshot for the convenience of anyone who wants to play around with this and help me out.

Thanks

Attached Files

samplesubtitle.srt (1.1 KB)

Create Plain Text Transcript From Incremental SRT Subtitle

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Vocational Training Instructor (Carpenter) at States of Jersey

SAHARA FLASH LIVE IN WERAGOLLA 2018-04-20

'My best friend looked possessed, then he stabbed me', teenager tells court

FortiLink mode supported over a layer-3 network

Named and shamed: a round up of cases heard by Essex magistrates

I want to a weather coin buyer genuine buyer r welcome

ZARIA CUMMINGS

(get) Tej Dosa Letter 81 - How To Make An Extra $200-$500/Week (In 2025)

JACOB FORREST OGDEN Arrested by Clackamas County Sheriff's Office on Dec 30,...

$22.6m payout to workers fired under UNC govts

Adolescence A Stage of Growth and Change Class 7 Extra Questions and Answers...

The 10 Tennessee Cities With The Largest Black Population For 2021

HP P2000 Storage Error Controller A Unknown Issue Resolution Request

[Visual Studio] 開発ツール対応 OS 一覧

Bureau of Internal Revenue: Regional Offices (Directory)

ページングファイルサイズの推奨設定とその背景について

Karimnagar District Tahsildars Phone Numbers-Mobile Numbers Telangana-State

Serial child killer David Threinen’s reign of terror

Philly Mobster Ronnie Turchi Took Last Ride In October ’99, Turned Up Trunk...