tl;dr How to use:
Install Tampermonkey (or another userscript manager) on your browser:
Install the userscript:
- Click here.
- You should be prompted with a Tampermonkey screen asking to install Panopto Caption.
- Click
Install
.
Download subtitles!
- Navigate to whatever Panopto video you want to download captions from.
- Within moments, the Panopto Caption userscript will automatically download a subtitle file for that the Panopto video.
Motive
Panopto is a video hosting platform made for businesses and universities.
Because of the nature of the videos being hosted on Panopto (i.e. university lectures), being able to download the videos can be convenient. There are already methods (hls downloader browser extensions) for downloading the videos when there are no download methods readily available on the video viewer client interface. However, this usually just downloads the video and not the captions that accompany the video in Panopto.
From this, I created the userscript (a custom file that can execute javascript on the client side of a website) Panopto Caption
to automatically generate a subtitle file from Panopto. Here is how Panopto Caption
works:
How it works
Subtitle files (ending with ‘.srt’) are formatted as follows:
1
2
3
4
5
6
7
8
9
10
11
12
| 1
00:00:00,000 --> 00:00:03,111
First caption
2
00:00:03,111 --> 00:00:05,123
Second caption
3
00:00:05,123 --> 00:00:10,321
Third caption
|
For each subtitle/caption in the subtitle file is accompanied with two things:
- Sequence number
- Time span
The sequence number indicates where in the sequence of captions the corresponding caption is located (i.e. the caption with sequence number 3 is third caption). The time span indicates the starting and ending times of the corresponding caption (i.e. the second caption above starts showing up 3.111 seconds into the video and disappear 5.123 seconds into the video).
So for the first 3.111 seconds, the subtitle is “First caption.”
From 3.111 to 5.123 seconds, the subtitle is “Second caption.”
From 5.123 to 10.321 seconds, the subtitle is “Third caption.”
Panopto Caption
generates and downloads the subtitle file for the Panopto video that you want to watch offline. This is done in four steps.
1. Waiting for captions to be accessible
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| function waitForCaptions(querySelector, callback) {
const observer = new MutationObserver((mutationList, obs) => {
if (document.querySelector(querySelector)) {
obs.disconnect();
setTimeout(callback, 3000);
return;
} else {
console.log('PCE: captions not found yet...');
}
});
observer.observe(document, {
attributes: true,
childList: true,
subtree: true
});
}
|
As with most webscraping projects, waiting for the presence/interactability of elements is crucial. For this I use MutationObserver
to check for when the caption element (which matches querySelector
) are available whenever changes are made in the DOM tree.
The main part of Panopto Caption
is passed in as the callback
function to waitForCaptions
so that everything (subtitles extration, generation, and download) is executed once the caption element is available in the DOM tree.
2. Extract and generate subtitles text
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
| let captionString = '';
let captionHour = 0;
let captionMinute = 0;
let captionSeconds = 0;
const captionHTMLElements = document.querySelector("#transcriptTabPane > div.event-tab-scroll-pane > ul").children;
// Generate full caption string
for (let i = 0; i < captionHTMLElements.length; i++) {
// Add caption index
captionString += i + 1 + '\n';
// Extract caption start time
const captionTime = captionHTMLElements[i].children[1].children[2].innerText;
[captionHour, captionMinute, captionSeconds] = extractTime(captionTime, captionHour, captionMinute, captionSeconds);
// Format + add start time
let formattedTime = formatTime(captionHour, captionMinute, captionSeconds);
captionString += formattedTime + ',000 --> ';
if (i === captionHTMLElements.length - 1) {
// Get end of video time for last caption
let videoEndHour = 0;
let videoEndMinute = 0;
let videoEndSeconds = 0;
const timeElapsed = document.getElementById('timeElapsed').innerText;
[videoEndHour, videoEndMinute, videoEndSeconds] = extractTime(timeElapsed, videoEndHour, videoEndMinute, videoEndSeconds);
const timeRemaining = document.getElementById('timeRemaining').innerText.slice(1);
const timeRemainingComponents = timeRemaining.split(':');
if (timeRemainingComponents.length === 2) {
videoEndMinute += parseInt(timeRemainingComponents[0]);
videoEndSeconds += parseInt(timeRemainingComponents[1]);
} else if (timeRemainingComponents.length === 3) {
videoEndHour += parseInt(timeRemainingComponents[0]);
videoEndMinute += parseInt(timeRemainingComponents[1]);
videoEndSeconds += parseInt(timeRemainingComponents[2]);
}
// Correct format to standard time units
const adjustedSeconds = videoEndSeconds % 60;
const adjustedMinute = (videoEndMinute + Math.floor(videoEndSeconds / 60)) % 60;
const adjustedHour = videoEndHour + Math.floor((videoEndMinute + Math.floor(videoEndSeconds / 60)) / 60);
// Format + add video end time
formattedTime = formatTime(adjustedHour, adjustedMinute, adjustedSeconds);;
captionString += formattedTime + ',000\n';
} else {
// Extract caption end time for non-last captions
const captionTime = captionHTMLElements[i+1].children[1].children[2].innerText;
[captionHour, captionMinute, captionSeconds] = extractTime(captionTime, captionHour, captionMinute, captionSeconds);
// Format + add end time
formattedTime = formatTime(captionHour, captionMinute, captionSeconds);
captionString += formattedTime + ',000\n';
}
// Extract + add caption text
const captionText = captionHTMLElements[i].children[1].children[1].innerText;
captionString += captionText.trim() + '\n\n';
}
|
Throughout I use two helper functions extractTime
and formatTime
.
1
2
3
4
5
6
7
8
9
10
11
12
| function extractTime(timeString, hour, minute, seconds) {
const timeComponents = timeString.split(':');
if (timeComponents.length === 2) {
minute = parseInt(timeComponents[0]);
seconds = parseInt(timeComponents[1]);
} else if (timeComponents.length === 3) {
hour = parseInt(timeComponents[0]);
minute = parseInt(timeComponents[1]);
seconds = parseInt(timeComponents[2]);
}
return [hour, minute, seconds];
}
|
extractTime
takes in timeString
and extracts and returns the hour, minute, and seconds. If timeComponents.length === 2
, then the given timeString
only has numbers for minute and seconds. If timeComponents.length === 3
, then the given timeString
has numbers for hour, minute, and seconds.
1
2
3
4
5
6
7
| function formatTime(hour, minute, seconds) {
return [
hour.toString().padStart(2, '0'),
minute.toString().padStart(2, '0'),
seconds.toString().padStart(2, '0')
].join(':');
}
|
formatTime
takes in hour, minute, and seconds and returns the formatted (with padding) string that is of format “00:00:00.”
All of this is to form the captionString
which contains the fully formatted string for the subtitle file.
3. Create subtitles file
1
2
3
| let textFile = null;
const data = new Blob([captionString], {type: 'text/plain'});
textFile = URL.createObjectURL(data);
|
To create the actual subtitle file, I use a Blob
object from the raw accumulated captionString
. Then I use URL.createObjectUrl()
to generate a URL representation of the created Blob
object. We then use this as the download link to the subtitle file.
4. Download subtitle file
1
2
3
4
5
| const downloadCaption = document.createElement('a');
downloadCaption.href = textFile;
downloadCaption.download = document.getElementsByTagName('title')[0].innerText + '.srt';
downloadCaption.click();
URL.revokeObjectURL(textFile);
|
Here we create an anchor element to hold the object URL as the download link. Subtitle files have the ‘.srt’ extension so we add that to the end of the title of the video. We call click()
on the created anchor element to simulate a mouse click on the element. In the end we revoke the object URL to prevent memory leaks.
Repo
The code for Panopto Caption is hosted on Github.
Issues
Please report any issues to the repository’s issue section.