Title: | R wrapper for yt-dlp, focused on extracting and processing subtitles |
---|---|
Description: | Get subtitles from YouTube and parse them. |
Authors: | Giorgio Comai [aut, cre, cph] |
Maintainer: | Giorgio Comai <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1.9003 |
Built: | 2024-11-04 04:18:30 UTC |
Source: | https://github.com/giocomai/ytdlpr |
Concatenated trimmed video files in a single video clip
yt_concatenate( trimmed_df, sort_by_date = FALSE, info_df = NULL, destination_filename = "concatenated", destination_folder = "0_concatenated_video", destination_path = NULL, overwrite = FALSE, yt_base_folder = NULL )
yt_concatenate( trimmed_df, sort_by_date = FALSE, info_df = NULL, destination_filename = "concatenated", destination_folder = "0_concatenated_video", destination_path = NULL, overwrite = FALSE, yt_base_folder = NULL )
trimmed_df |
A data frame, typically generated as output of |
sort_by_date |
Defaults to FALSE. If TRUE, retrieves (and, if necessary, downloads) info files for all video clips, and reads them with |
info_df |
Defaults to NULL. If given, a data frame typically generated with |
destination_folder |
Defaults to |
destination_path |
Defaults to NULL. Location where trimmed video files
will be stored. If given, takes precedence over |
overwrite |
Defaults to FALSE. If TRUE, overwrites concatenated file if it already exists. |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
The path of the generated video clip.
Extract YouTube identifier from an URL.
yt_extract_id(yt_id)
yt_extract_id(yt_id)
yt_id |
Url or unique identifier of a YouTube video. |
A character vector of YouTube identifiers.
yt_extract_id("https://youtu.be/WXPBOfRtXQE?feature=shared") # if already an identifier, just returns it: yt_extract_id("WXPBOfRtXQE")
yt_extract_id("https://youtu.be/WXPBOfRtXQE?feature=shared") # if already an identifier, just returns it: yt_extract_id("WXPBOfRtXQE")
Filter subtitles and link back the original source
yt_filter( subtitles_df, pattern, ignore_case = TRUE, regex = TRUE, playlist = NULL, sub_lang = NULL, sub_format = "vtt", lag = -3, yt_base_folder = NULL )
yt_filter( subtitles_df, pattern, ignore_case = TRUE, regex = TRUE, playlist = NULL, sub_lang = NULL, sub_format = "vtt", lag = -3, yt_base_folder = NULL )
subtitles_df |
Defaults to NULL. If given must be a data frame,
typically generated with |
pattern |
A character string. |
ignore_case |
Defaults to TRUE. |
regex |
Defaults to TRUE. |
playlist |
Playlist, either as full url from Youtube or as id. |
sub_lang |
Defaults to NULL. If not given, all local subtitles are returned. If given, only subtitles in the given sub_lang are returned. |
sub_format |
Defaults to "vtt". File extension of the subtitle. |
lag |
Defaults to |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
A data frame, including only lines where the given pattern is found.
## Not run: yt_get_subtitles_playlist( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P" ) subtitles_df <- yt_get_local_subtitles( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P" ) |> yt_read_vtt() yt_filter( pattern = "rover", subtitles_df = subtitles_df ) ## End(Not run)
## Not run: yt_get_subtitles_playlist( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P" ) subtitles_df <- yt_get_local_subtitles( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P" ) |> yt_read_vtt() yt_filter( pattern = "rover", subtitles_df = subtitles_df ) ## End(Not run)
If write_subs
or auto_subs
is set to TRUE, it checks that the
subtitles in the language set with sub_lang
are available. In all other
cases, by defaults, it skips ' downloads if any previous file associated
with a given video identifier has ' been downloaded. Set check_previous
to
FALSE to always download files.
yt_get( yt_id = NULL, playlist = NULL, check_previous = TRUE, sub_lang = "en", sub_format = "vtt", subs = FALSE, auto_subs = FALSE, video = FALSE, description = FALSE, info_json = FALSE, comments = FALSE, thumbnail = FALSE, min_sleep_interval = 1, max_sleep_interval = 8, sleep_subtitles = 2, custom_options = "", yt_base_folder = NULL )
yt_get( yt_id = NULL, playlist = NULL, check_previous = TRUE, sub_lang = "en", sub_format = "vtt", subs = FALSE, auto_subs = FALSE, video = FALSE, description = FALSE, info_json = FALSE, comments = FALSE, thumbnail = FALSE, min_sleep_interval = 1, max_sleep_interval = 8, sleep_subtitles = 2, custom_options = "", yt_base_folder = NULL )
yt_id |
YouTube identifier of a video or full url to a video. Normally a
vector, but if a data frame is given, |
playlist |
Playlist, either as full url from Youtube or as id. |
check_previous |
Defaults to TRUE. If FALSE, input is always downloaded.
If TRUE, and |
sub_lang |
Defaults to "en". If more than one, can be given as comma separated two letter codes, or a as vector. |
sub_format |
Defaults to "vtt". Other formats not yet supported. |
subs |
Defaults to FALSE. "Write subtitle file". |
auto_subs |
Defaults to FALSE "Write automatically generated subtitle file". |
video |
Defaults to FALSE. Download the video files. |
description |
Defaults to FALSE. "Write video description to a .description file" |
info_json |
Defaults to FALSE. "Write video metadata to a .info.json file" |
comments |
Defaults to FALSE. "Retrieve video comments to be placed in the infojson." |
thumbnail |
Defaults to FALSE. "Write thumbnail image to disk" |
min_sleep_interval |
"Number of seconds to sleep before each download. This is the minimum time to sleep when used along with –max-sleep-interval" |
max_sleep_interval |
"Maximum number of seconds to sleep. Can only be used along with –min-sleep-interval" |
sleep_subtitles |
"Number of seconds to sleep before each subtitle download" |
custom_options |
Defaults to an empty string. If given, it should correspond to parameters exactly as they would be used on command line. For a full list, see the original documentation. |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
Argument definition are quoted from the original yt-dlp project.
A data frame, with details about locally available subtitles.
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) yt_get( yt_id = "https://youtu.be/WXPBOfRtXQE", subtitles = TRUE, video = TRUE, description = TRUE, info_json = TRUE ) ## End(Not run)
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) yt_get( yt_id = "https://youtu.be/WXPBOfRtXQE", subtitles = TRUE, video = TRUE, description = TRUE, info_json = TRUE ) ## End(Not run)
Checks with clips have subtitles or automatic captions based on info json file
yt_get_available_subtitles( info_json_df = NULL, sub_lang = "en", automatic_captions = TRUE, subtitles = TRUE, yt_id = NULL, playlist = NULL )
yt_get_available_subtitles( info_json_df = NULL, sub_lang = "en", automatic_captions = TRUE, subtitles = TRUE, yt_id = NULL, playlist = NULL )
info_json_df |
Defaults to NULL. Generally created with
|
sub_lang |
Defaults to "en", subtitles language. |
automatic_captions |
Defaults to TRUE. If TRUE, checks if subtitles are available as "automatic captions". |
subtitles |
Defaults to TRUE. If TRUE, checks if subtitles are available as (manually added or approved) "subtitles". |
yt_id |
YouTube video identifier. Ignored if |
playlist |
YouTube list. Ignored if |
A data frame with three columns, yt_id
, sub_lang
, and sub_type
(sub_type
can either be automatic_captions
or subtitles
).
## Not run: yt_get_available_subtitles( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) ## End(Not run)
## Not run: yt_get_available_subtitles( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) ## End(Not run)
Retrieves base folder where all retrieved files will be stored for the current session
yt_get_base_folder(path = NULL)
yt_get_base_folder(path = NULL)
path |
Defaults to NULL. If not set, checks if a path has been previously set, and if not, defaults to current folder. |
The given path (or, if left to NULL, the path previously set) is returned invisibly.
yt_set_base_folder(path = fs::path( fs::path_home_r(), "R", "ytdlpr" )) yt_get_base_folder()
yt_set_base_folder(path = fs::path( fs::path_home_r(), "R", "ytdlpr" )) yt_get_base_folder()
Lists locally available files that can be attributed to a video id
yt_get_local( yt_id = NULL, playlist = NULL, file_extension = NULL, yt_base_folder = NULL )
yt_get_local( yt_id = NULL, playlist = NULL, file_extension = NULL, yt_base_folder = NULL )
yt_id |
Url or unique identifier of a YouTube video. |
playlist |
Playlist, either as full url from Youtube or as id. |
file_extension |
Defaults to NULL. Only file names with the given extension are returned. |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
A data frame, with two columns, yt_id
and path
.
## Not run: yt_get_local() yt_get_local( yt_id = "WXPBOfRtXQE", file_extension = "webm" ) ## End(Not run)
## Not run: yt_get_local() yt_get_local( yt_id = "WXPBOfRtXQE", file_extension = "webm" ) ## End(Not run)
Check for the availability of local subtitles and return details in a data frame
yt_get_local_subtitles( yt_id = NULL, playlist = NULL, sub_lang = NULL, sub_format = "vtt", yt_base_folder = NULL )
yt_get_local_subtitles( yt_id = NULL, playlist = NULL, sub_lang = NULL, sub_format = "vtt", yt_base_folder = NULL )
yt_id |
Url or unique identifier of a YouTube video. |
playlist |
Playlist, either as full url from Youtube or as id. |
sub_lang |
Defaults to NULL. If not given, all local subtitles are returned. If given, only subtitles in the given sub_lang are returned. |
sub_format |
Defaults to "vtt". File extension of the subtitle. |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
A data frame (a tibble) with details on locally available subtitle files.
## Not run: yt_get_local_subtitles() ## End(Not run)
## Not run: yt_get_local_subtitles() ## End(Not run)
Get local path to folder where files for the given playlist will be stored
yt_get_playlist_folder(playlist, yt_base_folder = NULL)
yt_get_playlist_folder(playlist, yt_base_folder = NULL)
playlist |
Playlist, either as full url from Youtube or as id. |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
Path to playlist folder.
## Not run: yt_get_playlist_folder( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) ## End(Not run)
## Not run: yt_get_playlist_folder( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) ## End(Not run)
Get all Youtube identifiers for a given playlist
yt_get_playlist_id(playlist, update = FALSE, yt_base_folder = NULL)
yt_get_playlist_id(playlist, update = FALSE, yt_base_folder = NULL)
playlist |
The full url of a Youtube playlist. |
update |
Defaults to FALSE. If FALSE, data is returned immediately if previously stored. If TRUE, it checks again the playlist on Youtube to see if new content has been added. |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
A data frame (a tibble) with a single column named yt_id
.
## Not run: yt_get_playlist_id( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) ## End(Not run)
## Not run: yt_get_playlist_id( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbtMcDKmT2dRAfjmSFwOt1Vj" ) ## End(Not run)
Read info json files, and extract key data in a data frame
yt_read_info_json(path)
yt_read_info_json(path)
path |
Path to one or more subtitle file in the json format, such as those downloaded by |
A data frame
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P", info_json = TRUE ) |> yt_read_info_json() ## End(Not run)
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P", info_json = TRUE ) |> yt_read_info_json() ## End(Not run)
Read vtt subtitles
yt_read_vtt(path)
yt_read_vtt(path)
path |
Path to one or more subtitle file in the vtt format. If a data frame is used as input, a column named "path" in that data frame will be used as source. |
A data frame with
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P", auto_subs = TRUE ) |> yt_read_vtt() ## End(Not run)
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P", auto_subs = TRUE ) |> yt_read_vtt() ## End(Not run)
Set for the current session base folder where all retrieved files will be stored
yt_set_base_folder(path)
yt_set_base_folder(path)
path |
Defaults to NULL. If not set, checks if a path has been previously set, and if not, defaults to current folder. |
The given path (or, if left to NULL, the path previously set) is returned invisibly.
yt_set_base_folder(path = fs::path( fs::path_home_r(), "R", "ytdlpr" )) yt_get_base_folder()
yt_set_base_folder(path = fs::path( fs::path_home_r(), "R", "ytdlpr" )) yt_get_base_folder()
Trim video files so that it shows only the part with relevant subtitles
yt_trim( subtitles_df, only_local = TRUE, lag = -3, duration = 5, check_previous = TRUE, video_file_extension = "webm|mp4|mkv", simulate = FALSE, destination_folder = "0_trimmed_video", destination_path = NULL, yt_base_folder = NULL )
yt_trim( subtitles_df, only_local = TRUE, lag = -3, duration = 5, check_previous = TRUE, video_file_extension = "webm|mp4|mkv", simulate = FALSE, destination_folder = "0_trimmed_video", destination_path = NULL, yt_base_folder = NULL )
subtitles_df |
A data frame with subtitles, typically generated with a
combination of |
only_local |
Defaults to TRUE. If FALSE, downloads missing video files. |
lag |
Defaults to |
duration |
Duration in seconds of the trimmed video. Defaults to 5. |
check_previous |
Defaults to TRUE. If a file with the same id, same starting time, and same duration has already been stored in the destination folder, skip it. |
video_file_extension |
Defaults to "webm|mp4|mkv". Can be set to explictly only rely on video files stored in a specific file format. |
simulate |
Defaults to FALSE. Similiarly to the same argument in
|
destination_folder |
Defaults to |
destination_path |
Defaults to NULL. Location where trimmed video files
will be stored. If given, takes precedence over |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
A data fram with details about the export and ffmpeg commmand. Mostly used for side effects (creates trimmed video files).
## Not run: filtered_subs_df <- yt_get( yt_id = "-0pPBAiJaYk", subtitles = TRUE, video = TRUE ) |> yt_read_vtt() |> yt_filter(pattern = "community") yt_trim(filtered_subs_df) ## End(Not run)
## Not run: filtered_subs_df <- yt_get( yt_id = "-0pPBAiJaYk", subtitles = TRUE, video = TRUE ) |> yt_read_vtt() |> yt_filter(pattern = "community") yt_trim(filtered_subs_df) ## End(Not run)
Arguments taken from FFMPEG's drawtext filter
yt_trim_with_text( subtitles_df, only_local = TRUE, font = "Mono", fontcolor = "white", fontsize = 32, box = 1, boxcolor = "black", boxopacity = 0.5, boxborderw = 5, position_x = 10, position_y = 10, yt_base_folder = NULL, ... )
yt_trim_with_text( subtitles_df, only_local = TRUE, font = "Mono", fontcolor = "white", fontsize = 32, box = 1, boxcolor = "black", boxopacity = 0.5, boxborderw = 5, position_x = 10, position_y = 10, yt_base_folder = NULL, ... )
subtitles_df |
A data frame with subtitles, typically generated with a
combination of |
only_local |
Defaults to TRUE. If FALSE, downloads missing video files. |
font |
Defaults to "Mono". |
fontcolor |
Defaults to "white". |
fontsize |
Defaults to 32. |
box |
Defaults to 1. |
boxcolor |
Defaults to "black". |
boxopacity |
Defaults to 0.5 |
boxborderw |
Defaults to 5 |
position_x |
Defaults to 10 |
position_y |
Defaults to 10 |
yt_base_folder |
Base folder, defaults to NULL. Can be set with
|
... |
Passed to |
Nothing, used for side effects.
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P", auto_subs = TRUE ) |> # download subtitles yt_read_vtt() |> # read them yt_filter(pattern = "rover") |> # keep only those with "rover" in the text dplyr::slice_sample(n = 2) |> # keep two, as this is only an example yt_trim_with_text(only_local = FALSE) # download video files and json files and trim video ## End(Not run)
## Not run: yt_get( playlist = "https://www.youtube.com/playlist?list=PLbyvawxScNbuSi7sJaJbHNyyx3iYJeW3P", auto_subs = TRUE ) |> # download subtitles yt_read_vtt() |> # read them yt_filter(pattern = "rover") |> # keep only those with "rover" in the text dplyr::slice_sample(n = 2) |> # keep two, as this is only an example yt_trim_with_text(only_local = FALSE) # download video files and json files and trim video ## End(Not run)