Package: castarter 0.3.0.9017

castarter: Content Analysis Starter Toolkit

Consistent approaches for basic web scraping, text mining and word frequency analysis of textual datasets.

Authors:Giorgio Comai [aut, cre, cph], EDJNet [fnd]

castarter_0.3.0.9017.tar.gz
castarter_0.3.0.9017.zip(r-4.7)castarter_0.3.0.9017.zip(r-4.6)castarter_0.3.0.9017.zip(r-4.5)
castarter_0.3.0.9017.tgz(r-4.6-any)castarter_0.3.0.9017.tgz(r-4.5-any)
castarter_0.3.0.9017.tar.gz(r-4.7-any)castarter_0.3.0.9017.tar.gz(r-4.6-any)
castarter_0.3.0.9017.tgz(r-4.6-emscripten)
manual.pdf |manual.html✨
DESCRIPTION |NEWS
card.svg |card.png
castarter/json (API)

# Install 'castarter' in R:

install.packages('castarter', repos = c('https://giocomai.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/giocomai/castarter/issues

Pkgdown/docs site:https://castarter.tadadit.xyz

Datasets:

casdb_empty_index_id - Empty data frame with the same format as data stored in the 'index_id' table

On CRAN:

tada text-mining

4.98 score 6 stars 2 scripts 108 exports 138 dependencies

Last updated from:009a59695d. Checks:7 WARNING, 2 OK. Indexed: yes.

Target	Result	Time
linux-devel-x86_64	WARNING	197
source / vignettes	OK	250
linux-release-x86_64	WARNING	194
macos-release-arm64	WARNING	103
macos-oldrel-arm64	WARNING	127
windows-devel	WARNING	120
windows-release	WARNING	388
windows-oldrel	WARNING	451
wasm-release	OK	169

Exports::=.data %>%as_label as_name cas_archive cas_backup_gd cas_browse cas_build_urls cas_check_corpus cas_check_db_folder cas_check_read_db_contents_data cas_check_response cas_check_use_db cas_check_website_folder cas_connect_to_db cas_convert_db_type cas_count cas_count_relative cas_count_total_words cas_create_db_folder cas_delete_corpus cas_delete_from_db cas_disable_db cas_disconnect_from_db cas_download cas_download_chromote cas_download_httr cas_download_httr2 cas_download_index cas_download_internal cas_enable_db cas_explorer cas_export_tables cas_extract cas_extract_html cas_extract_html_custom cas_extract_links cas_extract_script cas_find_extractor cas_generate_metadata cas_get_base_folder cas_get_base_path cas_get_corpus_path cas_get_db cas_get_db_file cas_get_db_folder cas_get_db_settings cas_get_files_to_download cas_get_files_to_extract cas_get_options cas_get_path_to_files cas_get_sitemap cas_get_urls_df cas_get_website_folder cas_ia_check cas_ia_save cas_ignore_id cas_kwic cas_kwic_single_pattern cas_read_corpus cas_read_db cas_read_db_contents_data cas_read_db_contents_id cas_read_db_download cas_read_db_ia cas_read_db_ignore_id cas_read_db_index cas_read_db_response cas_read_db_sitemap cas_read_db_urls cas_read_from_db cas_reset_db cas_reset_db_contents_data cas_reset_db_contents_id cas_reset_db_ignore_id cas_reset_db_index_id cas_reset_download_contents cas_reset_download_index cas_restore cas_set_db cas_set_db_folder cas_set_options cas_show_barchart_ggiraph cas_show_barchart_ggplot2 cas_show_gg_base cas_show_ts_dygraph cas_summarise cas_update cas_write_corpus cas_write_db_contents_data cas_write_db_contents_id cas_write_db_ignore_id cas_write_db_index cas_write_db_sitemap cas_write_db_urls cas_write_to_db cass_build_urls cass_combine_into_pattern cass_download_csv_app cass_highlight cass_show_ts_dygraph_app cass_split_string enquo enquos expr sym syms

Dependencies:askpass assertthat attempt base64enc bit bit64 blob bslib cachem callr cicerone cli clipr codetools commonmark config cpp11 crayon credentials crosstalk curl DBI dbplyr desc digest dplyr DT dygraphs ellipsis evaluate farver fastmap fontawesome fontBitstreamVera fontLiberation fontquiver forcats fs gdtools generics gert ggiraph ggplot2 gitcreds glue golem gtable highr hms htmltools htmlwidgets httpuv httr httr2 ini isoband janeaustenr jquerylib jsonlite knitr labeling later lattice lazyeval lifecycle lubridate magrittr MASS Matrix memoise mime openssl otel pillar pkgconfig PrettyCols prettyunits processx progress promises ps purrr R.cache R.methodsS3 R.oo R.utils R6 rappdirs RColorBrewer Rcpp reactable reactR readr rlang rmarkdown rprojroot RSQLite rstudioapi rvest S7 sass scales selectr shiny shinymeta slider SnowballC sourcetools stringi stringr styler sys systemfonts tbl2xts tibble tidyr tidyselect tidytext timechange tinytex tokenizers tzdb usethis utf8 vctrs viridisLite vroom waiter warp whisker withr xfun xml2 xtable xts yaml zip zoo

Shiny modules included in castarter

Time series | Dygraphs | Barchart

Rendered fromcastarter-shiny-modules.Rmdusingknitr::rmarkdown

Last update: 2026-02-07
Started: 2022-12-11

Database structure in castarter

Database location and database file naming conventions | Main database tables and column names | Additional tables

Rendered fromcastarter-database.Rmdusingknitr::rmarkdown

Last update: 2023-12-24
Started: 2022-12-11

Citation

Development and contributors

Readme and manuals

Help Manual

Help page	Topics
Archive originals of downloaded files in compressed folders	cas_archive
Backup files to Google Drive	cas_backup_gd
Open in a browser a URL stored in the local database	cas_browse
URL builder	cas_build_urls
Checks if given corpus exists, and, optionally updates it	cas_check_corpus
Checks if database folder exists, if not returns an informative message	cas_check_db_folder
Returns a corpus from the 'contents_data' table in the database; if corpus is given, it just returns that instead.	cas_check_read_db_contents_data
Check httr response code and cache locally results	cas_check_response
Check caching status in the current session, and override it upon request	cas_check_use_db
Checks if current website folder exists	cas_check_website_folder
Return a connection to be used for caching	cas_connect_to_db
Convert database type, e.g. from DuckDB to SQLite	cas_convert_db_type
Count strings in a corpus	cas_count
Count strings in a corpus relative to the number of words	cas_count_relative
Count total words in a dataset	cas_count_total_words
Creates the base folder where 'castarter' stores the project database.	cas_create_db_folder
Delete previously stored corpora written with 'cas_write_corpus()'.	cas_delete_corpus
Delete rows from selected database table	cas_delete_from_db
Disable caching for the current session	cas_disable_db
Ensure that connection to database is disconnected consistently	cas_disconnect_from_db
Downloads files systematically, and stores details about the download in a local database	cas_download
Downloads one file at a time with 'chromote'	cas_download_chromote
Downloads one file at a time with httr	cas_download_httr
Downloads one file at a time with 'httr2'	cas_download_httr2
Downloads index files systematically, and stores details about the download in a local database	cas_download_index
Downloads one file at a time with the default R function for downloading files	cas_download_internal
Enable caching for the current session	cas_enable_db
Run a web interface allowing basic word frequency analysis	cas_explorer
Export database tables to another format such as csv	cas_export_tables
Extract fields and contents from downloaded files	cas_extract
Facilitates extraction of contents from an html file	cas_extract_html
Facilitates extraction of contents from an html file	cas_extract_html_custom
Extract direct links to individual content pages from index pages	cas_extract_links
Extracts scripts from an html page	cas_extract_script
Facilitate finding extractors, typically to be used with 'cas_extract_html()'	cas_find_extractor
Generate basic metadata about the corpus, including start and end date and total number of items available.	cas_generate_metadata
Get base folder under which files will be stored.	cas_get_base_folder
Build full path to base folder where batches of files will be stored.	cas_get_base_path
Get path to folder where the corpus is stored.	cas_get_corpus_path
Get connection to database with details about current website	cas_get_db
Gets location of database file	cas_get_db_file
Get database connection settings from the environment	cas_get_db_settings
Create a data frame with not yet downloaded files	cas_get_files_to_download
Get path to (locally available) files to be extracted	cas_get_files_to_extract
Get key project parameters that determine the folder used for storing project files	cas_get_options
Get path to locally downloaded files	cas_get_path_to_files
Checks for availability of a sitemap in xml format.	cas_get_sitemap
Checks that a given input corresponds to the format expected of a download data frame, consistently returns expected format	cas_get_urls_df
Get folder were files and data related to the current website are stored	cas_get_website_folder
Gets an Archive.org Wayback Machine URL	cas_ia_check
Save a URL the Internet Archive's Wayback Machine	cas_ia_save
Adds a column with n words before and after the selected pattern to see keywords in context	cas_kwic
Adds a column with n words before and after the selected pattern to see keywords in context	cas_kwic_single_pattern
Read datasets created with 'cas_write_dataset'	cas_read_corpus
Read contents data from local database	cas_read_db_contents_data
Read contents from local database	cas_read_db_contents_id
Read index from local database	cas_read_db_download
Read status on the Internet Archive of given URLs	cas_read_db_ia
Read identifiers to be ignored from the local database	cas_read_db_ignore_id
Read index from local database	cas_read_db_index
Check response type of URLs as stored in the local database.	cas_read_db_response
Read sitemap from local database	cas_read_db_sitemap
Read urls stored in the local database	cas_read_db_urls
Reads data from local database	cas_read_db cas_read_from_db
Delete a specific table from database	cas_reset_db
Removes from the local database the folder where extracted data are stored	cas_reset_db_contents_data
Removes from the local database the folder where links to contents associated with their id are stored	cas_reset_db_contents_id
Removes from the local database all identifiers included in the ignore list	cas_reset_db_ignore_id
Removes from the local database the table where links to index urls are stored	cas_reset_db_index_id
Delete all files and database records for the contents pages of the current website	cas_reset_download_contents
Delete all files and database records for the index pages of the current website	cas_reset_download_index
Restore files from compressed files	cas_restore
Set database connection settings for the session	cas_set_db
Set folder for storing the database	cas_get_db_folder cas_set_db_folder
Set key project parameters that determine the folder used for storing project files	cas_set_options
Creates interacative barchart with ggiraph	cas_show_barchart_ggiraph
Creates barchart with ggplot2	cas_show_barchart_ggplot2
Creates base ggplot2 object to be used by ggplot or ggiraph	cas_show_gg_base
Create dygraphs based on a data frame typically generated with 'cas_count()'	cas_show_ts_dygraph
Summarise for a given time period word counts, typically calculatd with 'cas_count()'	cas_summarise
Update corpus	cas_update
Export the textual dataset for the current website	cas_write_corpus
Write extracted contents to local database	cas_write_db_contents_data
Write contents URLs to local database	cas_write_db_contents_id
Ignore a set of ids from the download or processing step	cas_ignore_id cas_write_db_ignore_id
Write index URLs to local database	cas_write_db_index
Write index URLs to local database	cas_write_db_sitemap
Write index or contents urls directly to the local database	cas_write_db_urls
Generic function for writing to database	cas_write_to_db
Empty data frame with the same format as data stored in the 'index_id' table	casdb_empty_index_id
Helps you define the parameters you need for building index urls	cass_build_urls
A minimal shiny app that demonstrates the functioning of related modules	cass_download_csv_app
Takes a character vector and returns it with matches of pattern wrapped in html tags used for highlighting	cass_highlight
A minimal shiny app that demonstrates the functioning of related modules	cass_show_ts_dygraph_app
Split string into multiple inputs	cass_split_string