Package: castarter 0.2.0.9010

castarter: Content Analysis Starter Toolkit

Consistent approaches for basic web scraping, text mining and word frequency analysis of textual datasets

Authors:Giorgio Comai [aut, cre, cph]

castarter_0.2.0.9010.tar.gz
castarter_0.2.0.9010.zip(r-4.5)castarter_0.2.0.9010.zip(r-4.4)castarter_0.2.0.9010.zip(r-4.3)
castarter_0.2.0.9010.tgz(r-4.4-any)castarter_0.2.0.9010.tgz(r-4.3-any)
castarter_0.2.0.9010.tar.gz(r-4.5-noble)castarter_0.2.0.9010.tar.gz(r-4.4-noble)
castarter_0.2.0.9010.tgz(r-4.4-emscripten)castarter_0.2.0.9010.tgz(r-4.3-emscripten)
castarter.pdf |castarter.html
castarter/json (API)
NEWS

# Install 'castarter' in R:
install.packages('castarter', repos = c('https://giocomai.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/giocomai/castarter/issues

Datasets:

On CRAN:

tadatext-mining

100 exports 3 stars 2.14 score 136 dependencies 2 scripts

Last updated 2 days agofrom:c13159106e. Checks:OK: 1 ERROR: 6. Indexed: yes.

TargetResultDate
Doc / VignettesOKSep 16 2024
R-4.5-winERRORSep 16 2024
R-4.5-linuxERRORSep 16 2024
R-4.4-winERRORSep 16 2024
R-4.4-macERRORSep 16 2024
R-4.3-winERRORSep 16 2024
R-4.3-macERRORSep 16 2024

Exports::=.data%>%as_labelas_namecas_archivecas_backup_gdcas_browsecas_build_urlscas_check_corpuscas_check_db_foldercas_check_read_db_contents_datacas_check_use_dbcas_check_website_foldercas_connect_to_dbcas_convert_db_typecas_countcas_count_relativecas_count_total_wordscas_create_db_foldercas_delete_corpuscas_delete_from_dbcas_disable_dbcas_disconnect_from_dbcas_downloadcas_download_chromotecas_download_httrcas_download_indexcas_download_internalcas_download_legacycas_enable_dbcas_explorercas_explorer_legacycas_export_tablescas_extractcas_extract_htmlcas_extract_linkscas_extract_scriptcas_find_extractorcas_generate_metadatacas_get_base_foldercas_get_base_pathcas_get_corpus_pathcas_get_dbcas_get_db_filecas_get_db_foldercas_get_db_settingscas_get_files_to_downloadcas_get_optionscas_get_path_to_filescas_get_urls_dfcas_get_website_foldercas_ia_checkcas_ia_savecas_ignore_idcas_kwiccas_kwic_single_patterncas_read_corpuscas_read_db_contents_datacas_read_db_contents_idcas_read_db_downloadcas_read_db_iacas_read_db_ignore_idcas_read_db_indexcas_read_db_urlscas_read_from_dbcas_reset_dbcas_reset_db_contents_datacas_reset_db_contents_idcas_reset_db_ignore_idcas_reset_db_index_idcas_reset_download_contentscas_reset_download_indexcas_restorecas_set_dbcas_set_db_foldercas_set_optionscas_show_barchart_ggiraphcas_show_barchart_ggplot2cas_show_gg_basecas_show_ts_dygraphcas_summarisecas_updatecas_write_corpuscas_write_db_contents_datacas_write_db_contents_idcas_write_db_ignore_idcas_write_db_indexcas_write_db_urlscas_write_to_dbcass_build_urlscass_download_csv_appcass_highlightcass_show_ts_dygraph_appcass_split_stringenquoenquosexprsymsyms

Dependencies:askpassassertthatattemptbase64encbitbit64blobbslibcachemcallrciceroneclicliprcolorspacecommonmarkconfigcpp11crayoncredentialscrosstalkcurlDBIdbplyrdescdigestdplyrDTdygraphsellipsisevaluatefansifarverfastmapfontawesomefsgenericsgertggiraphggplot2ghgitcredsgluegolemgtableherehighrhmshtmltoolshtmlwidgetshttpuvhttrhttr2iniisobandjaneaustenrjquerylibjsonliteknitrlabelinglaterlatticelazyevallifecyclelubridatemagrittrMASSMatrixmemoisemgcvmimemunsellnlmeopensslpillarpkgconfigplogrPrettyColsprettyunitsprocessxprogresspromisespspurrrR.cacheR.methodsS3R.ooR.utilsR6rappdirsRColorBrewerRcppreactablereactRrlangrmarkdownrprojrootRSQLiterstudioapirvestsassscalesselectrshinyshinymetasliderSnowballCsourcetoolsstringistringrstylersyssystemfontstbl2xtstibbletidyrtidyselecttidytexttimechangetinytextokenizersusethisutf8uuidvctrsviridisLitewaiterwarpwhiskerwithrxfunxml2xtablextsyamlzipzoo

Database structure in castarter

Rendered fromcastarter-database.Rmdusingknitr::rmarkdownon Sep 16 2024.

Last update: 2023-12-24
Started: 2022-12-11

Shiny modules included in castarter

Rendered fromcastarter-shiny-modules.Rmdusingknitr::rmarkdownon Sep 16 2024.

Last update: 2023-12-24
Started: 2022-12-11

Readme and manuals

Help Manual

Help pageTopics
Archive originals of downloaded files in compressed folderscas_archive
Backup files to Google Drivecas_backup_gd
Open in a browser a URL stored in the local databasecas_browse
URL buildercas_build_urls
Checks if given corpus exists, and, optionally updates itcas_check_corpus
Checks if database folder exists, if not returns an informative messagecas_check_db_folder
Returns a corpus from the 'contents_data' table in the database; if corpus is give, it just returns that instead.cas_check_read_db_contents_data
Check caching status in the current session, and override it upon requestcas_check_use_db
Checks if current website folder existscas_check_website_folder
Return a connection to be used for cachingcas_connect_to_db
Convert database type, e.g. from DuckDB to SQLitecas_convert_db_type
Count strings in a corpuscas_count
Count strings in a corpus relative to the number of wordscas_count_relative
Count total words in a datasetcas_count_total_words
Creates the base folder where 'castarter' stores the project database.cas_create_db_folder
Delete previously stored corpora written with 'cas_write_corpus()'.cas_delete_corpus
Delete rows from selected database tablecas_delete_from_db
Disable caching for the current sessioncas_disable_db
Ensure that connection to database is disconnected consistentlycas_disconnect_from_db
Downloads files systematically, and stores details about the download in a local databasecas_download
Downloads one file at a time with chromotecas_download_chromote
Downloads one file at a time with httrcas_download_httr
Downloads index files systematically, and stores details about the download in a local databasecas_download_index
Downloads one file at a time with readLinescas_download_internal
Downloads html pages based on a vector of linkscas_download_legacy
Enable caching for the current sessioncas_enable_db
Run the Shiny Applicationcas_explorer
Run the Shiny Applicationcas_explorer_legacy
Export database tables to another format such as csvcas_export_tables
Extract fields and contents from downloaded filescas_extract
Facilitates extraction of contents from an html filecas_extract_html
Extract direct links to individual content pages from index pagescas_extract_links
Extracts scripts from an html pagecas_extract_script
Facilitate finding extractors, typically to be used with 'cas_extract_html()'cas_find_extractor
Generate basic metadata about the corpus, including start and end date and total number of items available.cas_generate_metadata
Get base folder under which files will be stored.cas_get_base_folder
Build full path to base working foldercas_get_base_path
Get path to folder where the corpus is stored.cas_get_corpus_path
Get connection to database with details about current websitecas_get_db
Gets location of database filecas_get_db_file
Get database connection settings from the environmentcas_get_db_settings
Create a data frame with not yet downloaded filescas_get_files_to_download
Get key project parameters that determine the folder used for storing project filescas_get_options
Get path to locally downloaded filescas_get_path_to_files
Checks that a given input corresponds to the format expected of a download data frame, consistently returns expected formatcas_get_urls_df
Get folder were files and data related to the current website are storedcas_get_website_folder
Gets an Archive.org Wayback Machine URLcas_ia_check
Save a URL the Internet Archive's Wayback Machinecas_ia_save
Adds a column with n words before and after the selected pattern to see keywords in contextcas_kwic
Adds a column with n words before and after the selected pattern to see keywords in contextcas_kwic_single_pattern
Read datasets created with 'cas_write_dataset'cas_read_corpus
Read contents data from local databasecas_read_db_contents_data
Read contents from local databasecas_read_db_contents_id
Read index from local databasecas_read_db_download
Read status on the Internet Archive of given URLscas_read_db_ia
Read identifiers to be ignored from the local databasecas_read_db_ignore_id
Read index from local databasecas_read_db_index
Read urls stored in the local databasecas_read_db_urls
Reads data from local databasecas_read_from_db
Delete a specific table from databasecas_reset_db
Removes from the local database the folder where extracted data are storedcas_reset_db_contents_data
Removes from the local database the folder where links to contents associated with their id are storedcas_reset_db_contents_id
Removes from the local database all identifiers included in the ignore listcas_reset_db_ignore_id
Removes from the local database the table where links to index urls are storedcas_reset_db_index_id
Delete all files and database records for the contents pages of the current websitecas_reset_download_contents
Delete all files and database records for the index pages of the current websitecas_reset_download_index
Restore files from compressed filescas_restore
Set database connection settings for the sessioncas_set_db
Set folder for storing the databasecas_get_db_folder cas_set_db_folder
Set key project parameters that determine the folder used for storing project filescas_set_options
Creates interacative barchart with ggiraphcas_show_barchart_ggiraph
Creates barchart with ggplot2cas_show_barchart_ggplot2
Creates base ggplot2 object to be used by ggplot or ggiraphcas_show_gg_base
Create dygraphs based on a data frame typically generated with cas_count()cas_show_ts_dygraph
Summarise for a given time period word counts, typically calculatd with 'cas_count()'cas_summarise
Update corpuscas_update
Export the textual dataset for the current websitecas_write_corpus
Write extracted contents to local databasecas_write_db_contents_data
Write contents URLs to local databasecas_write_db_contents_id
Ignore a set of ids from the download or processing stepcas_ignore_id cas_write_db_ignore_id
Write index URLs to local databasecas_write_db_index
Write index or contents urls directly to the local databasecas_write_db_urls
Generic function for writing to databasecas_write_to_db
Empty data frame with the same format as data stored in the 'index_id' tablecasdb_empty_index_id
Helps you define the parameters you need for building index urlscass_build_urls
Combines a vector of words into a string to be used for regex matching.cass_combine_into_pattern
A minimal shiny app that demonstrates the functioning of related modulescass_download_csv_app
Takes a character vector and returns it with matches of pattern wrapped in html tags used for highlightingcass_highlight
A minimal shiny app that demonstrates the functioning of related modulescass_show_ts_dygraph_app
Split string into multiple inputscass_split_string