If this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee. It shall be used to continue my Open Source efforts. The full explanation is here: A Personal Message from an Open Source Contributor.
You can send me questions for the blog using this form and subscribe to receive an email when there is a new post.
I got the following question for the blog: “How can I scrape texts from Chabad.org in R?”
This question turned out to be funny and challenging! I organized the scraped code for the Tanakh (the Old Testament for Christians) into the R package “tanakh” (https://github.com/pachadotdev/tanakh/). The code using purrr and RSelenium is here.
I would love to receive more ideas about non-English datasets in languages such as Arabic, Akkadian, Enochian, Greek, and others.
The ‘Tanakh’ R package
The tanakh R package provides tidy, verse-level access to the full Hebrew Bible (Tanakh) with English and Hebrew text, organized by book, chapter, and verse. Data is sourced from Chabad.org and includes diacritics (niqqud) in Hebrew.
Features
- Three datasets:
torah (Pentateuch), neviim (Prophets), ketuvim (Writings)
- Each dataset is a tibble with columns:
chapter_number, chapter_name, line, english, hebrew, rashi_english, rashi_hebrew
- Hebrew text is normalized and includes diacritics
- Includes Rashi’s commentary in both English and Hebrew (thanks to Rab. Rapoport for the suggestion)
- Easy filtering and analysis in R
Installation
You can install the development version of tanakh from the R console:
if (!require(remotes)) install.packages("remotes")
Loading required package: remotes
Attaching package: 'remotes'
The following objects are masked from 'package:devtools':
dev_package_deps, install_bioc, install_bitbucket, install_cran,
install_deps, install_dev, install_git, install_github,
install_gitlab, install_local, install_svn, install_url,
install_version, update_packages
if (!require(tanakh)) remotes::install_github("pachadotdev/tanakh")
Loading required package: tanakh
Datasets
Torah (Pentateuch): - Bereshit (Genesis), Shemot (Exodus), Vayikra (Leviticus), Bamidbar (Numbers), Devarim (Deuteronomy)
Nevi’im (Prophets): - Yehoshua (Joshua), Shoftim (Judges), Shmuel I & II, Melachim I & II, Yeshayahu, Yirmiyahu, Yechezkel, Hoshea, Yoel, Amos, Ovadiah, Yonah, Michah, Nachum, Chavakuk, Tzefaniah, Chaggai, Zechariah, Malachi
Ketuvim (Writings): - Tehillim (Psalms), Mishlei (Proverbs), Iyov (Job), Shir Hashirim, Rut, Eichah, Kohelet, Esther, Daniel, Ezra, Nechemiah, Divrei Hayamim I & II
Example Usage
library(tanakh)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
# Find all verses mentioning Moses in the Torah
torah %>%
filter(grepl("Moses", english))
# A tibble: 599 x 7
chapter_number chapter_name line english hebrew rashi_english rashi_hebrew
<int> <fct> <int> <chr> <chr> <chr> <chr>
1 2 Shemot (Exodu~ 10 "The c~ "\u05~ "For I drew ~ "\u05de\u05~
2 2 Shemot (Exodu~ 11 "Now i~ "\u05~ "Moses grew ~ "\u05d5\u05~
3 2 Shemot (Exodu~ 14 "And h~ "\u05~ "Who made yo~ "\u05de\u05~
4 2 Shemot (Exodu~ 15 "Phara~ "\u05~ "Pharaoh hea~ "\u05d5\u05~
5 2 Shemot (Exodu~ 17 "But t~ "\u05~ "and drove t~ "\u05d5\u05~
6 2 Shemot (Exodu~ 21 "Moses~ "\u05~ "consented. ~ "\u05d5\u05~
7 3 Shemot (Exodu~ 1 "Moses~ "\u05~ "after the f~ "\u05d0\u05~
8 3 Shemot (Exodu~ 3 "So Mo~ "\u05~ "Let me turn~ "\u05d0\u05~
9 3 Shemot (Exodu~ 4 "The L~ "\u05~ <NA> <NA>
10 3 Shemot (Exodu~ 6 "And H~ "\u05~ <NA> <NA>
# i 589 more rows
# Find all verses mentioning Amos in Nevi'im
neviim %>%
filter(grepl("Amos", english))
# A tibble: 7 x 7
chapter_number chapter_name line english hebrew rashi_english rashi_hebrew
<int> <fct> <int> <chr> <chr> <chr> <chr>
1 1 Amos 1 "The word~ "\u05~ "who was amo~ "\u05d0\u05~
2 7 Amos 8 "And the ~ "\u05~ "Behold I pl~ "\u05d4\u05~
3 7 Amos 10 "And Amaz~ "\u05~ "the priest ~ "\u05db\u05~
4 7 Amos 11 "For so s~ "\u05~ "Jeroboam sh~ "\u05d1\u05~
5 7 Amos 12 "And Amaz~ "\u05~ "\u201cSeer\~ "\u05d7\u05~
6 7 Amos 14 "And Amos~ "\u05~ "I am neithe~ "\u05dc\u05~
7 8 Amos 2 "And He s~ "\u05~ <NA> <NA>
# Find all verses from Esther in Ketuvim
ketuvim %>%
filter(chapter_name == "Esther")
# A tibble: 167 x 7
chapter_number chapter_name line english hebrew rashi_english rashi_hebrew
<int> <fct> <int> <chr> <chr> <chr> <chr>
1 1 Esther 1 Now it c~ "\u05~ "Now it came~ "\u05d5\u05~
2 1 Esther 2 In those~ "\u05~ "when King A~ "\u05db\u05~
3 1 Esther 3 In the t~ "\u05~ "the nobles.~ "\u05d4\u05~
4 1 Esther 4 When he ~ "\u05~ "many days. ~ "\u05d9\u05~
5 1 Esther 5 And when~ "\u05~ "the garden.~ "\u05d2\u05~
6 1 Esther 6 [There w~ "\u05~ "white, fine~ "\u05d7\u05~
7 1 Esther 7 And they~ "\u05~ "And they ga~ "\u05d5\u05~
8 1 Esther 8 And the ~ "\u05~ "according t~ "\u05db\u05~
9 1 Esther 9 Also, Va~ "\u05~ <NA> <NA>
10 1 Esther 10 On the s~ "\u05~ "On the seve~ "\u05d1\u05~
# i 157 more rows
Font
In order to display the Hebrew text with niqqud correctly, please ensure you have a font that supports Hebrew diacritics installed on your system. Recommended font: Noto Sans Hebrew. See the repository for more information on font installation.