English, Guest Posts, Methodology

It Functions, and that’s (almost) All: Another Look at “Tagging the Talmud”

 

Itay Marienberg-Milikowsky is currently a visiting scholar at the Interdisciplinary Center for Narratology, Universität Hamburg, where he conducts his post-doc research entitled “The Rise of  Narrativity in Talmudic Literature: Computational Perspectives.” This is our third post in an ongoing series on Digital Humanities and Rabbinic Literature.


In Alfred Döblin’s famous novel Berlin, Alexanderplatz, a certain Franz Biberkopf rejoins the modern city after a prolonged incarceration, where he is astonished by the relentless, alienating pace of change. In time, Biberkopf gradually becomes entrapped in a net of forces stronger than himself, and his bewilderment is reflected in the splitting of his voice – or, maybe, the narrator’s voice – into two (if not more) contradictory points of view. Thus, the telegraph is described in one sentence as “astonishing, clever, tricky,” while in a subsequent sentence, we read: “It’s hard to get enthusiastic about all this; it functions, and that’s all” (p. 76). Continue reading

Standard
English, Guest Posts

Natural Language Processing of Rabbinic Texts: Contexts, Challenges, Opportunities

The Talmud Blog is happy to continue our series on the interface of Digital Humanities and the study of Rabbinic Literature with a post by Marton Ribary of University of Manchester.

I read Michael Satlow’s enthusiastic report on the Classical Philology Goes Digital Workshop with great pleasure. I am delighted to see how the study of Rabbinic literature moves towards the use of digital tools and especially Natural Language Processing (NLP) methods. Below I shall sketch the background of NLP methods applied to Rabbinic literature and what we can learn from projects concentrating on other classical linguistic data, notably Latin. I shall briefly discuss the enormous obstacles Rabbinic literature poses even compared to Latin, which means that our expectations to achieve meaningful results in standard 3-5 year research projects should be very moderate. Nevertheless, I shall argue that we should dream big and aim for courageous projects accompanied by an outward-looking strategy in order to attract big corporate money. Continue reading

Standard
English, Guest Posts

Digital Humanities and Rabbinic Literature

The Talmud Blog is happy to be hosting a series on the interface of Digital Humanities and the study of Rabbinic Literature. Our first post comes from Prof. Michael Satlow, of Brown University. 

The other week I attended a workshop called Classical Philology Goes Digital Workshop in Potsdam, Germany. The major goal of the workshop, which was also tied to the Humboldt Chair of Digital Humanities, was to further the work of creating and analyzing open texts of the “classics”, broadly construed. We have been thinking about adding natural language processing (including morphological and syntactic tagging – or, as I learned at the workshop, more accurately “annotation”) to the Inscriptions of Israel/Palestine project. While we learned much and are better positioned to add this functionality, I was most struck by how far the world of “digital classical philology,” focused mainly on texts, has progressed, and it got me thinking about the state of our own field. Continue reading

Standard
Talk of the Town

“We Read Thus”: On Hachi Garsinan and Learning Talmud in the 21st century

Untitled

A screenshot from Hachi Garsinan 2.0: a synopsis of b. Bab. Kam. 89a (left) and a genizah fragment from Oxford, Heb.c.21/31-36, with a marginal note in Judaeao-Arabic

Since its creation, the text of the Talmud has been the object of critical inquiry. Amoraim inquired after the exact wording of Tannaitic texts, Geonim struggled to establish the correct version of the Oral Talmud, as did the Rishonim with their written copies. The advent of philology, from the renaissance on, prompted the collection and collation of manuscript copies of the Talmud as well as scholarly emendations and corrections. The Vilna Shas is the product of many centuries of scholarly work, pious and less-so, presented to the discerning student in what was the best technology available. Continue reading

Standard
Around the Web, English

A Synopsis of the Entire Talmud

Few texts present philologists with as many difficulties as the Babylonian Talmud, whose complicated transmission history – oral and written, spanning centuries and continents – has created countless conundrums that we are only now beginning to understand. At the same time, scholars dispute the proper way to understand the development of the text of the Bavli, meaning that opinions vary with regards to the proper way in which one should explain a difference in the Talmudic text: Is this event a consequence of fluidity during an early stage of oral transmission, or is it perhaps a later interpolation of a learned scribe? Such differences between textual witnesses of the Bavli are countless, and the different scholarly attempts to approach them are related to different ways of understanding how the Talmud developed over time. Thus, the close study of thousands of differences between manuscript versions of the Bavli not only helps explain the sugya at hand, but also sheds light on the development of the Bavli itself. Continue reading

Standard
Guest Posts

The ‘Status of The Talmud’ on Sefaria

Like many Jewish text geeks, I’ve been following the goings-on at Sefaria closely. Beyond providing free versions and translations of oodles of Jewish texts, Sefaria has made them available through a stunning, extremely accessible platform that allows for an expansion of the community of learners and an enrichment of the dialogue within that community. I’ve asked Sefaria’s Ari Elias-Bachrach to share his recent report on the status of the Talmud on Sefaria and to invite our readers to share what they would most like to see on the website. – Y.L.

Sefaria is a non-profit organization that is creating a massive library of interconnected Torah texts. It is all free and all in the public domain. To do this we’ve done a number of things including importing text from other open source projects on the web like WikiSource, and digitizing public domain sefarim and putting them on the web. One of the things we’re working on is building a Talmud that gives a better learning experience than anything that comes before it. Our goal is to have the standard Talmud text with not just Rashi and Tosafot, but also other major commentators all in the same place. Additionally, citations from things like the masorat hashas and ein mishpat will be linked so you can see the relevant halachot automatically.

Our Talmud text comes from WikiSouce, and we’ve been correcting it to ensure it matches the text of the Vilna shas. We realized that given the number of mefarshim we plan on having, an amud was simply too large a unit of measure to reasonably use. When we did the Tanach it was comparatively simple – the commentators usually comment on specific verses, so any given verses just needs to link to those commentaries. However, a single amud might contain 100 comments from Rashi Tosafot, the Rosh, and the other major commentators. Without breaking up the amud into smaller units, there’s no way to know which subset of those 100 commentaries to display. When you click on a pasuk in the Torah, you see all the commentaries on that pasuk in Torah. We wanted something similar here – when you click on a sentence in the Talmud, we wanted to display the relevant commentaries on that sentence. The conclusion was clear – we needed a way to break up the dapim. Thankfully Koren Publishers graciously allowed us to use their punctuation to break up the amud. Each line of the amud now corresponds to a grammatical phrase (not a line of the Vilna printing). We undertook a massive project to segment all of shas in this manner, and finished in the fall of 2014. In the process we also double checked the text we had from WikiSource to make sure it matched the Vilna shas. (As a side note, we found a significant number of errors in the WikiSource Talmud in both the Talmud text and the Rashi and Tosafot. These errors have unfortunately propagated to many sites across the web, and in many cases it is clear we’re the first people to actually check the text for accuracy).

Next up of course is the commentaries of Rashi and Tosafot. Now that the Talmud was segmented, we needed to make sure to associate each comment with the appropriate line. One of our wonderful volunteer developers Noah Santacruz made a commentary poster – a program that looks at the dibur hamatchil and tries to place the comment in the right place. Unfortunately, it cannot place every comment based solely on that information, as sometimes the text of the dibur hamatchil will appear multiple times in a daf, or it might not match at all if there are roshei teivot in use, or the commentator decides to abbreviate the text in some other fashion. To fix those, we’ve had people going through manually learning the appropriate masechtot and placing the commentaries where they belong. At the same time they’ve also been checking the contents of the comments against the Vilna shas to make sure our text is accurate. So far we’ve finished Brachot, Megillah, and Taanit. Kiddushin and Ketubot are in progress. Those of you doing Daf Yomi will be happy to know we’ll be keeping the Ketubot progress in front of the Daf Yomi cycle, so you don’t have to worry. This is still an ongoing process and while we’re looking for ways to improve our automation, we’re also looking for volunteers. If you, your chevruta, or your school group is learning Talmud and wants to help out the cause of Talmud learning on the internet, you could help by placing the missing commentaries in the right place as you learn. If you’re interested please let us know and we’ll help to get you started.

Throughout this process we’ve been checking the text of the Talmud and the commentaries. We’ve found a significant number of mistakes and typos, most of which have been copied over and over again by countless websites. One of the advantages to our system is that we’re able to spot and correct these errors quickly and easily. Sefaria currently has the most accurate Talmud text freely available on the internet today (using the Vilna shas as the standard), and when we’re done we will have the most accurate copies of Rashi and Tosafot too.

After Rashi and Tosafot of course come the other major commentaries. We’ve recently finished digitizing the Rosh and the Nosei Keilim there. We’re currently working on digitizing Maharsha, Maharal, Maharsham, the Rif, and the Nosei Keilim on the Rif. We’ve also acquired digitized versions of the Pnei Yehoshua, Yad Ramah, Ramban, Shita Mekubetzet, Rashba, and Tosafot Rid. So far we’ve done Shita Mekubetzet on Brachot. While getting these into our system is difficult for the same reasons as Rashi and Tosafot, you can expect to start seeing all these commentaries appearing on Sefaria starting in a few months.

Lastly, we’re also working on a few other features that should be helpful to people including an integrated dictionary with data from Jastrow and the Comprehensive Aramaic Lexicon, as well as a way of integrating the Mesorat Hashas and Ein Mishpat Ner Mitzvah. We’re also going to put in links to the Mishnah whenever the Gemarah quotes a Mishnah so that you can easily navigate to the Mishnah to see the various Mishnah commentaries we have. Currently that includes Ovadia M’Bartenura and the Tosafot Yom Tov, but we should be adding the Rambam this summer.

What other features would you find useful? One of the advantages to our system is that while extracting text is much more difficult than just putting images online, it also gives us a lot more flexibility and allows for the building of some features which may not have been possible before.

Standard
English, Events

How Open is ‘Open’?

For those of you who couldn’t make it to last night’s event– we’re sorry, but we unfortunately failed to to take Germany-USA into account when reserving the space a few weeks back. I would like to summarize some of the topics and projects that were discussed for the benefit of the larger Talmud Blog community and also to help developers and digital humanists at large understand the problems facing talmudists.

After an extremely helpful introduction on what the term “open” means by the apostle of digital humanities in the Holy Land, Sinai Rusinek, yours truly attempted to briefly summarize the main stages of rabbinic text curatorship and which websites talmudists use while performing such “manuscript work.” One of the issues I raised is the simple inconvenience of having to keep track of what is out there: the internet is a big place, and the number of websites containing either images or transcriptions of manuscripts is constantly growing. Between the National Library’s online catalog, which links to every available online image of Hebrew manuscripts, and The Talmud Blog‘s “Toolbox,” one can get to all of these different resources, but it is unfortunate that there is not more of an attempt to centralize all of these different projects under one roof.

The conversation quickly turned to the question of what questions we can ask using our computers, and three different subfields quickly emerged. The first two dealt with working with manuscripts:

A) How can we use computers to help us edit rabbinic texts? This question was addressed via two parallel projects, one led by Hayim Lapin and another by Daniel Stoekl ben Ezra, both of whom began editing the text of the Mishnah ahead of English and French translations and are now working together. Different issues that arose under this topic were how to crowd-source such relatively mundane tasks as transcribing manuscripts and tagging lemma, using Hebrew in TEI, and to what extent should scholars make their synopsi or databases of manuscripts available to the public. For example, many of the transcriptions currently available online are presented as PDFs, but others may want to access them in different formats so that they can more easily (and legally, where those issues arise) use them in making their own editions or to ask other questions of the text.

File:Karl Lachmann - Imagines philologorum.jpg

Karl Lachmann (1793-1851)

B) Can computers help us ask questions with regards to what one participant termed “fundamental issues of how the Bavli was formed and transmitted?” Time and again, and especially now that the Cairo Genizah corpus is more readily available, a certain brand of philological work of the Bavli has problematized the notion that we can use Lachmannian stemmatics to try and understand the relationship between manuscripts. How can we use computers simply to keep track, quantify, and characterize the various differences between talmudic manuscripts? As mentioned, these questions pertain not just to how to “edit” the talmudic text- what the text can be- but also to the very question of what the talmudic text is.

C) Manuscripts aside, what other questions can we ask of rabbinic texts using computers? The questions that came up for the most part dealt with the Bavli, harkening back to some issues discussed here on the blog. Itay Marienberg-Milikwosky of Ben Gurion University’s Department of Hebrew Literature described some of the projects he has started working on in a Franco Moretti inspired lab for digital humanities at Ben Gurion. One of the projects he is working on is trying to restructure how we think of the sugya. Given that the term itself is somewhat foreign to the Bavli and is largely a construct of later interpreters, Itay has been using word-frequency statistics and other quantitative computations to map the Bavli differently, taking special note of how- and perhaps even why- the Bavli repeats itself (“חזרתיות” in his words). You can be assured that Shai and I will be pressuring Itay to guest-post on his findings.

Additional themes that arose related to the relationship between the academic talmudist and the programming or non-academic other. In terms of the latter, how much can projects such as Sefaria (presented by Ephraim Damboritz) and Sefer haAggadah (presented by Amit Assis) benefit from academic talmud? In terms of the former- how and where should talmudists be working with programmers? On the one hand, some talmudist participants were adamant about not studying programming and insisting on working alongside programmers instead. At the other end of the spectrum were those eager to create programming boot camps for talmudists and other humanities scholars. In between were the newbies who got lost after the ‘m’ of XML. Either way, it was clear that talk of “open resources” brings together scholars who are themselves rather open to new ways of thinking about their research.

Next Steps

I would like to see a couple of things come from this evening. First of all, it is clear that a growing number of talmudists are excited by the possibilities that digital humanities opens open up before them, and that many of them already have some idea of how they would like to use moderately sophisticated programs in their own research. I think it would be great if we could use The Talmud Blog– perhaps in the comments section here, through guest posts and maybe even by adding a forum- to create some kind of clearing space for ideas and projects.  Such a space is needed to connect people who may have similar projects in mind, to generate discussions about what we can do with digital humanities, and to address the more philosophical questions of how humanities scholarship is changing before our eyes in this digital age. Let us know what you think!

Standard
English, Guest Posts

H. Lapin on ‘The Digital Mishnah Project’

Ahead of tomorrow’s joint event of The Talmud Blog and Digital Humanities Israel on “Open digital Bible, Mishna and Talmud,” Hayim Lapin of the University of Maryland has written this guest-post on The Digital Mishnah Project. Feel free to leave Hayim your feedback in the comments section below or on the project’s site.

I am pleased to formally announce the relocation of my project to a server at the University of Maryland. I would also like to thank the Talmud Blog for hosting this guest post, which will also appear on my project blog at blog.umd.edu/digitalmishnah. The transition to the new site is not entirely complete, but it is complete enough to talk about it here.

In this post, I’d like to describe the project, give a brief user’s guide, and talk about next steps. At Yitz Landes’s suggestion I’ve also provided a PDF of my paper for the Peter Schäfer festschrift on this topic.

1. The project to date

The project initially was conceived as a side project to an annotated translation of the Mishnah that I am editing with Shaye Cohen (Harvard) and Bob Goldenberg (SUNY Stony Brook, emer.), under contract with Oxford, for which I am also contributing translation and annotations for tractate Neziqin, or Bava Qamma, Bava Metsi’a, and Bava Batra. Since I was going to spend time looking at manuscript variants anyway, I reasoned, why not just work more systematically, and develop a digital edition. Much more complicated than I imagined!

The demo as it is available today on the development server is not significantly different in functionality from what was available in April 2013. However, there is now much more text available. While all the texts need further editing (including for encoding errors that interfere with output), there is enough available to get a very good start at a critical edition (or at least a variorum edition) of tractate Neziqin including several Genizah fragments, with an emphasis on transcribing fragments that constitute joinable sections of larger manuscripts.

2. User’s guide

When you access the demo edition, you will have two choices: “Browse” and “Compare.”

Following the link to the Browse page gives you the list of transcribed witnesses. Selecting one will allow you to browse through the manuscript page by page, column by column, or chapter by chapter with a more compact display of the text.

When you first get to the Compare page, all the witnesses installed to date are displayed and the interactive form is not yet activated [bad design alert!]. It will only be activated when you use the collapsing menus on the left to select text at the Chapter or Mishnah level. You are welcome to poke around and see where there are pieces of text (typically based on Genizah fragments) outside of the Bavot. However, significant amounts of text and of witnesses are only available for these three tractates.

To compare texts, select order (Neziqin) > tractate (one of the Bavot) > chapter (1-10) > mishnah (any one, or whole chapter).

The system will limit the table to those witnesses that have text for the selected passage and enable the table.  At that point, you can select witnesses by putting a numeral into the text field. The output will be sorted based on that number. Then select the Compare button on the right. Output will appear below the form, and you may need to scroll down to it.

There are three output options. By default, you get an alignment in tabular form. Also possible are a text and apparatus (really, just a proof of concept) and a pretty featureless presentation in parallel columns.

3. Desired features

Here are a number of features I would like to see. I would also be glad to hear about additional features that potential users would find useful.

  • Highlighting of orthographic and/or substantive variants in alignment table view. (Current highlighting uses CollateX’s own output, and there appear to be some errors. In addition to correcting these errors, it would be useful to highlight and color code types of variation.)
  • In parallel column view, selecting one text highlights the corresponding text in the other columns.
  • Downloading results (excel, TEI parallel segmentation, etc.).
  • Quantitative analysis (distance, grouping, stemmatics)
  • Correcting output. The preceding features are only as useful as the alignment is good. While down the road, the project will be able to generate an edited alignment table that will obviate the on the fly collation that is currently offered, for the present, users need a way of adjusting the output to reflect what a human eye and brain (carbon-based liveware) can see. At a minimum, this corrected alignment should be available to be reprocessed in the various output and analysis options.
  • Morphological tagging. This is something that Daniel Stoekl ben Ezra and I have been working on (see below).

4. Behind the scenes

For those interested in what is happening behind the scenes, here is a brief description.

The application is built in cocoon. The source code is available at https://github.com/umd-mith/mishnah.

The texts are transcribed and encoded in TEI, a specification of XML developed for textual editing. The transcriptions aim at fairly detailed representation of textual and codicological features, so that the database might be useful also for those interested in the history of the book.

For the Browse function, the system selects the requested source, citation, and transforms the base XML into HTML (using XSLT), and presents it on the monitor.

For the Compare function, the system extracts the selected text from the selected witnesses, passes it through the CollateX program, and then transforms the result into HTML using XSLT. The process is actually somewhat more complicated, since in order to get good alignment a certain amount of massaging is required before passing the information into CollateX, and the output requires then requires some further handling. The text is tokenized (broken into words as comparison units), the tokens are regularized (all abbreviations are expanded, matres lectionis are removed, final aleph is treated as final heh, etc.). CollateX then aligns the regularized tokens, but the output needs to be re-merged with the full tokens. This re-merged text is what is presented in the output.

Ideally, the transcriptions would be done directly into TEI using an XML editor. (Oxygen now has fairly good support for right-to-left editing in “author” mode.) In practice, and especially when supervising long distance, it is easier for transcribers to work in Word. I have developed a set of Word macros that allow transcribers to do the kind of full inputting of data the project requires, and an XSLT transformation to transform the word document from Microsoft’s Open XML to TEI, with a certain amount of post-processing required.

5. Technical next steps

Both for internal processing and in order to align this project more closely with that of Daniel Stoekl ben Ezra (with whom I have been collaborating), the XML schema that governs how to do the markup will need to be changed, so that each word has its own unique address (@xml:id). This also means that tags that can straddle others (say, damage that extends from the end of one text to the beginning of another) will need to be revised. Once I am revising the schema, it will be useful to tighten it, and limit the values that can be used for attributes. This will make it easier for transcribers to work directly in XML.

Encoding of Genizah fragments pose a particular set of problems. (1) We need a method for virtually joining fragments that belong to a single manuscript while also retaining the integrity of the original fragment. At present, each fragment is encoded separately and breaks have pointers to locations in a central reference text. A procedure is necessary to then process the texts and generate a composite. At present this is envisioned with texts that are known to join. At a later stage the approach could be generalized to search for possible joins. (2) Fragmentary texts in general pose a problem in alignment, since we need to distinguish textual absences from physical gaps due to preservation for the alignment program. The system of pointing described in (1) will facilitate this.

The above are necessary to make the present version of the demo function effectively. Toward a next phase, one essential feature is the correction of the alignment output (see also above). I can envision two use cases. In one, the individual user makes corrections for his or her own use, and the edition does not provide a “curated” alignment. Alternatively, we build a content management system that allows the editors to oversee the construction of corrected alignments that become part of the application. When completed, this “curated” version replaces the collation that takes place on the fly. In either case, corrected output makes possible a suite of statistical functions that I would like to implement.

Finally, for our proposed model digital edition, Daniel Stoekl ben Ezra and I have discussed a morphological tagging component. Here we have worked with Meni Adler (BGU) to create preliminary morphological analysis based on modern Hebrew and with a programmer to create a markup tool. Ideally, the corrected markup could then be recycled to train morphological analysis programs on rabbinic Hebrew and Medieval/Early Modern orthography.

I have benefited from the support of the Meyerhoff Center for Jewish Studies, the History Department, and the Maryland Institute for Technology in the Humanities (MITH). MITH, and in particular then assistant director Travis Brown, built the web application and worked with me as I slowly and painfully learned how to build the parts that I built directly. Trevor Muñoz of MITH was insistent that I develop a schema and helped to do so, the wisdom of which I am only now learning. Many professional transcribers and students worked on transcriptions and markup. These are listed in the file ref.xml available in the project repository, and I hope to make an honor roll more visible at a later date.

Many scholars and institutions have been gracious about sharing texts and information. Michael Krupp shared transcriptions of the first four orders of the Mishnah. Peter Schaefer and Gottfried Reeg made available the Mishnah texts from all editions of the Yerushalmi included in the Synopsis. Accordance, through the good offices of Marty Abegg, made pointed transcriptions of the Kaufmann manuscript available. Daniel Stoekl ben Ezra and I have been collaborating for some time now, and have proposed a jointly edited model edition of Bava Metsi’a pending funding.

Standard
English, Events

The Talmud Blog and Digital Humanities Israel

Joint meeting of the Talmud Blog and Digital Humanities Israel

Next Thursday evening, June 26th, The Talmud Blog and “Digital Humanities Israel” will be holding a joint meeting at the “Open Hub” of The National Library in Givat Ram (1st floor) on the topic of open digital resources in the study of the Bible, Mishnah, and Talmud (BMT):

How can open digital resources contribute to BMT scholarship? What new questions can be asked using digital tools and methods?

The Talmud Blog and Digital Humanities Israel will dedicate a special joint meeting to open digital resources, tools and studies in Bible, Mishnah and Talmud scholarship.

Resources and projects dedicated to ancient Jewish sources will be presented, along with new ideas for possible implementations of digital tools and methods for the study of these resources. No prior technological knowledge is required! An open discussion session will follow.

Doors open at 19:00. Presentations by Yitz Landes (The Talmud Blog), Sinai Rusinek (Digital Humanities Israel), Ephraim Damboritz (Sefaria), and others will begin promptly at 19:30.

opentalmud

An open talmud.

Standard
Conferences, English

A Conference on “Aggadic Midrash in the Communities of the Genizah”

Along with fellow students in Hebrew University’s Program for the Study of Late Antiquity, I set out to Haifa University early yesterday morning in order to attend a conference organized by an inter-university research group on “Physical Culture and Textual Culture in the History of the Land of Israel.” One of the day’s highlights was that I happened to sit next to Haifa University’s Dr. Moshe Lavee, who asked that I share the following information about an exciting conference that he is organizing, to take place at HaifaU next week:
The newly founded center for Genizah Research at the University of Haifa will hold the second conference of the research group on “Aggadic Midrash in the Communities of the Genizah” on Wednesday and Thursday next week, 15-16/1/2014 . The conference will present the fruits of the group, as well as lectures by scholars who deal with the subject and adjacent topics such as the relationship between Piyyut and Midrash, the question of oral homilies and sermons, the representation of Midrash in Judeo Arabic materials, and more.
The first day will also include an event marking the recent publication of Uri Ehrlich’s edition of the Amidah prayer according to Genizah fragments . On the second day there will be a special session on the use of new computational tools for classifying and analysing material from the Genizah from Qumran.

agada-heb

Standard