% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/text.R
\name{html_text}
\alias{html_text}
\alias{html_text2}
\title{Get element text}
\usage{
html_text(x, trim = FALSE)

html_text2(x, preserve_nbsp = FALSE)
}
\arguments{
\item{x}{A document, node, or node set.}

\item{trim}{If \code{TRUE} will trim leading and trailing spaces.}

\item{preserve_nbsp}{Should non-breaking spaces be preserved? By default,
\code{html_text2()} converts to ordinary spaces to ease further computation.
When \code{preserve_nbsp} is \code{TRUE}, \verb{&nbsp;} will appear in strings as
\code{"\\ua0"}. This often causes confusion because it prints the same way as
\code{" "}.}
}
\value{
A character vector the same length as \code{x}
}
\description{
There are two ways to retrieve text from a element: \code{html_text()} and
\code{html_text2()}. \code{html_text()} is a thin wrapper around \code{\link[xml2:xml_text]{xml2::xml_text()}}
which returns just the raw underlying text. \code{html_text2()} simulates how
text looks in a browser, using an approach inspired by JavaScript's
\href{https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/innerText}{innerText()}.
Roughly speaking, it converts \verb{<br />} to \code{"\\n"}, adds blank lines
around \verb{<p>} tags, and lightly formats tabular data.

\code{html_text2()} is usually what you want, but it is much slower than
\code{html_text()} so for simple applications where performance is important
you may want to use \code{html_text()} instead.
}
\examples{
# To understand the difference between html_text() and html_text2()
# take the following html:

html <- minimal_html(
  "<p>This is a paragraph.
    This another sentence.<br>This should start on a new line"
)

# html_text() returns the raw underlying text, which includes whitespace
# that would be ignored by a browser, and ignores the <br>
html |> html_element("p") |> html_text() |> writeLines()

# html_text2() simulates what a browser would display. Non-significant
# whitespace is collapsed, and <br> is turned into a line break
html |> html_element("p") |> html_text2() |> writeLines()

# By default, html_text2() also converts non-breaking spaces to regular
# spaces:
html <- minimal_html("<p>x&nbsp;y</p>")
x1 <- html |> html_element("p") |> html_text()
x2 <- html |> html_element("p") |> html_text2()

# When printed, non-breaking spaces look exactly like regular spaces
x1
x2
# But aren't actually the same:
x1 == x2
# Which you can confirm by looking at their underlying binary
# representaion:
charToRaw(x1)
charToRaw(x2)
}
