To use Python in R Markdown, we need to load the
{reticulate} R package in a
code chunk, we do it here in the setup code chunk. We also load the
{ggplot2} R package so we can plot
some data from a Python pandas data frame later on!
.Rmd filesJust like R, you can use Python in .Rmd files! Here we import the
Python package that we need to import our data into Python, pandas.
pandas is a Python package that adds data reading, wrangling, and
simple data visualization functionality to Python. It holds a similar
place as the {tidyverse} R meta-package does (however pandas is not a
meta-package, just a very large package). If you want to learn more
about pandas, see the “10 minutes to
pandas”
in the docs our checkout this free interactive course:
https://prog-learn.mds.ubc.ca/en.
import pandas as pd
Let’s load the titanic data (which lives in the data directory of this
project) and view the data:
titanic  = pd.read_csv("data/titanic.csv")
pandasFirst, let’s use some quick and simple pandas plotting functions:
titanic.plot.scatter(x='age', y='fare', alpha=0.3)

Want to learn more about getting started plotting in Python using
pandas, see this page to get started:
https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
We can also access the Python environment in this R Markdown document
from R! This allows us to apply R’s functions on our Python objects! We
can use the py$obbject syntax to do this and create a data
visualization using {ggplot2}!
ggplot2::ggplot(py$titanic, aes(x = age, y = fare)) +
  geom_point()

pandasHere we subset the age and fare columns:
titanic[["age", "fare"]]
##           age      fare
## 0     29.0000  211.3375
## 1      0.9167  151.5500
## 2      2.0000  151.5500
## 3     30.0000  151.5500
## 4     25.0000  151.5500
## ...       ...       ...
## 1304  14.5000   14.4542
## 1305      NaN   14.4542
## 1306  26.5000    7.2250
## 1307  27.0000    7.2250
## 1308  29.0000    7.8750
## 
## [1309 rows x 2 columns]
Here we filter for rows containing people over 85:
titanic[titanic["age"] > 70]
##       pclass  survived  ...   body                home.dest
## 9          1         0  ...   22.0      Montevideo, Uruguay
## 14         1         1  ...    NaN            Hessle, Yorks
## 61         1         1  ...    NaN  Little Onn Hall, Staffs
## 135        1         0  ...    NaN             New York, NY
## 727        3         0  ...  171.0                      NaN
## 1235       3         0  ...    NaN                      NaN
## 
## [6 rows x 14 columns]
Here we find the destination of the first passenger:
first_dest = titanic["home.dest"][0]
The destination of the first passenger is St Louis, MO.