To use Python in R Markdown, we need to load the
{reticulate} R package in a
code chunk, we do it here in the setup code chunk. We also load the
{ggplot2} R package so we can plot
some data from a Python pandas
data frame later on!
.Rmd
filesJust like R, you can use Python in .Rmd
files! Here we import the
Python package that we need to import our data into Python, pandas
.
pandas
is a Python package that adds data reading, wrangling, and
simple data visualization functionality to Python. It holds a similar
place as the {tidyverse} R meta-package does (however pandas
is not a
meta-package, just a very large package). If you want to learn more
about pandas, see the “10 minutes to
pandas”
in the docs our checkout this free interactive course:
https://prog-learn.mds.ubc.ca/en.
import pandas as pd
Let’s load the titanic data (which lives in the data
directory of this
project) and view the data:
titanic = pd.read_csv("data/titanic.csv")
pandas
First, let’s use some quick and simple pandas
plotting functions:
titanic.plot.scatter(x='age', y='fare', alpha=0.3)
Want to learn more about getting started plotting in Python using
pandas
, see this page to get started:
https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html
We can also access the Python environment in this R Markdown document
from R! This allows us to apply R’s functions on our Python objects! We
can use the py$obbject
syntax to do this and create a data
visualization using {ggplot2}!
ggplot2::ggplot(py$titanic, aes(x = age, y = fare)) +
geom_point()
pandas
Here we subset the age and fare columns:
titanic[["age", "fare"]]
## age fare
## 0 29.0000 211.3375
## 1 0.9167 151.5500
## 2 2.0000 151.5500
## 3 30.0000 151.5500
## 4 25.0000 151.5500
## ... ... ...
## 1304 14.5000 14.4542
## 1305 NaN 14.4542
## 1306 26.5000 7.2250
## 1307 27.0000 7.2250
## 1308 29.0000 7.8750
##
## [1309 rows x 2 columns]
Here we filter for rows containing people over 85:
titanic[titanic["age"] > 70]
## pclass survived ... body home.dest
## 9 1 0 ... 22.0 Montevideo, Uruguay
## 14 1 1 ... NaN Hessle, Yorks
## 61 1 1 ... NaN Little Onn Hall, Staffs
## 135 1 0 ... NaN New York, NY
## 727 3 0 ... 171.0 NaN
## 1235 3 0 ... NaN NaN
##
## [6 rows x 14 columns]
Here we find the destination of the first passenger:
first_dest = titanic["home.dest"][0]
The destination of the first passenger is St Louis, MO.