Generalized model for mapping bicycle ridership with crowdsourced data

Trisalyn Nelson; Avipsa Roy; Colin Ferster; Jaimy Fischer; Vanessa Brum-Bastos; Karen Laberee; Hanchen Yu; Meghan Winters

doi:10.1016/j.trc.2021.102981

Generalized model for mapping bicycle ridership with crowdsourced data

Trisalyn Nelson, Avipsa Roy, Colin Ferster, Jaimy Fischer, Vanessa Brum-Bastos, Karen Laberee, Hanchen Yu, Meghan Winters

Research output: Contribution to journal › Article › peer-review

27 Scopus citations

Abstract

Fitness apps, such as Strava, are a growing source of data for mapping bicycling ridership, due to large samples and high resolution. To overcome bias introduced by data generated from only fitness app users, researchers build statistical models that predict total bicycling by integrating Strava data with official counts and geographic data. However, studies conducted on single cities provide limited insight on best practices for modeling bicycling with Strava as generalizability is difficult to assess. Our goal is to develop a generalized approach to modeling bicycling ridership using Strava data. In doing so we enable detailed mapping that is more inclusive of all bicyclists and will support more equitable decision-making across cities. We used Strava data, official counts, and geographic data to model Average Annual Daily Bicycling (AADB) in five cities: Boulder, Ottawa, Phoenix, San Francisco, and Victoria. Using a machine learning approach, LASSO, we identify variables important for predicting ridership in all cities, and independently in each city. Using the LASSO-selected variables as predictors in Poisson regression, we built generalized and city-specific models and compared accuracy. Our results indicate generalized prediction of bicycling ridership on a road segment in concert with Strava data should include the following variables: number of Strava riders, percentage of Strava trips categorized as commuting, bicycling safety, and income. Inclusion of city-specific variables increased model performance, as the R² for generalized and city-specific models ranged from 0.08–0.80 and 0.68–0.92, respectively. However, model accuracy was influenced most by the official count data used for model training. For best results, official count data should capture diverse street conditions, including low ridership areas. Counts collected continuously over a long time period, rather than at peak periods, may also improve modeling. Modeling bicycling from Strava and geographic data enables mapping of bicycling ridership that is more inclusive of all bicyclists and better able to support decision-making.

Original language	English (US)
Article number	102981
Journal	Transportation Research Part C: Emerging Technologies
Volume	125
DOIs	https://doi.org/10.1016/j.trc.2021.102981
State	Published - Apr 2021
Externally published	Yes

Keywords

Bias-correction
Bicycling ridership
Big data
Exposure
LASSO
Strava

ASJC Scopus subject areas

Transportation
Automotive Engineering
Civil and Structural Engineering
Management Science and Operations Research

Access to Document

10.1016/j.trc.2021.102981

Cite this

@article{7e43b952dcbb409bbb5d5c6f49f5e408,

title = "Generalized model for mapping bicycle ridership with crowdsourced data",

abstract = "Fitness apps, such as Strava, are a growing source of data for mapping bicycling ridership, due to large samples and high resolution. To overcome bias introduced by data generated from only fitness app users, researchers build statistical models that predict total bicycling by integrating Strava data with official counts and geographic data. However, studies conducted on single cities provide limited insight on best practices for modeling bicycling with Strava as generalizability is difficult to assess. Our goal is to develop a generalized approach to modeling bicycling ridership using Strava data. In doing so we enable detailed mapping that is more inclusive of all bicyclists and will support more equitable decision-making across cities. We used Strava data, official counts, and geographic data to model Average Annual Daily Bicycling (AADB) in five cities: Boulder, Ottawa, Phoenix, San Francisco, and Victoria. Using a machine learning approach, LASSO, we identify variables important for predicting ridership in all cities, and independently in each city. Using the LASSO-selected variables as predictors in Poisson regression, we built generalized and city-specific models and compared accuracy. Our results indicate generalized prediction of bicycling ridership on a road segment in concert with Strava data should include the following variables: number of Strava riders, percentage of Strava trips categorized as commuting, bicycling safety, and income. Inclusion of city-specific variables increased model performance, as the R2 for generalized and city-specific models ranged from 0.08–0.80 and 0.68–0.92, respectively. However, model accuracy was influenced most by the official count data used for model training. For best results, official count data should capture diverse street conditions, including low ridership areas. Counts collected continuously over a long time period, rather than at peak periods, may also improve modeling. Modeling bicycling from Strava and geographic data enables mapping of bicycling ridership that is more inclusive of all bicyclists and better able to support decision-making.",

keywords = "Bias-correction, Bicycling ridership, Big data, Exposure, LASSO, Strava",

author = "Trisalyn Nelson and Avipsa Roy and Colin Ferster and Jaimy Fischer and Vanessa Brum-Bastos and Karen Laberee and Hanchen Yu and Meghan Winters",

note = "Publisher Copyright: {\textcopyright} 2021 The Author(s)",

year = "2021",

month = apr,

doi = "10.1016/j.trc.2021.102981",

language = "English (US)",

volume = "125",

journal = "Transportation Research Part C: Emerging Technologies",

issn = "0968-090X",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Generalized model for mapping bicycle ridership with crowdsourced data

AU - Nelson, Trisalyn

AU - Roy, Avipsa

AU - Ferster, Colin

AU - Fischer, Jaimy

AU - Brum-Bastos, Vanessa

AU - Laberee, Karen

AU - Yu, Hanchen

AU - Winters, Meghan

PY - 2021/4

Y1 - 2021/4

N2 - Fitness apps, such as Strava, are a growing source of data for mapping bicycling ridership, due to large samples and high resolution. To overcome bias introduced by data generated from only fitness app users, researchers build statistical models that predict total bicycling by integrating Strava data with official counts and geographic data. However, studies conducted on single cities provide limited insight on best practices for modeling bicycling with Strava as generalizability is difficult to assess. Our goal is to develop a generalized approach to modeling bicycling ridership using Strava data. In doing so we enable detailed mapping that is more inclusive of all bicyclists and will support more equitable decision-making across cities. We used Strava data, official counts, and geographic data to model Average Annual Daily Bicycling (AADB) in five cities: Boulder, Ottawa, Phoenix, San Francisco, and Victoria. Using a machine learning approach, LASSO, we identify variables important for predicting ridership in all cities, and independently in each city. Using the LASSO-selected variables as predictors in Poisson regression, we built generalized and city-specific models and compared accuracy. Our results indicate generalized prediction of bicycling ridership on a road segment in concert with Strava data should include the following variables: number of Strava riders, percentage of Strava trips categorized as commuting, bicycling safety, and income. Inclusion of city-specific variables increased model performance, as the R2 for generalized and city-specific models ranged from 0.08–0.80 and 0.68–0.92, respectively. However, model accuracy was influenced most by the official count data used for model training. For best results, official count data should capture diverse street conditions, including low ridership areas. Counts collected continuously over a long time period, rather than at peak periods, may also improve modeling. Modeling bicycling from Strava and geographic data enables mapping of bicycling ridership that is more inclusive of all bicyclists and better able to support decision-making.

AB - Fitness apps, such as Strava, are a growing source of data for mapping bicycling ridership, due to large samples and high resolution. To overcome bias introduced by data generated from only fitness app users, researchers build statistical models that predict total bicycling by integrating Strava data with official counts and geographic data. However, studies conducted on single cities provide limited insight on best practices for modeling bicycling with Strava as generalizability is difficult to assess. Our goal is to develop a generalized approach to modeling bicycling ridership using Strava data. In doing so we enable detailed mapping that is more inclusive of all bicyclists and will support more equitable decision-making across cities. We used Strava data, official counts, and geographic data to model Average Annual Daily Bicycling (AADB) in five cities: Boulder, Ottawa, Phoenix, San Francisco, and Victoria. Using a machine learning approach, LASSO, we identify variables important for predicting ridership in all cities, and independently in each city. Using the LASSO-selected variables as predictors in Poisson regression, we built generalized and city-specific models and compared accuracy. Our results indicate generalized prediction of bicycling ridership on a road segment in concert with Strava data should include the following variables: number of Strava riders, percentage of Strava trips categorized as commuting, bicycling safety, and income. Inclusion of city-specific variables increased model performance, as the R2 for generalized and city-specific models ranged from 0.08–0.80 and 0.68–0.92, respectively. However, model accuracy was influenced most by the official count data used for model training. For best results, official count data should capture diverse street conditions, including low ridership areas. Counts collected continuously over a long time period, rather than at peak periods, may also improve modeling. Modeling bicycling from Strava and geographic data enables mapping of bicycling ridership that is more inclusive of all bicyclists and better able to support decision-making.

KW - Bias-correction

KW - Bicycling ridership

KW - Big data

KW - Exposure

KW - LASSO

KW - Strava

UR - http://www.scopus.com/inward/record.url?scp=85101619467&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85101619467&partnerID=8YFLogxK

U2 - 10.1016/j.trc.2021.102981

DO - 10.1016/j.trc.2021.102981

M3 - Article

AN - SCOPUS:85101619467

SN - 0968-090X

VL - 125

JO - Transportation Research Part C: Emerging Technologies

JF - Transportation Research Part C: Emerging Technologies

M1 - 102981

ER -

Generalized model for mapping bicycle ridership with crowdsourced data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this