CEE 5/7303

Public Data

The Space team has made the following datasets and collections publicly available. You must be a logged-in member of the Space to access all the datasets and collections.

Datasets

Viewing most recent datasets View All Datasets

Final Project Memo (Mario Farag)

0 1 0 250 0

Capital Bikeshare Memorandum

The memorandum and the appendix of my AI project

0 1 0 419 0

Capital Bikeshare Dataset Used for Project

These are the datasets used for the project. Here is the original link: https://www.kaggle.com/datasets/taweilo/capital-bikeshare-dataset-202005202408
There is a Supervised and Clustered Dataset along with my combined datasets
Attributes:
- date (observation date)
- pickup_counts (number of bikes rented per day)
- dropoff_counts (number of bikes dropped off per day)
-tempmax (max daily temp F)
-tempmin (min daily temp F)
-humidity (average daily humidity %)
-precip (daily total precipitation inches)
-windspeed (average daily windspeed mph)
-weekday (numeric representation of day of week 1=monday 7=sunday)
-month (numeric representation 1-12)
-holiday (1 = federal or major holiday and 0 = normal day)
-total_usage (total usage of a station)

0 4 0 543 0

correlation_matrix_clean.csv

0 1 0 1181 0

ZnSolventDataset.arff

0 1 0 410 0

ZnSolventDataset.csv

0 1 0 331 0

Citizen-Generated Incident Reports

https://www.dallasopendata.com/Archive/Dallas-Police-Public-Data-RMS-Incidents-with-GeoLo/4ea4-q4ui/about_data

0 1 3 458 0

Mobile/GPS Location Data

https://www.dallasopendata.com/dataset/Geolocation-2016/2byq-ux7x/about_data

0 1 3 241 1

High Crash Rate Intersections

https://www.dallasopendata.com/Public-Safety/High-Crash-Rate-Intersections-in-Dallas/cyd9-x7py/about_data

0 2 3 284 1

Collections

Viewing most recent collections View All Collections

Reyes_Vector-Borne_Diseases_Risk

Datasets used to predict vector-borne disease risk based on weather patterns and weekly disease reports.

0 0 197 0

Ye Tian – Machine Learning Analysis of Organic Solvents

3 0 283 0

Edens-Predicting Capital Bike Ridership

This Collection contains datasets used for the project and my memo, any appendices or extra data used

4 0 303 0

Flury- Coffee Shop Data

Optimizing Campus

Coffee Shop

Operations:

Reducing Waste

and Improving

Efficiency

2 0 281 0

Hunter – Construction Cost Estimation

Below are my datasets and their attributes (GPT-5 used for formatting):

Dataset Name: Construction Estimation Data

Source Link: https://www.kaggle.com/datasets/sasakitetsuya/construction-estimation-data

Description: A simulated dataset of 1,000 construction project cost estimates including cost components and pricing adjustments.

Attributes:

material_cost – Estimated material cost for each project (USD)

labor_cost – Estimated labor cost for each project (USD)

profit_rate – Contractor markup percentage (%)

discount_or_markup – Additional price adjustment applied (USD)

policy_reason – Text category describing the reason for markup/discount

total_estimate – Final estimated project cost (USD)

Instances: 1,000 projects

Units: USD & Percentage

Spatial Scope: National (synthetic)

Temporal Scope: Static cross-section

Purpose: Core supervised ML dataset used to train models predicting final construction cost based on cost inputs and pricing policy factors.

__

Dataset Name: U.S. Construction Spending Dataset

Source Link:
https://www.kaggle.com/datasets/shashwatwork/construction-spending-dataset/data

Description:
This dataset contains historical U.S. construction spending data across multiple market segments. It includes monthly spending totals for public, private, and total construction activity. The dataset provides insight into macro-level construction trends over time and serves as an external economic context dataset supporting project-level cost estimation.

Attributes:

Date – Month and year of reported spending

Total Construction Spending – Aggregate U.S. construction spending (in USD millions)

Private Construction Spending – Spending on private-sector construction projects (USD millions)

Public Construction Spending – Spending on publicly funded construction projects (USD millions)

(names may vary slightly depending on file columns — adjust after loading)

Instances:
Monthly observations across multiple years

Units:
U.S. Dollars (millions)

Spatial Scope:
United States, national-level data

Temporal Scope:
Monthly time series

Purpose:
To incorporate real-world construction market activity trends into the modeling process by adding a macro-economic indicator that reflects industry spending levels and demand cycles.

Justification:
The primary Kaggle construction cost dataset includes project-level variables such as material cost, labor cost, and profit factors, which directly influence individual project estimates. The Construction Spending dataset adds external industry context by providing monthly U.S. construction spending trends. Including this variable supports a more realistic modeling approach by aligning project-level cost estimates with broader construction market activity and economic conditions.

3 0 626 0

Farag_Leveraging Machine Learning to Reduce Traffic Accidents in Dallas: A Data-Driven Safety Approach

0 0 282 0

The following datasets have been published through this Space and any affiliated Spaces.

Filter by:

Statistics

Collections	6
Datasets	11
Files:	15
Bytes:	321.6 MB
Users:	8

External Links

No External Links

Access

PUBLIC