Nov 29th – Project Updates

In my latest project, I wanted to develop a predictive model for property valuations. After separating the dataset into features and the target variable, I preprocessed the data, treating numerical columns with a median strategy to handle missing values, and categorical columns with the most frequent strategy followed by one-hot encoding. This meticulous preparation ensured that each property’s unique attributes were ready to inform the predictive process.

Choosing a Decision Tree Regressor for its straightforward approach to learning, I created a pipeline that included both preprocessing and modeling stages. The training phase was an exercise in pattern recognition, with the model analyzing 80% of the data to understand the intricacies of real estate valuation. The decision tree’s method of breaking down data into a series of binary decisions made it an excellent tool for navigating the complex relationships between property features and their market values.

The model’s performance was evaluated using the mean squared error and R-squared metrics, revealing an impressive R-squared value of 0.976. This indicates that our model explains a significant majority of the variance in property values, showcasing its ability to make highly accurate predictions. Such precision in predictive modeling is not only a triumph in statistical analysis but also a potential cornerstone for investors and policymakers in the real estate market, providing a reliable tool for future valuation assessments.

Nov 27th – Project Updates

My analysis of the dataset has led me to a striking conclusion: the size of a property, encompassing both its gross and living areas, is profoundly influential on its value. This insight came from observing correlations well over 0.83 between these size metrics and land value. Moreover, the number of rooms and bedrooms also shares a moderate positive relationship with land value, further cementing the idea that more spacious properties fetch higher land valuations. This relationship holds true across the board, from land to building, and ultimately, the total value of a property.

While size stands out as a crucial determinant of value, the age and renovation history of a building tell a subtler story. My observations reveal that these factors are not key players in determining land value. However, when it comes to building value, renovations (YR_REMODEL) seem to carry slightly more weight than the year of construction (YR_BUILT), hinting that modern updates can indeed enhance a building’s worth, albeit modestly.

Extending these insights into the total value of properties, I’ve noticed that the patterns observed for land and building values echo here as well. The total value is most responsive to the tangible, functional aspects of a property—its size, structure, and capacity. This suggests a consistent market trend: buyers and assessors alike prioritize the present capabilities and amenities of a property over its historical narrative or cosmetic improvements.

Through this data-driven journey, I’ve learned that in the realm of property valuation, the physical attributes of a property—its expansiveness and utility—reign supreme. The age and renovation history, while they do play roles, are secondary in the grand scheme of property valuation. This analysis not only informs potential investors and homeowners but also enriches my understanding of the real estate market’s valuation principles.

Nov 24th – Project Updates

Diving into the dataset I have charted a course through the construction and remodeling history of our urban landscape. The data stretches back to the 1700s, revealing an enduring legacy of architectural history with a mean construction year nestled in the roaring twenties—a time when the city was burgeoning with new structures. The standard deviation tells a story of diversity; a 42-year spread indicates an array of property ages, painting a picture of a city that grew in bursts and waves, rather than a steady stream.

The histogram I’ve examined shows a city that reached its zenith of construction in the 1900s, a testament to a bygone era of rapid expansion and the industrial boom. Over 50,600 buildings from that period still stand today, marking it as the most prolific in the city’s history. Fast forward to the 1980s, and the data reveals a peak in renovations, with over 104,000 properties rejuvenated, suggesting a decade of renewal and transformation, echoing a renewed spirit of urban revitalization.

Interestingly, the skewness of the construction years leans towards older properties, hinting at a city that values and preserves its past. Yet, the kurtosis presents a flat distribution, suggesting few anomalies in this historical narrative. When it comes to renovations, the data shows a less skewed distribution, revealing a consistent effort to update and modernize. However, the weak correlation between construction or remodel years and total property value challenges common perceptions, indicating that a property’s age or its updates do not necessarily equate to its market worth—a fascinating observation that suggests the city’s real estate value is dictated by more than just its history or its facelifts.

Nov 22nd – Project updates – Correlations in Property Assessments

In my analysis of the “FY2023 Property Assessment Data,” I’ve uncovered intriguing correlations that shed light on the complex dynamics of property valuation. The data revealed a robust positive correlation between the gross area, living area, and the value metrics, including land and building values. This compelling link underscores a fundamental principle in real estate: larger properties tend to command higher values, reflecting the premium placed on space.

Another interesting observation is the positive correlation between the number of residential and commercial units and the total property value. This suggests that multi-unit properties, which offer more living or business space, naturally carry a higher valuation. Moreover, the strong correlation between gross tax and property values reaffirms the direct impact of valuation on tax obligations, highlighting the fiscal implications of property assessments.

Curiously, the year a building was constructed or remodeled shows little to no correlation with its size or value, pointing to the nuanced ways that age and modern updates intersect with market worth. Meanwhile, features like the number of parking spaces and fireplaces appear to have a varied influence on value, with parking showing a moderate link to property size, while fireplaces bear a negligible relationship, perhaps challenging conventional wisdom about the value these features add to a property.

Nov 20th – Geographic Distribution of Property Values

In my recent analysis of property values across different neighborhoods, I’ve discovered a vibrant patchwork of economic climates that define our city’s landscape. The chart I’ve crafted here is a horizontal bar chart that lays bare the distribution of property counts across various parts of the city. East Boston, the lengthiest bar on the chart, suggests a bustling hub of real estate activity, possibly due to a combination of historical significance and contemporary development. Meanwhile, areas like Newton and Brookline, with shorter bars, might indicate exclusivity and higher property values despite a smaller count of properties.

This chart is more than a collection of colored bars; it’s a series of narratives about each neighborhood’s character and economic status. For instance, the substantial number of properties in Dorchester and Jamaica Plain could reflect a density of residential zones or a surge in property development. In contrast, the modest bars representing Chestnut Hill and Dedham may speak to a quieter real estate landscape, potentially one with larger properties and more green spaces.

Nov 17 – Exploring Property Construction Years

In my recent analysis of the “FY2023 Property Assessment Data,” I was particularly captivated by the ‘Year Built’ aspect of the dataset. It felt like unearthing a historical timeline, where each property’s construction year tells a story of the past. The accompanying chart, a histogram of these years, visually narrates the peaks of construction activity, reflecting the region’s economic and social transformations over time.

The chart reveals a fascinating blend of historic and modern structures. It’s intriguing to see how different periods have left their architectural signatures, from historic buildings that echo the past to contemporary ones symbolizing modernity and progress. This mix not only highlights the region’s architectural evolution but also its adaptability to changing times and needs.

This exploration was more than an analysis; it was a journey through the architectural history of our region. It underscored the importance of balancing the preservation of historical buildings with the embrace of modern development. As I delved into these construction years, I gained a deeper appreciation for the stories embedded in our built environment, each structure a chapter in the ongoing narrative of our community’s growth.

Nov 15 – Project Boston Property Insight

In my initial exploration of the “FY2023 Property Assessment Data,” I was struck by the depth and breadth of information captured in this dataset. It encompasses a vast array of 180,627 properties, each detailed across 34 diverse attributes. These range from basic identifiers like street name, city, and zip code, to more intricate details such as land use classifications, building values, and the physical characteristics of the properties.

As I delved into the dataset, I noticed that it paints a rich tapestry of the area’s property landscape. The variety in attributes like land and building values particularly caught my attention, revealing the economic diversity of the region’s real estate. Furthermore, the dataset provides a window into the architectural history and development trends of the area, as evidenced by data points like the year properties were built or remodeled, and their physical conditions.

Output image

From a statistical standpoint, the range and spread of the data are remarkable. The properties’ construction years span from the early 18th century to the present, illustrating a fascinating mix of historical and contemporary architecture. The economic aspects, such as land and building values, show significant variation, underscoring the varied economic strata within the region. This initial analysis of the dataset has been enlightening, offering a comprehensive overview of the housing market and property dynamics. It lays the groundwork for more detailed investigations, which I anticipate will yield further insights into specific trends and patterns in property assessments.

Nov 10th – The Significance of RSS in Decision Trees: A Comprehensive Overview

In my exploration of decision trees within the realm of predictive modeling, a pivotal concept that has significantly enriched my understanding is the Residual Sum of Squares (RSS). This unassuming yet powerful metric serves as the linchpin in the decision tree algorithm, contributing substantially to the precision and efficacy of predictive modeling.

In essence, RSS functions as a guiding principle for decision trees, particularly during the process of making optimal splits. Its primary objective is to minimize the sum of squared differences between predicted values and actual outcomes. As the decision tree algorithm traverses through the dataset, RSS emerges as a discerning force, meticulously evaluating potential feature splits and selecting those that result in the minimal RSS at each node.

The role of RSS extends beyond the initial training phase, manifesting in the crucial process of pruning to prevent overfitting. Pruning, guided by RSS, strategically trims branches of the tree that contribute minimally to reducing the overall RSS. This delicate balance between complexity and accuracy ensures the decision tree’s capacity to generalize effectively to new and unseen data, cementing RSS as an integral component in the journey from model creation to refinement. In conclusion, my exploration of RSS in decision trees has underscored its significance as a decision-making criterion and a key contributor to the model’s predictive prowess.

Nov 8th – Project Updates

Our latest analysis provides insights into the proportions of individuals shot relative to their estimated population percentage in the U.S. This comparison reveals notable disparities across different racial categories:

  • Black (B): Black individuals are shot at a proportion approximately 1.70 times higher than their representation in the U.S. population.
  • Native American (N): The proportion of Native American individuals shot is approximately 1.31 times their representation in the U.S. population.
  • Hispanic (H): Hispanic individuals are shot at a proportion approximately 0.77 times their representation in the U.S. population.
  • White (W): The proportion of White individuals shot is approximately 0.67 times their representation in the U.S. population.
  • Asian (A): Asian individuals are shot at a proportion approximately 0.29 times their representation in the U.S. population.
  • Other (O): Individuals from other racial categories are shot at a proportion approximately 0.09 times their representation in the U.S. population.

This analysis sheds light on the disparities in the likelihood of being shot across various racial groups. Our next steps involve exploring potential factors contributing to these disparities and further examining the broader implications of these findings.

Nov 3rd – Project Updates

To understand the statistical significance , we used  “Cohen’s d” effect size measure, which is commonly used in conjunction with t-tests. Cohen’s d quantifies the difference between two groups in terms of standard deviations.

Cohen’s d is calculated using the formula:

d = (mean difference/Pooled Standard Deviation)

Where the Mean Difference is the difference in means between two groups, and the Pooled Standard Deviation is a weighted average of the standard deviations of the two groups.

On calculating this in regards to the mean age of Black and white race, we found that Cohen’s d: 0.57  which is a small effect size i.e average age of White  has higher mean in comparison with Black race.

Nov 1st – Project updates

Logistic regression serves as a statistical modeling technique utilized for examining the connection between a binary outcome variable and one or more predictor variables. In this analysis, I designated “manner_of_death” as the binary outcome variable, where the column specifies whether the death resulted from being “shot” or “shot and tasered.” Additional columns such as “armed,” “age,” “gender,” and “race” were considered as predictor variables to gauge the likelihood of a specific mode of death.

Subsequently, I delved into exploring and preprocessing the data, addressing missing values and encoding categorical variables. Following this, a logistic regression model was constructed by fitting the data. In this model, the binary variable became the dependent variable, and the other columns served as independent data, acting as predictors.

The model’s performance was then evaluated using various metrics, including accuracy, precision, recall, F1 score, and R2 score, all of which demonstrated satisfactory results. This comprehensive approach allowed for a thorough understanding of the relationship between the chosen predictors and the binary outcome, providing valuable insights into the risk estimation for different modes of death.