15 views (last 30 days)

Show older comments

Wolfgang McCormack on 28 May 2021

Answered: Image Analyst on 29 May 2021

Hi all,

I have a scatter plot and there are some dots on that. Is there any option to get the X and Y of those points on the scatter plot? Furthermore, anyoption to run polyfit among those points directly on the scatter plot?

Thanks

##### 8 Comments Show 6 older commentsHide 6 older comments

Show 6 older commentsHide 6 older comments

Cris LaPierre on 28 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1549860

How did you create the scatter plot?

Walter Roberson on 28 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1549880

Do you still have the data that was used to create the scatter plot?

If not, is the scatter plot itself still available on the display, so that the data could be extracted from it?

Or is what you have an image of a scatter plot?

dpb on 28 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1549940

Presuming you still do have the figure, in recent (I don't know how recent is required), under the Tools menu on the figure is the ability to do so interactively.

As Walter says, if you have either the original data or the figure, you can certainly do whatever you want; just may have to retrieve the X|YData from the figure in the one case.

Wolfgang McCormack on 29 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1550580

@Walter Roberson @dpb @Cris LaPierre, I have both the initial data and figure itself. but DataA and DataB are not simple ordinal X and Y but rather they gain meaning in a scatter plot. The data entries are more than 10,000 entries but the scatter plot created from them is just 6 points on the scatter plot. Now I want a fit in these 6 points and their locations. Also, any idea on how to code the statistics tool box options such as the R^2 and RMSE? Right now, I rewrite them from the tool box using a text but that would be a pain for the rest of my plots.

dpb on 29 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1550650

Even if the data are redundant samples of some small number of values, the statistics of the fit of the data are based on the weighted errors; you could either fit the raw data or use the counts to classify the points and fit the unique values.

Depending on on the distribution of the points relative to the locations, just fitting the N unique values as one-each point may return a quite different result if the relative number of points is grossly different between these values.

In the Statistics TB, regstats will return all the diagnostic statistics you could want...

Star Strider on 29 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1550685

Open in MATLAB Online

@Wolfgang McCormack — If there are 10,000 points represented by only 6 distinct points on the scatter plot, are they all the same values, or do they have a third dimension as well?

Example —

x = repmat(rand(6,1),1, 100);

y = repmat(rand(6,1),1, 100);

z = reshape(rand(600,1),6, 100);

figure

scatter(x, y, 'filled')

grid

xlabel('x')

ylabel('y')

title('2D Scatter Plot')

figure

scatter3(x(:), y(:), z(:), 'filled')

grid on

xlabel('x')

ylabel('y')

zlabel('z')

title('3D Scatter Plot')

The polyfit (and polyval) functions would likely not have problems with only ‘x’ and ‘y’, however if weighting with respect to the ‘z’ coordinates would be required, that could be more difficult. The Statistics and Machine Learning Toolbox functions (such as fitlm and fitnlm) would likely be more appropriate here, not only because they would provide the desired statistics, but also because they will allow weighting.

.

Wolfgang McCormack on 29 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1550895

⋮

@Star Strider Yes, they are 10k points but in one the variable y changes between only 6 values. For example there are 3000 values of 4.36 and 2000 of 6.54 and so on. Same thing happens in variable x too. Are they duplicates? They are not actually duplicates because the data is hourly. But in other sense, you can say they are the same in many hours. THere is no third dimension in my case but thanks for pointing that out. It'll def help me in future. :D I guess I can write a book based on all my questions so far on MATLAB, naming it MATLAB for junior researchers :D Thank you all

Star Strider on 29 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1550945

Open in MATLAB Online

@Wolfgang McCormack — I am doing my best to understand what the data are in the absence of the data themselves.

So they are actually something like this, then —

N = 25;

x = rand(1,N);

y = repmat(randn(6,1),1,N);

figure

scatter(x, y, 'filled')

grid

or this —

figure

scatter(x, y(randi(6,1,N)), 'filled')

grid

?

.

Sign in to comment.

Sign in to answer this question.

### Answers (2)

Cris LaPierre on 29 May 2021

Polyfit is not going to return the X and Y values of those 6 points. It's going to return the polynomial coefficients for the equation that best fits the data. You then supply that equation with whatever X values you want to obtain the corresponding Y values. Using those, you can plot the fit line.

If you need the X and Y of the 6 groups, I'd suggest using something like kmeans clustering to identify 6 clusters and return their centroids first.

[idx,C] = kmeans(___)

Use the centroids as inputs to polyfit.

Calculating and RMSE is fairly simple. You just need to do some math to calculate SSR and SST. See this answer.

##### 2 Comments Show NoneHide None

Show NoneHide None

Wolfgang McCormack on 29 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1550900

@Cris LaPierre Thank you Cris, this will def help me in future however in my current case, I only have 6 changes in each variable. like there are 3000 of 3.46 in X and 5000 of 4.6 and so on.

Cris LaPierre on 29 May 2021

#### Direct link to this comment

https://support.mathworks.com/matlabcentral/answers/842620-is-there-any-option-to-run-a-polyfit-on-a-scatter-plot#comment_1550920

Share your data. It will be easier than trying to guess what is going on. Save your variables to a mat file and attach them to your post using the paperclip icon.

Sign in to comment.

Image Analyst on 29 May 2021

Open in MATLAB Online

Chris suggests a nice trick. And attach your data like he says in his hidden comment (click link above to show it).

save('answers.mat', 'DataA', 'DataB');

Use the paperclip icon.

Another trick I've used when you have quantized data (multiple points with the same x value) is to add a very slight amount of noise to the x data. Add enough noise to make them unique and avoid the error polyfit throws, but not enough to change the formula it will find:

% Determine range of data.

minx = min(x)

maxx = max(x)

% Add a fraction of a percent of noise to x to make them unique.

xNoisy = x + 0.00001 * (maxx - minx);

% Determine the formula with the noisy x instead of the actual x.

% Below we will use a second order polynomial.

coefficients = polyfit(xNoisy, y, 2); % Fit a quadratic.

% Get estimated y from arbitrary x

estimatedY = coefficients(3) * thisX .^2 + coefficients(2) * thisX + coefficients(1);

Note that this will give a different formula than Chris's because this will consider how many points are in the cluster, so more points in a cluster will influence the line more, while Chris's uses the centroids of the clusters which ignores how many points are in the cluster. If you have about the same number of points in each cluster, it won't make much of a difference, but if some clusters have wildly different number of points than other clusters, then it could make a noticeable difference.

##### 0 Comments Show -2 older commentsHide -2 older comments

Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

### See Also

### Categories

MATLABGraphics2-D and 3-D PlotsSurfaces, Volumes, and PolygonsSurface and Mesh Plots

Find more on **Surface and Mesh Plots** in Help Center and File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

**An Error Occurred**

Unable to complete the action because of changes made to the page. Reload the page to see its updated state.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list

Americas

- América Latina (Español)
- Canada (English)
- United States (English)

Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- Deutsch
- English
- Français

- United Kingdom(English)

Asia Pacific

- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)

Contact your local office